uniquecombs.Rd 2.86 KB
Newer Older
1 2 3 4 5
\name{uniquecombs}
\alias{uniquecombs}
%- Also NEED an `\alias' for EACH other topic documented here.
\title{find the unique rows in a matrix }
\description{
6 7
This routine returns a matrix or data frame containing all the unique rows of the
matrix or data frame supplied as its argument. That is, all the duplicate rows are
8
stripped out. Note that the ordering of the rows on exit need not be the same
9 10
as on entry. It also returns an index attribute for relating the result back 
to the original matrix.
11 12
}
\usage{
13
uniquecombs(x,ordered=FALSE)
14 15 16
}
%- maybe also `usage' for other objects documented here.
\arguments{
17
 \item{x}{ is an \R matrix (numeric), or data frame. }
18
 \item{ordered}{ set to \code{TRUE} to have the rows of the returned object in the same order regardless of input ordering.}
19 20 21
}
\details{ Models with more parameters than unique combinations of
  covariates are not identifiable. This routine provides a means of
22 23 24 25 26 27 28 29 30 31 32
  evaluating the number of unique combinations of covariates in a
  model. 

  When \code{x} has only one column then the routine
  uses \code{\link{unique}} and \code{\link{match}} to get the index. When there are
  multiple columns then it uses \code{\link{paste0}} to produce labels for each row, 
  which should be unique if the row is unique. Then \code{unique} and \code{match} 
  can be used as in the single column case. Obviously the pasting is inefficient, but 
  still quicker for large n than the C based code that used to be called by this routine, which 
  had O(nlog(n)) cost. In principle a hash table based solution in C 
  would be only O(n) and much quicker in the multicolumn case. 
33
   
34
 \code{\link{unique}} and \code{\link{duplicated}}, can be used 
35
 in place of this, if the full index is not needed. Relative performance is variable. 
36 37 38

 If \code{x} is not a matrix or data frame on entry then an attmept is made to coerce 
 it to a data frame. 
39 40
}
\value{
41
A matrix or data frame consisting of the unique rows of \code{x} (in arbitrary order).
42

43
The matrix or data frame has an \code{"index"} attribute. \code{index[i]} gives the row of the returned 
44
matrix that contains row i of the original matrix. 
45 46 47
 
}

48
\seealso{\code{\link{unique}}, \code{\link{duplicated}}, \code{\link{match}}.}
49

50
\author{ Simon N. Wood \email{simon.wood@r-project.org} with thanks to Jonathan Rougier}
51 52 53


\examples{
54
require(mgcv)
55 56 57

## matrix example...
X <- matrix(c(1,2,3,1,2,3,4,5,6,1,3,2,4,5,6,1,1,1),6,3,byrow=TRUE)
58
print(X)
59 60 61 62
Xu <- uniquecombs(X);Xu
ind <- attr(Xu,"index")
## find the value for row 3 of the original from Xu
Xu[ind[3],];X[3,]
63

64 65 66 67 68 69 70
## same with fixed output ordering
Xu <- uniquecombs(X,TRUE);Xu
ind <- attr(Xu,"index")
## find the value for row 3 of the original from Xu
Xu[ind[3],];X[3,]


71 72 73 74
## data frame example...
df <- data.frame(f=factor(c("er",3,"b","er",3,3,1,2,"b")),
      x=c(.5,1,1.4,.5,1,.6,4,3,1.7))
uniquecombs(df)
75 76 77 78
}
\keyword{models} \keyword{regression}%-- one or more ..