uniquecombs.Rd 2.86 KB
 Dirk Eddelbuettel committed Apr 10, 2018 1 2 3 4 5 \name{uniquecombs} \alias{uniquecombs} %- Also NEED an \alias' for EACH other topic documented here. \title{find the unique rows in a matrix } \description{  Dirk Eddelbuettel committed Apr 10, 2018 6 7 This routine returns a matrix or data frame containing all the unique rows of the matrix or data frame supplied as its argument. That is, all the duplicate rows are  Dirk Eddelbuettel committed Apr 10, 2018 8 stripped out. Note that the ordering of the rows on exit need not be the same  Dirk Eddelbuettel committed Apr 10, 2018 9 10 as on entry. It also returns an index attribute for relating the result back to the original matrix.  Dirk Eddelbuettel committed Apr 10, 2018 11 12 } \usage{  Dirk Eddelbuettel committed Apr 10, 2018 13 uniquecombs(x,ordered=FALSE)  Dirk Eddelbuettel committed Apr 10, 2018 14 15 16 } %- maybe also usage' for other objects documented here. \arguments{  Dirk Eddelbuettel committed Apr 10, 2018 17  \item{x}{ is an \R matrix (numeric), or data frame. }  Dirk Eddelbuettel committed Apr 10, 2018 18  \item{ordered}{ set to \code{TRUE} to have the rows of the returned object in the same order regardless of input ordering.}  Dirk Eddelbuettel committed Apr 10, 2018 19 20 21 } \details{ Models with more parameters than unique combinations of covariates are not identifiable. This routine provides a means of  Dirk Eddelbuettel committed Apr 10, 2018 22 23 24 25 26 27 28 29 30 31 32  evaluating the number of unique combinations of covariates in a model. When \code{x} has only one column then the routine uses \code{\link{unique}} and \code{\link{match}} to get the index. When there are multiple columns then it uses \code{\link{paste0}} to produce labels for each row, which should be unique if the row is unique. Then \code{unique} and \code{match} can be used as in the single column case. Obviously the pasting is inefficient, but still quicker for large n than the C based code that used to be called by this routine, which had O(nlog(n)) cost. In principle a hash table based solution in C would be only O(n) and much quicker in the multicolumn case.  Dirk Eddelbuettel committed Apr 10, 2018 33   Dirk Eddelbuettel committed Apr 10, 2018 34  \code{\link{unique}} and \code{\link{duplicated}}, can be used  Dirk Eddelbuettel committed Apr 10, 2018 35  in place of this, if the full index is not needed. Relative performance is variable.  Dirk Eddelbuettel committed Apr 10, 2018 36 37 38  If \code{x} is not a matrix or data frame on entry then an attmept is made to coerce it to a data frame.  Dirk Eddelbuettel committed Apr 10, 2018 39 40 } \value{  Dirk Eddelbuettel committed Apr 10, 2018 41 A matrix or data frame consisting of the unique rows of \code{x} (in arbitrary order).  Dirk Eddelbuettel committed Apr 10, 2018 42   Dirk Eddelbuettel committed Apr 10, 2018 43 The matrix or data frame has an \code{"index"} attribute. \code{index[i]} gives the row of the returned  Dirk Eddelbuettel committed Apr 10, 2018 44 matrix that contains row i of the original matrix.  Dirk Eddelbuettel committed Apr 10, 2018 45 46 47  }  Dirk Eddelbuettel committed Apr 10, 2018 48 \seealso{\code{\link{unique}}, \code{\link{duplicated}}, \code{\link{match}}.}  Dirk Eddelbuettel committed Apr 10, 2018 49   Dirk Eddelbuettel committed Apr 10, 2018 50 \author{ Simon N. Wood \email{simon.wood@r-project.org} with thanks to Jonathan Rougier}  Dirk Eddelbuettel committed Apr 10, 2018 51 52 53  \examples{  Dirk Eddelbuettel committed Apr 10, 2018 54 require(mgcv)  Dirk Eddelbuettel committed Apr 10, 2018 55 56 57  ## matrix example... X <- matrix(c(1,2,3,1,2,3,4,5,6,1,3,2,4,5,6,1,1,1),6,3,byrow=TRUE)  Dirk Eddelbuettel committed Apr 10, 2018 58 print(X)  Dirk Eddelbuettel committed Apr 10, 2018 59 60 61 62 Xu <- uniquecombs(X);Xu ind <- attr(Xu,"index") ## find the value for row 3 of the original from Xu Xu[ind[3],];X[3,]  Dirk Eddelbuettel committed Apr 10, 2018 63   Dirk Eddelbuettel committed Apr 10, 2018 64 65 66 67 68 69 70 ## same with fixed output ordering Xu <- uniquecombs(X,TRUE);Xu ind <- attr(Xu,"index") ## find the value for row 3 of the original from Xu Xu[ind[3],];X[3,]  Dirk Eddelbuettel committed Apr 10, 2018 71 72 73 74 ## data frame example... df <- data.frame(f=factor(c("er",3,"b","er",3,3,1,2,"b")), x=c(.5,1,1.4,.5,1,.6,4,3,1.7)) uniquecombs(df)  Dirk Eddelbuettel committed Apr 10, 2018 75 76 77 78 } \keyword{models} \keyword{regression}%-- one or more ..