Commit e9a5cc94 authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.7-28

parent 5b7bea96
** denotes quite substantial/important changes
*** denotes really big changes
1.7-28
* exclude.too.far updated to use kd-tree instead of inefficient search for
neighbours. This can make plot.gam *much* faster for large datasets.
* Change in smoothCon, so that sweep and drop constraints (default for bam
for efficiency reasons) are no longer allowed with by variables and matrix
arguments (could lead to confusing results with factor by variables in bam).
* 'ti' terms now allow control of which marginals to constrain, via 'mc'.
Allows e.g. y ~ ti(x) + ti(x,z,mc=c(0,1)) - for experts only!
* tensor.prod.model.matrix re-written to call C code. Around 5-10 times
faster than old version for large data sets.
* re-write of mini.mf function used by bam to generate a reduced size
model frame for model setup. New version ensures that all factor levels
are present in reduced frame, and avoids production of unrealistic
combinations of variables in multi-dimensional smooths which could occur
with old version.
* bam models could fail if a penalty matrix was 1 by 1, or if multiple
penalties on a smooth were in fact seperable into single penalties.
Fixed. Thanks to Martijn weiling for reporting.
* Constant in tps basis computation was different to published version
for odd dimensions - makes no difference to fit, but annoying if you
are trying to test a re-implementation. Thanks to Weijie Cai at SAS.
* prediction for "cc" and "cp" classes is now cyclic - values outside the
range of knots are wrapped back into the interval.
* ldTweedie now returns derivatives w.r.t. a transform of p as well as
w.r.t log of scale parameter phi.
* gamm can now handle 'varComb' variance functions (thanks Sven Neulinger
for reporting that it didn't).
* fix of a bug which could cause bam to seg fault for a model with no smooths
(insufficient storage allocated in C in this case). Thanks Martijn Weiling.
1.7-27
* Further multi-threading in gam fits - final two leading order matrix
......
Package: mgcv
Version: 1.7-27
Version: 1.7-28
Author: Simon Wood <simon.wood@r-project.org>
Maintainer: Simon Wood <simon.wood@r-project.org>
Title: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness
......@@ -14,7 +14,7 @@ Suggests: splines, parallel
LazyLoad: yes
ByteCompile: yes
License: GPL (>= 2)
Packaged: 2013-09-30 20:38:14 UTC; sw283
Packaged: 2014-01-29 11:37:25 UTC; sw283
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2013-10-01 09:49:00
Date/Publication: 2014-01-29 19:02:48
1174ecab552f0e905f3d1bf460abc073 *ChangeLog
7e3c4dfbb9e547587aba9c97f5bb5c54 *DESCRIPTION
984183b5dfa8613c710cce579aa067da *ChangeLog
df13385f38c692c79bf3f091be66d151 *DESCRIPTION
eb723b61539feef013de476e68b5c50a *GPL-2
c481dc91d92fe1366ba65dac94c4dd97 *NAMESPACE
5b5d847f492e7c5a687bafe83f241a8c *R/bam.r
38df74295474deb8d94f007d15931bfc *R/fast-REML.r
3aa1f87a3ee196f94610febde5cb2f4f *R/gam.fit3.r
eecaba353d94fd7d9cbddcb55cd5a282 *R/bam.r
1608ddeec23c5fbfe7e2ead4067d4928 *R/fast-REML.r
c052a873fafe7a46f596596449d6a00f *R/gam.fit3.r
d71d1748010ca83b1f1b11f49a32b70b *R/gam.fit4.r
902657a0ee2dedc3fdfa501bf3b37c5b *R/gam.sim.r
2ef4d06f28ea2157372507107970ed82 *R/gamm.r
7431eef037b2dddfb853cd88b53c0bc4 *R/mgcv.r
cc6cc65db0fb325267edc9e861a6dd92 *R/misc.r
f8b84b98c0524c2664885294650af3b8 *R/plots.r
b0b6e71c15742608f2dec02c1d8ae71b *R/smooth.r
f63cb67513ca8587099c43ee02da5689 *R/gamm.r
5fbd35702a44eb809fc58cfab7466fa1 *R/mgcv.r
68c99b8080c1c1da9b65c14cff7c33f9 *R/misc.r
c515404b216c7fecc040dd3f77e1b0fc *R/plots.r
12df1a0bef9538c2064d550e5f2eb837 *R/smooth.r
90f72ef91dfc2a25c262fc002f781bf7 *R/soap.r
0c5d2acdc985d2af9de6bb736d4df2ab *R/sparse.r
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
40874e3ced720a596750f499ded8a60a *data/columb.rda
9388a0979dab4fe1988b777ae4dcfb0a *inst/CITATION
074adce4131eced4cc71c67ad1d63c52 *inst/CITATION
5bf12ddc0dab9daae72271b96a15c539 *man/Predict.matrix.Rd
c45c0f78f753461b33a295883461e732 *man/Predict.matrix.cr.smooth.Rd
e9b0a2e31b130cf2cb38618ec50d1919 *man/Predict.matrix.soap.film.Rd
f12468625253dd3603907de233762fd6 *man/Rrank.Rd
3e5e30b44947d3ddf00fcd55be219038 *man/Tweedie.Rd
27b7ae82bab920543222c4428d33cee1 *man/anova.gam.Rd
80f8763baa4987579e2aa56073a9e94e *man/anova.gam.Rd
e7301fab74ef54c73a122b397fc101d0 *man/bam.Rd
71bde8b8caa24a36529ce7e0ac3165d8 *man/bam.update.Rd
bb5ec26743d46d7ce1dbf128bceab35a *man/cSplineDes.Rd
a2beb811b1093c5e82ef32d7de1f7d32 *man/cSplineDes.Rd
9b4d616d1b6c4a46ca77d16cded3f806 *man/choose.k.Rd
c03748964ef606621418e428ae49b103 *man/columb.Rd
4196ba59f1fa8449c9cd0cab8a347978 *man/concurvity.Rd
......@@ -38,10 +39,10 @@ d87ff6287d7e343ea24d2053f4724b35 *man/formula.gam.Rd
6f405acde2d7b6f464cf45f5395113ba *man/full.score.Rd
d7392fabfe137e327d89cc9686946740 *man/gam.Rd
fe61dd0efab9e920c17335faf3d5764c *man/gam.check.Rd
9ba1c43c08de6de712138a0741a94e68 *man/gam.control.Rd
e15fa51d0b8c63101b858a89e118a40d *man/gam.control.Rd
44db24b66ce63bc16d2c8bc3f5b42ac5 *man/gam.convergence.Rd
58ab3b3d6f4fd0d008d73c3c4e6d3305 *man/gam.fit.Rd
21339a5d1eb8c83679dd9022ab682b5e *man/gam.fit3.Rd
dcf10ab3cc3102f7fb36d3ddf44013f5 *man/gam.fit3.Rd
fef20e1357eeb7d936d3c737cbebccce *man/gam.models.Rd
e969287d1a5c281faa7eb6cfce31a7c5 *man/gam.outer.Rd
96676186808802344a99f9d3170bf775 *man/gam.selection.Rd
......@@ -57,36 +58,36 @@ a2ea1233d43fac89e0cacbc09a8d31e2 *man/in.out.Rd
2f222eeeb3d7bc42f93869bf8c2af58a *man/influence.gam.Rd
bafea2eef12fdc819f8ac1fb41d8b914 *man/initial.sp.Rd
aba56a0341ba9526a302e39d33aa9042 *man/interpret.gam.Rd
74720435528f6731265143c58ffb89ce *man/ldTweedie.Rd
07d2c259b9edf164f42935170b4fccd0 *man/ldTweedie.Rd
58e73ac26b93dc9d28bb27c8699e12cf *man/linear.functional.terms.Rd
5de18c3ad064a5bda4f9027d9455170a *man/logLik.gam.Rd
611f5f6acac9c5f40869c01cf7f75dd3 *man/ls.size.Rd
19a3f7da03e76d7d2fc0a4e544564c72 *man/magic.Rd
5169af4be5fccf9fa79b6de08e9ea035 *man/magic.post.proc.Rd
9427464e63d70da15fc468961ef0c29b *man/mgcv-FAQ.Rd
2db8cd353119cff94b57ff8ae06a6f5d *man/mgcv-package.Rd
0acf87e2fdcdf8b50aa44c204b201770 *man/mgcv-package.Rd
8365171cd0280a0aa5ad90c30349c949 *man/mgcv-parallel.Rd
00ccf213c31910cd14f1df65a300eb33 *man/model.matrix.gam.Rd
bc9b89db7e7ff246749551c16f5f1f07 *man/mono.con.Rd
d33914a328f645af13f5a42914ca0f35 *man/mroot.Rd
7931452d2f9534c49cfd6e8aa4c3d62f *man/negbin.Rd
167e96db9e32f28b45814ddfb3d4b519 *man/negbin.Rd
8b766a6ad848b0f1ca469e381ded0169 *man/new.name.Rd
7d8f62e182f7c428cc9d46ddd4d97d43 *man/notExp.Rd
445571cf0b115bf19a2d3474e70f59e7 *man/notExp2.Rd
9532f2ce26e3e6040c34db75976a0462 *man/null.space.dimension.Rd
2f0baa6f36924e60391c5e35b2261db9 *man/pcls.Rd
c64b44bc684cbadfe77e7f6dc833ddb4 *man/pcls.Rd
717d796acbaab64216564daf898b6d04 *man/pdIdnot.Rd
8c0f8575b427f30316b639a326193aeb *man/pdTens.Rd
1721f1b266d9e14827e8226e2cb74a81 *man/pen.edf.Rd
edf57071572275a8443b2f0b66d44424 *man/place.knots.Rd
84d54e8081b82cb8d96a33de03741843 *man/plot.gam.Rd
39eeb155a4f7e7cb39c68f928ed09e91 *man/plot.gam.Rd
c27a6b886929b1dc83bf4b90cae848f9 *man/polys.plot.Rd
afca36f5b1a5d06a7fcab2eaaa029e7e *man/predict.bam.Rd
df63d7045f83a1dc4874fcac18a2303c *man/predict.gam.Rd
a594eb641cae6ba0b83d094acf4a4f81 *man/print.gam.Rd
b8ac3ed8fc05bb14c605b996404853cd *man/qq.gam.Rd
f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd
58b5c7bee1277e2362b029317d2f7e24 *man/random.effects.Rd
5ff3bd5034e8f2afa3174c0a2d989d97 *man/random.effects.Rd
37669f97e17507f3ae2d6d1d74feb9d7 *man/residuals.gam.Rd
f6f1333be2587ffef5970905b13468ea *man/rig.Rd
7258dfc0149fff020054117fd2ee6bd8 *man/s.Rd
......@@ -109,11 +110,11 @@ ce311e544603f1089562747652abce20 *man/smooth.construct.sos.smooth.spec.Rd
b55a396da77559dac553613146633f97 *man/sp.vcov.Rd
83bd8e097711bf5bd0fff09822743d43 *man/spasm.construct.Rd
700699103b50f40d17d3824e35522c85 *man/step.gam.Rd
c168e9934229117f51576c1b6a7fefbe *man/summary.gam.Rd
60057ac5868dd7584807460bca69bde2 *man/summary.gam.Rd
84a714ed8ef26e46e51947abe959762c *man/t2.Rd
2ca0b136690ac2239e7872f5d7e07368 *man/te.Rd
63e2050f9a01b12096d640e87916b78a *man/te.Rd
6eebb6ef90374ee09453d6da6449ed79 *man/tensor.prod.model.matrix.Rd
06ca54aeca7231f01ce06187a1f6cae8 *man/uniquecombs.Rd
71add05e5823dcc0509f435ab4dc4627 *man/uniquecombs.Rd
5b05216eaeea867397ab130bb1332164 *man/vcov.gam.Rd
0e045fc94ea00c17e703ae07923f549b *man/vis.gam.Rd
becbe3e1f1588f7292a74a97ef07a9ae *po/R-de.po
......@@ -126,20 +127,20 @@ d5a0f198090ecbfedaa6549f2918b997 *po/R-po.po
1a4a267ddcb87bb83f09c291d3e97523 *po/fr.po
813514ea4e046ecb4563eb3ae8aa202a *po/mgcv.pot
749ac663240fd6eaa2b725602d47ef2a *po/po.po
5d6b6d2ce66b7d3a78ebd10dc0c8da91 *src/Makevars
f278ee892368a664c1c4bfefad77eff4 *src/gdi.c
fa59bab566913ff89c255e38326ea830 *src/Makevars
8c13279b905295d111f824dfaa457091 *src/gdi.c
49af97195accb65adc75620183d39a4c *src/general.h
f08642ce338f08fb00f829fc01868216 *src/init.c
336b2ab5e56a10dea1124ee9a8c7714f *src/init.c
6dbd109048a3bd18b67db6230c064c21 *src/magic.c
7a3b3d7d2504574094c59ea70912f125 *src/mat.c
6c9101f596440a09ecc3966a2374efd2 *src/matrix.c
068bafbc5695caf742fa84829f571db9 *src/mat.c
43f97e7abd78f21e8b85ed28ebbea94b *src/matrix.c
0f8448f67d16668f9027084a2d9a1b52 *src/matrix.h
96e43aa805215a0463fd13fd4b3d41f8 *src/mgcv.c
568d1195622cd01aef2f614140b94be0 *src/mgcv.h
c6b8c3093e2970654237c81ccfab1473 *src/misc.c
7a12ef0cbd8f5312cd67669a36d75ebf *src/mgcv.c
7aef13a547ca99e11a1fb038cc1901ef *src/mgcv.h
f49e1eb844668a24bba6ab1be1240183 *src/misc.c
97bc53816f97fc68a37b2e0716a6dc0a *src/qp.c
cd563899be5b09897d1bf36a7889caa0 *src/qp.h
15ed450da804a9dbd0833459af1f2474 *src/soap.c
296af5dc0f71ab0199d462c81937ae37 *src/sparse-smooth.c
42f015e636b5fbd2a3a3bb46cbe24eaa *src/tprs.c
b3ff61207cd26308abf386d6a0666320 *src/sparse-smooth.c
816d29cf2a8f0f1b473a68ccff24185a *src/tprs.c
5352d5d2298acd9b03ee1895933d4fb4 *src/tprs.h
......@@ -31,16 +31,29 @@ chol2qr <- function(XX,Xy) {
## takes X'X and X'y and returns R and f
## equivalent to qr update. Uses simple
## regularization if XX not +ve def.
#d <- diag(XX)
#ind <- d>0
#d[!ind] <- 1;d[ind] <- 1/sqrt(d[ind])
#XX <- t(d*t(d*XX)) ## diagonal pre-conditioning
XXeps <- norm(XX)*.Machine$double.eps^.9
n <- ncol(XX)
for (i in 0:20) {
ok <- TRUE
R <- try(chol(XX+diag(n)*XXeps*i),silent=TRUE) ## R'R = X'X
R <- try(chol(XX+diag(n)*XXeps*i,pivot=TRUE),silent=TRUE) ## R'R = X'X
if (inherits(R,"try-error")) ok <- FALSE else {
f <- try(forwardsolve(t(R),Xy),silent=TRUE)
ipiv <- piv <- attr(R,"pivot")
#f <- try(forwardsolve(t(R),(Xy*d)[piv]),silent=TRUE)
f <- try(forwardsolve(t(R),Xy[piv]),silent=TRUE)
if (inherits(f,"try-error")) ok <- FALSE
}
if (ok) break; ## success
if (ok) {
ipiv[piv] <- 1:ncol(R)
#R <- t(t(R[,ipiv])/d)
R <- R[,ipiv]
break; ## success
}
}
if (i==20 && !ok) stop("Choleski based method failed, switch to QR")
list(R=R,f=f)
......@@ -110,8 +123,13 @@ qr.up <- function(arg) {
mini.mf <-function(mf,chunk.size) {
## takes a model frame and produces a representative subset of it, suitable for
## basis setup.
## first count the minimum number of rows required for representiveness
mn <- 0
for (j in 1:length(mf)) mn <- mn + if (is.factor(mf[[j]])) length(levels(mf[[j]])) else 2
if (chunk.size < mn) chunk.size <- mn
n <- nrow(mf)
if (n<=chunk.size) return(mf)
if (n <= chunk.size) return(mf)
seed <- try(get(".Random.seed",envir=.GlobalEnv),silent=TRUE) ## store RNG seed
if (inherits(seed,"try-error")) {
runif(1)
......@@ -119,25 +137,60 @@ mini.mf <-function(mf,chunk.size) {
}
kind <- RNGkind(NULL)
RNGkind("default", "default")
set.seed(66)
set.seed(66)
## randomly sample from original frame...
ind <- sample(1:n,chunk.size)
RNGkind(kind[1], kind[2])
assign(".Random.seed", seed, envir = .GlobalEnv)
mf0 <- mf[ind,] ## random sample of rows
## now need to ensure that max and min are in sample for each element of mf0
## note that min and max might occur twice, but this shouldn't matter (and
## is better than min overwriting max, for example)
mf0 <- mf[ind,]
## ... now need to ensure certain sorts of representativeness
## work through elements collecting the rows containing
## max and min for each variable, and a random row for each
## factor level....
ind <- sample(1:n,n,replace=FALSE) ## randomized index for stratified sampling w.r.t. factor levels
fun <- function(X,fac,ind) ind[fac[ind]==X][1] ## stratified sampler
k <- 0
for (j in 1:length(mf)) if (is.numeric(mf0[[j]])) {
if (is.matrix(mf0[[j]])) { ## find row containing minimum
j.min <- min((1:n)[as.logical(rowSums(mf[[j]]==min(mf[[j]])))])
j.max <- min((1:n)[as.logical(rowSums(mf[[j]]==max(mf[[j]])))])
mf0[[j]][1,] <- mf[[j]][j.min,]
mf0[[j]][2,] <- mf[[j]][j.max,]
} else { ## vector
mf0[[j]][1] <- min(mf[[j]])
mf0[[j]][2] <- max(mf[[j]])
j.min <- min(which(mf[[j]]==min(mf[[j]])))
j.max <- min(which(mf[[j]]==max(mf[[j]])))
}
k <- k + 1; mf0[k,] <- mf[j.min,]
k <- k + 1; mf0[k,] <- mf[j.max,]
} else if (is.factor(mf[[j]])) { ## factor variable...
## randomly sample one row from each factor level...
find <- apply(X=as.matrix(levels(mf[[j]])),MARGIN=1,FUN=fun,fac=mf[[j]],ind=ind)
nf <- length(find)
mf0[(k+1):(k+nf),] <- mf[find,]
k <- k + nf
}
RNGkind(kind[1], kind[2])
assign(".Random.seed", seed, envir = .GlobalEnv)
## problems with the following are...
## 1. you can produce model frame rows that are wholly un-representative of the
## data for multi dimensional smooths this way, by pairing extreme values
## with values of other variables that they never occur near.
## 2. Nothing is done to ensure that all factor levels are present.
# mf0 <- mf[ind,] ## random sample of rows
## now need to ensure that max and min are in sample for each element of mf0
## note that min and max might occur twice, but this shouldn't matter (and
## is better than min overwriting max, for example)
# for (j in 1:length(mf)) if (is.numeric(mf0[[j]])) {
# if (is.matrix(mf0[[j]])) { ## find row containing minimum
# j.min <- min((1:n)[as.logical(rowSums(mf[[j]]==min(mf[[j]])))])
# j.max <- min((1:n)[as.logical(rowSums(mf[[j]]==max(mf[[j]])))])
# mf0[[j]][1,] <- mf[[j]][j.min,]
# mf0[[j]][2,] <- mf[[j]][j.max,]
# } else { ## vector
# mf0[[j]][1] <- min(mf[[j]])
# mf0[[j]][2] <- max(mf[[j]])
# }
# }
mf0
}
......@@ -346,7 +399,8 @@ bgam.fit <- function (G, mf, chunk.size, gp ,scale ,gamma,method, coef=NULL,etas
if (method=="GCV.Cp") {
fit <- magic(qrx$f,qrx$R,G$sp,G$S,G$off,L=G$L,lsp0=G$lsp0,rank=G$rank,
H=G$H,C=G$C,gamma=gamma,scale=scale,gcv=(scale<=0),
H=G$H,C=matrix(0,0,ncol(qrx$R)), ##C=G$C,
gamma=gamma,scale=scale,gcv=(scale<=0),
extra.rss=rss.extra,n.score=nobs+nobs.extra)
post <- magic.post.proc(qrx$R,fit,qrx$f*0+1)
......@@ -555,7 +609,8 @@ bgam.fit2 <- function (G, mf, chunk.size, gp ,scale ,gamma,method, etastart = NU
if (method=="GCV.Cp") {
fit <- magic(qrx$f,qrx$R,G$sp,G$S,G$off,L=G$L,lsp0=G$lsp0,rank=G$rank,
H=G$H,C=G$C,gamma=gamma,scale=scale,gcv=(scale<=0),
H=G$H,C= matrix(0,0,ncol(qrx$R)), ##C=G$C,
gamma=gamma,scale=scale,gcv=(scale<=0),
extra.rss=rss.extra,n.score=G$n)
post <- magic.post.proc(qrx$R,fit,qrx$f*0+1)
......@@ -931,7 +986,8 @@ bam.fit <- function(G,mf,chunk.size,gp,scale,gamma,method,rho=0,
if (method=="GCV.Cp") {
fit <- magic(qrx$f,qrx$R,G$sp,G$S,G$off,L=G$L,lsp0=G$lsp0,rank=G$rank,
H=G$H,C=G$C,gamma=gamma,scale=scale,gcv=(scale<=0),
H=G$H,C=matrix(0,0,ncol(qrx$R)), ##C=G$C,
gamma=gamma,scale=scale,gcv=(scale<=0),
extra.rss=rss.extra,n.score=n)
post <- magic.post.proc(qrx$R,fit,qrx$f*0+1)
......@@ -1088,10 +1144,10 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
method <- "REML"
warning("sparse=TRUE not supported with fast REML, reset to REML.")
}
gp<-interpret.gam(formula) # interpret the formula
cl<-match.call() # call needed in gam object for update to work
mf<-match.call(expand.dots=FALSE)
mf$formula<-gp$fake.formula
gp <- interpret.gam(formula) # interpret the formula
cl <- match.call() # call needed in gam object for update to work
mf <- match.call(expand.dots=FALSE)
mf$formula <- gp$fake.formula
mf$method <- mf$family<-mf$control<-mf$scale<-mf$knots<-mf$sp<-mf$min.sp <- mf$gc.level <-
mf$gamma <- mf$paraPen<- mf$chunk.size <- mf$rho <- mf$sparse <- mf$cluster <-
mf$use.chol <- mf$samfrac <- mf$...<-NULL
......@@ -1103,6 +1159,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
pmf <- eval(pmf, parent.frame()) # pmf contains all data for parametric part
pterms <- attr(pmf,"terms") ## pmf only used for this
rm(pmf);
if (gc.level>0) gc()
mf <- eval(mf, parent.frame()) # the model frame now contains all the data
......@@ -1131,7 +1188,8 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
if (sparse) sparse.cons <- 2 else sparse.cons <- -1
G <- gam.setup(gp,pterms=pterms,data=mf0,knots=knots,sp=sp,min.sp=min.sp,
G <- gam.setup(gp,pterms=pterms,
data=mf0,knots=knots,sp=sp,min.sp=min.sp,
H=NULL,absorb.cons=TRUE,sparse.cons=sparse.cons,select=FALSE,
idLinksBases=TRUE,scale.penalty=control$scalePenalty,
paraPen=paraPen)
......@@ -1141,7 +1199,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
G$var.summary <- var.summary
G$family <- family
G$terms<-terms;G$pterms<-pterms
G$terms<-terms;##G$pterms<-pterms
pvars <- all.vars(delete.response(terms))
G$pred.formula <- if (length(pvars)>0) reformulate(pvars) else NULL
......@@ -1153,12 +1211,12 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
G$offset <- model.offset(mf)
if (is.null(G$offset)) G$offset <- rep(0,n)
if (ncol(G$X)>nrow(mf)+nrow(G$C)) stop("Model has more coefficients than data")
if (ncol(G$X)>nrow(mf)) stop("Model has more coefficients than data") ##+nrow(G$C)) stop("Model has more coefficients than data")
G$cl<-cl;
G$am <- am
G$min.edf<-G$nsdf-dim(G$C)[1]
G$min.edf<-G$nsdf #-dim(G$C)[1]
if (G$m) for (i in 1:G$m) G$min.edf<-G$min.edf+G$smooth[[i]]$null.space.dim
G$formula<-formula
......@@ -1374,7 +1432,8 @@ bam.update <- function(b,data,chunk.size=10000) {
if (b$method=="GCV") scale <- -1 else scale = b$sig2
fit <- magic(b$qrx$f,b$qrx$R,b$sp,b$G$S,b$G$off,L=b$G$L,lsp0=b$G$lsp0,rank=b$G$rank,
H=b$G$H,C=b$G$C,gamma=b$gamma,scale=scale,gcv=(scale<=0),
H=b$G$H,C= matrix(0,0,ncol(b$qrx$R)),##C=b$G$C,
gamma=b$gamma,scale=scale,gcv=(scale<=0),
extra.rss=rss.extra,n.score=n)
post <- magic.post.proc(b$qrx$R,fit,b$qrx$f*0+1)
......
......@@ -19,7 +19,7 @@ Sl.setup <- function(G) {
## - Applies to cols/params from start:stop.
## - If numeric then X[,start:stop]%*%diag(D) is repara X[,start:stop],
## b.orig = D*b.repara
## - If matrix then X[,start:stop]%*%diag(D) is repara X[,start:stop],
## - If matrix then X[,start:stop]%*%D is repara X[,start:stop],
## b.orig = D%*%b.repara
## The penalties in Sl are in the same order as those in G
## Also returns attribute "E" a square root of the well scaled total
......@@ -109,9 +109,10 @@ Sl.setup <- function(G) {
for (j in 1:m) {
Sl[[b]] <- list()
ind <- sbStart[j]:sbStop[j]
Sl[[b]]$S <- list(G$smooth[[i]]$S[[j]][ind,ind])
Sl[[b]]$S <- list(G$smooth[[i]]$S[[j]][ind,ind,drop=FALSE])
Sl[[b]]$start <- G$smooth[[i]]$first.para + sbStart[j]-1
Sl[[b]]$stop <- G$smooth[[i]]$first.para + sbStop[j]-1
Sl[[b]]$rank <- G$smooth[[i]]$rank[j]
b <- b + 1
}
} else { ## not possible to split
......@@ -151,9 +152,11 @@ Sl.setup <- function(G) {
ind <- rep(FALSE,length(D))
ind[1:Sl[[b]]$rank] <- TRUE ## index penalized elements
D[ind] <- 1/sqrt(D[ind]);D[!ind] <- 1
D <- t(D*t(U)) ## D <- U%*%diag(D)
Sl[[b]]$D <- t(D*t(U)) ## D <- U%*%diag(D)
Sl[[b]]$Di <- t(U)/D
## so if X is smooth model matrix X%*%D is re-parameterized form
Sl[[b]]$D <- D; Sl[[b]]$ind <- ind
## Sl[[b]]$D <- D;
Sl[[b]]$ind <- ind
}
## add penalty square root into E
ind <- (Sl[[b]]$start:Sl[[b]]$stop)[Sl[[b]]$ind]
......@@ -198,20 +201,36 @@ Sl.setup <- function(G) {
Sl ## the penalty list
} ## end of Sl.setup
Sl.initial.repara <- function(Sl,X,inverse=FALSE) {
Sl.initial.repara <- function(Sl,X,inverse=FALSE,both.sides=TRUE,cov=TRUE) {
## Routine to apply initial Sl re-parameterization to model matrix X,
## or, if inverse==TRUE, to apply inverse re-para to parameter vector
## or cov matrix
## or cov matrix. if inverse is TRUE and both.sides=FALSE then
## re-para only applied to rhs, as appropriate for a choleski factor.
if (inverse) { ## apply inverse re-para
if (is.matrix(X)) { ## then assume it's a covariance matrix
for (b in 1:length(Sl)) {
ind <- Sl[[b]]$start:Sl[[b]]$stop
if (is.matrix(Sl[[b]]$D)) {
X[ind,] <- Sl[[b]]$D%*%X[ind,,drop=FALSE]
X[,ind] <- X[,ind,drop=FALSE]%*%t(Sl[[b]]$D)
} else {
X[,ind] <- t(Sl[[b]]$D * t(X[,ind,drop=FALSE]))
X[ind,] <- Sl[[b]]$D * X[ind,,drop=FALSE]
if (is.matrix(X)) {
if (cov) { ## then it's a covariance matrix
for (b in 1:length(Sl)) {
ind <- Sl[[b]]$start:Sl[[b]]$stop
if (is.matrix(Sl[[b]]$D)) {
if (both.sides) X[ind,] <- Sl[[b]]$D%*%X[ind,,drop=FALSE]
X[,ind] <- X[,ind,drop=FALSE]%*%t(Sl[[b]]$D)
} else { ## Diagonal D
X[,ind] <- t(Sl[[b]]$D * t(X[,ind,drop=FALSE]))
if (both.sides) X[ind,] <- Sl[[b]]$D * X[ind,,drop=FALSE]
}
}
} else { ## regular matrix: need to use Di
for (b in 1:length(Sl)) {
ind <- Sl[[b]]$start:Sl[[b]]$stop
if (is.matrix(Sl[[b]]$D)) {
Di <- if(is.null(Sl[[b]]$Di)) t(Sl[[b]]$D) else Sl[[b]]$Di
if (both.sides) X[ind,] <- t(Di)%*%X[ind,,drop=FALSE]
X[,ind] <- X[,ind,drop=FALSE]%*%Di
} else { ## Diagonal D
Di <- 1/Sl[[b]]$D
X[,ind] <- t(Di * t(X[,ind,drop=FALSE]))
if (both.sides) X[ind,] <- Di * X[ind,,drop=FALSE]
}
}
}
} else { ## it's a parameter vector
......@@ -227,15 +246,18 @@ Sl.initial.repara <- function(Sl,X,inverse=FALSE) {
X[,ind] <- t(Sl[[b]]$D*t(X[,ind,drop=FALSE])) ## X[,ind]%*%diag(Sl[[b]]$D)
}
X
} ## end Sl.initial.repra
} ## end Sl.initial.repara
ldetS <- function(Sl,rho,fixed,np,root=FALSE) {
## Get log generalized determinant of S stored blockwise in an Sl list.
## Any multi-term blocks will be re-parameterized using gam.reparam, and
## a re-parameterization object supplied in the returned object.
## rho contains log smoothing parameters, fixed is an array indicating whether they
## are fixed (or free). np is the number of coefficients. root indicates
## whether or not to return E.
## Returns: Sl, with modified rS terms, if needed and rho added to each block
## rp, a re-parameterization list
## E a total penalty square root such that E'E = S_tot
## E a total penalty square root such that E'E = S_tot (if root==TRUE)
## ldetS,ldetS1,ldetS2 the value, grad vec and Hessian
n.deriv <- sum(!fixed)
k.deriv <- k.sp <- k.rp <- 1
......@@ -295,15 +317,17 @@ ldetS <- function(Sl,rho,fixed,np,root=FALSE) {
list(ldetS=ldS,ldet1=d1.ldS,ldet2=d2.ldS,Sl=Sl,rp=rp,E=E)
} ## end ldetS
Sl.repara <- function(rp,X,inverse=FALSE) {
## Apply re-parameterization from ldetS to X, blockwise
## if inverse==TRUE applies the inverse of the re-para to
## vector X or cov matrix X...
Sl.repara <- function(rp,X,inverse=FALSE,both.sides=TRUE) {
## Apply re-parameterization from ldetS to X, blockwise.
## If X is a matrix it is assumed to be a model matrix
## whereas if X is a vector it is assumed to be a parameter vector.
## If inverse==TRUE applies the inverse of the re-para to
## parameter vector X or cov matrix X...
nr <- length(rp);if (nr==0) return(X)
if (inverse) {
if (is.matrix(X)) { ## X is a cov matrix
for (i in 1:nr) {
X[rp[[i]]$ind,] <-
if (both.sides) X[rp[[i]]$ind,] <-
rp[[i]]$Qs %*% X[rp[[i]]$ind,,drop=FALSE]
X[,rp[[i]]$ind] <-
X[,rp[[i]]$ind,drop=FALSE] %*% t(rp[[i]]$Qs)
......@@ -312,7 +336,11 @@ Sl.repara <- function(rp,X,inverse=FALSE) {
for (i in 1:nr) X[rp[[i]]$ind] <- rp[[i]]$Qs %*% X[rp[[i]]$ind]
}
} else { ## apply re-para to X
for (i in 1:nr) X[,rp[[i]]$ind] <- X[,rp[[i]]$ind]%*%rp[[i]]$Qs
if (is.matrix(X)) {
for (i in 1:nr) X[,rp[[i]]$ind] <- X[,rp[[i]]$ind]%*%rp[[i]]$Qs
} else {
for (i in 1:nr) X[rp[[i]]$ind] <- t(rp[[i]]$Qs) %*% X[rp[[i]]$ind]
}
}
X
} ## end Sl.repara
......@@ -393,11 +421,11 @@ Sl.termMult <- function(Sl,A,full=FALSE) {
ind <- (Sl[[b]]$start:Sl[[b]]$stop)[Sl[[b]]$ind]
if (full) { ## return zero answer with all zeroes in place
B <- A*0
if (Amat) B[ind,] <- Sl[[b]]$lambda*A[ind,] else
if (Amat) B[ind,] <- Sl[[b]]$lambda*A[ind,,drop=FALSE] else
B[ind] <- Sl[[b]]$lambda*A[ind]
SA[[k]] <- B
} else { ## strip zero rows from answer
if (Amat) SA[[k]] <- Sl[[b]]$lambda*A[ind,] else
if (Amat) SA[[k]] <- Sl[[b]]$lambda*A[ind,,drop=FALSE] else
SA[[k]] <- as.numeric(Sl[[b]]$lambda*A[ind])
attr(SA[[k]],"ind") <- ind
}
......@@ -407,11 +435,11 @@ Sl.termMult <- function(Sl,A,full=FALSE) {
k <- k + 1
if (full) { ## return answer with all zeroes in place
B <- A*0
if (Amat) B[ind,] <- Sl[[b]]$Srp[[i]]%*%A[ind,] else
if (Amat) B[ind,] <- Sl[[b]]$Srp[[i]]%*%A[ind,,drop=FALSE] else
B[ind] <- Sl[[b]]$Srp[[i]]%*%A[ind]
SA[[k]] <- B
} else { ## strip zero rows from answer
if (Amat) SA[[k]] <- Sl[[b]]$Srp[[i]]%*%A[ind,] else
if (Amat) SA[[k]] <- Sl[[b]]$Srp[[i]]%*%A[ind,,drop=FALSE] else
SA[[k]] <- as.numeric(Sl[[b]]$Srp[[i]]%*%A[ind])
attr(SA[[k]],"ind") <- ind
}
......@@ -432,7 +460,7 @@ d.detXXS <- function(Sl,PP) {
d1[i] <- sum(diag(SPP[[i]][,indi,drop=FALSE]))
for (j in i:nd) {
indj <- attr(SPP[[j]],"ind")
d2[i,j] <- d2[j,i] <- -sum(t(SPP[[i]][,indj])*SPP[[j]][,indi])
d2[i,j] <- d2[j,i] <- -sum(t(SPP[[i]][,indj,drop=FALSE])*SPP[[j]][,indi,drop=FALSE])
}
d2[i,i] <- d2[i,i] + d1[i]
}
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -2,6 +2,23 @@
## Many of the following are simple wrappers for C functions, used largely
## for testing purposes
pinv <- function(X,svd=FALSE) {
## a pseudoinverse for n by p, n>p matrices
qrx <- qr(X,tol=0,LAPACK=TRUE)
R <- qr.R(qrx);Q <- qr.Q(qrx)
rr <- Rrank(R)
if (svd&&rr<ncol(R)) {
piv <- 1:ncol(X); piv[qrx$pivot] <- 1:ncol(X)
er <- svd(R[,piv])
d <- er$d*0;d[1:rr] <- 1/er$d[1:rr]
X <- Q%*%er$u%*%(d*t(er$v))
} else {
Ri <- R*0
Ri[1:rr,1:rr] <- backsolve(R[1:rr,1:rr],diag(rr))
X[,qrx$pivot] <- Q%*%t(Ri)
}
X
} ## end pinv
pqr2 <- function(x,nt=1) {
## Function for parallel pivoted qr decomposition of a matrix using LAPACK
......
......@@ -191,15 +191,17 @@ k.check <- function(b,subsample=5000,n.rep=400) {
} else modf <- b$model
nr <- length(rsd)
for (k in 1:m) { ## work through smooths
dat <- as.data.frame(ExtractData(b$smooth[[k]],modf,NULL)$data)
ok <- TRUE
dat <- ExtractData(b$smooth[[k]],modf,NULL)$data
if (!is.null(attr(dat,"index"))||!is.null(attr(dat[[1]],"matrix"))||is.matrix(dat[[1]])) ok <- FALSE
if (ok) dat <- as.data.frame(dat)
snames[k] <- b$smooth[[k]]$label
ind <- b$smooth[[k]]$first.para:b$smooth[[k]]$last.para
kc[k] <- length(ind)
edf[k] <- sum(b$edf[ind])
nc <- b$smooth[[k]]$dim
ok <- TRUE
if (ok && ncol(dat)>nc) dat <- dat[,1:nc] ## drop any by variables
for (j in 1:nc) if (is.factor(dat[[j]])) ok <- FALSE
if (!is.null(attr(dat[[1]],"matrix"))) ok <- FALSE
if (!ok) {
p.val[k] <- v.obs[k] <- NA ## can't do this test with summation convention/factors
} else { ## normal term
......@@ -1021,7 +1023,8 @@ plot.gam <- function(x,residuals=FALSE,rug=TRUE,se=TRUE,pages=0,select=NULL,scal
scheme <- rep(scheme[1],m)
}
order <- attr(x$pterms,"order") # array giving order of each parametric term
## array giving order of each parametric term...
order <- if (is.list(x$pterms)) unlist(lapply(x$pterms,attr,"order")) else attr(x$pterms,"order")
if (all.terms) # plot parametric terms as well
n.para <- sum(order==1) # plotable parametric terms
......@@ -1187,9 +1190,11 @@ plot.gam <- function(x,residuals=FALSE,rug=TRUE,se=TRUE,pages=0,select=NULL,scal
{ class(x) <- c("gam","glm","lm") # needed to get termplot to call model.frame.glm
if (is.null(select)) {
attr(x,"para.only") <- TRUE
termplot(x,se=se,rug=rug,col.se=1,col.term=1,...)
termplot(x,se=se,rug=rug,col.se=1,col.term=1,main=attr(x$pterms,"term.labels"),...)
} else { # figure out which plot is required
if (select > m) {
## can't figure out how to get this to work with more than first linear predictor
## as termplots relies on matching terms to names in original data...
select <- select - m # i.e. which parametric term
term.labels <- attr(x$pterms,"term.labels")
term.labels <- term.labels[order==1]
......@@ -1225,9 +1230,12 @@ exclude.too.far<-function(g1,g2,d1,d2,dist)
if (m!=length(d2)) stop("data vectors are of different lengths")
if (dist<0) stop("supplied dist negative")
distance<-array(0,n)
o<-.C(C_MinimumSeparation,as.double(g1),as.double(g2),as.integer(n),as.double(d1),as.double(d2),
as.integer(m),distance=as.double(distance))
res<-rep(FALSE,n)
o<-.C(C_MinimumSeparation,x=as.double(cbind(g1,g2)),n=as.integer(n), d=as.integer(2),
t=as.double(cbind(d1,d2)),m=as.integer(m),distance=as.double(distance))
#o<-.C(C_MinimumSeparation,as.double(g1),as.double(g2),as.integer(n),as.double(d1),as.double(d2),
# as.integer(m),distance=as.double(distance))