Commit b955da99 authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.7-16

parent 7aa70fc3
Package: mgcv Package: mgcv
Version: 1.7-13 Version: 1.7-16
Author: Simon Wood <simon.wood@r-project.org> Author: Simon Wood <simon.wood@r-project.org>
Maintainer: Simon Wood <simon.wood@r-project.org> Maintainer: Simon Wood <simon.wood@r-project.org>
Title: GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL Title: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness
estimation
Description: Routines for GAMs and other generalized ridge regression Description: Routines for GAMs and other generalized ridge regression
with multiple smoothing parameter selection by GCV, REML or with multiple smoothing parameter selection by GCV, REML or
UBRE/AIC. Also GAMMs by REML or PQL. Includes a gam() function. UBRE/AIC. Also GAMMs. Includes a gam() function.
Priority: recommended Priority: recommended
Depends: R (>= 2.14.0), stats, graphics Depends: R (>= 2.14.0), stats, graphics
Imports: nlme, methods, Matrix Imports: nlme, methods, Matrix
...@@ -13,6 +14,6 @@ Suggests: nlme (>= 3.1-64), splines, Matrix, parallel ...@@ -13,6 +14,6 @@ Suggests: nlme (>= 3.1-64), splines, Matrix, parallel
LazyLoad: yes LazyLoad: yes
ByteCompile: yes ByteCompile: yes
License: GPL (>= 2) License: GPL (>= 2)
Packaged: 2012-01-21 16:33:06 UTC; sw283 Packaged: 2012-04-30 07:09:06 UTC; sw283
Repository: CRAN Repository: CRAN
Date/Publication: 2012-01-22 09:59:42 Date/Publication: 2012-04-30 08:22:16
5d5d72ee6d284c96b4525e9eb748bc0f *DESCRIPTION 9f562aa60504f1265daa8ff8095b6333 *DESCRIPTION
29e13076f8e7c500f10e2b64b0821984 *NAMESPACE 50152051f123a389d421aa3130dce252 *NAMESPACE
ecfb144fb5214dde68dffac22f219a1f *R/bam.r cf4210b25d2ece355a79cb8ed5e4455a *R/bam.r
f4f5c9bb8776c2248e088c9ee3517208 *R/fast-REML.r
b160632e8f38fa99470e2f8cba82a495 *R/gam.fit3.r b160632e8f38fa99470e2f8cba82a495 *R/gam.fit3.r
902657a0ee2dedc3fdfa501bf3b37c5b *R/gam.sim.r 902657a0ee2dedc3fdfa501bf3b37c5b *R/gam.sim.r
e137c06cabb48551c18cf0cc3512d297 *R/gamm.r ad57e83090b4633ee50041fd3571c016 *R/gamm.r
c61836edb704dbd7b718c754d714a291 *R/mgcv.r a7d790a4fe2640fd69646e1dcf161d80 *R/mgcv.r
bf70158e37e33ea1136efdeab97569f8 *R/plots.r 5c5f68e76697c356b95e084bee5d7776 *R/plots.r
d20083082ba7e1bf361ac7d404efd8a3 *R/smooth.r bb2b4220a103364afc87249157a040b7 *R/smooth.r
fe9745f610246ee1f31eb915ca0d76a9 *R/sparse.r fb66d6c18398411a99ffcb788b854f13 *R/sparse.r
76637934ae66a4b74a0637e698f71469 *changeLog 020a4b9253d806cb55b0521412715dc7 *changeLog
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
40874e3ced720a596750f499ded8a60a *data/columb.rda 40874e3ced720a596750f499ded8a60a *data/columb.rda
88d77139cc983317b6acd8c5f1252ab9 *gnugpl2.txt 88d77139cc983317b6acd8c5f1252ab9 *gnugpl2.txt
...@@ -18,10 +19,10 @@ f693920e12f8a6f2b6cab93648628150 *index ...@@ -18,10 +19,10 @@ f693920e12f8a6f2b6cab93648628150 *index
c51c9b8c9c73f81895176ded39b91394 *man/Predict.matrix.cr.smooth.Rd c51c9b8c9c73f81895176ded39b91394 *man/Predict.matrix.cr.smooth.Rd
612ab6354541ebe38a242634d73b66ba *man/Tweedie.Rd 612ab6354541ebe38a242634d73b66ba *man/Tweedie.Rd
6d711de718a09e1e1ae2a6967abade33 *man/anova.gam.Rd 6d711de718a09e1e1ae2a6967abade33 *man/anova.gam.Rd
c4d1ad309698994e7c1c75f7db294a58 *man/bam.Rd fa6e8f98dc01de508e95d1008ae84d60 *man/bam.Rd
b385d6d5419d0d6aefe03af1a79d5c4e *man/bam.update.Rd b385d6d5419d0d6aefe03af1a79d5c4e *man/bam.update.Rd
4e925cb579f4693d1b8ec2d5092c0b37 *man/cSplineDes.Rd 4e925cb579f4693d1b8ec2d5092c0b37 *man/cSplineDes.Rd
9753a8051d9b495d9855537f3f26f491 *man/choose.k.Rd 9b4d616d1b6c4a46ca77d16cded3f806 *man/choose.k.Rd
c03748964ef606621418e428ae49b103 *man/columb.Rd c03748964ef606621418e428ae49b103 *man/columb.Rd
4196ba59f1fa8449c9cd0cab8a347978 *man/concurvity.Rd 4196ba59f1fa8449c9cd0cab8a347978 *man/concurvity.Rd
f764fb7cb9e63ff341a0075a3854ab5d *man/exclude.too.far.Rd f764fb7cb9e63ff341a0075a3854ab5d *man/exclude.too.far.Rd
...@@ -29,20 +30,20 @@ f764fb7cb9e63ff341a0075a3854ab5d *man/exclude.too.far.Rd ...@@ -29,20 +30,20 @@ f764fb7cb9e63ff341a0075a3854ab5d *man/exclude.too.far.Rd
44ad0563add1c560027d502ce41483f5 *man/fix.family.link.Rd 44ad0563add1c560027d502ce41483f5 *man/fix.family.link.Rd
75373268c1203ee110e1eede633752aa *man/fixDependence.Rd 75373268c1203ee110e1eede633752aa *man/fixDependence.Rd
9ac808f5a2a43cf97f24798c0922c9bf *man/formXtViX.Rd 9ac808f5a2a43cf97f24798c0922c9bf *man/formXtViX.Rd
34308f4ada8e2aca9981a17794dac30b *man/formula.gam.Rd bb099e6320a6c1bd79fe4bf59e0fde08 *man/formula.gam.Rd
6f405acde2d7b6f464cf45f5395113ba *man/full.score.Rd 6f405acde2d7b6f464cf45f5395113ba *man/full.score.Rd
c7f0549fe7b9da0624417e33ed92344d *man/gam.Rd 39b4fa3782cc33a445d34a9b26df44f8 *man/gam.Rd
aeb7ec80d75244bc4f2f2fd796f86efd *man/gam.check.Rd 69c6ef61a3cfc397cbeacd21e5e6cc9b *man/gam.check.Rd
847599e287ecf79fbb7be2cb06d72742 *man/gam.control.Rd 96c9417e4ac5d79ec9ed3f363adfc4e9 *man/gam.control.Rd
fd98327327ba74bb1a61a6519f12e936 *man/gam.convergence.Rd fd98327327ba74bb1a61a6519f12e936 *man/gam.convergence.Rd
58ab3b3d6f4fd0d008d73c3c4e6d3305 *man/gam.fit.Rd 58ab3b3d6f4fd0d008d73c3c4e6d3305 *man/gam.fit.Rd
32b5cd1b6f63027150817077f3914cf4 *man/gam.fit3.Rd 21339a5d1eb8c83679dd9022ab682b5e *man/gam.fit3.Rd
dd35a8a851460c2d2106c03d544c8241 *man/gam.models.Rd dd35a8a851460c2d2106c03d544c8241 *man/gam.models.Rd
468d116a2ef9e60f683af48f4f100ef5 *man/gam.outer.Rd e969287d1a5c281faa7eb6cfce31a7c5 *man/gam.outer.Rd
7e5ba69a44bc937ddca04e4f153c7975 *man/gam.selection.Rd 96676186808802344a99f9d3170bf775 *man/gam.selection.Rd
76651917bd61fc6bc447bbb40b887236 *man/gam.side.Rd 76651917bd61fc6bc447bbb40b887236 *man/gam.side.Rd
78588cf8ed0af8eca70bba3bbed64dbe *man/gam.vcomp.Rd 78588cf8ed0af8eca70bba3bbed64dbe *man/gam.vcomp.Rd
278e0b3aa7baa44dfb96e235ceb07f4c *man/gam2objective.Rd a66a814cc4c6f806e824751fda519ae0 *man/gam2objective.Rd
4d5b3b1266edc31ce3b0e6be11ee9166 *man/gamObject.Rd 4d5b3b1266edc31ce3b0e6be11ee9166 *man/gamObject.Rd
0ac5fb78c9db628ce554a8f68588058c *man/gamSim.Rd 0ac5fb78c9db628ce554a8f68588058c *man/gamSim.Rd
6078c49c55f4e7ce20704e4fbe3bba8a *man/gamm.Rd 6078c49c55f4e7ce20704e4fbe3bba8a *man/gamm.Rd
...@@ -55,30 +56,29 @@ aba56a0341ba9526a302e39d33aa9042 *man/interpret.gam.Rd ...@@ -55,30 +56,29 @@ aba56a0341ba9526a302e39d33aa9042 *man/interpret.gam.Rd
58e73ac26b93dc9d28bb27c8699e12cf *man/linear.functional.terms.Rd 58e73ac26b93dc9d28bb27c8699e12cf *man/linear.functional.terms.Rd
5de18c3ad064a5bda4f9027d9455170a *man/logLik.gam.Rd 5de18c3ad064a5bda4f9027d9455170a *man/logLik.gam.Rd
611f5f6acac9c5f40869c01cf7f75dd3 *man/ls.size.Rd 611f5f6acac9c5f40869c01cf7f75dd3 *man/ls.size.Rd
8ef61987727e1b857edf3a366d21b66c *man/magic.Rd c4d7e46cead583732e391d680fecc572 *man/magic.Rd
496388445d8cde9b8e0c3917cbe7461d *man/magic.post.proc.Rd 496388445d8cde9b8e0c3917cbe7461d *man/magic.post.proc.Rd
5c55658a478bd34d66daad46e324d7f4 *man/mgcv-FAQ.Rd d564d1c5b2f780844ff10125348f2e2c *man/mgcv-FAQ.Rd
904b19ba280010d85d59a4b21b6d2f94 *man/mgcv-package.Rd 41df245a5821b3964db4c74b1930c0fe *man/mgcv-package.Rd
196ad09f09d6a5a44078e2282eb0a56f *man/mgcv.Rd
bb420a39f1f8155f0084eb9260fad89c *man/mgcv.control.Rd
18a9858b6f3ffde288b0bf9e1a5da2f6 *man/model.matrix.gam.Rd 18a9858b6f3ffde288b0bf9e1a5da2f6 *man/model.matrix.gam.Rd
3edd2618dcb4b366eeb405d77f3f633c *man/mono.con.Rd bc9b89db7e7ff246749551c16f5f1f07 *man/mono.con.Rd
3a4090ac778273861d97077681a55df2 *man/mroot.Rd 3a4090ac778273861d97077681a55df2 *man/mroot.Rd
8aea04d0764d195409da798b33516051 *man/negbin.Rd 8aea04d0764d195409da798b33516051 *man/negbin.Rd
41de8762baab4fc0cf1224df168520fe *man/new.name.Rd 41de8762baab4fc0cf1224df168520fe *man/new.name.Rd
dffa2d51c704c610088fa02d7220b05e *man/notExp.Rd dffa2d51c704c610088fa02d7220b05e *man/notExp.Rd
150d7f8a427117353c5c2e466ff0bfae *man/notExp2.Rd 150d7f8a427117353c5c2e466ff0bfae *man/notExp2.Rd
95b3e6686e9557b3278e21e350704ce9 *man/null.space.dimension.Rd 95b3e6686e9557b3278e21e350704ce9 *man/null.space.dimension.Rd
3720c8867aa31d7705dae102eeaa2364 *man/pcls.Rd 19939543d691f128e84d86fb5423541e *man/pcls.Rd
717d796acbaab64216564daf898b6d04 *man/pdIdnot.Rd 717d796acbaab64216564daf898b6d04 *man/pdIdnot.Rd
8c0f8575b427f30316b639a326193aeb *man/pdTens.Rd 8c0f8575b427f30316b639a326193aeb *man/pdTens.Rd
b388d29148264fd3cd636391fde87a83 *man/pen.edf.Rd b388d29148264fd3cd636391fde87a83 *man/pen.edf.Rd
de454d1dc268bda008ff46639a89acec *man/place.knots.Rd de454d1dc268bda008ff46639a89acec *man/place.knots.Rd
ced71ada93376fcdffa28ad08009cf49 *man/plot.gam.Rd 84d54e8081b82cb8d96a33de03741843 *man/plot.gam.Rd
3d1484b6c3c2ea93efe41f6fc3801b8d *man/polys.plot.Rd 3d1484b6c3c2ea93efe41f6fc3801b8d *man/polys.plot.Rd
fdd6b7e03fde145e274699fe9ea8996c *man/predict.gam.Rd afca36f5b1a5d06a7fcab2eaaa029e7e *man/predict.bam.Rd
df63d7045f83a1dc4874fcac18a2303c *man/predict.gam.Rd
a594eb641cae6ba0b83d094acf4a4f81 *man/print.gam.Rd a594eb641cae6ba0b83d094acf4a4f81 *man/print.gam.Rd
d837c87f037760c81906a51635476298 *man/qq.gam.Rd 5311a1e83ae93aef5f9ae38f7492536a *man/qq.gam.Rd
f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd
827743e1465089a859a877942ba2f4a9 *man/random.effects.Rd 827743e1465089a859a877942ba2f4a9 *man/random.effects.Rd
37669f97e17507f3ae2d6d1d74feb9d7 *man/residuals.gam.Rd 37669f97e17507f3ae2d6d1d74feb9d7 *man/residuals.gam.Rd
...@@ -97,12 +97,12 @@ d202c6718fb1138fdd99e6102250aedf *man/smooth.construct.re.smooth.spec.Rd ...@@ -97,12 +97,12 @@ d202c6718fb1138fdd99e6102250aedf *man/smooth.construct.re.smooth.spec.Rd
8672633a1fad8df3cb1f53d7fa883620 *man/smooth.construct.tensor.smooth.spec.Rd 8672633a1fad8df3cb1f53d7fa883620 *man/smooth.construct.tensor.smooth.spec.Rd
4b9bd43c3acbab6ab0159d59967e19db *man/smooth.construct.tp.smooth.spec.Rd 4b9bd43c3acbab6ab0159d59967e19db *man/smooth.construct.tp.smooth.spec.Rd
1de9c315702476fd405a85663bb32d1c *man/smooth.terms.Rd 1de9c315702476fd405a85663bb32d1c *man/smooth.terms.Rd
6aa3bcbd3198d2bbc3b9ca12c9c9cd7e *man/smoothCon.Rd 0d12daea17e0b7aef8ab89b5f801adf1 *man/smoothCon.Rd
5ae47a140393009e3dba7557af175170 *man/sp.vcov.Rd 5ae47a140393009e3dba7557af175170 *man/sp.vcov.Rd
83bd8e097711bf5bd0fff09822743d43 *man/spasm.construct.Rd 83bd8e097711bf5bd0fff09822743d43 *man/spasm.construct.Rd
a17981f0fa2a6a50e637c98c672bfc45 *man/step.gam.Rd 700699103b50f40d17d3824e35522c85 *man/step.gam.Rd
dd54c87fb87c284d3894410f50550047 *man/summary.gam.Rd dd54c87fb87c284d3894410f50550047 *man/summary.gam.Rd
22b571cbc0bd1e31f195ad927434c27e *man/t2.Rd 7f383eaaca246c8bf2d5b74d841f7f8a *man/t2.Rd
04076444b2c99e9287c080298f9dc1d7 *man/te.Rd 04076444b2c99e9287c080298f9dc1d7 *man/te.Rd
c3c23641875a293593fe4ef032b44aae *man/tensor.prod.model.matrix.Rd c3c23641875a293593fe4ef032b44aae *man/tensor.prod.model.matrix.Rd
fbd45cbb1931bdb5c0de044e22fdd028 *man/uniquecombs.Rd fbd45cbb1931bdb5c0de044e22fdd028 *man/uniquecombs.Rd
...@@ -117,20 +117,18 @@ becbe3e1f1588f7292a74a97ef07a9ae *po/R-de.po ...@@ -117,20 +117,18 @@ becbe3e1f1588f7292a74a97ef07a9ae *po/R-de.po
1a4a267ddcb87bb83f09c291d3e97523 *po/fr.po 1a4a267ddcb87bb83f09c291d3e97523 *po/fr.po
813514ea4e046ecb4563eb3ae8aa202a *po/mgcv.pot 813514ea4e046ecb4563eb3ae8aa202a *po/mgcv.pot
cd54024d76a9b53dc17ef26323fc053f *src/Makevars cd54024d76a9b53dc17ef26323fc053f *src/Makevars
a25e39145f032e8e37433651bba92ddf *src/gcv.c 94a2bcbb75cc60e8460e72ed154678c9 *src/gdi.c
2798411be2cb3748b8bd739f2d2016ee *src/gcv.h
d40012dcda1a10ee535a9b3de9b46c19 *src/gdi.c
49af97195accb65adc75620183d39a4c *src/general.h 49af97195accb65adc75620183d39a4c *src/general.h
da280ee5538a828afde0a4f6c7b8328a *src/init.c 6f301e977834b4743728346184ea11ba *src/init.c
8b37eb0db498a3867dc83364dc65f146 *src/magic.c 7f9fcb495707a003817e78f4802ceeba *src/magic.c
066af9db587e5fe6e5cc4ff8c09ae9c2 *src/mat.c 066af9db587e5fe6e5cc4ff8c09ae9c2 *src/mat.c
d21847ac9a1f91ee9446c70bd93a490a *src/matrix.c de0ae24ea5cb533640a3ab57e0383595 *src/matrix.c
54ce9309b17024ca524e279612a869d6 *src/matrix.h 0f8448f67d16668f9027084a2d9a1b52 *src/matrix.h
08c94a2af4cd047ecd79871ecbafe33a *src/mgcv.c 6a9f57b44d2aab43aa32b01ccb26bd6e *src/mgcv.c
99204b3b20c2e475d9e14022e0144804 *src/mgcv.h c62652f45ad1cd3624a849005858723a *src/mgcv.h
2a1c4f1c10510a4338e5cc34defa65f6 *src/misc.c fcbe85d667f8c7818d17509a0c3c5935 *src/misc.c
7e0ba698a21a01150fda519661ef9857 *src/qp.c 7e0ba698a21a01150fda519661ef9857 *src/qp.c
cd563899be5b09897d1bf36a7889caa0 *src/qp.h cd563899be5b09897d1bf36a7889caa0 *src/qp.h
e9cab4a461eb8e086a0e4834cbf16f30 *src/sparse-smooth.c e9cab4a461eb8e086a0e4834cbf16f30 *src/sparse-smooth.c
3a251ecac78b25c315de459cd2ba0b04 *src/tprs.c 985ef1e19c7b5d97b8e29ed78e709fc5 *src/tprs.c
d0531330f4c1209a1cdd7a75b1854724 *src/tprs.h 5352d5d2298acd9b03ee1895933d4fb4 *src/tprs.h
...@@ -12,7 +12,7 @@ export(anova.gam, bam, bam.update, concurvity, cSplineDes, ...@@ -12,7 +12,7 @@ export(anova.gam, bam, bam.update, concurvity, cSplineDes,
gam.side, gam.side,
get.var,ldTweedie, get.var,ldTweedie,
initial.sp,logLik.gam,ls.size, initial.sp,logLik.gam,ls.size,
magic, magic.post.proc, mgcv, mgcv.control, model.matrix.gam, magic, magic.post.proc, model.matrix.gam,
mono.con, mroot, negbin, new.name, mono.con, mroot, negbin, new.name,
notExp,notExp2,notLog,notLog2,pcls,null.space.dimension, notExp,notExp2,notLog,notLog2,pcls,null.space.dimension,
pen.edf,pdIdnot,pdTens, pen.edf,pdIdnot,pdTens,
......
This diff is collapsed.
This diff is collapsed.
...@@ -733,7 +733,7 @@ smooth2random.tensor.smooth <- function(object,vnames,type=1) { ...@@ -733,7 +733,7 @@ smooth2random.tensor.smooth <- function(object,vnames,type=1) {
## first sort out the re-parameterization... ## first sort out the re-parameterization...
sum.S <- object$S[[1]]/mean(abs(object$S[[1]])) sum.S <- object$S[[1]]/mean(abs(object$S[[1]]))
null.rank <- ncol(object$margin[[1]]$X)-object$margin[[1]]$rank ## null space rank null.rank <- ncol(object$margin[[1]]$X)-object$margin[[1]]$rank ## null space rank
bs.dim <- object$margin[[1]]$bs.dim bs.dim <- ncol(object$margin[[1]]$X)
if (length(object$S)>1) for (l in 2:length(object$S)) { if (length(object$S)>1) for (l in 2:length(object$S)) {
sum.S <- sum.S + object$S[[l]]/mean(abs(object$S[[l]])) sum.S <- sum.S + object$S[[l]]/mean(abs(object$S[[l]]))
dfl <- ncol(object$margin[[l]]$X) ## actual df of term (`df' may not be set by constructor) dfl <- ncol(object$margin[[l]]$X) ## actual df of term (`df' may not be set by constructor)
......
This diff is collapsed.
...@@ -163,8 +163,88 @@ qq.gam <- function(object, rep=0, level=.9,s.rep=10, ...@@ -163,8 +163,88 @@ qq.gam <- function(object, rep=0, level=.9,s.rep=10,
} }
k.check <- function(b,subsample=5000,n.rep=400) {
## function to check k in a gam fit...
## does a randomization test looking for evidence of residual
## pattern attributable to covariates of each smooth.
m <- length(b$smooth)
if (m==0) return(NULL)
rsd <- residuals(b)
ve <- rep(0,n.rep)
p.val<-v.obs <- kc <- edf<- rep(0,m)
snames <- rep("",m)
n <- nrow(b$model)
if (n>subsample) { ## subsample to avoid excessive cost
ind <- sample(1:n,subsample)
modf <- b$model[ind,]
rsd <- rsd[ind]
} else modf <- b$model
nr <- length(rsd)
for (k in 1:m) { ## work through smooths
dat <- as.data.frame(mgcv:::ExtractData(b$smooth[[k]],modf,NULL)$data)
snames[k] <- b$smooth[[k]]$label
ind <- b$smooth[[k]]$first.para:b$smooth[[k]]$last.para
kc[k] <- length(ind)
edf[k] <- sum(b$edf[ind])
nc <- b$smooth[[k]]$dim
ok <- TRUE
for (j in 1:nc) if (is.factor(dat[[j]])) ok <- FALSE
if (!is.null(attr(dat[[1]],"matrix"))) ok <- FALSE
if (!ok) {
p.val[k] <- v.obs[k] <- NA ## can't do this test with summation convention/factors
} else { ## normal term
if (nc==1) { ## 1-D term
e <- diff(rsd[order(dat[,1])])
v.obs[k] <- mean(e^2)/2
for (i in 1:n.rep) {
e <- diff(rsd[sample(1:nr,nr)]) ## shuffle
ve[i] <- mean(e^2)/2
}
p.val[k] <- mean(ve<v.obs[k])
v.obs[k] <- v.obs[k]/mean(rsd^2)
} else { ## multi-D
if (!is.null(b$smooth[[k]]$margin)) { ## tensor product (have to consider scaling)
## get the scale factors...
beta <- coef(b)[ind]
f0 <- PredictMat(b$smooth[[k]],dat)%*%beta
gr.f <- rep(0,ncol(dat))
for (i in 1:nc) {
datp <- dat;dx <- diff(range(dat[,i]))/1000
datp[,i] <- datp[,i] + dx
fp <- PredictMat(b$smooth[[k]],datp)%*%beta
gr.f[i] <- mean(abs(fp-f0))/dx
}
for (i in 1:nc) { ## rescale distances
dat[,i] <- dat[,i] - min(dat[,i])
dat[,i] <- gr.f[i]*dat[,i]/max(dat[,i])
}
}
nn <- 3
ni <- mgcv:::nearest(nn,as.matrix(dat))$ni
e <- rsd - rsd[ni[,1]]
for (j in 2:nn) e <- c(e,rsd-rsd[ni[,j]])
v.obs[k] <- mean(e^2)/2
for (i in 1:n.rep) {
rsdr <- rsd[sample(1:nr,nr)] ## shuffle
e <- rsdr - rsdr[ni[,1]]
for (j in 2:nn) e <- c(e,rsdr-rsdr[ni[,j]])
ve[i] <- mean(e^2)/2
}
p.val[k] <- mean(ve<v.obs[k])
v.obs[k] <- v.obs[k]/mean(rsd^2)
}
}
}
k.table <- cbind(kc,edf,v.obs, p.val)
dimnames(k.table) <- list(snames, c("k\'","edf","k-index", "p-value"))
k.table
} ## end of k.check
gam.check <- function(b, old.style=FALSE, gam.check <- function(b, old.style=FALSE,
type=c("deviance","pearson","response"), type=c("deviance","pearson","response"),
k.sample=5000,k.rep=200,
## arguments passed to qq.gam() {w/o warnings !}: ## arguments passed to qq.gam() {w/o warnings !}:
rep=0, level=.9, rl.col=2, rep.col="gray80", ...) rep=0, level=.9, rl.col=2, rep.col="gray80", ...)
# takes a fitted gam object and produces some standard diagnostic plots # takes a fitted gam object and produces some standard diagnostic plots
...@@ -183,7 +263,7 @@ gam.check <- function(b, old.style=FALSE, ...@@ -183,7 +263,7 @@ gam.check <- function(b, old.style=FALSE,
hist(resid,xlab="Residuals",main="Histogram of residuals",...) hist(resid,xlab="Residuals",main="Histogram of residuals",...)
plot(fitted(b), napredict(b$na.action, b$y), plot(fitted(b), napredict(b$na.action, b$y),
xlab="Fitted Values",ylab="Response",main="Response vs. Fitted Values",...) xlab="Fitted Values",ylab="Response",main="Response vs. Fitted Values",...)
if (!(b$method%in%c("GCV","GACV","UBRE","REML","ML","P-ML","P-REML"))) { ## gamm `gam' object if (!(b$method%in%c("GCV","GACV","UBRE","REML","ML","P-ML","P-REML","fREML"))) { ## gamm `gam' object
par(old.par) par(old.par)
return(invisible()) return(invisible())
} }
...@@ -219,6 +299,13 @@ gam.check <- function(b, old.style=FALSE, ...@@ -219,6 +299,13 @@ gam.check <- function(b, old.style=FALSE,
} }
} }
cat("\n") cat("\n")
## now check k
kchck <- k.check(b,subsample=k.sample,n.rep=k.rep)
if (!is.null(kchck)) {
cat("Basis dimension (k) checking results. Low p-value (k-index<1) may\n")
cat("indicate that k is too low, especially if edf is close to k\'.\n\n")
printCoefmat(kchck,digits=3);
}
par(old.par) par(old.par)
## } else plot(linpred,resid,xlab="linear predictor",ylab="residuals",...) ## } else plot(linpred,resid,xlab="linear predictor",ylab="residuals",...)
} ## end of gam.check } ## end of gam.check
......
This diff is collapsed.
...@@ -119,7 +119,7 @@ kd.vis <- function(X,cex=.5) { ...@@ -119,7 +119,7 @@ kd.vis <- function(X,cex=.5) {
} }
nearest <- function(k,X,get.a=FALSE,balanced=FALSE,cut.off=5) { nearest <- function(k,X,gt.zero = FALSE,get.a=FALSE,balanced=FALSE,cut.off=5) {
## The rows of X contain coordinates of points. ## The rows of X contain coordinates of points.
## For each point, this routine finds its k nearest ## For each point, this routine finds its k nearest
## neighbours, returning a list of 2, n by k matrices: ## neighbours, returning a list of 2, n by k matrices:
...@@ -133,8 +133,14 @@ nearest <- function(k,X,get.a=FALSE,balanced=FALSE,cut.off=5) { ...@@ -133,8 +133,14 @@ nearest <- function(k,X,get.a=FALSE,balanced=FALSE,cut.off=5) {
## for neighbours chosen to be on either side of the box in each ## for neighbours chosen to be on either side of the box in each
## direction in this case k>2*ncol(X). These neighbours are only used ## direction in this case k>2*ncol(X). These neighbours are only used
## if closer than cut.off*max(k nearest distances). ## if closer than cut.off*max(k nearest distances).
## gt.zero indicates that neighbours must have distances greater
## than zero...
require(mgcv) require(mgcv)
Xu <- uniquecombs(X);ind <- attr(Xu,"index") ## Xu[ind,] == X if (balanced) gt.zero <- TRUE
if (gt.zero) {
Xu <- uniquecombs(X);ind <- attr(Xu,"index") ## Xu[ind,] == X
} else { Xu <- X; ind <- 1:nrow(X)}
if (k>nrow(Xu)) stop("not enough unique values to find k nearest")
nobs <- length(ind) nobs <- length(ind)
n <- nrow(Xu) n <- nrow(Xu)
d <- ncol(Xu) d <- ncol(Xu)
...@@ -154,7 +160,8 @@ nearest <- function(k,X,get.a=FALSE,balanced=FALSE,cut.off=5) { ...@@ -154,7 +160,8 @@ nearest <- function(k,X,get.a=FALSE,balanced=FALSE,cut.off=5) {
rind <- 1:nobs rind <- 1:nobs
rind[ind] <- 1:nobs rind[ind] <- 1:nobs
ni <- matrix(rind[oo$ni+1],n,k)[ind,] ni <- matrix(rind[oo$ni+1],n,k)[ind,]
list(ni=ni,dist=dist,a=oo$a[ind]) if (get.a) a=oo$a[ind] else a <- NULL
list(ni=ni,dist=dist,a=a)
} }
......
** denotes quite substantial/important changes ** denotes quite substantial/important changes
*** denotes really big changes *** denotes really big changes
1.7-16
* There was an unitialized variable bug in the 1.7-14 re-written "cr" basis
code for the case k=3. Fixed.
* gam.check modified slightly so that k test only applied to smooths of
numeric variables, not factors.
1.7-15
* Several packages had documentation linking to the 'mgcv' function
help page (now removed), when a link to the package was meant. An alias
has been added to mgcv-package.Rd to fix/correct these links.
1.7-14
** predict.bam now added as a wrapper for predict.gam, allowing parallel
computation
** bam now has method="fREML" option which uses faster REML optimizer:
can make a big difference on parameter rich models.
* bam can now use a cross product and Choleski based method to accumulate
the required model matrix factorization. Faster, but less stable than
the QR based default.
* bam can now obtain starting values using a random sub sample of the data.
Useful for seriously large datasets.
* check of adequacy of basis dimensions added to gam.check
* magic can now deal with model matrices with more columns than rows.
* p-value reference distribution approximations improved.
* bam returns objects of class "bam" inheriting from "gam"
* bam now uses newdata.guaranteed=TRUE option when predicting as part
of model matrix decomposition accumulation. Speeds things up.
* More efficient `sweep and drop' centering constraints added as default for
bam. Constaint null space unchanged, but computation is faster.
* Underlying "cr" basis code re-written for greater efficiency.
* routine mgcv removed, it now being many years since there has been any
reason to use it. C source code heavily pruned as a result.
* coefficient name generation moved from estimate.gam to gam.setup.
* smooth2random.tensor.smooth had a bug that could produce a nonsensical
penalty null space rank and an error, in some cases (e.g. "cc" basis)
causing te terms to fail in gamm. Fixed.
* minor change to te constructor. Any unpenalized margin now has
corresponding penalty rank dropped along with penalty.
* Code for handling sp's fixed at exactly zero was badly thought out, and
could easily fail. fixed.
* TPRS prediction code made more efficient, partly by use of BLAS. Large
dataset setup also made more efficient using BLAS.
* smooth.construct.tensor.smooth.spec now handles marginals with factor
arguments properly (there was a knot generation bug in this case)
* bam now uses LAPACK version of qr, for model matrix QR, since it's
faster and uses BLAS.
1.7-13 1.7-13
** The Lanczos routine in mat.c was using a stupidly inefficient check for ** The Lanczos routine in mat.c was using a stupidly inefficient check for
......
...@@ -15,9 +15,10 @@ for large datasets. \code{bam} can also compute on a cluster set up by the \link ...@@ -15,9 +15,10 @@ for large datasets. \code{bam} can also compute on a cluster set up by the \link
} }
\usage{ \usage{
bam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL, bam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action=na.omit, offset=NULL,method="REML",control=list(), na.action=na.omit, offset=NULL,method="fREML",control=list(),
scale=0,gamma=1,knots=NULL,sp=NULL,min.sp=NULL,paraPen=NULL, scale=0,gamma=1,knots=NULL,sp=NULL,min.sp=NULL,paraPen=NULL,
chunk.size=10000,rho=0,sparse=FALSE,cluster=NULL,gc.level=1,...) chunk.size=10000,rho=0,sparse=FALSE,cluster=NULL,gc.level=1,
use.chol=FALSE,samfrac=1,...)
} }
%- maybe also `usage' for other objects documented here. %- maybe also `usage' for other objects documented here.
...@@ -60,7 +61,8 @@ included in \code{formula}: this conforms to the behaviour of ...@@ -60,7 +61,8 @@ included in \code{formula}: this conforms to the behaviour of
\item{method}{The smoothing parameter estimation method. \code{"GCV.Cp"} to use GCV for unknown scale parameter and \item{method}{The smoothing parameter estimation method. \code{"GCV.Cp"} to use GCV for unknown scale parameter and
Mallows' Cp/UBRE/AIC for known scale. \code{"GACV.Cp"} is equivalent, but using GACV in place of GCV. \code{"REML"} Mallows' Cp/UBRE/AIC for known scale. \code{"GACV.Cp"} is equivalent, but using GACV in place of GCV. \code{"REML"}
for REML estimation, including of unknown scale, \code{"P-REML"} for REML estimation, but using a Pearson estimate for REML estimation, including of unknown scale, \code{"P-REML"} for REML estimation, but using a Pearson estimate
of the scale. \code{"ML"} and \code{"P-ML"} are similar, but using maximum likelihood in place of REML. } of the scale. \code{"ML"} and \code{"P-ML"} are similar, but using maximum likelihood in place of REML. Default
\code{"fREML"} uses fast REML computation.}
\item{control}{A list of fit control parameters to replace defaults returned by \item{control}{A list of fit control parameters to replace defaults returned by
\code{\link{gam.control}}. Any control parameters not supplied stay at their default values.} \code{\link{gam.control}}. Any control parameters not supplied stay at their default values.}
...@@ -116,6 +118,14 @@ single machine). See details and example code. ...@@ -116,6 +118,14 @@ single machine). See details and example code.
} }
\item{gc.level}{to keep the memory footprint down, it helps to call the garbage collector often, but this takes \item{gc.level}{to keep the memory footprint down, it helps to call the garbage collector often, but this takes
a substatial amount of time. Setting this to zero means that garbage collection only happens when R decides it should. Setting to 2 gives frequent garbage collection. 1 is in between.} a substatial amount of time. Setting this to zero means that garbage collection only happens when R decides it should. Setting to 2 gives frequent garbage collection. 1 is in between.}
\item{use.chol}{By default \code{bam} uses a very stable QR update approach to obtaining the QR decomposition
of the model matrix. For well conditioned models an alternative accumulates the crossproduct of the model matrix
and then finds its Choleski decomposition, at the end. This is somewhat more efficient, computationally.}
\item{samfrac}{For very large sample size Generalized additive models the number of iterations needed for the model fit can
be reduced by first fitting a model to a random sample of the data, and using the results to supply starting values. This initial fit is run with sloppy convergence tolerances, so is typically very low cost. \code{samfrac} is the sampling fraction to use. 0.1 is often reasonable. }
\item{...}{further arguments for \item{...}{further arguments for
passing on e.g. to \code{gam.fit} (such as \code{mustart}). } passing on e.g. to \code{gam.fit} (such as \code{mustart}). }
...@@ -184,18 +194,18 @@ The negbin family is only supported for the *known theta* case. ...@@ -184,18 +194,18 @@ The negbin family is only supported for the *known theta* case.
\code{\link{linear.functional.terms}}, \code{\link{s}}, \code{\link{linear.functional.terms}}, \code{\link{s}},
\code{\link{te}} \code{\link{predict.gam}}, \code{\link{te}} \code{\link{predict.gam}},
\code{\link{plot.gam}}, \code{\link{summary.gam}}, \code{\link{gam.side}}, \code{\link{plot.gam}}, \code{\link{summary.gam}}, \code{\link{gam.side}},
\code{\link{gam.selection}},\code{\link{mgcv}}, \code{\link{gam.control}} \code{\link{gam.selection}}, \code{\link{gam.control}}
\code{\link{gam.check}}, \code{\link{linear.functional.terms}} \code{\link{negbin}}, \code{\link{magic}},\code{\link{vis.gam}} \code{\link{gam.check}}, \code{\link{linear.functional.terms}} \code{\link{negbin}}, \code{\link{magic}},\code{\link{vis.gam}}
} }
\examples{ \examples{
library(mgcv) library(mgcv)
## following is not *very* large, for obvious reasons... ## Some moderately large examples...
dat <- gamSim(1,n=15000,dist="normal",scale=20) dat <- gamSim(1,n=100000,dist="normal",scale=20)
bs <- "ps";k <- 20 bs <- "cr";k <- 20
b <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+ b <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
s(x3,bs=bs,k=k),data=dat,method="REML") s(x3,bs=bs,k=k),data=dat)
summary(b) summary(b)
plot(b,pages=1,rug=FALSE) ## plot smooths, but not rug plot(b,pages=1,rug=FALSE) ## plot smooths, but not rug
plot(b,pages=1,rug=FALSE,seWithMean=TRUE) ## `with intercept' CIs plot(b,pages=1,rug=FALSE,seWithMean=TRUE) ## `with intercept' CIs
...@@ -206,7 +216,7 @@ summary(ba) ...@@ -206,7 +216,7 @@ summary(ba)
## A Poisson example... ## A Poisson example...
dat <- gamSim(1,n=15000,dist="poisson",scale=.1) dat <- gamSim(1,n=35000,dist="poisson",scale=.1)
system.time(b1 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+ system.time(b1 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
s(x3,bs=bs,k=k),data=dat,method="ML",family=poisson())) s(x3,bs=bs,k=k),data=dat,method="ML",family=poisson()))
b1 b1
...@@ -227,13 +237,16 @@ system.time(b2 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+ ...@@ -227,13 +237,16 @@ system.time(b2 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
system.time(b2 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+ system.time(b2 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
s(x3,bs=bs,k=k),data=dat,method="ML",family=poisson(),cluster=cl)) s(x3,bs=bs,k=k),data=dat,method="ML",family=poisson(),cluster=cl))
fv <- predict(b2,cluster=cl) ## parallel prediction
if (!is.null(cl)) stopCluster(cl) if (!is.null(cl)) stopCluster(cl)
b2 b2
## Sparse smoothers example... ## Sparse smoother example...
b3 <- bam(y ~ te(x0,x1,bs="ps",k=10,np=FALSE)+s(x2,bs="ps",k=30)+ dat <- gamSim(1,n=10000,dist="poisson",scale=.1)
s(x3,bs="ps",k=30),data=dat,method="ML", system.time( b3 <- bam(y ~ te(x0,x1,bs="ps",k=10,np=FALSE)+
family=poisson(),sparse=TRUE) s(x2,bs="ps",k=30)+s(x3,bs="ps",k=30),data=dat,
method="REML",family=poisson(),sparse=TRUE))
b3 b3
} }
......
...@@ -42,6 +42,10 @@ doing this, then \code{k} was large enough. (Change in the smoothness selection ...@@ -42,6 +42,10 @@ doing this, then \code{k} was large enough. (Change in the smoothness selection
and/or the effective degrees of freedom, when \code{k} is increased, provide the obvious and/or the effective degrees of freedom, when \code{k} is increased, provide the obvious
numerical measures for whether the fit has changed substantially.) numerical measures for whether the fit has changed substantially.)
\code{\link{gam.check}} runs a simple simulation based check on the basis dimensions, which can
help to flag up terms for which \code{k} is too low. Grossly too small \code{k}
will also be visible from partial residuals available with \code{\link{plot.gam}}.
One scenario that can cause confusion is this: a model is fitted with One scenario that can cause confusion is this: a model is fitted with
\code{k=10} for a smooth term, and the EDF for the term is estimated as 7.6, \code{k=10} for a smooth term, and the EDF for the term is estimated as 7.6,
some way below the maximum of 9. The model is then refitted with \code{k=20} some way below the maximum of 9. The model is then refitted with \code{k=20}
...@@ -68,14 +72,17 @@ Wood, S.N. (2006) Generalized Additive Models: An Introduction with R. CRC. ...@@ -68,14 +72,17 @@ Wood, S.N. (2006) Generalized Additive Models: An Introduction with R. CRC.
\examples{ \examples{
## Simulate some data .... ## Simulate some data ....
library(mgcv) library(mgcv)
set.seed(0) set.seed(1)
dat <- gamSim(1,n=400,scale=2) dat <- gamSim(1,n=400,scale=2)
## fit a GAM with quite low `k' ## fit a GAM with quite low `k'
b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat) b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat)
plot(b,pages=1) plot(b,pages=1,residuals=TRUE) ## hint of a problem in s(x2)
## the following suggests a problem with s(x2)
gam.check(b)
## Economical tactic (see below for more obvious approach).... ## Another approach (see below for more obvious method)....
## check for residual pattern, removeable by increasing `k' ## check for residual pattern, removeable by increasing `k'
## typically `k', below, chould be substantially larger than ## typically `k', below, chould be substantially larger than
## the original, `k' but certainly less than n/2. ## the original, `k' but certainly less than n/2.
...@@ -87,13 +94,19 @@ gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine ...@@ -87,13 +94,19 @@ gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
## refit...
b <- gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=20)+s(x3,k=6),data=dat)
gam.check(b) ## better
## similar example with multi-dimensional smooth ## similar example with multi-dimensional smooth
b1 <- gam(y~s(x0)+s(x1,x2,k=15)+s(x3),data=dat) b1 <- gam(y~s(x0)+s(x1,x2,k=15)+s(x3),data=dat)
rsd <- residuals(b1) rsd <- residuals(b1)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam(rsd~s(x1,x2,k=100,bs="ts"),gamma=1.4,data=dat) ## `k' too low gam(rsd~s(x1,x2,k=100,bs="ts"),gamma=1.4,data=dat) ## `k' too low
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam.check(b1) ## shows same problem
## and a `te' example ## and a `te' example
b2 <- gam(y~s(x0)+te(x1,x2,k=4)+s(x3),data=dat) b2 <- gam(y~s(x0)+te(x1,x2,k=4)+s(x3),data=dat)
rsd <- residuals(b2) rsd <- residuals(b2)
...@@ -101,10 +114,15 @@ gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine ...@@ -101,10 +114,15 @@ gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam(rsd~te(x1,x2,k=10,bs="cs"),gamma=1.4,data=dat) ## `k' too low gam(rsd~te(x1,x2,k=10,bs="cs"),gamma=1.4,data=dat) ## `k' too low
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam.check(b2) ## shows same problem
## same approach works with other families in the original model ## same approach works with other families in the original model
dat <- gamSim(1,n=400,scale=.25,dist="poisson") dat <- gamSim(1,n=400,scale=.25,dist="poisson")
bp<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6), bp<-gam(y~s(x0,k=5)+s(x1,k=5)+s(x2,k=5)+s(x3,k=5),
family=poisson,data=dat) family=poisson,data=dat,method="ML")
gam.check(bp)
rsd <- residuals(bp) rsd <- residuals(bp)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
......
...@@ -26,7 +26,13 @@ Smooth terms are specified by expressions of the form: \cr ...@@ -26,7 +26,13 @@ Smooth terms are specified by expressions of the form: \cr
where \code{x1}, \code{x2}, etc. are the covariates which the smooth where \code{x1}, \code{x2}, etc. are the covariates which the smooth
is a function of, and \code{k} is the dimension of the basis used to is a function of, and \code{k} is the dimension of the basis used to
represent the smooth term. If \code{k} is not