Commit ac677317 authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.7-8

parent b404fe09
Package: mgcv
Version: 1.7-6
Version: 1.7-8
Author: Simon Wood <simon.wood@r-project.org>
Maintainer: Simon Wood <simon.wood@r-project.org>
Title: GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL
......@@ -12,6 +12,6 @@ Imports: graphics, stats, nlme, Matrix
Suggests: nlme (>= 3.1-64), splines, Matrix
LazyLoad: yes
License: GPL (>= 2)
Packaged: 2011-04-14 11:16:32 UTC; simon
Packaged: 2011-10-10 17:19:53 UTC; simon
Repository: CRAN
Date/Publication: 2011-04-14 17:01:29
Date/Publication: 2011-10-10 17:51:59
ed855c1a5ee811c55fe614288ddb85da *DESCRIPTION
abc5e760b0f5dd22c81d7614c0b304f0 *NAMESPACE
4553efb64e4def029f7c5bdfc0168936 *R/bam.r
257095aabda92b283b68e1b4a55a7976 *R/gam.fit3.r
902657a0ee2dedc3fdfa501bf3b37c5b *R/gam.sim.r
085b8b34e68292c173a22e55aca86562 *R/gamm.r
143fbfcc4e84daea905504cdcb88382d *R/mgcv.r
027eb86db613780842ec085493fd7e79 *R/plots.r
bff0e42dd1b9de24f3834d4d08245252 *R/smooth.r
fe9745f610246ee1f31eb915ca0d76a9 *R/sparse.r
d70df05baa5919d6971b26b226e88ae8 *changeLog
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
40874e3ced720a596750f499ded8a60a *data/columb.rda
88d77139cc983317b6acd8c5f1252ab9 *gnugpl2.txt
f693920e12f8a6f2b6cab93648628150 *index
9388a0979dab4fe1988b777ae4dcfb0a *inst/CITATION
5bf12ddc0dab9daae72271b96a15c539 *man/Predict.matrix.Rd
c51c9b8c9c73f81895176ded39b91394 *man/Predict.matrix.cr.smooth.Rd
612ab6354541ebe38a242634d73b66ba *man/Tweedie.Rd
bd6e48ef8323677776e6e6e213482bc0 *man/anova.gam.Rd
c06df4877abdaf07e67fc32c85fd1a87 *man/bam.Rd
b385d6d5419d0d6aefe03af1a79d5c4e *man/bam.update.Rd
4e925cb579f4693d1b8ec2d5092c0b37 *man/cSplineDes.Rd
9753a8051d9b495d9855537f3f26f491 *man/choose.k.Rd
c03748964ef606621418e428ae49b103 *man/columb.Rd
4196ba59f1fa8449c9cd0cab8a347978 *man/concurvity.Rd
f764fb7cb9e63ff341a0075a3854ab5d *man/exclude.too.far.Rd
702afc89845d640c9359253e394bcadc *man/extract.lme.cov.Rd
44ad0563add1c560027d502ce41483f5 *man/fix.family.link.Rd
75373268c1203ee110e1eede633752aa *man/fixDependence.Rd
9ac808f5a2a43cf97f24798c0922c9bf *man/formXtViX.Rd
34308f4ada8e2aca9981a17794dac30b *man/formula.gam.Rd
6f405acde2d7b6f464cf45f5395113ba *man/full.score.Rd
6e4bbb08f94dbe91d808f8ad1335ff3b *man/gam.Rd
aeb7ec80d75244bc4f2f2fd796f86efd *man/gam.check.Rd
847599e287ecf79fbb7be2cb06d72742 *man/gam.control.Rd
fd98327327ba74bb1a61a6519f12e936 *man/gam.convergence.Rd
58ab3b3d6f4fd0d008d73c3c4e6d3305 *man/gam.fit.Rd
32b5cd1b6f63027150817077f3914cf4 *man/gam.fit3.Rd
dd35a8a851460c2d2106c03d544c8241 *man/gam.models.Rd
468d116a2ef9e60f683af48f4f100ef5 *man/gam.outer.Rd
7e5ba69a44bc937ddca04e4f153c7975 *man/gam.selection.Rd
71e12bd2f7a166ca573aaf27fd984b25 *man/gam.side.Rd
78588cf8ed0af8eca70bba3bbed64dbe *man/gam.vcomp.Rd
278e0b3aa7baa44dfb96e235ceb07f4c *man/gam2objective.Rd
4d5b3b1266edc31ce3b0e6be11ee9166 *man/gamObject.Rd
0ac5fb78c9db628ce554a8f68588058c *man/gamSim.Rd
d31bdc3f9ce07fb1caefe5a609fb3bfb *man/gamm.Rd
1f5d723f2fa931297940e1a4a840d792 *man/get.var.Rd
a2ea1233d43fac89e0cacbc09a8d31e2 *man/in.out.Rd
2f222eeeb3d7bc42f93869bf8c2af58a *man/influence.gam.Rd
bafea2eef12fdc819f8ac1fb41d8b914 *man/initial.sp.Rd
aba56a0341ba9526a302e39d33aa9042 *man/interpret.gam.Rd
6e6a0ae08d83d92ee137bb0c0ddf512b *man/ldTweedie.Rd
58e73ac26b93dc9d28bb27c8699e12cf *man/linear.functional.terms.Rd
5de18c3ad064a5bda4f9027d9455170a *man/logLik.gam.Rd
611f5f6acac9c5f40869c01cf7f75dd3 *man/ls.size.Rd
8ef61987727e1b857edf3a366d21b66c *man/magic.Rd
496388445d8cde9b8e0c3917cbe7461d *man/magic.post.proc.Rd
5c55658a478bd34d66daad46e324d7f4 *man/mgcv-FAQ.Rd
904b19ba280010d85d59a4b21b6d2f94 *man/mgcv-package.Rd
eaf9a06f1a4849970f1130263c826441 *man/mgcv.Rd
bb420a39f1f8155f0084eb9260fad89c *man/mgcv.control.Rd
18a9858b6f3ffde288b0bf9e1a5da2f6 *man/model.matrix.gam.Rd
3edd2618dcb4b366eeb405d77f3f633c *man/mono.con.Rd
3a4090ac778273861d97077681a55df2 *man/mroot.Rd
8aea04d0764d195409da798b33516051 *man/negbin.Rd
41de8762baab4fc0cf1224df168520fe *man/new.name.Rd
dffa2d51c704c610088fa02d7220b05e *man/notExp.Rd
150d7f8a427117353c5c2e466ff0bfae *man/notExp2.Rd
95b3e6686e9557b3278e21e350704ce9 *man/null.space.dimension.Rd
3720c8867aa31d7705dae102eeaa2364 *man/pcls.Rd
717d796acbaab64216564daf898b6d04 *man/pdIdnot.Rd
8c0f8575b427f30316b639a326193aeb *man/pdTens.Rd
b388d29148264fd3cd636391fde87a83 *man/pen.edf.Rd
de454d1dc268bda008ff46639a89acec *man/place.knots.Rd
82f8427d8807e1ef1999798b8972a7ca *man/plot.gam.Rd
3d1484b6c3c2ea93efe41f6fc3801b8d *man/polys.plot.Rd
fdd6b7e03fde145e274699fe9ea8996c *man/predict.gam.Rd
a594eb641cae6ba0b83d094acf4a4f81 *man/print.gam.Rd
d837c87f037760c81906a51635476298 *man/qq.gam.Rd
f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd
827743e1465089a859a877942ba2f4a9 *man/random.effects.Rd
37669f97e17507f3ae2d6d1d74feb9d7 *man/residuals.gam.Rd
472cc8de5077f5511fe7eb2ad454767e *man/rig.Rd
7258dfc0149fff020054117fd2ee6bd8 *man/s.Rd
690dddb35fd5986a6eeb39dd79fa33f9 *man/slanczos.Rd
47f5ed6af7c784f26e0fe8e027ee1d90 *man/smooth.construct.Rd
10a456bc3d7151f60be24a7e3bba3d30 *man/smooth.construct.ad.smooth.spec.Rd
9f05449c0114748608f99693ec25cf3f *man/smooth.construct.cr.smooth.spec.Rd
995e79cf7a23027c60d113481083c1cd *man/smooth.construct.ds.smooth.spec.Rd
70738776c2f6f885e08aad9396b2f494 *man/smooth.construct.fs.smooth.spec.Rd
651d90947f23423ac6cf65176d318932 *man/smooth.construct.mrf.smooth.spec.Rd
78688702b6396f24692b74c554483428 *man/smooth.construct.ps.smooth.spec.Rd
0b134a171b9c5f8194787bf243f9c962 *man/smooth.construct.re.smooth.spec.Rd
9ff90a4670411d01d2cfaa0a2bd464cc *man/smooth.construct.sos.smooth.spec.Rd
8672633a1fad8df3cb1f53d7fa883620 *man/smooth.construct.tensor.smooth.spec.Rd
a84c09609eedb6902b3ef7c8885010dd *man/smooth.construct.tp.smooth.spec.Rd
1de9c315702476fd405a85663bb32d1c *man/smooth.terms.Rd
6aa3bcbd3198d2bbc3b9ca12c9c9cd7e *man/smoothCon.Rd
5ae47a140393009e3dba7557af175170 *man/sp.vcov.Rd
83bd8e097711bf5bd0fff09822743d43 *man/spasm.construct.Rd
a17981f0fa2a6a50e637c98c672bfc45 *man/step.gam.Rd
75a0ca35d038c3738ab26705cd6a6bfe *man/summary.gam.Rd
22b571cbc0bd1e31f195ad927434c27e *man/t2.Rd
04076444b2c99e9287c080298f9dc1d7 *man/te.Rd
c3c23641875a293593fe4ef032b44aae *man/tensor.prod.model.matrix.Rd
fbd45cbb1931bdb5c0de044e22fdd028 *man/uniquecombs.Rd
6e4e84b961c3ade48e23acebacd39c70 *man/vcov.gam.Rd
0e045fc94ea00c17e703ae07923f549b *man/vis.gam.Rd
becbe3e1f1588f7292a74a97ef07a9ae *po/R-de.po
0bdfcf98961b0d52b60f806dc1dba77e *po/R-en@quot.po
9126ed91030cd1c168d669854cf1f510 *po/R-fr.po
756303bf89ba1b0da99771c0850de3c3 *po/R-mgcv.pot
3550a794504bdfd4f75a83de1148bb40 *po/de.po
93f72334356fe6f05a64e567efd35c8e *po/en@quot.po
1a4a267ddcb87bb83f09c291d3e97523 *po/fr.po
813514ea4e046ecb4563eb3ae8aa202a *po/mgcv.pot
cd54024d76a9b53dc17ef26323fc053f *src/Makevars
a25e39145f032e8e37433651bba92ddf *src/gcv.c
2798411be2cb3748b8bd739f2d2016ee *src/gcv.h
d1a84be43974752e429c66bb7bdf7f0e *src/gdi.c
49af97195accb65adc75620183d39a4c *src/general.h
da280ee5538a828afde0a4f6c7b8328a *src/init.c
b42f1336e4bfd6d909d40573d09412af *src/magic.c
edf95099e3559dfa083270cd1f4bd4b3 *src/mat.c
770a885a5bb6c7a37b3d08709325949b *src/matrix.c
54ce9309b17024ca524e279612a869d6 *src/matrix.h
66946ed100818db02c6775c868b2069c *src/mgcv.c
9c7041e09719bb67e0cd9717096dde82 *src/mgcv.h
2a1c4f1c10510a4338e5cc34defa65f6 *src/misc.c
7e0ba698a21a01150fda519661ef9857 *src/qp.c
cd563899be5b09897d1bf36a7889caa0 *src/qp.h
e9cab4a461eb8e086a0e4834cbf16f30 *src/sparse-smooth.c
319f8d9fd086f39b529ee966b18006b0 *src/tprs.c
6279fd50c8e27f485c8707a20aebbd3d *src/tprs.h
......@@ -48,6 +48,7 @@ export(anova.gam, bam, bam.update, concurvity, cSplineDes,
smooth.construct.sos.smooth.spec,
smooth.construct.ad.smooth.spec,
summary.gam,sp.vcov,
spasm.construct,spasm.sp,spasm.smooth,
t2,te,tensor.prod.model.matrix,tensor.prod.penalties,
Tweedie,uniquecombs, vcov.gam, vis.gam)
......@@ -60,7 +61,7 @@ importFrom(stats,.checkMFClasses,.getXlevels, as.formula, anova,anova.glmlist,co
predict,printCoefmat ,quantile,qqnorm, reformulate,residuals,rnorm,runif,termplot,terms,terms.formula, uniroot,
var,vcov)
importFrom(nlme,Dim,corMatrix,logDet,pdConstruct,pdFactor,pdMatrix)
importFrom(Matrix,t,mean,colMeans,colSums)
importFrom(Matrix,t,mean,colMeans,colSums,chol,solve,diag)
S3method(anova, gam)
S3method(influence, gam)
......@@ -93,8 +94,6 @@ S3method(pdMatrix,pdIdnot)
S3method(solve,pdIdnot)
S3method(summary,pdIdnot)
S3method(smooth.construct,ad.smooth.spec)
S3method(smooth.construct,ps.smooth.spec)
S3method(smooth.construct,cp.smooth.spec)
......@@ -114,6 +113,11 @@ S3method(Predict.matrix,cyclic.smooth)
S3method(Predict.matrix,tensor.smooth)
S3method(Predict.matrix,tprs.smooth)
S3method(spasm.construct,cus)
S3method(spasm.sp,cus)
S3method(spasm.smooth,cus)
S3method(smooth2random,mgcv.smooth)
S3method(smooth2random,fs.interaction)
S3method(smooth2random,tensor.smooth)
S3method(smooth2random,t2.smooth)
This diff is collapsed.
......@@ -40,10 +40,13 @@ gam.reparam <- function(rS,lsp,deriv)
r.tol = as.double(r.tol),
fixed_penalty = as.integer(fixed.penalty))
S <- matrix(oo$S,q,q)
S <- (S+t(S))*.5
p <- abs(diag(S))^.5 ## by Choleski, p can not be zero if S +ve def
p[p==0] <- 1 ## but it's possible to make a mistake!!
##E <- t(t(chol(t(t(S/p)/p)))*p)
E <- t(mroot(t(t(S/p)/p),rank=q)*p) ## the square root S, with column separation
St <- t(t(S/p)/p)
St <- (St + t(St))*.5 ## force exact symmetry -- avoids very rare mroot fails
E <- t(mroot(St,rank=q)*p) ## the square root S, with column separation
Qs <- matrix(oo$Qs,q,q) ## the reparameterization matrix t(Qs)%*%S%*%Qs -> S
k0 <- 1
for (i in 1:length(rS)) { ## unpack the rS in the new space
......@@ -257,7 +260,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
## need an initial `null deviance' to test for initial divergence...
## null.coef <- qr.coef(qr(x),family$linkfun(mean(y)+0*y))
## null.coef[is.na(null.coef)] <- 0
null.eta <- x%*%null.coef + as.numeric(offset)
null.eta <- as.numeric(x%*%null.coef + as.numeric(offset))
old.pdev <- sum(dev.resids(y, linkinv(null.eta), weights)) + t(null.coef)%*%St%*%null.coef
## ... if the deviance exceeds this then there is an immediate problem
......@@ -1962,7 +1965,10 @@ fix.family.ls<-function(fam)
if (!is.null(fam$ls)) return(fam)
family <- fam$family
if (family=="gaussian") {
fam$ls <- function(y,w,n,scale) c(-sum(w)*log(2*pi*scale)/2,-sum(w)/(2*scale),sum(w)/(2*scale*scale))
fam$ls <- function(y,w,n,scale) {
nobs <- sum(w>0)
c(-nobs*log(2*pi*scale)/2 + sum(log(w[w>0]))/2,-nobs/(2*scale),nobs/(2*scale*scale))
}
return(fam)
}
if (family=="poisson") {
......@@ -1982,12 +1988,20 @@ fix.family.ls<-function(fam)
if (family=="Gamma") {
fam$ls <- function(y,w,n,scale) {
res <- rep(0,3)
k <- -lgamma(1/scale) - log(scale)/scale - 1/scale
res[1] <- sum(w*(k-log(y)))
y <- y[w>0];w <- w[w>0]
scale <- scale/w
k <- -lgamma(1/scale) - log(scale)/scale - 1/scale
res[1] <- sum(k-log(y))
k <- (digamma(1/scale)+log(scale))/(scale*scale)
res[2] <- sum(w*k)
res[2] <- sum(k/w)
k <- (-trigamma(1/scale)/(scale) + (1-2*log(scale)-2*digamma(1/scale)))/(scale^3)
res[3] <- sum(w*k)
res[3] <- sum(k/w^2)
# k <- -lgamma(1/scale) - log(scale)/scale - 1/scale
# res[1] <- sum(w*(k-log(y)))
# k <- (digamma(1/scale)+log(scale))/(scale*scale)
# res[2] <- sum(w*k)
# k <- (-trigamma(1/scale)/(scale) + (1-2*log(scale)-2*digamma(1/scale)))/(scale^3)
# res[3] <- sum(w*k)
res
}
return(fam)
......@@ -1995,12 +2009,18 @@ fix.family.ls<-function(fam)
if (family=="quasi"||family=="quasipoisson"||family=="quasibinomial") {
## fam$ls <- function(y,w,n,scale) rep(0,3)
## Uses extended quasi-likelihood form...
fam$ls <- function(y,w,n,scale) c(-sum(w)*log(scale)/2,-sum(w)/(2*scale),sum(w)/(2*scale*scale))
fam$ls <- function(y,w,n,scale) {
nobs <- sum(w>0)
c(-nobs*log(scale)/2 + sum(log(w[w>0]))/2,-nobs/(2*scale),nobs/(2*scale*scale))
}
return(fam)
}
if (family=="inverse.gaussian") {
fam$ls <- function(y,w,n,scale) c(-sum(w*log(2*pi*scale*y^3))/2,
-sum(w)/(2*scale),sum(w)/(2*scale*scale))
fam$ls <- function(y,w,n,scale) {
nobs <- sum(w>0)
c(-sum(log(2*pi*scale*y^3))/2 + sum(log(w[w>0]))/2,-nobs/(2*scale),nobs/(2*scale*scale))
## c(-sum(w*log(2*pi*scale*y^3))/2,-sum(w)/(2*scale),sum(w)/(2*scale*scale))
}
return(fam)
}
stop("family not recognised")
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
** denotes quite substantial/important changes
*** denotes really big changes
ISSUES:
1.7-8
* gamm.setup fix. Bug introduced in 1.7-7 whereby gamm with no smooths would
fail.
* gamm gives returned object a class "gamm"
1.7-7
* "fs" smooth factor interaction class introduced, for smooth factor
interactions where smoothing parameters are same at each factor level.
Very efficient with gamm, so good for e.g. individual subject smooths.
* qq.gam default method modified for increased power.
* "re" terms now allowed as tensor product marginals.
* log saturated likelihoods modified w.r.t. weight handling, so that weights
are treated as modifying the scale parameter, when scale parameter is free.
i.e. obs specific scale parameter is overall scale parameter divided by
obs weight. This ensures that when the scale parameter is free, RE/ML based
inference is invariant to multiplicative rescaling of weights.
* te and t2 now accept lists for 'm'. This allows more flexibility with
marginals that can have vector 'm' arguments (Duchon splines, P splines).
* minor mroot fix/gam.reparam fix. Could declare symmetric matrix
not symmetric and halt gam fit.
* argument sparse added to bam to allow exploitation of sparsity in fitting,
but results disappointing.
* "mrf" now evaluates rank of penalty null space numerically (previously
assumed it was always one, which it need not be with e.g. a supplied
penalty).
* gam.side now corrects the penalty rank in smooth objects that have
been constrained, to account for the constraint. Avoids some nested
model failures.
* gamm and gamm.setup code restructured to allow smooths nested in factors
and for cleaner object oriented converion of smooths to random effects.
* gam.fit3 bug. Could fail on immediate divergence as null.eta was matrix.
* slanczos bug fixes --- could segfault if k negative. Could also fail to
return correct values when k small and kl < 0 (due to a convergence
testing bug, now fixed)
* gamm bug --- could fail if only smooth was a fixed one, by looking for
non-existent sp vector. fixed.
* 'cc' Predict.matrix bug fix - prediction failed for single points.
* summary.gam failed for single coefficient random effects. fixed.
* gam returns rV, where t(rV)%*%rV*scale is Bayesian cov matrix.
1.7-6
** factor `by' variable handling extended: if a by variable is an
......
......@@ -17,7 +17,7 @@ for large datasets.
bam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action=na.omit, offset=NULL,method="REML",control=list(),
scale=0,gamma=1,knots=NULL,sp=NULL,min.sp=NULL,paraPen=NULL,
chunk.size=10000,rho=0,...)
chunk.size=10000,rho=0,sparse=FALSE,...)
}
%- maybe also `usage' for other objects documented here.
......@@ -106,6 +106,10 @@ if smooths share smoothing parameters).}
\item{rho}{An AR1 error model can be used for the residuals (based on dataframe order), of Gaussian-identity
link models. This is the AR1 correlation parameter.}
\item{sparse}{If all smooths are P-splines and all tensor products are of the form \code{te(...,bs="ps",np=FALSE)}
then in principle computation could be made faster using sparse matrix methods, and you could set this to \code{TRUE}.
In practice the speed up is disappointing, and the computation is less well conditioned than the default. See details.}
\item{...}{further arguments for
passing on e.g. to \code{gam.fit} (such as \code{mustart}). }
......@@ -123,17 +127,26 @@ block processing, fitting takes place, without the need to ever form the whole m
In the generalized case, the same trick is used with the weighted model matrix and weighted pseudodata, at each step of the PIRLS.
Smoothness selection is performed on the working model at each stage (performance oriented iteration), to maintain the
small memory footprint. This is easy to
justify in the case of GCV or Cp/UBRE/AIC based model selection, and also for REML/ML with known scale parameter. It is slightly
less well justified in the REML/ML unknown scale parameter case, but it seems unlikely that this case actually behaves any less well.
small memory footprint. This is trivial to justify in the case of GCV or Cp/UBRE/AIC based model selection, and
for REML/ML is justified via the asymptotic multivariate normality of Q'z where z is the IRLS pseudodata.
Note that POI is not as stable as the default nested iteration used with \code{\link{gam}}, but that for very large, information rich,
datasets, this is unlikely to matter much.
Note that it is possible to spend most of the computational time on basis evaluation, if an expensive basis is used. In practice
Note also that it is possible to spend most of the computational time on basis evaluation, if an expensive basis is used. In practice
this means that the default \code{"tp"} basis should be avoided: almost any other basis (e.g. \code{"cr"} or \code{"ps"})
can be used in the 1D case, and tensor product smooths (\code{te}) are typically much less costly in the multi-dimensional case.
If the argument \code{sparse=TRUE} then QR updating is replaced by an alternative scheme, in which the model matrix is stored whole
as a sparse matrix. This only makes sense if all smooths are P-splines and all tensor products are of the
form \code{te(...,bs="ps",np=FALSE)}, but no check is made. The computations are then based on the Choleski decomposition of
the crossproduct of the sparse model matrix. Although this crossproduct is nearly dense, sparsity should make its
formation efficient, which is useful as it is the leading order term in the operations count. However there is no benefit
in using sparse methods to form the Choleski decomposition, given that the crossproduct is dense.
In practice the sparse matrix handling overheads mean that modest or no speed ups are produced
by this approach, while the computation is less stable than the default, and the memory footprint often higher
(but please let the author know if you find an example where the speedup is really worthwhile).
}
......@@ -175,23 +188,28 @@ library(mgcv)
dat <- gamSim(1,n=15000,dist="normal",scale=20)
bs <- "ps";k <- 20
b<-bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
b <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
s(x3,bs=bs,k=k),data=dat,method="REML")
summary(b)
plot(b,pages=1,rug=FALSE) ## plot smooths, but not rug
plot(b,pages=1,rug=FALSE,seWithMean=TRUE) ## `with intercept' CIs
ba<-bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
ba <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
s(x3,bs=bs,k=k),data=dat,method="GCV.Cp") ## use GCV
summary(ba)
## A Poisson example...
dat <- gamSim(1,n=15000,dist="poisson",scale=.1)
b1<-bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
b1 <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
s(x3,bs=bs,k=k),data=dat,method="ML",family=poisson())
b1
## Sparse smoothers example...
b2 <- bam(y ~ te(x0,x1,bs="ps",k=10,np=FALSE)+s(x2,bs="ps",k=30)+
s(x3,bs="ps",k=30),data=dat,method="ML",
family=poisson(),sparse=TRUE)
b2
}
......
......@@ -113,7 +113,9 @@ which is a numeric or factor variable of the same dimension as the covariates of
If a \code{by} variable is numeric, then its \eqn{i^{th}}{ith} element multiples the \eqn{i^{th}}{ith}
row of the model matrix corresponding to the smooth term concerned.
If a \code{by} variable is a \code{\link{factor}} then it generates an indicator vector for each level
Factor smooth interactions (see also \code{\link{factor.smooth.interaction}}).
If a \code{by} variable is a \code{\link{factor}} then it
generates an indicator vector for each level
of the factor, unless it is an \code{\link{ordered}} factor.
In the non-ordered case, the model matrix for the smooth term is then replicated for each factor level,
and each copy has its rows multiplied by the corresponding rows of its
......
......@@ -132,6 +132,8 @@ for smoothness estimation. }
\item{residuals}{the working residuals for the fitted model.}
\item{rV}{If present, \code{rV\%*\%t(rV)*sig2} gives the estimated Bayesian covariance matrix.}
\item{scale}{when present, the scale (as \code{sig2})}
\item{scale.estimated}{ \code{TRUE} if the scale parameter was estimated, \code{FALSE} otherwise.}
......
......@@ -10,7 +10,7 @@
departures from this line).
}
\usage{
qq.gam(object, rep=0, level=.9,
qq.gam(object, rep=0, level=.9,s.rep=10,
type=c("deviance","pearson","response"),
pch=".", rl.col=2, rep.col="gray80", \dots)
}
......@@ -22,6 +22,7 @@ qq.gam(object, rep=0, level=.9,
the object family.}
\item{level}{If simulation is used for the quantiles, then reference intervals can be provided for the QQ-plot, this specifies the level.
0 or less for no intervals, 1 or more to simply plot the QQ plot for each replicate generated.}
\item{s.rep}{how many times to randomize uniform quantiles to data under direct computation.}
\item{type}{what sort of residuals should be plotted? See
\code{\link{residuals.gam}}.}
\item{pch}{plot character to use. 19 is good.}
......@@ -30,16 +31,14 @@ qq.gam(object, rep=0, level=.9,
\item{...}{extra graphics parameters to pass to plotting functions.}
}
\details{QQ-plots of the the model residuals can be produced in one of two ways. The cheapest method generates reference quantiles by
ordering the deviance residuals, associating a quantile of the uniform distribution with the corresponding datum according to the ordering,
and then feeding these uniform quantiles into the quantile function associated with each datum. The resulting quantiles are then used in place of
each datum to generate residuals. When sorted, these give approximate quantiles for the residual distribution expected for the fitted model.
associating a quantile of the uniform distribution with each datum, and feeding these uniform quantiles into the quantile function associated with each datum. The resulting quantiles are then used in place of each datum to generate approximate quantiles of residuals.
The residual quantiles are averaged over \code{s.rep} randomizations of the uniform quantiles to data.
The second method is to simulate. For each replicate, data are simulated from the fitted model, and the corresponding residuals computed.
The second method is to use direct simulatation. For each replicate, data are simulated from the fitted model, and the corresponding residuals computed. This is repeated \code{rep} times.
Quantiles are readily obtained from the empirical distribution of residuals so obtained. From this method reference bands are also computable.
Even if \code{rep} is set to zero, the routine will attempt to simulate quantiles if no quantile function is available for the family. If no random
deviate generating function family is available (e.g. for the quasi families), then a normal QQ-plot is produced. The routine conditions on the
fitted model coefficents and the scale parameter estimate.
Even if \code{rep} is set to zero, the routine will attempt to simulate quantiles if no quantile function is available for the family. If no random deviate generating function family is available (e.g. for the quasi families), then a normal QQ-plot is produced. The routine conditions on the fitted model coefficents and the scale parameter estimate.
The plots are very similar to those proposed in Ben and Yohai (2004), but are substantially cheaper to produce (the interpretation of
residuals for binary data in Ben and Yohai is not recommended).
......
......@@ -6,24 +6,37 @@
}
\usage{
slanczos(A,k=10,kl=-1)
slanczos(A,k=10,kl=-1,tol=.Machine$double.eps^.5)
}
%- maybe also `usage' for other objects documented here.
\arguments{
\item{A}{A symmetric matrix.}
\item{k}{If \code{kl} is negative, then the \code{k} largest magnitude eigenvalues are found, together with the corresponding eigenvectors. Otherwise the \code{k} highest eigenvalues are found.}
\item{k}{Must be non-negative. If \code{kl} is negative, then the \code{k} largest magnitude eigenvalues
are found, together with the corresponding eigenvectors. If \code{kl} is non-negative then the \code{k}
highest eigenvalues are found together with their eigenvectors and the \code{kl} lowest eigenvalues with
eigenvectors are also returned.}
\item{kl}{The \code{kl} lowest eigenvalues are returned, unless \code{kl} is negative. }
\item{kl}{If \code{kl} is non-negative then the \code{kl} lowest eigenvalues are returned together with their
corresponding eigenvectors (in addition to the \code{k} highest eignevalues + vectors).
negative \code{kl} signals that the \code{k} largest magnitude eigenvalues should be returned, with eigenvectors.}
\item{tol}{tolerance to use for convergence testing of eigenvalues. Error in eigenvalues will be less
than the magnitude of the dominant eigenvalue multiplied by \code{tol} (or the machine precision!).}
}
\details{ The routine implements Lanczos iteration with full re-orthogonalization as described in Demmel (1997). Lanczos
\details{ If \code{kl} is non-negative, returns the highest \code{k} and lowest \code{kl} eigenvalues,
with their corresponding eigenvectors. If \code{kl} is negative, returns the largest magnitude \code{k}
eigenvalues, with corresponding eigenvectors.
The routine implements Lanczos iteration with full re-orthogonalization as described in Demmel (1997). Lanczos
iteraction iteratively constructs a tridiagonal matrix, the eigenvalues of which converge to the eigenvalues of \code{A},
as the iteration proceeds (most extreme first). Eigenvectors can also be computed. For small \code{k} and \code{kl} the
approach is faster than computing the full symmetric eigendecompostion. The tridiagonal eigenproblems are handled using LAPACK.
The implementation is not optimal: in particular the inner triadiagonal problems could be handled more efficiently.
The implementation is not optimal: in particular the inner triadiagonal problems could be handled more efficiently, and
there would be some savings to be made by not always returning eigenvectors.
}
\value{ A list with elements \code{values} (array of eigenvalues); \code{vectors} (matrix with eigenvectors in its columns);
......
......@@ -120,7 +120,8 @@ for example because they are inverse variance components. }
unpenalized and unconstrained). If this is null then \code{smoothCon} will set it to the basis
dimension. \code{smoothCon} will reduce this by the number of constraints.}
\item{te.ok}{\code{FALSE} if this term should not be used as a tensor product marginal. Set to \code{TRUE} if \code{NULL}.}
\item{te.ok}{\code{0} if this term should not be used as a tensor product marginal, \code{1} if
it can be used and plotted, and \code{2} is it can be used but not plotted. Set to \code{1} if \code{NULL}.}
\item{plot.me}{Set to \code{FALSE} if this smooth should not be plotted by \code{\link{plot.gam}}. Set to \code{TRUE} if \code{NULL}.}
......
......@@ -13,7 +13,7 @@ Duchon's (1977) construction generalizes the usual thin plate spline penalty as
of the squared Euclidian norm of a vector of mixed partial derivatives of the function w.r.t. its arguments. Duchon re-expresses
this penalty in the Fourier domain, and then weights the squared norm in the integral by the Euclidean norm of the fourier frequencies,
raised to the power 2s. s is a user selected constant taking integer values divided by 2. If d is the numberof arguments of the smooth,
then it is required that -d/2 < m < d/2. To obtain smooth functions we further require that m + s > d/2. If s=0 then the usual thin plate
then it is required that -d/2 < s < d/2. To obtain continuous functions we further require that m + s > d/2. If s=0 then the usual thin plate
spline is recovered.
The construction is amenable to exactly the low rank approximation method given in Wood (2003) to thin plate splines, with similar
......@@ -61,12 +61,12 @@ To use for basis setup it is recommended to use \code{\link{smooth.construct2}}.
For these classes the specification \code{object} will contain
information on how to handle large datasets in their \code{xt} field. The default is to randomly
subsample 2000 `knots' from which to produce a tprs basis, if the number of
unique predictor variable combinations in excess of 2000. The default can be
subsample 2000 `knots' from which to produce a reduced rank eigen approximation to the full basis,
if the number of unique predictor variable combinations in excess of 2000. The default can be
modified via the \code{xt} argument to \code{\link{s}}. This is supplied as a
list with elements \code{max.knots} and \code{seed} containing a number
to use in place of 2000, and the random number seed to use (either can be
missing).
missing). Note that the random sampling will not effect the state of R's RNG.
For these bases \code{knots} has two uses. Firstly, as mentioned already, for large datasets
the calculation of the \code{tp} basis can be time-consuming. The user can retain most of the advantages of the
......
\name{smooth.construct.fs.smooth.spec}
\alias{smooth.construct.fs.smooth.spec}
\alias{Predict.matrix.fs.interaction}
\alias{factor.smooth.interaction}
%- Also NEED an `\alias' for EACH other topic documented here.
\title{Factor smooth interactions in GAMs}
\description{Simple factor smooth interactions, which are efficient when used with \code{\link{gamm}}.
This smooth class allows a separate smooth for each level of a factor, with the same smoothing parameter for all
smooths. It is an alternative to using factor \code{by} variables.
See the discussion of \code{by} variables in \code{\link{gam.models}} for more general alternatives
for factor smooth interactions (including interactions of tensor product smooths with factors).
}
\usage{
\method{smooth.construct}{fs.smooth.spec}(object, data, knots)
\method{Predict.matrix}{fs.interaction}(object, data)
}
\arguments{
\item{object}{For the \code{smooth.construct} method a smooth specification object,
usually generated by a term \code{s(x,...,bs="fs",)}. For the \code{predict.Matrix} method
an object of class \code{"fs.interaction"} produced by the \code{smooth.construct} method.}
\item{data}{a list containing just the data (including any \code{by} variable) required by this term,
with names corresponding to \code{object$term}.}
\item{knots}{ a list containing any knots supplied for smooth basis setup.}
}
\value{ An object of class \code{"fs.interaction"} or a matrix mapping the coefficients of the factor smooth interaction to the smooths themselves.
}
\details{This class produces a smooth for each level of a single factor variable. Within a \code{\link{gam}}
formula this is done with something like \code{s(x,fac,bs="fs")}, which is almost equivalent to \code{s(x,by=fac,id=1)}
(with the \code{gam} argument \code{select=TRUE}). The terms are fully penalized, with separate penalties on each null
space component: for this reason they are not centred (no sum-to-zero constraint).
The class is particularly useful for use with \code{\link{gamm}}, where estimation efficiently exploits
the nesting of the smooth within the factor. Note however that: i) \code{gamm} only allows one conditioning
factor for smooths, so \code{s(x)+s(z,fac,bs="fs")+s(v,fac,bs="fs")} is OK, but \code{s(x)+s(z,fac1,bs="fs")+s(v,fac2,bs="fs")}
is not; ii) all aditional random effects and correlation structures will be treated as nested within the factor
of the smooth factor interaction.
Note that \code{gamm4} from the {\code{gamm4}} package suffers from none of the restrictions that apply to \code{gamm}, and
\code{"fs"} terms can be used without side-effects.
Any singly penalized basis can be used to smooth at each factor level. The default is \code{"tp"}, but alternatives can
be supplied in the \code{xt} argument of \code{s} (e.g. \code{s(x,fac,bs="fs",xt="cr")} or
\code{s(x,fac,bs="fs",xt=list(bs="cr")}). The \code{k} argument to \code{s(...,bs="fs")} refers to the basis dimension to
use for each level of the factor variable.
Note one computational bottleneck: currently \code{\link{gamm}} (or \code{gamm4}) will produce the full posterior covariance matrix for the
smooths, including the smooths at each level of the factor. This matrix can get large and computationally costly if there
are more than a few hundred levels of the factor. Even at one or two hundred levels, care should be taken to keep
down \code{k}.
}
\author{ Simon N. Wood \email{simon.wood@r-project.org}}
\seealso{\code{\link{gam.models}}, \code{\link{gamm}}}
\examples{
library(mgcv)
set.seed(0)
## simulate data...
f0 <- function(x) 2 * sin(pi * x)
f1 <- function(x,a=2,b=-1) exp(a * x)+b
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 *
(10 * x)^3 * (1 - x)^10
n <- 500;nf <- 25
fac <- sample(1:nf,n,replace=TRUE)
x0 <- runif(n);x1 <- runif(n);x2 <- runif(n)
a <- rnorm(nf)*.2 + 2;b <- rnorm(nf)*.5
f <- f0(x0) + f1(x1,a[fac],b[fac]) + f2(x2)
fac <- factor(fac)
y <- f + rnorm(n)*2
## so response depends on global smooths of x0 and
## x2, and a smooth of x1 for each level of fac.
## fit model...
bm <- gamm(y~s(x0)+ s(x1,fac,bs="fs",k=5)+s(x2,k=20))
plot(bm$gam,pages=1)
## Could also use...
## b <- gam(y~s(x0)+ s(x1,fac,bs="fs",k=5)+s(x2,k=20),method="ML")
## ... but its slower (increasingly so with increasing nf)
## b <- gam(y~s(x0)+ t2(x1,fac,bs=c("tp","re"),k=5,full=TRUE)+
## s(x2,k=20),method="ML"))
## ... is exactly equivalent.
}
\keyword{models} \keyword{regression}%-- one or more ..
......@@ -89,7 +89,7 @@ Wood S.N. (2006) Generalized additive models: an intriduction with R CRC.
\author{ Simon N. Wood \email{simon.wood@r-project.org} and Thomas Kneib
(Fabian Scheipl prorotyped the low rank MRF idea) }
\seealso{\code{\link{in.out}}, code{\link{polys.plot}}}
\seealso{\code{\link{in.out}}, \code{\link{polys.plot}}}
\examples{
library(mgcv)
......
......@@ -60,7 +60,7 @@ then it represents the parametric interaction of these arguments, with the coeff
assumption that the coefficients are i.i.d. normal random effects. See \code{\link{smooth.construct.re.smooth.spec}}. Note that random effect terms are not suitable
for use in tensor product smooths.}
\item{Markov Random Fields}{\code{bs="mrf"}. These are popular when space is split up into disctrete contiguous
\item{Markov Random Fields}{\code{bs="mrf"}. These are popular when space is split up into discrete contiguous
geographic units (districts of a town, for example). In this case a simple smoothing penalty is constructed
based on the neighbourhood structure of the geographic units. See \code{\link{mrf}} for details and an example.}
......@@ -85,12 +85,21 @@ metres and seconds: a tensor product smooth will give the same
answer in both cases (see \code{\link{te}} for an example of this). Note that tensor product terms are knot based, and the
thin plate splines seem to offer no advantage over cubic or P-splines as marginal bases.
Finally univariate and bivariate adaptive (\code{bs="ad"}) smooths are available (see \code{\link{adaptive.smooth}}).
Some further specialist smoothers that are not suitable for use in tensor products are also available.
\describe{
\item{Adaptive smoothers}{\code{bs="ad"}}
Univariate and bivariate adaptive smooths are available (see \code{\link{adaptive.smooth}}).
These are appropriate when the degree of smoothing should itself vary with the covariates to be smoothed, and the
data contain sufficient information to be able to estimate the appropriate variation. Because this flexibility is
achieved by splitting the penalty into several `basis penalties' these terms are not suitable as components of tensor
product smooths, and are not supported by \code{gamm}.
\item{Factor smooth interactions}{\code{bs="fs"}}
Smooth factor interactions are often produced using \code{by} variables (see \code{\link{gam.models}}), but a special smoother
class (see \code{\link{factor.smooth.interaction}}) is available for the case in which a smooth is required at each of a large number of factor levels (for example a smooth for each patient in a study), and each smooth should have the same smoothing parameter. The \code{"fs"} smoothers are set up to be efficient when used with \code{\link{gamm}}, and have penalties on each null sapce component (i.e. they are fully `random effects').
}
}
......
\name{spasm.construct}
\alias{spasm.construct}
\alias{spasm.sp}
\alias{spasm.smooth}
%- Also NEED an `\alias' for EACH other topic documented here.
\title{Experimental sparse smoothers}
\description{These are experimental sparse smoothing functions, and should
be left well alone!
}
\usage{
spasm.construct(object,data)
spasm.sp(object,sp,w=rep(1,object$nobs),get.trH=TRUE,block=0,centre=FALSE)
spasm.smooth(object,X,residual=FALSE,block=0)
}
%- maybe also `usage' for other objects documented here.
\arguments{
\item{object}{sparse smooth object}
\item{data}{data frame}
\item{sp}{smoothing parameter value}
\item{w}{optional weights}
\item{get.trH}{Should (estimated) trace of sparse smoother matrix be returned}
\item{block}{index of block, 0 for all blocks}
\item{centre}{should sparse smooth be centred?}
\item{X}{what to smooth}
\item{residual}{apply residual operation?}
}
%\value{}
%\details{}
%\references{}
\author{Simon N. Wood \email{simon.wood@r-project.org}}