Commit 59a2c26e authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.7-23

parent b33cf22f
Package: mgcv
Version: 1.7-22
Version: 1.7-23
Author: Simon Wood <>
Maintainer: Simon Wood <>
Title: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness
......@@ -14,6 +14,7 @@ Suggests: nlme (>= 3.1-64), splines, Matrix, parallel
LazyLoad: yes
ByteCompile: yes
License: GPL (>= 2)
Packaged: 2012-10-14 19:28:35 UTC; sw283
Packaged: 2013-05-20 17:25:14 UTC; sw283
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2012-10-15 11:19:58
Date/Publication: 2013-05-20 22:23:35
f18b9a3e2aae6b0e649d8590a0eb17f1 *DESCRIPTION
3e152f2a5b0efaa9f28a3a976ed087e4 *NAMESPACE
7b5db03a3a80878eb8007f8c0848583a *R/bam.r
3833a46aaf23732d739f65e4143985ba *DESCRIPTION
5f405da54d89b1e236e272dad685cc2c *NAMESPACE
26423bcf418f00a9a89c06ddaa0646b5 *R/bam.r
e59ee1842b4546cb64ebe39e6a7f00ee *R/fast-REML.r
2073e50d30aacc57abac1db8b3133467 *R/gam.fit3.r
bed563ba6663fc3d2fa3e2e35c71130e *R/gam.fit3.r
902657a0ee2dedc3fdfa501bf3b37c5b *R/gam.sim.r
3527e64fa9d32baba5644c69c7d1eaf6 *R/gamm.r
adf0eadd4f8b76ac0a0feb6dfaabc7f5 *R/mgcv.r
8b4c61012d19bfd07321cc17fc4f9927 *R/plots.r
034e8f9ace2bd40a1fb860c49419e677 *R/smooth.r
484800e6369585d11f6c7aef7631d9b0 *R/soap.r
fd470608d9222676fc4c5e8c5e6530dd *R/gamm.r
51fed051fcdeb2d4159389f0a8e9eed2 *R/mgcv.r
2370f3d746b680472cd51ccb9dc10284 *R/plots.r
7c562d025d05df4a5002cd0bd2c0030c *R/smooth.r
bc0821492c2db83bc43ecef857b455f9 *R/soap.r
fb66d6c18398411a99ffcb788b854f13 *R/sparse.r
21f750feb19056a490a9afb39aac1622 *changeLog
146e60ee2eaed061319f27ef75a559b6 *changeLog
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
40874e3ced720a596750f499ded8a60a *data/columb.rda
88d77139cc983317b6acd8c5f1252ab9 *gnugpl2.txt
......@@ -20,7 +20,7 @@ f693920e12f8a6f2b6cab93648628150 *index
c51c9b8c9c73f81895176ded39b91394 *man/
e9b0a2e31b130cf2cb38618ec50d1919 *man/
3e5e30b44947d3ddf00fcd55be219038 *man/Tweedie.Rd
0d24940b2e0a3e9fa7b3dc0224d049ab *man/anova.gam.Rd
27b7ae82bab920543222c4428d33cee1 *man/anova.gam.Rd
585d6b6a2e7816df83d5f9193195a233 *man/bam.Rd
b385d6d5419d0d6aefe03af1a79d5c4e *man/bam.update.Rd
bb5ec26743d46d7ce1dbf128bceab35a *man/cSplineDes.Rd
......@@ -28,23 +28,23 @@ bb5ec26743d46d7ce1dbf128bceab35a *man/cSplineDes.Rd
c03748964ef606621418e428ae49b103 *man/columb.Rd
4196ba59f1fa8449c9cd0cab8a347978 *man/concurvity.Rd
f764fb7cb9e63ff341a0075a3854ab5d *man/exclude.too.far.Rd
ee14c3984a1d6ad58abd16c9a611df3a *man/extract.lme.cov.Rd
309a4b74fdb1e36ee53642bb96a2e069 *man/extract.lme.cov.Rd
44ad0563add1c560027d502ce41483f5 *man/
4d4eea9ad2b78e765a96b2a0065725c1 *man/fixDependence.Rd
9ac808f5a2a43cf97f24798c0922c9bf *man/formXtViX.Rd
bb099e6320a6c1bd79fe4bf59e0fde08 *man/formula.gam.Rd
c7561a0f2e114247955e212aeabc6ae9 *man/formXtViX.Rd
d87ff6287d7e343ea24d2053f4724b35 *man/formula.gam.Rd
4da4d585b329769eb44f0c7a6e7dd554 *man/fs.test.Rd
6f405acde2d7b6f464cf45f5395113ba *man/full.score.Rd
d7781fc09ad742a63746ea92a927eee4 *man/gam.Rd
d7392fabfe137e327d89cc9686946740 *man/gam.Rd
fe61dd0efab9e920c17335faf3d5764c *man/gam.check.Rd
96c9417e4ac5d79ec9ed3f363adfc4e9 *man/gam.control.Rd
44db24b66ce63bc16d2c8bc3f5b42ac5 *man/gam.convergence.Rd
58ab3b3d6f4fd0d008d73c3c4e6d3305 *man/
21339a5d1eb8c83679dd9022ab682b5e *man/gam.fit3.Rd
714e72ecfeda30b49bfc6c20690076d0 *man/gam.models.Rd
fef20e1357eeb7d936d3c737cbebccce *man/gam.models.Rd
e969287d1a5c281faa7eb6cfce31a7c5 *man/gam.outer.Rd
96676186808802344a99f9d3170bf775 *man/gam.selection.Rd
3b4d858771c3c0a8a33ff426e194db76 *man/gam.side.Rd
956bf1a6ac1361dd0403c28153b03a9b *man/gam.side.Rd
b2ff252537dd2155524b774b2435e66e *man/gam.vcomp.Rd
a66a814cc4c6f806e824751fda519ae0 *man/gam2objective.Rd
807a8e175d84ba3eac87f8ef2931f6e4 *man/gamObject.Rd
......@@ -56,17 +56,17 @@ a2ea1233d43fac89e0cacbc09a8d31e2 *man/in.out.Rd
2f222eeeb3d7bc42f93869bf8c2af58a *man/influence.gam.Rd
bafea2eef12fdc819f8ac1fb41d8b914 *man/initial.sp.Rd
aba56a0341ba9526a302e39d33aa9042 *man/interpret.gam.Rd
6e6a0ae08d83d92ee137bb0c0ddf512b *man/ldTweedie.Rd
74720435528f6731265143c58ffb89ce *man/ldTweedie.Rd
58e73ac26b93dc9d28bb27c8699e12cf *man/linear.functional.terms.Rd
5de18c3ad064a5bda4f9027d9455170a *man/logLik.gam.Rd
611f5f6acac9c5f40869c01cf7f75dd3 *man/ls.size.Rd
396548e4a40652c54e80751acbfb2e2c *man/magic.Rd
496388445d8cde9b8e0c3917cbe7461d *man/
9427464e63d70da15fc468961ef0c29b *man/mgcv-FAQ.Rd
42cc4e7dcfa64979db73c672980ac9b1 *man/mgcv-package.Rd
2db8cd353119cff94b57ff8ae06a6f5d *man/mgcv-package.Rd
00ccf213c31910cd14f1df65a300eb33 *man/model.matrix.gam.Rd
bc9b89db7e7ff246749551c16f5f1f07 *man/mono.con.Rd
0fec75a585ffc9524e36367f194ce23c *man/mroot.Rd
d33914a328f645af13f5a42914ca0f35 *man/mroot.Rd
f0363e3309cca00db840ad0fb8359c0f *man/negbin.Rd
8b766a6ad848b0f1ca469e381ded0169 *man/
7d8f62e182f7c428cc9d46ddd4d97d43 *man/notExp.Rd
......@@ -93,22 +93,22 @@ b5a06918956fd10f23285b453c05bdb4 *man/smooth.construct.Rd
58198db8810ffe0f285d7393de893860 *man/
76013feaf70d00976bba0154b6f2c946 *man/
e571d0064d328b2c1698c4d06c2eba06 *man/smooth.construct.ds.smooth.spec.Rd
1bb6748d4d2934e48f0572bc5114ffcb *man/smooth.construct.fs.smooth.spec.Rd
db75c958cbfb561914a3291ab58b9786 *man/smooth.construct.fs.smooth.spec.Rd
772e6c18d25cfaf3d9c194031ee042fe *man/smooth.construct.mrf.smooth.spec.Rd
abe15377f471a2d8957a59c19eeef0bb *man/
d202c6718fb1138fdd99e6102250aedf *man/
b67bb662654b968a583301789ecaaf62 *man/
ce311e544603f1089562747652abce20 *man/smooth.construct.sos.smooth.spec.Rd
8672633a1fad8df3cb1f53d7fa883620 *man/smooth.construct.tensor.smooth.spec.Rd
76ca359e3eda6e32398da0d6cdf9903b *man/
f0a8dc771f043b027fade5d40aec4792 *man/smooth.terms.Rd
574641abfbf14339fbb01948235a2a19 *man/
75a042d74460dd0c6e0c0a95c1f5d3f7 *man/smooth.terms.Rd
0d12daea17e0b7aef8ab89b5f801adf1 *man/smoothCon.Rd
b55a396da77559dac553613146633f97 *man/sp.vcov.Rd
83bd8e097711bf5bd0fff09822743d43 *man/spasm.construct.Rd
700699103b50f40d17d3824e35522c85 *man/step.gam.Rd
b8a08ccd4c9371ff2735b5aefd3058b2 *man/summary.gam.Rd
c168e9934229117f51576c1b6a7fefbe *man/summary.gam.Rd
84a714ed8ef26e46e51947abe959762c *man/t2.Rd
27205a151fe03ce66941d2a53ff82f0f *man/te.Rd
2ca0b136690ac2239e7872f5d7e07368 *man/te.Rd
6eebb6ef90374ee09453d6da6449ed79 *man/
06ca54aeca7231f01ce06187a1f6cae8 *man/uniquecombs.Rd
5b05216eaeea867397ab130bb1332164 *man/vcov.gam.Rd
......@@ -126,17 +126,17 @@ d5a0f198090ecbfedaa6549f2918b997 *po/R-po.po
cd54024d76a9b53dc17ef26323fc053f *src/Makevars
94a2bcbb75cc60e8460e72ed154678c9 *src/gdi.c
49af97195accb65adc75620183d39a4c *src/general.h
227ab8e09ea73b4c3a68069d8e2dca74 *src/init.c
622474c4de77d29db86e9a3f73dc7862 *src/init.c
1635205c9d5bb9a5713663775f47fd2a *src/magic.c
14a2d29c6be01b083cbd9b5fff15025f *src/mat.c
ad5838098ac3fe9d20443fcad3e9a62d *src/mat.c
de0ae24ea5cb533640a3ab57e0383595 *src/matrix.c
0f8448f67d16668f9027084a2d9a1b52 *src/matrix.h
6a9f57b44d2aab43aa32b01ccb26bd6e *src/mgcv.c
fd0b2cb110575b93a7d826b25226f12b *src/mgcv.h
fcbe85d667f8c7818d17509a0c3c5935 *src/misc.c
d6407d2677ddf834dbeda77cd0768cee *src/mgcv.c
2b38840c84b6f8d6c2dce7603df8b1e9 *src/mgcv.h
b9328a6d6aa86380fe52a971deccd3d4 *src/misc.c
7e0ba698a21a01150fda519661ef9857 *src/qp.c
cd563899be5b09897d1bf36a7889caa0 *src/qp.h
3236a7cbc7030edbd6f1ebc3563c9faa *src/soap.c
dcd47b55e6a2e5f21eac7ddbfb357fba *src/soap.c
e9cab4a461eb8e086a0e4834cbf16f30 *src/sparse-smooth.c
3a9712b746b358a0636b49088d79e582 *src/tprs.c
a0de1174b314d662859a6ba6f1d7f03b *src/tprs.c
5352d5d2298acd9b03ee1895933d4fb4 *src/tprs.h
......@@ -55,7 +55,7 @@ export(anova.gam, bam, bam.update, concurvity, cSplineDes,,
Tweedie,uniquecombs, vcov.gam, vis.gam)
......@@ -86,6 +86,7 @@ qr.up <- function(arg) {
ind <- arg$start[b]:arg$stop[b]
##arg$G$model <- arg$mf[ind,]
X <- predict(arg$G,newdata=arg$mf[ind,],type="lpmatrix",newdata.guaranteed=TRUE,block.size=length(ind))
rownames(X) <- NULL
if (is.null(arg$coef)) eta1 <- arg$eta[ind] else eta1 <- drop(X%*%arg$coef) + arg$offset[ind]
mu <- arg$linkinv(eta1)
y <- arg$G$y[ind] ## arg$G$model[[arg$response]]
......@@ -265,6 +266,7 @@ <- function (G, mf, chunk.size, gp ,scale ,gamma,method, coef=NULL,etas
ind <- start[b]:stop[b]
##G$model <- mf[ind,]
X <- predict(G,newdata=mf[ind,],type="lpmatrix",newdata.guaranteed=TRUE,block.size=length(ind))
rownames(X) <- NULL
if (is.null(coef)) eta1 <- eta[ind] else eta1 <- drop(X%*%coef) + offset[ind]
mu <- linkinv(eta1)
y <- G$y[ind] ## G$model[[gp$response]] ## - G$offset[ind]
......@@ -1259,7 +1261,7 @@ bam.update <- function(b,data,chunk.size=10000) {
gp<-interpret.gam(b$formula) # interpret the formula
X <- predict(b,newdata=data,type="lpmatrix",na.action=b$NA.action) ## extra part of model matrix
rownames(X) <- NULL
cnames <- names(b$coefficients)
## now get the new data in model frame form...
......@@ -549,6 +549,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
ls <- family$ls(y,weights,n,scale)*n.true/nobs ## saturated likelihood and derivatives
Dp <- dev + oo$conv.tol + dev.extra
REML <- Dp/(2*scale) - ls[1] + oo$rank.tol/2 - rp$det/2 - remlInd*Mp/2*log(2*pi*scale)
attr(REML,"Dp") <- Dp/(2*scale)
if (deriv) {
REML1 <- oo$D1/(2*scale) + oo$trA1/2 - rp$det1/2
if (deriv==2) REML2 <- (matrix(oo$D2,nSp,nSp)/scale + matrix(oo$trA2,nSp,nSp) - rp$det2)/2
......@@ -581,7 +582,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
K <- oo$rank.tol/2 - rp$det/2
REML <- Dp/(2*phi) - ls[1] + K - Mp/2*log(2*pi*phi)*remlInd
attr(REML,"Dp") <- Dp/(2*phi)
if (deriv) {
phi1 <- oo$P1; Dp1 <- oo$D1; K1 <- oo$trA1/2 - rp$det1/2;
REML1 <- Dp1/(2*phi) - phi1*(Dp/(2*phi^2)+Mp/(2*phi)*remlInd + ls[2]) + K1
......@@ -2040,7 +2041,7 @@<-function(fam)
negbin <- function (theta = stop("'theta' must be specified"), link = "log") {
## modified from Venables and Ripley's MASS library to work with gam.fit3,
## and to allow a range of `theta' values to be specified
## single `theta' to specify fixed value; 2 theta values (first smaller that second)
## single `theta' to specify fixed value; 2 theta values (first smaller than second)
## are limits within which to search for theta; otherwise supplied values make up
## search set.
## Note: to avoid warnings, get(".Theta")[1] is used below. Otherwise the initialization
......@@ -2161,17 +2162,19 @@ totalPenaltySpace <- function(S,H,off,p)
mini.roots <- function(S,off,np)
mini.roots <- function(S,off,np,rank=NULL)
# function to obtain square roots, B[[i]], of S[[i]]'s having as few
# columns as possible. S[[i]]=B[[i]]%*%t(B[[i]]). np is the total number
# of parameters. S is in packed form.
# of parameters. S is in packed form. rank[i] is optional supplied rank
# of S[[i]], rank[i] < 1, or rank=NULL to estimate.
{ m<-length(S)
if (m<=0) return(list())
if (is.null(rank)) rank <- rep(-1,m)
for (i in 1:m)
{ b<-mroot(S[[i]])
{ b <- mroot(S[[i]],rank=rank[i])
B[[i]] <- matrix(0,np,ncol(b))
B[[i]][off[i]:(off[i]+nrow(b)-1),] <- b
......@@ -1242,6 +1242,7 @@ formXtViX <- function(V,X)
qrz <- qr(Z,LAPACK=TRUE)
R <- qr.R(qrz);R[,qrz$pivot] <- R
colnames(R) <- colnames(X)
#res <- crossprod(R)
......@@ -1617,7 +1618,8 @@ gamm <- function(formula,random=NULL,correlation=NULL,family=gaussian(),data=lis
if (G$m>0) for (i in 1:G$m) {# Accumulate the total penalty matrix
if (is.null(object$smooth[[i]]$fac)) { ## simple regular smooth...
ind <- first.para[i]:last.para[i]
for (l in 1:length(object$smooth[[i]]$S)) {
ns <-length(object$smooth[[i]]$S)
if (ns) for (l in 1:ns) {
S[ind,ind] <- S[ind,ind] +
k <- k+1
......@@ -1627,7 +1629,7 @@ gamm <- function(formula,random=NULL,correlation=NULL,family=gaussian(),data=lis
ind <- first.para[i]:(first.para[i]+object$smooth[[i]]$n.para-1)
ns <- length(object$smooth[[i]]$S)
for (j in 1:length(flev)) {
for (l in 1:ns) {
if (ns) for (l in 1:ns) {
S[ind,ind] <- S[ind,ind] +
k <- k+1
This diff is collapsed.
......@@ -83,12 +83,23 @@ qq.gam <- function(object, rep=0, level=.9,s.rep=10,
pch=".", rl.col=2, rep.col="gray80",...) {
## get deviance residual quantiles under good fit
type <- match.arg(type)
ylab <- paste(type,"residuals")
if (inherits(object,c("glm","gam"))) {
if (is.null(object$sig2)) object$sig2 <- summary(object)$dispersion
} else stop("object is not a glm or gam")
## in case of NA & na.action="na.exclude", we need the "short" residuals:
object$na.action <- NULL
D <- residuals(object,type=type)
if (object$method %in% c("PQL","lme.ML","lme.REML","lmer.REML","lmer.ML","glmer.ML")) {
## then it's come out of a gamm fitter and qq.gam can't see the random effects
## that would be necessary to get quantiles. Fall back to normal QQ plot.
lim <- Dq <- NULL
if (rep==0) {
fam <-$family)
......@@ -144,8 +155,6 @@ qq.gam <- function(object, rep=0, level=.9,s.rep=10,
ylab <- paste(type,"residuals")
if (!is.null(Dq))
{ qqplot(Dq,D,ylab=ylab,xlab="theoretical quantiles",ylim=range(c(lim,D)),
......@@ -374,7 +383,9 @@ repole <- function(lo,la,lop,lap) {
## get distances to meridian point
d <- sqrt((x-xm)^2+(y-ym)^2+(z-zm)^2)
## angles to meridian plane (i.e. plane containing origin, meridian point and pole)...
theta <- acos((1+cos(phi)^2-d^2)/(2*cos(phi)))
theta <- (1+cos(phi)^2-d^2)/(2*cos(phi))
theta[theta < -1] <- -1; theta[theta > 1] <- 1
theta <- acos(theta)
## now decide which side of meridian plane...
......@@ -778,7 +789,8 @@ plot.mgcv.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
if (!x$||x$dim>2) return(NULL) ## shouldn't or can't plot
if (x$dim==1) { ## get basic plotting data for 1D terms
raw <- data[x$term][[1]]
xx<-seq(min(raw),max(raw),length=n) # generate x sequence for prediction
if (is.null(xlim)) xx <- seq(min(raw),max(raw),length=n) else # generate x sequence for prediction
xx <- seq(xlim[1],xlim[2],length=n)
if (x$by!="NA") # deal with any by variables
{ by<-rep(1,n);dat<-data.frame(x=xx,by=by)
......@@ -799,8 +811,10 @@ plot.mgcv.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
raw <- data.frame(x=as.numeric(data[xterm][[1]]),
n2 <- max(10,n2)
xm <- seq(min(raw$x),max(raw$x),length=n2)
ym <- seq(min(raw$y),max(raw$y),length=n2)
if (is.null(xlim)) xm <- seq(min(raw$x),max(raw$x),length=n2) else
xm <- seq(xlim[1],xlim[2],length=n2)
if (is.null(ylim)) ym <- seq(min(raw$y),max(raw$y),length=n2) else
ym <- seq(ylim[1],ylim[2],length=n2)
xx <- rep(xm,n2)
yy <- rep(ym,rep(n2,n2))
if (too.far>0)
......@@ -1063,7 +1077,7 @@ plot.gam <- function(x,residuals=FALSE,rug=TRUE,se=TRUE,pages=0,select=NULL,scal
X1[,first:last] <- P$X <- sqrt(rowSums((X1%*%x$Vp)*X1))
} else <- ## se in centred (or anyway unconstained) space only
if (!is.null(P$exclude)) P$[P$exclude] <- NA
} ## standard errors for fit completed
if (partial.resids) { P$p.resid <- fv.terms[,length(order)+i] + w.resid }
This diff is collapsed.
......@@ -752,6 +752,9 @@ <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
## plot method function for soap.smooth terms
if (scheme==3) {
if (is.null(P)) outline <- FALSE else outline <- TRUE
if (is.null(xlim)) xlim <- c(x$sd$x0,x$sd$x0+ncol(x$sd$G)*x$sd$dx)
if (is.null(ylim)) ylim <- c(x$sd$y0,x$sd$y0+nrow(x$sd$G)*x$sd$dy)
P0 <- plot.mgcv.smooth(x=x,P=P,data=data,label=label,se1.mult=se1.mult,se2.mult=se2.mult,
......@@ -773,24 +776,24 @@ <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
soap.basis(x$sd,film=film,wiggly=wiggly,plot=TRUE,beta=beta) -> G
if (is.null(xlab)) xlabel<- x$term[1] else xlabel <- xlab
if (is.null(ylab)) ylabel <- x$term[2] else ylabel <- ylab
xscale <- x$sd$x0 + 0:(ncol(G)-1) * x$sd$dx
yscale <- x$sd$y0 + 0:(nrow(G)-1) * x$sd$dy
xscale <- x$sd$x0 + 0:(nrow(G)-1) * x$sd$dx
yscale <- x$sd$y0 + 0:(ncol(G)-1) * x$sd$dy
} else { ## do plot
if (scheme==0) {
xlim <- range(P$xscale);dx = xlim[2] - xlim[1]
ylim <- range(P$yscale);dy = ylim[2] - ylim[1]
} else if (scheme==1) {
} else if (scheme==2) {
if (is.null(names(P$bnd))) {
for (i in 1:length(P$bnd)) lines(P$bnd[[i]],lwd=2)
** denotes quite substantial/important changes
*** denotes really big changes
* users can use as.factor(a) in formula, and variable.summary won't know
about it - causes a problem is a is character, for example. Could coerce
character to factor, but is that general enough? what about if a was numeric
then summary doesn't reflect how it's used?
* F needed in gam object?
* still not really happy with nesting constraints, especially under rank
*** Fix of severe bug introduced with R 2.15.2 LAPACK change. The shipped
version of dsyevr can fail to produce orthogonal eigenvectors when
uplo='U' (upper triangule of symmetric matrix used), as opposed to 'L'.
This led to a substantial number of gam smoothing parameter estimation
convergence failures, as the key stabilizing re-parameterization was
substantially degraded. The issue did not affect gaussian additive models
with GCV model selection. Other models could fail to converge any further
as soon as any smoothing parameter became `large', as happens when a
smooth is estimated as a straight line. check.gam reported the lack of full
convergence, but the issue could also generate complete fit failures.
Picked up late as full test suite had only been run on R > 2.15.1 with an
external LAPACK.
** 'ti' smooth specification introduced, which provides a much better (and
very simple) way of allowing nested models based on 'te' type tensor
product smooths. 'ti' terms are used to set up smooth interactions
excluding main effects (so ti(x,z) is like x:z while te(x,z) is more
like x*z, although the analogy is not exact).
* summary.gam now uses a more efficient approach to p-value computation
for smooths, using the factor R from the QR factorization of the weighted
model matrix produced during fitting. This is a weighted version of the
Wood (2013) statistic used previously - simulations in that paper
essentially unchanged by the change.
* summary.gam now deals gracefully with terms such as "fs" smooths
estimated using gamm, for which p-values can no be computed. (thanks to
Gavin Simpson).
* gam.check/qq.gam now uses a normal QQ-plot when the model has been fitted
using gamm or gamm4, since qq.gam cannot compute corrext quantiles in
the presence of random effects in these cases.
* gamm could fail with fixed smooths while assembling total
penalty matrix, by attempting to access non-existent penalty
matrix. (Thanks Ainars Aunins for reporting this.)
* stripped rownames from model matrix, eta, linear predictor etc. Saves
memory and time.
* could switch axis ranges. Fixed.
* plot.mgcv.smooth now sets smooth plot range on basis of xlim and
ylim if present.
* formXtViX documentation fixed + return matrix labels.
* fixDependence related negative index failures for completely confounded
terms - now fixed.
* sos smooth model matrix re-scaled for better conditioning.
* sos plot method could produce NaNs by a rounding error in argument to
acos - fixed.
......@@ -77,7 +77,7 @@ which is in fact an object returned from \code{\link{summary.gam}}.
Scheipl, F., Greven, S. and Kuchenhoff, H. (2008) Size and power of tests for a zero random effect variance or polynomial
regression in additive and linear mixed models. Comp. Statist. Data Anal. 52, 3283-3299
Wood, S.N. (2012) On p-values for smooth componentes of an extended generalized additive model. in press Biometrika
Wood, S.N. (2013) On p-values for smooth components of an extended generalized additive model. Biometrika 100:221-228
\author{ Simon N. Wood \email{} with substantial
......@@ -87,6 +87,7 @@ and Hall/CRC Press.
## see also ?formXtViX for use of extract.lme.cov2
......@@ -4,7 +4,7 @@
\title{ Form component of GAMM covariance matrix}
\description{ This is a service routine for \code{\link{gamm}}. Given,
\eqn{V}{V}, an estimated covariance matrix obtained using \code{\link{extract.lme.cov2}} this
routine forms \eqn{ X^TV^{-1}X}{X'inv(V)X} as efficiently as possible, given
routine forms a matrix square root of \eqn{ X^TV^{-1}X}{X'inv(V)X} as efficiently as possible, given
the structure of \eqn{V}{V} (usually sparse).
......@@ -27,7 +27,7 @@ this matrix.
\value{ A matrix.
\value{ A matrix, R such that \code{crossprod(R)} gives \eqn{ X^TV^{-1}X}{X'inv(V)X}.
......@@ -50,6 +50,17 @@ Generalized Additive Mixed Models. Biometrics 62(4):1025-1036
\code{\link{gamm}}, \code{\link{extract.lme.cov2}}
b <- lme(effort ~ Type, data=ergoStool, random=~1|Subject)
V1 <- extract.lme.cov(b, ergoStool)
V2 <- extract.lme.cov2(b, ergoStool)
X <- model.matrix(b, data=ergoStool)
crossprod(formXtViX(V2, X))
\keyword{models} \keyword{smooth} \keyword{regression}%-- one or more ..
......@@ -55,15 +55,19 @@ An alternative for specifying smooths of more than one covariate is e.g.: \cr
smooth of the two covariates \code{x} and \code{z} constructed from marginal t.p.r.s. bases
of dimension 5 and 10 with marginal penalties of order 2 and 3. Any combination of basis types is
possible, as is any number of covariates. \code{\link{te}} provides further information.
\code{\link{ti}} terms are a variant designed to be used as interaction terms when the main
effects (and any lower order interactions) are present. \code{\link{t2}} produces tensor product
smooths that are the natural low rank analogue of smoothing spline anova models.
Both \code{s} and \code{te} terms accept an \code{sp} argument of supplied smoothing parameters: positive
\code{s}, \code{te}, \code{ti} and \code{t2} terms accept an \code{sp} argument of supplied smoothing parameters: positive
values are taken as fixed values to be used, negative to indicate that the parameter should be estimated. If
\code{sp} is supplied then it over-rides whatever is in the \code{sp} argument to \code{gam}, if it is not supplied
then it defaults to all negative, but does not over-ride the \code{sp} argument to \code{gam}.
Formulae can involve nested or ``overlapping'' terms such as \cr
\code{y~s(x)+s(z)+s(x,z)} or \code{y~s(x,z)+s(z,v)}:\cr see
\code{\link{gam.side}} for further details and examples.
\code{y~s(x)+s(z)+s(x,z)} or \code{y~s(x,z)+s(z,v)}\cr
but nested models should really be set up using \code{\link{ti}} terms:
see \code{\link{gam.side}} for further details and examples.
Smooth terms in a \code{gam} formula will accept matrix arguments as covariates (and corresponding \code{by} variable),
in which case a `summation convention' is invoked. Consider the example of \code{s(X,Z,by=L)} where \code{X}, \code{Z}
......@@ -222,10 +222,10 @@ facilitate optimization, and \code{\link{gam.fit3}} is used in place of
Several alternative basis-penalty types are built in for representing model
smooths, but alternatives can easily be added (see \code{\link{smooth.terms}}
for an overview and \code{\link{smooth.construct}} for how to add smooth classes). In practice the
default basis is usually the best choice, but the choice of the basis dimension (\code{k} in the
\code{s} and \code{te} terms) is something that should be considered carefully (the exact value is not critical,
but it is important not to make it restrictively small, nor very large and computationally costly). The basis should
for an overview and \code{\link{smooth.construct}} for how to add smooth classes). The choice of the basis dimension
(\code{k} in the \code{s}, \code{te}, \code{ti} and \code{t2} terms) is something that should be considered carefully
(the exact value is not critical, but it is important not to make it restrictively small, nor very large and
computationally costly). The basis should
be chosen to be larger than is believed to be necessary to approximate the smooth function concerned.
The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty on
the term, and (usually) selected automatically (with an upper limit set by
......@@ -387,6 +387,12 @@ plot(bt,pages=1)
plot(bt,pages=1,scheme=2) ## alternative visualization
AIC(b0,bt) ## interaction worse than additive
## Alternative: test for interaction with a smooth ANOVA
## decomposition (this time between x2 and x1)
bt <- gam(y~s(x0)+s(x1)+s(x2)+s(x3)+ti(x1,x2,k=6),
## If it is believed that x0 and x1 are naturally on
## the same scale, and should be treated isotropically
## then could try...
......@@ -81,26 +81,22 @@ specifies that the tensor product should use a rank 6 cubic regression spline ma
and a rank 7 P-spline marginal to create a smooth with basis dimension 42.
\section{Nested terms}{
Another common modelling task is to decide
if a model like:
\deqn{E(y_i) = f(x_i,z_i)}{E(y)=f(x,z)}
is really necessary or whether:
would do just as well. One possibility is to examine the results
of fitting:\cr
\code{y ~ s(x) + s(z) + s(x,z)}.\cr
\code{gam} automatically generates side conditions to make this model
identifiable. You can also estimate `overlapping' models like:\cr
\code{y ~ s(x,z) + s(z,v)}.
Note that optimum interpretability of such models is obtained by ensuring that
lower order terms are strictly nested withing higher order ones. This is most easily achieved by
employing tensor product smooths for multidimensional terms, and ensuring that lower order terms are
using the same (marginal) bases as higher order ones. For example:\cr
\code{y ~ s(x,bs="cr",k=5) + s(z,bs="cr",k=5) + te(x,z)}\cr
would ensure strict nesting of the main effects within the interaction (given the \code{\link{te}} default
\code{bs} and \code{k} arguments), since the main effects employ the same bases used as marginals for the \code{te} term.
\section{Nested terms/functional ANOVA}{
Sometimes it is interesting to specify smooth models with a main effects + interaction structure such as
\deqn{E(y_i) = f_1(x_i) + f_2(z_i) + f_3(x_i,z_i)}{E(y) = f1 (x) + f2(z) + f3(x,z)}
\deqn{E(y_i)=f_1(x_i) + f_2(z_i) + f_3(v_i)
+ f_4(x_i,z_i) + f_5(z_i,v_i) + f_6(z_i,v_i) + f_7(x_i,z_i,v_i) }{
E(y) = f1(x) + f2(z) + f3(v) + f4(x,z) + f5(z,v) + f6(z,v) + f7(x,z,v)}
for example. Such models should be set up using \code{\link{ti}} terms in the model formula. For example: \cr
\code{y ~ ti(x) + ti(z) + ti(x,z)}, or\cr
\code{y ~ ti(x) + ti(z) + ti(v) + ti(x,z) + ti(x,v) + ti(z,v)+ti(x,z,v)}. \cr
The \code{ti} terms produce interactions with the component main effects excluded appropriately. (There is in fact no need to use \code{ti} terms for the main effects here, \code{s} terms could also be used.)
\code{gam} allows nesting (or `overlap') of \code{te} and \code{s} smooths, and automatically generates side conditions to
make such models identifiable, but the resulting models are much less stable and interpretable than those constructed using \code{ti} terms.
\section{`by' variables}{
......@@ -2,7 +2,7 @@
%- Also NEED an `\alias' for EACH other topic documented here.
\title{Identifiability side conditions for a GAM}
\description{ GAM formulae with repeated variables only correspond to
\description{ GAM formulae with repeated variables may only correspond to
identifiable models given some side conditions. This routine works
out appropriate side conditions, based on zeroing redundant parameters.
It is called from \code{mgcv:::gam.setup} and is not intended to be called by users.
......@@ -53,23 +53,37 @@ prediction matrices for the term.
\author{ Simon N. Wood \email{}}
## The first two examples here iluustrate models that cause
## gam.side to impose constraints, but both are a bad way
## of estimating such models. The 3rd example is the right
## way....
dat <- gamSim(n=400,scale=2) ## simulate data
## estimate model with redundant smooth interaction...
## estimate model with redundant smooth interaction (bad idea).
## Simulate data with real interation...
dat <- gamSim(2,n=500,scale=.1)
## a fully nested tensor product example
## a fully nested tensor product example (bad idea)
b <- gam(y~s(x,bs="cr",k=6)+s(z,bs="cr",k=6)+te(x,z,k=6),
## A fully nested tensor product example, done properly,
## so that gam.side is not needed to ensure identifiability.
## ti terms are designed to produce interaction smooths
## suitable for adding to main effects (we could also have
## used s(x) and s(z) without a problem, but not s(z,x)
## or te(z,x)).
b <- gam(y ~ ti(x,k=6) + ti(z,k=6) + ti(x,z,k=6),
......@@ -35,9 +35,7 @@ case then the \code{tweedie} package is the place to start.
%- maybe also `usage' for other objects documented here.
\author{ Simon N. Wood \email{}
modified from Venables and Ripley's \code{negative.binomial} family.
\author{ Simon N. Wood \email{}}
Dunn, P.K. and G.K. Smith (2005) Series evaluation of Tweedie exponential dispersion model densities.
......@@ -28,7 +28,8 @@ are also provided, for extracting information from a fitted \code{\link{gamObjec
Use of \code{\link{gam}} is much like use of \code{\link{glm}}, except that
within a \code{gam} model formula, isotropic smooths of any number of predictors can be specified using
\code{\link{s}} terms, while scale invariant smooths of any number of
predictors can be specified using \code{\link{te}} terms. \code{\link{smooth.terms}} provides an
predictors can be specified using \code{\link{te}}, \code{\link{ti}} or \code{\link{t2}} terms.
\code{\link{smooth.terms}} provides an
overview of the built in smooth classes, and \code{\link{random.effects}} should be refered to for an overview
of random effects terms (see also \code{\link{mrf}} for Markov random fields). Estimation is by
penalized likelihood or quasi-likelihood maximization, with smoothness
......@@ -14,14 +14,14 @@ mroot(A,rank=NULL,method="chol")
\item{A}{ The positive semi-definite matrix, a square root of which is
to be found.}
\item{rank}{if the rank of the matrix \code{A} is known then it should
be supplied.}
be supplied. \code{NULL} or <1 imply that it should be estimated.}
\item{method}{ \code{"chol"} to use pivoted choloeski decompositon,
which is fast but tends to over-estimate rank. \code{"svd"} to use
singular value decomposition, which is slow, but is the most accurate way
to estimate rank.}
\details{ The routine uses an LAPACK SVD routine, or the LINPACK pivoted
\details{ The function uses SVD, or a pivoted
Choleski routine. It is primarily of use for turning penalized regression
problems into ordinary regression problems.}
\value{ A matrix, \eqn{ {\bf B}}{B} with as many columns as the rank of
......@@ -83,9 +83,11 @@ y <- f + rnorm(n)*2
## so response depends on global smooths of x0 and
## x2, and a smooth of x1 for each level of fac.
## fit model...
## fit model (note p-values not available when fit