Commit b90a627e authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.8-15

parent fe531dd4
......@@ -5,6 +5,23 @@ Currently deprecated and liable to be removed:
- single penalty tensor product smooths.
- p.type!=0 in summary.gam.
1.8-15
* Fix of survival function prediction in cox.ph family. Code used expression
(8.8.5) in Klein and Moeschberger (2003), which is missing a term. Correct
expression is, e.g., (10) from Andersen, Weis Bentzon and Klein (1996)
Scandinavian Journal of Statistics.
* Added help file 'cox.pht' for Cox PH regression with time dependent
covariates.
* fix of potential seg fault in gdi.c:get_bSb if single smooth model
rank deficient (insufficient workspace allocated).
* gam.fit5 modified to step half if trial penalized likelihood is infinite.
* Fix so that bam works properly with drop.intercept=TRUE.
1.8-14
* bug fix to smoothCon that could generate NAs in model matrix when using bam
......
Package: mgcv
Version: 1.8-14
Version: 1.8-15
Author: Simon Wood <simon.wood@r-project.org>
Maintainer: Simon Wood <simon.wood@r-project.org>
Title: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness
......@@ -16,6 +16,6 @@ LazyLoad: yes
ByteCompile: yes
License: GPL (>= 2)
NeedsCompilation: yes
Packaged: 2016-08-19 14:43:58 UTC; sw283
Packaged: 2016-09-13 10:00:39 UTC; sw283
Repository: CRAN
Date/Publication: 2016-08-20 12:07:14
Date/Publication: 2016-09-14 18:51:00
7df0bec9645b1dadeee2a563262c54a8 *ChangeLog
ca275901f459de1d298d7f5865990c2a *DESCRIPTION
0d9b05a5c3954f23ca64d831c45daf89 *ChangeLog
992b7d6af50703c1753fe3dd458a9df3 *DESCRIPTION
eb723b61539feef013de476e68b5c50a *GPL-2
595f7fa74dd678ae4be1f6968e174555 *NAMESPACE
5d66b16a4068c8848cfca1665e6cc5c7 *R/bam.r
aa23a9b76c09f06b4ec8e71ea903a0a8 *R/coxph.r
41b51c0155f4cb0b9d746d896108c160 *R/bam.r
87933a9e2845f871334d96f537ee11e9 *R/coxph.r
fdb1dd621eb177107bbfb5f5c11777b2 *R/efam.r
4b370253beb1eda19d389132ade30212 *R/fast-REML.r
dea8bcbb4668ad0a6dfe3420bebebf48 *R/gam.fit3.r
94ffbce0c3d9504fa0fabd41ca07e5fe *R/gam.fit4.r
6f57e0f2cb348d07754b0e41a374e76c *R/gam.fit4.r
1b620b840ca67c9759639bd6e53824e3 *R/gam.sim.r
a03433fd11911b28b59a2e339af9d100 *R/gamlss.r
66257913c556f135657b7c12b6ed733d *R/gamlss.r
cceac26b3d01513f8f87eacae91ddae0 *R/gamm.r
4a0ce642d904d7f871b5b766c04a2af4 *R/jagam.r
91a24cd64e1f7b155079716cd1c3c179 *R/mgcv.r
71d23c1ee219ca4242d5f49ccd164970 *R/mgcv.r
2feca0dc9d354de7bc707c67a5891364 *R/misc.r
03772998ab05887f2eeae10dd6efe983 *R/mvam.r
24d69c678592e666bd2304e671dd8dca *R/plots.r
9533ebaace4334adbccb112a5d49ce86 *R/smooth.r
1dde3ff2f6c3227d1a4765e41d0cf92b *R/soap.r
586b0b4447aeb73b28fac9bdfefd3e21 *R/plots.r
38a7e6b503af65a76ffe999ac66049bb *R/smooth.r
666d7fd36fda68b928993d5388b0d7fc *R/soap.r
76cc875719bf0ef9eab45ea5bfeccda6 *R/sparse.r
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
40874e3ced720a596750f499ded8a60a *data/columb.rda
......@@ -39,7 +39,7 @@ d0fa291cbbcef359c61e426f6ba38fbb *man/Predict.matrix.soap.film.Rd
f12468625253dd3603907de233762fd6 *man/Rrank.Rd
d3e09165c6ca581db78d820a3c39329c *man/Tweedie.Rd
c93cf5b2e0def2e36cced3ee68912958 *man/anova.gam.Rd
209aff0fbe9653fc80f01ef80d04435f *man/bam.Rd
871531b320262b21eab35467f334cc6a *man/bam.Rd
8c95ed4822108db4e8bc59d8eac9e77c *man/bam.update.Rd
cf5f1ee0aab639c7c4b9b357434f15b2 *man/bandchol.Rd
745cbf31eb14fc1c5916fc634c74d998 *man/bug.reports.mgcv.Rd
......@@ -47,10 +47,11 @@ cf5f1ee0aab639c7c4b9b357434f15b2 *man/bandchol.Rd
a72647229fd92763636941e61597995d *man/choose.k.Rd
c03748964ef606621418e428ae49b103 *man/columb.Rd
9906a83ce29a3b17044bc2a3c9940cee *man/concurvity.Rd
9fda5546321683530d3351ea1849fe7a *man/coxph.Rd
c355ae144e02759a86da79dcfca1a6c1 *man/coxph.Rd
5f4e410d9dbc40e99b723a5caa1d9e01 *man/coxpht.Rd
0a6d4b3858cbb69a5d375ecd09282256 *man/exclude.too.far.Rd
069e7d174bb0d35bf30675e74a47dfd3 *man/extract.lme.cov.Rd
2878ef784cc418a8520133194f31bb44 *man/family.mgcv.Rd
0423770bd423bf3de94a53e7c707678c *man/family.mgcv.Rd
44ad0563add1c560027d502ce41483f5 *man/fix.family.link.Rd
4d4eea9ad2b78e765a96b2a0065725c1 *man/fixDependence.Rd
e75719779e18c723ee1fd17e44e7901b *man/formXtViX.Rd
......@@ -72,10 +73,10 @@ b2ff252537dd2155524b774b2435e66e *man/gam.vcomp.Rd
eb8648cc6b3c9374b899abd2f2b12f7b *man/gam2objective.Rd
717401fd7efa3b39d90418a5d1d0c216 *man/gamObject.Rd
a2593990d6a87f7b783d0435062dec02 *man/gamSim.Rd
235410d3b67b06368b5d87b2a342a27a *man/gamm.Rd
351009f7345aaff40f3377e77f4ce7ad *man/gamm.Rd
ec2d1b2aa87cc3f8424e07b6dc0340d5 *man/gaulss.Rd
398a5c12285401c1d37a8edb58780bc3 *man/get.var.Rd
164f3e338ff038e941b0bf9db95dcd16 *man/gevlss.Rd
faf325db054dce698286f87e09ddc202 *man/gevlss.Rd
39b47f30a7ea45382d62ca1753d876a8 *man/identifiability.Rd
4f96476abbf9692f52030d3859580a31 *man/in.out.Rd
6c33026ebb458483d34a04548c05d664 *man/inSide.Rd
......@@ -112,11 +113,11 @@ ee9352ba4c531a8def16deddcab9a9fd *man/pdIdnot.Rd
931c3aefb8b5e42aa230cfedd281bed1 *man/place.knots.Rd
b903ebcf31703db156e033fdfa527d73 *man/plot.gam.Rd
c27a6b886929b1dc83bf4b90cae848f9 *man/polys.plot.Rd
49f15188b06a04ced66395511025f714 *man/predict.bam.Rd
d9fcc12228600e5ad422a4d082cd51c9 *man/predict.gam.Rd
1a9d83c9fc67e5f0fc85d66d3112f4ef *man/predict.bam.Rd
2892714c395537c0cca29914989b1d50 *man/predict.gam.Rd
cf14ce6cf8e4147f0f5c6e5b93b2af73 *man/print.gam.Rd
6d0ce4e574fabceffdbedd46c91364cb *man/qq.gam.Rd
f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd
22b7dcbc8ff4096365fa98ce56b957c9 *man/rTweedie.Rd
fc1985e7dd5222182c4a8a939963b965 *man/random.effects.Rd
c523210ae95cb9aaa0aaa1c37da1a4c5 *man/residuals.gam.Rd
3c747a8066bcc28ae706ccf74f903d3e *man/rig.Rd
......@@ -124,7 +125,7 @@ c523210ae95cb9aaa0aaa1c37da1a4c5 *man/residuals.gam.Rd
845ec29324583d18c8dc150625e153e3 *man/s.Rd
d515e51ec98d73af6166f7b31aeaba9b *man/scat.Rd
898e7cc2def2ee234475e68d0b904b29 *man/sdiag.Rd
9641bb5e2573b5fcbdc303fcf160f3e6 *man/single.index.Rd
8e968226c2b65ee89c8de2fd9869b086 *man/single.index.Rd
6f03e337d54221bc167d531e25af1eea *man/slanczos.Rd
8020154bd5c709d11f0e7cf043df886d *man/smooth.construct.Rd
4a689eba97e4fed138dccb8cad13205e *man/smooth.construct.ad.smooth.spec.Rd
......@@ -132,11 +133,11 @@ d515e51ec98d73af6166f7b31aeaba9b *man/scat.Rd
76013feaf70d00976bba0154b6f2c946 *man/smooth.construct.cr.smooth.spec.Rd
f5e6d0f5122f61c336827b3615482157 *man/smooth.construct.ds.smooth.spec.Rd
db75c958cbfb561914a3291ab58b9786 *man/smooth.construct.fs.smooth.spec.Rd
e38f6ea7f89cb068976b73a177878906 *man/smooth.construct.gp.smooth.spec.Rd
7eba0bd0ecf412e7db5dab49eba724cd *man/smooth.construct.gp.smooth.spec.Rd
4aaa84b520992fbc32b0c37f7f63c1dd *man/smooth.construct.mrf.smooth.spec.Rd
2523b6cefa306210c00f3477853b7f07 *man/smooth.construct.ps.smooth.spec.Rd
b45d8e71bda4ceca8203dffea577e441 *man/smooth.construct.re.smooth.spec.Rd
224a08b5edcd7c8af6c995f03b17320a *man/smooth.construct.so.smooth.spec.Rd
0607caa694345cef339bc2178674923c *man/smooth.construct.so.smooth.spec.Rd
0bfe981f2c3e6ea5b8d5372076ccde53 *man/smooth.construct.sos.smooth.spec.Rd
3cb4e59f915c8d64b90754eaeeb5a86f *man/smooth.construct.t2.smooth.spec.Rd
8672633a1fad8df3cb1f53d7fa883620 *man/smooth.construct.tensor.smooth.spec.Rd
......@@ -170,9 +171,9 @@ b3dfaf74ca2d76ca26eec986a14f5584 *po/fr.po
24f393ff94fa39c8e66250eb0d0e2fcd *po/mgcv.pot
ed7cb61912e4990cb0076d4cdcf11da8 *po/pl.po
03972284b3400cf82cacd5d2dc4b8cb3 *src/Makevars
b0459e16b04844271cf5e6b53aca0e47 *src/coxph.c
342aa30c8f6f1e99ffa2576a6f29d7ce *src/coxph.c
555f6948880bff0b6fa23eeb51609c1c *src/discrete.c
ac57d36542385c458dca90cb0675439e *src/gdi.c
9721ea07a126278eacd041c814161968 *src/gdi.c
2436f9b328e80370ce2203dbf1dd813c *src/general.h
acf1a2cff05d68856f2b6acee1e63cf7 *src/init.c
5d9a48e07b438a7c3b32c94fe67ae30c *src/magic.c
......
......@@ -658,6 +658,7 @@ bgam.fit <- function (G, mf, chunk.size, gp ,scale ,gamma,method, coef=NULL,etas
offset <- G$offset
family <- G$family
G$family <- gaussian() ## needed if REML/ML used
G$family$drop.intercept <- family$drop.intercept ## needed in predict.gam
variance <- family$variance
dev.resids <- family$dev.resids
## aic <- family$aic
......@@ -1606,7 +1607,7 @@ AR.resid <- function(rsd,rho=0,AR.start=NULL) {
bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,na.action=na.omit,
offset=NULL,method="fREML",control=list(),select=FALSE,scale=0,gamma=1,knots=NULL,sp=NULL,
min.sp=NULL,paraPen=NULL,chunk.size=10000,rho=0,AR.start=NULL,discrete=FALSE,
cluster=NULL,nthreads=NA,gc.level=1,use.chol=FALSE,samfrac=1,
cluster=NULL,nthreads=NA,gc.level=1,use.chol=FALSE,samfrac=1,coef=NULL,
drop.unused.levels=TRUE,G=NULL,fit=TRUE,drop.intercept=NULL,...)
## Routine to fit an additive model to a large dataset. The model is stated in the formula,
......@@ -1674,7 +1675,8 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
mf$formula <- gp$fake.formula
mf$method <- mf$family<-mf$control<-mf$scale<-mf$knots<-mf$sp<-mf$min.sp <- mf$gc.level <-
mf$gamma <- mf$paraPen<- mf$chunk.size <- mf$rho <- mf$cluster <- mf$discrete <-
mf$use.chol <- mf$samfrac <- mf$nthreads <- mf$G <- mf$fit <- mf$select <- mf$drop.intercept <- mf$...<-NULL
mf$use.chol <- mf$samfrac <- mf$nthreads <- mf$G <- mf$fit <- mf$select <- mf$drop.intercept <-
mf$coef <- mf$...<-NULL
mf$drop.unused.levels <- drop.unused.levels
mf[[1]] <- quote(stats::model.frame) ## as.name("model.frame")
pmf <- mf
......@@ -1714,7 +1716,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
## from the span of the parametric effects?
if (is.null(family$drop.intercept)) { ## family does not provide information
if (is.null(drop.intercept)) drop.intercept <- FALSE else {
drop.intercept <- drop.intercept ## force drop.intercept to correct length
drop.intercept <- drop.intercept[1] ## force drop.intercept to correct length
if (drop.intercept) family$drop.intercept <- drop.intercept ## ensure prediction works
}
} else drop.intercept <- as.logical(family$drop.intercept) ## family overrides argument
......@@ -1920,13 +1922,12 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
object <- bam.fit(G,mf,chunk.size,gp,scale,gamma,method,rho=rho,cl=cluster,
gc.level=gc.level,use.chol=use.chol,npt=nthreads)
} else if (G$discretize) {
object <- bgam.fitd(G, mf, gp ,scale ,nobs.extra=0,rho=rho,
object <- bgam.fitd(G, mf, gp ,scale ,nobs.extra=0,rho=rho,coef=coef,
control = control,npt=nthreads,gc.level=gc.level,...)
} else {
G$X <- matrix(0,0,ncol(G$X)); if (gc.level>1) gc()
if (rho!=0) warning("AR1 parameter rho unused with generalized model")
coef <- NULL
if (samfrac<1 && samfrac>0) { ## sub-sample first to get close to right answer...
ind <- sample(1:nrow(mf),ceiling(nrow(mf)*samfrac))
if (length(ind)<2*ncol(G$X)) warning("samfrac too small - ignored") else {
......@@ -1935,7 +1936,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
control1 <- control
control1$epsilon <- 1e-2
object <- bgam.fit(G, mf[ind,], chunk.size, gp ,scale ,gamma,method=method,nobs.extra=0,
control = control1,cl=cluster,npt=nthreads,gc.level=gc.level,
control = control1,cl=cluster,npt=nthreads,gc.level=gc.level,coef=coef,
use.chol=use.chol,samfrac=1,...)
G$w <- Gw;G$offset <- Goffset
coef <- object$coefficients
......
......@@ -43,6 +43,7 @@ cox.ph <- function (link = "identity") {
## code to evaluate in estimate.gam, to do with data ordering and
## baseline hazard estimation...
## first get the estimated hazard and prediction information...
G$X <- Sl.initial.repara(G$Sl,G$X,inverse=TRUE,cov=FALSE,both.sides=FALSE)
object$family$data <- G$family$hazard(G$y,G$X,object$coefficients,G$w)
rumblefish <- G$family$hazard(G$y,matrix(0,nrow(G$X),0),object$coefficients,G$w)
s0.base <- exp(-rumblefish$h[rumblefish$r]) ## no model baseline survival
......@@ -76,7 +77,7 @@ cox.ph <- function (link = "identity") {
n=as.integer(nrow(X)),p=as.integer(ncol(X)),
nt=as.integer(nt),PACKAGE="mgcv")
p <- ncol(X)
list(tr=tr,h=oo$h,q=oo$q,a=matrix(oo$A[p*nt],p,nt),nt=nt,r=r,km=oo$km)
list(tr=tr,h=oo$h,q=oo$q,a=matrix(oo$A[1:(p*nt)],p,nt),nt=nt,r=r,km=oo$km)
}
residuals <- function(object,type=c("deviance","martingale")) {
......
......@@ -845,7 +845,7 @@ gam.fit5 <- function(x,y,lsp,Sl,weights=NULL,offset=NULL,deriv=2,family,
ll <- llf(y,x,coef1,weights,family,offset=offset,deriv=1)
ll1 <- ll$l - (t(coef1)%*%St%*%coef1)/2
khalf <- 0;fac <- 2
while (ll1 < ll0 && khalf < 25) { ## step halve until it succeeds...
while ((!is.finite(ll1)||ll1 < ll0) && khalf < 25) { ## step halve until it succeeds...
step <- step/fac;coef1 <- coef + step
ll <- llf(y,x,coef1,weights,family,offset=offset,deriv=0)
ll1 <- ll$l - (t(coef1)%*%St%*%coef1)/2
......@@ -858,12 +858,12 @@ gam.fit5 <- function(x,y,lsp,Sl,weights=NULL,offset=NULL,deriv=2,family,
if (khalf>5) fac <- 5
} ## end step halve
if (ll1 < ll0) { ## switch to steepest descent...
if (!is.finite(ll1) || ll1 < ll0) { ## switch to steepest descent...
step <- -.5*drop(grad)*mean(abs(coef))/mean(abs(grad))
khalf <- 0
}
while (ll1 < ll0 && khalf < 25) { ## step cut until it succeeds...
while ((!is.finite(ll1)||ll1 < ll0) && khalf < 25) { ## step cut until it succeeds...
step <- step/10;coef1 <- coef + step
ll <- llf(y,x,coef1,weights,family,offset=offset,deriv=0)
ll1 <- ll$l - (t(coef1)%*%St%*%coef1)/2
......@@ -875,7 +875,7 @@ gam.fit5 <- function(x,y,lsp,Sl,weights=NULL,offset=NULL,deriv=2,family,
khalf <- khalf + 1
}
if (ll1 >= ll0||iter==control$maxit) { ## step ok. Accept and test
if ((is.finite(ll1)&&ll1 >= ll0)||iter==control$maxit) { ## step ok. Accept and test
coef <- coef + step
## convergence test...
ok <- (iter==control$maxit||(abs(ll1-ll0) < control$epsilon*abs(ll0)
......@@ -1097,7 +1097,7 @@ gam.fit5 <- function(x,y,lsp,Sl,weights=NULL,offset=NULL,deriv=2,family,
db.drho <- matrix(0,length(bdrop),ncol(d1b));db.drho[!bdrop,] <- d1b
} else db.drho <- d1b
## and undo re-para...
if (!is.null(d1b)) db.drho <- t(Sl.repara(rp$p,t(db.drho),inverse=TRUE,both.sides=FALSE))
if (!is.null(d1b)) db.drho <- t(Sl.repara(rp$rp,t(db.drho),inverse=TRUE,both.sides=FALSE))
ret <- list(coefficients=coef,family=family,y=y,prior.weights=weights,
fitted.values=fitted.values, linear.predictors=linear.predictors,
......
......@@ -1466,7 +1466,9 @@ gevlss <- function(link=list("identity","identity","logit")) {
stats[[i]]$d3link <- fam$d3link
stats[[i]]$d4link <- fam$d4link
}
if (link[[3]]=="logit") { ## shifted logit link to confine xi to (-1,.5)
if (link[[3]]=="logit") { ## shifted logit link to confine xi to (-1,.5)
## Smith '85 Biometrika shows that -1 limit needed for MLE consistency
## but would need -0.5 for normality...
stats[[3]]$linkfun <- function(mu) binomial()$linkfun((mu + 1)/1.5)
stats[[3]]$mu.eta <- function(eta) binomial()$mu.eta(eta)*1.5
stats[[3]]$linkinv <- function(eta) 1.5* binomial()$linkinv(eta) - 1
......
......@@ -1506,7 +1506,7 @@ gam.outer <- function(lsp,fscale,family,control,method,optimizer,criterion,scale
object$control <- control
if (inherits(family,"general.family")) {
mv <- gam.fit5.post.proc(object,G$Sl,G$L,G$lsp0,G$S,G$off)
object$coefficients <- Sl.initial.repara(G$Sl,object$coefficients,inverse=TRUE)
## object$coefficients <- Sl.initial.repara(G$Sl,object$coefficients,inverse=TRUE)
} else mv <- gam.fit3.post.proc(G$X,G$L,G$lsp0,G$S,G$off,object)
## note: use of the following in place of Vp appears to mess up p-values for smooths,
## but doesn't change r.e. p-values of course.
......@@ -1561,6 +1561,7 @@ estimate.gam <- function (G,method,optimizer,control,in.out,scale,gamma,start=NU
method <- "REML" ## any method you like as long as it's REML
G$Sl <- Sl.setup(G) ## prepare penalty sequence
## Xorig <- G$X ## store original X incase it is needed by family - poor option pre=proc can manipulate G$X
G$X <- Sl.initial.repara(G$Sl,G$X,both.sides=FALSE) ## re-parameterize accordingly
if (!is.null(start)) start <- Sl.initial.repara(G$Sl,start,inverse=FALSE,both.sides=FALSE)
......@@ -1762,9 +1763,14 @@ estimate.gam <- function (G,method,optimizer,control,in.out,scale,gamma,start=NU
names(object$edf) <- G$term.names
names(object$edf1) <- G$term.names
## extended family may need to manipulate fit object...
if (!is.null(G$family$postproc)) eval(G$family$postproc)
if (inherits(family,"general.family")) {
object$coefficients <- Sl.initial.repara(G$Sl,object$coefficients,inverse=TRUE)
}
## extended family may need to manipulate fit object. Code
## will need to include the following line if G$X used...
## G$X <- Sl.initial.repara(G$Sl,G$X,inverse=TRUE,cov=FALSE,both.sides=FALSE)
if (!is.null(G$family$postproc)) eval(G$family$postproc)
if (!is.null(G$P)) { ## matrix transforming from fit to prediction parameterization
object$coefficients <- as.numeric(G$P %*% object$coefficients)
......
# R plotting routines for the package mgcv (c) Simon Wood 2000-2010
## R plotting routines for the package mgcv (c) Simon Wood 2000-2010
## With contributions from Henric Nilsson
......@@ -307,11 +307,11 @@ gam.check <- function(b, old.style=FALSE,
if (!b$mgcv.conv$fully.converged)
cat(" by steepest\ndescent step failure.\n") else cat(".\n")
cat("The RMS",b$method,"score gradiant at convergence was",b$mgcv.conv$rms.grad,".\n")
cat("The RMS",b$method,"score gradient at convergence was",b$mgcv.conv$rms.grad,".\n")
if (b$mgcv.conv$hess.pos.def)
cat("The Hessian was positive definite.\n") else cat("The Hessian was not positive definite.\n")
cat("The estimated model rank was ",b$mgcv.conv$rank,
" (maximum possible: ",b$mgcv.conv$full.rank,")\n",sep="")
#cat("The estimated model rank was ",b$mgcv.conv$rank,
# " (maximum possible: ",b$mgcv.conv$full.rank,")\n",sep="")
}
}
if (!is.null(b$rank)) {
......@@ -438,7 +438,7 @@ plot.sos.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
partial.resids=FALSE,rug=TRUE,se=TRUE,scale=-1,n=100,n2=40,
pers=FALSE,theta=30,phi=30,jit=FALSE,xlab=NULL,ylab=NULL,main=NULL,
ylim=NULL,xlim=NULL,too.far=0.1,shade=FALSE,shade.col="gray80",
shift=0,trans=I,by.resids=FALSE,scheme=0,colors=heat.colors(100),
shift=0,trans=I,by.resids=FALSE,scheme=0,hcolors=heat.colors(100),
contour.col=3,...) {
## plot method function for sos.smooth terms
if (scheme>1) return(plot.mgcv.smooth(x,P=P,data=data,label=label,se1.mult=se1.mult,se2.mult=se2.mult,
......@@ -446,7 +446,7 @@ plot.sos.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
pers=pers,theta=theta,phi=phi,jit=jit,xlab=xlab,ylab=ylab,main=main,
ylim=ylim,xlim=xlim,too.far=too.far,shade=shade,shade.col=shade.col,
shift=shift,trans=trans,by.resids=by.resids,scheme=scheme-2,
colors=colors,contour.col=contour.col,...))
hcolors=hcolors,contour.col=contour.col,...))
## convert location of pole in plotting grid to radians
phi <- phi*pi/180
theta <- theta*pi/180
......@@ -506,7 +506,7 @@ plot.sos.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
if (scheme == 0) {
col <- 1# "lightgrey
zz[P$ind] <- P$fit
image(P$xm,P$ym,matrix(zz,m,m),col=colors,axes=FALSE,xlab="",ylab="",...)
image(P$xm,P$ym,matrix(zz,m,m),col=hcolors,axes=FALSE,xlab="",ylab="",...)
if (rug) {
if (is.null(list(...)[["pch"]])) points(P$raw$x,P$raw$y,pch=".",...) else
points(P$raw$x,P$raw$y,...)
......@@ -696,7 +696,7 @@ plot.mgcv.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
partial.resids=FALSE,rug=TRUE,se=TRUE,scale=-1,n=100,n2=40,
pers=FALSE,theta=30,phi=30,jit=FALSE,xlab=NULL,ylab=NULL,main=NULL,
ylim=NULL,xlim=NULL,too.far=0.1,shade=FALSE,shade.col="gray80",
shift=0,trans=I,by.resids=FALSE,scheme=0,colors=heat.colors(50),
shift=0,trans=I,by.resids=FALSE,scheme=0,hcolors=heat.colors(50),
contour.col=3,...) {
## default plot method for smooth objects `x' inheriting from "mgcv.smooth"
## `x' is a smooth object, usually part of a `gam' fit. It has an attribute
......@@ -910,9 +910,9 @@ plot.mgcv.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
persp(P$x,P$y,matrix(trans(P$fit+shift),n2,n2),xlab=P$xlab,ylab=P$ylab,
zlab=P$main,ylim=P$ylim,xlim=P$xlim,theta=theta,phi=phi,...)
} else if (scheme==2||scheme==3) {
if (scheme==3) colors <- grey(0:50/50)
if (scheme==3) hcolors <- grey(0:50/50)
image(P$x,P$y,matrix(trans(P$fit+shift),n2,n2),xlab=P$xlab,ylab=P$ylab,
main=P$main,xlim=P$xlim,ylim=P$ylim,col=colors,...)
main=P$main,xlim=P$xlim,ylim=P$ylim,col=hcolors,...)
contour(P$x,P$y,matrix(trans(P$fit+shift),n2,n2),add=TRUE,col=contour.col,...)
if (rug) {
if (is.null(list(...)[["pch"]])) points(P$raw$x,P$raw$y,pch=".",...) else
......@@ -957,9 +957,9 @@ plot.mgcv.smooth <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
persp(P$x,P$y,matrix(trans(P$fit+shift),n2,n2),xlab=P$xlab,ylab=P$ylab,
zlab=P$main,theta=theta,phi=phi,xlim=P$xlim,ylim=P$ylim,...)
} else if (scheme==2||scheme==3) {
if (scheme==3) colors <- grey(0:50/50)
if (scheme==3) hcolors <- grey(0:50/50)
image(P$x,P$y,matrix(trans(P$fit+shift),n2,n2),xlab=P$xlab,ylab=P$ylab,
main=P$main,xlim=P$xlim,ylim=P$ylim,col=colors,...)
main=P$main,xlim=P$xlim,ylim=P$ylim,col=hcolors,...)
contour(P$x,P$y,matrix(trans(P$fit+shift),n2,n2),add=TRUE,col=contour.col,...)
if (rug) {
if (is.null(list(...)[["pch"]])) points(P$raw$x,P$raw$y,pch=".",...) else
......
......@@ -3039,12 +3039,13 @@ gpE <- function(x,xk,defn = NA) {
if (rho <= 0) rho <- max(E) ## approximately the K & W choise
E <- E/rho
if (!type%in%1:5||k>2||k<=0) stop("incorrect arguments to GP smoother")
if (type>2) eE <- exp(-E)
E <- switch(type,
(1 - 1.5*E + 0.5 *E^3)*(E<=rho), ## 1 spherical
exp(-E^k), ## 2 power exponential
(1 + E) * exp(-E), ## 3 Matern k = 1.5
(1 + E + E^2/3) * exp(-E), ## 4 Matern k = 2.5
(1 + E + .4 * E^2 + E^3 / 15) * exp(-E) ## 5 Matern k = 3.5
(1 + E) * eE, ## 3 Matern k = 1.5
eE + (E*eE)*(1+E/3), ## 4 Matern k = 2.5
eE + (E*eE)*(1+.4*E+E^2/15) ## 5 Matern k = 3.5
)
attr(E,"defn") <- c(type,rho,k)
E
......
......@@ -748,7 +748,7 @@ plot.soap.film <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
partial.resids=FALSE,rug=TRUE,se=TRUE,scale=-1,n=100,n2=40,
pers=FALSE,theta=30,phi=30,jit=FALSE,xlab=NULL,ylab=NULL,main=NULL,
ylim=NULL,xlim=NULL,too.far=0.1,shade=FALSE,shade.col="gray80",
shift=0,trans=I,by.resids=FALSE,scheme=0,colors=heat.colors(100),
shift=0,trans=I,by.resids=FALSE,scheme=0,hcolors=heat.colors(100),
contour.col=1,...) {
## plot method function for soap.smooth terms
if (scheme==3) {
......@@ -760,7 +760,7 @@ plot.soap.film <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
partial.resids=partial.resids,rug=rug,se=se,scale=scale,n=n,n2=n2,
pers=pers,theta=theta,phi=phi,jit=jit,xlab=xlab,ylab=ylab,main=main,
ylim=ylim,xlim=xlim,too.far=too.far,shade=shade,shade.col=shade.col,
shift=shift,trans=trans,by.resids=by.resids,scheme=scheme,colors=colors,
shift=shift,trans=trans,by.resids=by.resids,hcolors=hcolors, ## don't pass scheme!!
contour.col=contour.col,...)
if (outline) { if (is.null(names(P$bnd))) {
for (i in 1:length(P$bnd)) lines(P$bnd[[i]],lwd=2)
......@@ -789,7 +789,7 @@ plot.soap.film <- function(x,P=NULL,data=NULL,label="",se1.mult=1,se2.mult=2,
ylim <- range(P$yscale);dy = ylim[2] - ylim[1]
plot(P$xscale[1],P$yscale[1],xlab=P$xlab,ylab=P$ylab,main=P$main,xlim=xlim,ylim=ylim,...)
rect(xlim[1]-dx,ylim[1]-dy,xlim[2]+dx,ylim[2]+dy,col="lightgrey")
image(P$xscale,P$yscale,P$fit,add=TRUE,col=colors,...)
image(P$xscale,P$yscale,P$fit,add=TRUE,col=hcolors,...)
contour(P$xscale,P$yscale,P$fit,add=TRUE,col=contour.col,...)
} else if (scheme==1) {
image(P$xscale,P$yscale,P$fit,col=grey(0:50/50),xlab=P$xlab,
......
......@@ -21,8 +21,8 @@ bam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action=na.omit, offset=NULL,method="fREML",control=list(),
select=FALSE,scale=0,gamma=1,knots=NULL,sp=NULL,min.sp=NULL,
paraPen=NULL,chunk.size=10000,rho=0,AR.start=NULL,discrete=FALSE,
cluster=NULL,nthreads=NA,gc.level=1,use.chol=FALSE,
samfrac=1,drop.unused.levels=TRUE,G=NULL,fit=TRUE,drop.intercept=NULL,...)
cluster=NULL,nthreads=NA,gc.level=1,use.chol=FALSE,samfrac=1,
coef=NULL,drop.unused.levels=TRUE,G=NULL,fit=TRUE,drop.intercept=NULL,...)
}
%- maybe also `usage' for other objects documented here.
......@@ -148,6 +148,8 @@ and then finds its Choleski decomposition, at the end. This is somewhat more eff
\item{samfrac}{For very large sample size Generalized additive models the number of iterations needed for the model fit can
be reduced by first fitting a model to a random sample of the data, and using the results to supply starting values. This initial fit is run with sloppy convergence tolerances, so is typically very low cost. \code{samfrac} is the sampling fraction to use. 0.1 is often reasonable. }
\item{coef}{initial values for model coefficients}
\item{drop.unused.levels}{by default unused levels are dropped from factors before fitting. For some smooths
involving factor variables you might want to turn this off. Only do so if you know what you are doing.}
......
......@@ -5,7 +5,9 @@
\description{The \code{cox.ph} family implements the Cox Proportional Hazards model with Peto's
correction for ties, and estimation by penalized partial likelihood maximization, for use with
\code{\link{gam}}. In the model formula, event time is the response. The \code{weights} vector provides
the censoring information (0 for censoring, 1 for event).
the censoring information (0 for censoring, 1 for event). Deals with the case in which each subject has
one event/censoring time and one row of covariate values. When each subject has several time dependent
covariates see \code{\link{cox.pht}}.
}
......@@ -26,16 +28,19 @@ information is provided by the \code{weights} argument to \code{gam}, with 1 ind
censoring.
Prediction from the fitted model object (using the \code{predict} method) with \code{type="response"} will predict on the
survivor function scale. See example code below for extracting the baseline hazard/survival directly. Martingale or deviance
survivor function scale. See example code below for extracting the cumulative baseline hazard/survival directly.
Martingale or deviance
residuals can be extracted. The \code{fitted.values} stored in the model object are survival function estimates for each
subject at their event/censoring time.
Estimation of model coefficients is by maximising the log-partial likelihood penalized by the smoothing penalties. See e.g.
Hastie and Tibshirani, 1990, section 8.3. for the partial likelihood used (with Peto's approximation for ties), but note that
optimization of the partial likelihood does not follow Hastie and Tibshirani. See Klein amd Moeschberger (2003) for
estimation of residuals, the baseline hazard, survival function and associated standard errors.
estimation of residuals, the cumulative baseline hazard, survival function and associated standard errors (the survival standard error expression has a typo).
The percentage deviance explained reported for Cox PH models is based on the sum of squares of the deviance residuals, as the model deviance, and the sum of squares of the deviance residuals when the covariate effects are set to zero, as the null deviance. The same baseline hazard estimate is used for both.
This family deals efficiently with the case in which each subject has one event/censoring time and one row of covariate values. For studies in which there are multiple time varying covariate measures for each subject then the equivalent Poisson model should be fitted to suitable pseudodata using \code{bam(...,discrete=TRUE)}.
}
\references{
......
\name{cox.pht}
\alias{cox.pht}
%- Also NEED an `\alias' for EACH other topic documented here.
\title{Additive Cox proportional hazard models with time varying covariates}
\description{The \code{cox.ph} family only allows one set of covariate values per subject. If each subject has several time varying covariate measurements then it is still possible to fit a proportional hazards regression model, via an equivalent Poisson model. The recipe is provided by Whitehead (1980) and is equally valid in the smooth additive case. Its drawback is that the equivalent Poisson dataset can be quite large.
The trick is to generate an artificial Poisson observation for each subject in the risk set at each non-censored event time. The corresponding covariate values for each subject are whatever they are at the event time, while the Poisson response is zero for all subjects except those experiencing the event at that time (this corresponds to Peto's correction for ties). The linear predictor for the model must include an intercept for each event time (the cumulative sum of the exponential of these is the Breslow estimate of the baseline hazard).
Below is some example code employing this trick for the \code{\link[survival]{pbcseq}} data from the \code{survival} package. It uses \code{\link{bam}} for fitting with the \code{discrete=TRUE} option for efficiency: there is some approximation involved in doing this, and the exact equivalent to what is done in \code{\link{cox.ph}} is rather obtained by using \code{\link{gam}} with \code{method="REML"} (taking some 14 times the computational time for the example below).
The function \code{tdpois} in the example code uses crude piecewise constant interpolation for the covariates, in which the covariate value at an event time is taken to be whatever it was the previous time that it was measured. Obviously more sophisticated interpolation schemes might be preferable.
}
\references{
Whitehead (1980) Fitting Cox's regression model to survival data using GLIM. Applied Statistics 29(3):268-275
}
\examples{
require(mgcv);require(survival)
## First define functions for producing Poisson model data frame
app <- function(x,t,to) {
## wrapper to approx for calling from apply...
y <- if (sum(!is.na(x))<1) rep(NA,length(to)) else
approx(t,x,to,method="constant",rule=2)$y
if (is.factor(x)) factor(levels(x)[y],levels=levels(x)) else y
} ## app
tdpois <- function(dat,event="z",et="futime",t="day",status="status1",
id="id") {
## dat is data frame. id is patient id; et is event time; t is
## observation time; status is 1 for death 0 otherwise;
## event is name for Poisson response.
if (event \%in\% names(dat)) warning("event name in use")
require(utils) ## for progress bar
te <- sort(unique(dat[[et]][dat[[status]]==1])) ## event times
sid <- unique(dat[[id]])
prg <- txtProgressBar(min = 0, max = length(sid), initial = 0,
char = "=",width = NA, title="Progress", style = 3)
## create dataframe for poisson model data
dat[[event]] <- 0; start <- 1
dap <- dat[rep(1:length(sid),length(te)),]
for (i in 1:length(sid)) { ## work through patients
di <- dat[dat[[id]]==sid[i],] ## ith patient's data
tr <- te[te <= di[[et]][1]] ## times required for this patient
## Now do the interpolation of covariates to event times...
um <- data.frame(lapply(X=di,FUN=app,t=di[[t]],to=tr))
## Mark the actual event...
if (um[[et]][1]==max(tr)&&um[[status]]==1) um[[event]][nrow(um)] <- 1
um[[et]] <- tr ## reset time to relevant event times
dap[start:(start-1+nrow(um)),] <- um ## copy to dap
start <- start + nrow(um)
setTxtProgressBar(prg, i)
}
close(prg)
dap[1:(start-1),]
} ## tdpois
## The following takes too long for CRAN checking
## (but typically takes a minute or less).
\dontrun{
## Convert pbcseq to equivalent Poisson form...
pbcseq$status1 <- as.numeric(pbcseq$status==2) ## death indicator
pb <- tdpois(pbcseq) ## conversion
pb$tf <- factor(pb$futime) ## add factor for event time
## Fit Poisson model...
b <- bam(z ~ tf - 1 + sex + trt + s(sqrt(protime)) + s(platelet)+ s(age)+
s(bili)+s(albumin), family=poisson,data=pb,discrete=TRUE,nthreads=2)
par(mfrow=c(2,3))
plot(b,scale=0)
## compute residuals...
chaz <- tapply(fitted(b),pb$id,sum) ## cum haz by subject
d <- tapply(pb$z,pb$id,sum) ## censoring indicator
mrsd <- d - chaz ## Martingale
drsd <- sign(mrsd)*sqrt(-2*(mrsd + d*log(chaz))) ## deviance
## plot survivor function and s.e. band for subject 25
te <- sort(unique(pb$futime)) ## event times
di <- pbcseq[pbcseq$id==25,] ## data for subject 25
pd <- data.frame(lapply(X=di,FUN=app,t=di$day,to=te)) ## interpolate to te
pd$tf <- factor(te)
X <- predict(b,newdata=pd,type="lpmatrix")
eta <- drop(X\%*\%coef(b)); H <- cumsum(exp(eta))
J <- apply(exp(eta)*X,2,cumsum)
se <- diag(J\%*\%vcov(b)\%*\%t(J))^.5
plot(stepfun(te,c(1,exp(-H))),do.points=FALSE,ylim=c(0.7,1),
ylab="S(t)",xlab="t (days)",main="",lwd=2)
lines(stepfun(te,c(1,exp(-H+se))),do.points=FALSE)
lines(stepfun(te,c(1,exp(-H-se))),do.points=FALSE)
rug(pbcseq$day[pbcseq$id==25]) ## measurement times
}
}
\keyword{models} \keyword{regression}%-- one or more ..
......@@ -31,6 +31,7 @@ The following families implement more general model classes. Usable only with \c
\itemize{
\item \code{\link{cox.ph}} the Cox Proportional Hazards model for survival data.
\item \code{\link{gaulss}} a Gaussian location-scale model where the mean and the standard deviation are both modelled using smooth linear predictors.
\item \code{\link{gevlss}} a generalized extreme value (GEV) model where the location, scale and shape parameters are each modelled using a linear predictor.
\item \code{\link{ziplss}} a `two-stage' zero inflated Poisson model, in which 'potential-presence' is modelled with one linear predictor, and Poisson mean abundance
given potential presence is modelled with a second linear predictor.
\item \code{\link{mvn}}: multivariate normal additive models.
......
......@@ -243,7 +243,7 @@ Linked smoothing parameters and adaptive smoothing are not supported.
\code{\link{predict.gam}},
\code{\link{plot.gam}}, \code{\link{summary.gam}}, \code{\link{negbin}},
\code{\link{vis.gam}},\code{\link{pdTens}}, \code{gamm4} (
\url{http://cran.r-project.org/package=gamm4})
\url{https://cran.r-project.org/package=gamm4})
}
\examples{
......
......@@ -18,7 +18,7 @@ gevlss(link=list("identity","identity","logit"))
\details{Used with \code{\link{gam}} to fit Generalized Extreme Value location scale and shape models. \code{gam} is called with a list containing 3 formulae: the first specifies the response on the left hand side and the structure of the linear predictor for the location parameter on the right hand side. The second is one sided, specifying the linear predictor for the log scale parameter on the right hand side. The third is one sided specifying the linear predictor for the shape parameter.
Link functions \code{"identity"} and \code{"log"} are available for the location (mu) parameter. There is no choice of link for the log scale parameter (\eqn{\rho = \log \sigma}{rho = log sigma}). The shape parameter (xi) defaults to a modified logit link restricting its range to (-1,.5), the upper limit is required to ensure existence of an MLE.
Link functions \code{"identity"} and \code{"log"} are available for the location (mu) parameter. There is no choice of link for the log scale parameter (\eqn{\rho = \log \sigma}{rho = log sigma}). The shape parameter (xi) defaults to a modified logit link restricting its range to (-1,.5), the upper limit is required to ensure finite variance, while the lower limit ensures consistency of the MLE (Smith, 1985).
The fitted values for this family will be a three column matrix. The first column is the location parameter, the second column is the log scale parameter, the third column is the shape parameter.
......@@ -28,6 +28,9 @@ The derivative system code for this family is mostly auto-generated, and the fam
}
\references{
Smith, R.L. (1985) Maximum likelihood estimation in a class of
nonregular cases. Biometrika 72(1):67-90
Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and
model selection for general smooth models.
Journal of the American Statistical Association.
......
......@@ -135,7 +135,10 @@ Chambers and Hastie (1993) (but is not a clone).
}
\section{WARNING }{
Predictions are likely to be incorrect if data dependent transformations of the covariates
are used within calls to smooths. See examples in \code{\link{predict.gam}}.
}
\seealso{ \code{\link{bam}}, \code{\link{predict.gam}}}
......
......@@ -2,6 +2,7 @@
\alias{predict.gam}
%- Also NEED an `\alias' for EACH other topic documented here.
\title{Prediction from fitted GAM model}
\description{ Takes a fitted \code{gam} object produced by \code{gam()}
and produces predictions given a new set of values for the model covariates
or the original values used for the model fit. Predictions can be accompanied
......@@ -128,7 +129,11 @@ Chambers and Hastie (1993) (but is not a clone).
}
\section{WARNING }{ Note that the behaviour of this function is not identical to
\section{WARNING }{
Predictions are likely to be incorrect if data dependent transformations of the covariates
are used within calls to smooths. See examples.
Note that the behaviour of this function is not identical to
\code{predict.gam()} in Splus.
\code{type=="terms"} does not exactly match what \code{predict.lm} does for
......@@ -191,6 +196,19 @@ mean(res);var(res)
res <- colSums(log(abs(Xp \%*\% t(br))))
##################################################################
# illustration of unsafe scale dependent transforms in smooths....
##################################################################
b0 <- gam(y~s(x0)+s(x1)+s(x2)+scale(x3),data=dat) ## safe
b1 <- gam(y~s(x0)+s(I(x1/2))+s(x2)+scale(x3),data=dat) ## safe
b2 <- gam(y~s(x0)+s(scale(x1))+s(x2)+scale(x3),data=dat) ## unsafe
pd <- dat; pd$x1 <- pd$x1/2; pd$x3 <- pd$x3/2
par(mfrow=c(1,2))
plot(predict(b0,pd),predict(b1,pd),main="b0 and b1 predictions match");abline(0,1,col=2)
plot(predict(b0,pd),predict(b2,pd),main="b2 unsafe, doesn't match");abline(0,1,col=2)
##################################################################
## The following shows how to use use an "lpmatrix" as a lookup
## table for approximate prediction. The idea is to create
......
......@@ -37,7 +37,7 @@ This is a restricted, but faster, version of \code{rtweedie} from the \code{twee
\references{
Peter K Dunn (2009). tweedie: Tweedie exponential family models. R