Commit 7ef10a6a authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.8-2

parent 8b8f464f
** denotes quite substantial/important changes
*** denotes really big changes
1.8-2
* For exponential family gams, fitted by outer iteration, a warning is now
generated if the Pearson scale parameter estimate is more than twice
a robust estimate. This may indicate an unstable Pearson estimate.
* 'gam.control' now has an option 'scale.est' to allow selection of the
estimator to use for the scale parameter in exponential family GAMs.
See ?gam.scale. Thanks to Trevor Davies for providing a clear unstable
Pearson estimate example.
* drop.unused.levels argument added to gam, bam and gamm to allow
"mrf" (and "re") terms to have unobserved factor levels.
* "mrf" constructor modified to deal properly with regions that contain no
observations.
* "fs" smooths are no longer eligible to have side conditions set, since
they are fully penalized terms and hence always identifiable (in theory).
* predict.bam was not declared as a method in NAMESPACE - fixed
* predict.bam modified to strip down object to save memory (especially in
parallel).
* predict.gam now has block.size=NULL as default. This implies a block
size of 1000 when newdata supplied, and use of a single block if no
new data was supplied.
* some messages were not printing correctly after a change in
message handling to facilitate easier translation. Now fixed.
1.8-1
* bam modified so that choleski based fitting works properly with rank
......
Package: mgcv
Version: 1.8-1
Version: 1.8-2
Author: Simon Wood <simon.wood@r-project.org>
Maintainer: Simon Wood <simon.wood@r-project.org>
Title: Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness
......@@ -14,7 +14,7 @@ Suggests: splines, parallel, survival, MASS
LazyLoad: yes
ByteCompile: yes
License: GPL (>= 2)
Packaged: 2014-07-05 09:43:28 UTC; sw283
Packaged: 2014-07-21 08:56:24 UTC; sw283
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-07-06 00:16:40
Date/Publication: 2014-07-21 12:12:11
bda4042be09d0aa5d24bd6426ff711e3 *ChangeLog
eda47726d381d2a2e3bf0a34d8046426 *DESCRIPTION
10b8bc7c708c2f3d89cc69d8bd67f3e3 *ChangeLog
ad04030a1a5f9a3888602dee013351e7 *DESCRIPTION
eb723b61539feef013de476e68b5c50a *GPL-2
d1659aca631e70420c5743e07bc71935 *NAMESPACE
752f0cb574845c1596bf656e9ff48ad8 *R/bam.r
870396939733c58c9d523cbc41f9da59 *NAMESPACE
ca641f4a3db11cd9494762ad2286665c *R/bam.r
9d5bb48cc0072b9ba638567986928a73 *R/coxph.r
d6f40f3d2acd3e30f3a59cbc85da6bc3 *R/efam.r
2fb8891700c1a813dc387cdce8dd817b *R/fast-REML.r
7953f7d1ff8d5ba96778c84934620e64 *R/gam.fit3.r
e72ee5107056471bd2ac183003ab5e25 *R/gam.fit3.r
6586de2772276237525c646362926f8b *R/gam.fit4.r
e63276b78a4ff2566af2e651bfc6aa9c *R/gam.sim.r
bd651b8eec80d83d6e0de5f47715a0de *R/gamlss.r
2c907093bcf015b79fdd1ac42ccc13f6 *R/gamm.r
183023013034703b80230bf238e35c72 *R/mgcv.r
449b21801a62d0d6b753e753d4861570 *R/gamm.r
ee07c79f181f6cc17edeafc8a33ae51b *R/mgcv.r
ed2cdcfa5c0f772bb75eee723b7294cd *R/misc.r
c287514d201a02f060fee98aab1e93c8 *R/mvam.r
4b0df0a2310fba6b38ea8164f76d0ccd *R/plots.r
3b176be0ae6388194fd53e1800f0bbe9 *R/smooth.r
8d3e8c0e0009b8028bf08a2b1919f850 *R/smooth.r
90f72ef91dfc2a25c262fc002f781bf7 *R/soap.r
0c5d2acdc985d2af9de6bb736d4df2ab *R/sparse.r
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
......@@ -28,7 +28,7 @@ e9b0a2e31b130cf2cb38618ec50d1919 *man/Predict.matrix.soap.film.Rd
f12468625253dd3603907de233762fd6 *man/Rrank.Rd
f6cadda5999c35800fd65c08d6812f7b *man/Tweedie.Rd
80f8763baa4987579e2aa56073a9e94e *man/anova.gam.Rd
b25d938d892cd35cda3effd2cddbab96 *man/bam.Rd
49fe84873e296267a96ed44c044d71c4 *man/bam.Rd
71bde8b8caa24a36529ce7e0ac3165d8 *man/bam.update.Rd
a2beb811b1093c5e82ef32d7de1f7d32 *man/cSplineDes.Rd
9b4d616d1b6c4a46ca77d16cded3f806 *man/choose.k.Rd
......@@ -44,21 +44,22 @@ c7561a0f2e114247955e212aeabc6ae9 *man/formXtViX.Rd
d87ff6287d7e343ea24d2053f4724b35 *man/formula.gam.Rd
4da4d585b329769eb44f0c7a6e7dd554 *man/fs.test.Rd
6f405acde2d7b6f464cf45f5395113ba *man/full.score.Rd
54bfbaff487b3f06a3a8ab71fcd34362 *man/gam.Rd
b113877121ece5b7e227cd07138b7623 *man/gam.Rd
fe61dd0efab9e920c17335faf3d5764c *man/gam.check.Rd
96c223bf025aaeb27c432119416b29cf *man/gam.control.Rd
a65bc22f606e45d185bc375fbf5698a1 *man/gam.control.Rd
44db24b66ce63bc16d2c8bc3f5b42ac5 *man/gam.convergence.Rd
58ab3b3d6f4fd0d008d73c3c4e6d3305 *man/gam.fit.Rd
dcf10ab3cc3102f7fb36d3ddf44013f5 *man/gam.fit3.Rd
8ba3991b5932b0775b452d20c9ff4d54 *man/gam.models.Rd
e969287d1a5c281faa7eb6cfce31a7c5 *man/gam.outer.Rd
39e2d1fa5374f3b5cff1d761b73f0daa *man/gam.scale.Rd
96676186808802344a99f9d3170bf775 *man/gam.selection.Rd
956bf1a6ac1361dd0403c28153b03a9b *man/gam.side.Rd
b2ff252537dd2155524b774b2435e66e *man/gam.vcomp.Rd
a66a814cc4c6f806e824751fda519ae0 *man/gam2objective.Rd
717401fd7efa3b39d90418a5d1d0c216 *man/gamObject.Rd
a2593990d6a87f7b783d0435062dec02 *man/gamSim.Rd
76276a29a43519f2be3bbb34cbdc385e *man/gamm.Rd
3acf983ba10da78af067d4d3a1794026 *man/gamm.Rd
df4a6db696749986fd5e20586fc9b718 *man/gaulss.Rd
39ae109127110152e97dc9f79e08bb14 *man/get.var.Rd
a2ea1233d43fac89e0cacbc09a8d31e2 *man/in.out.Rd
......@@ -93,7 +94,7 @@ edf57071572275a8443b2f0b66d44424 *man/place.knots.Rd
4aed50caabd7f058f90437dab4cdbee0 *man/plot.gam.Rd
c27a6b886929b1dc83bf4b90cae848f9 *man/polys.plot.Rd
afca36f5b1a5d06a7fcab2eaaa029e7e *man/predict.bam.Rd
3840817b3c4d9e122dfcc0604dbdc79c *man/predict.gam.Rd
900647986ab1365d437368cc28b62658 *man/predict.gam.Rd
7562cb5ce9a7fbf0cd60124cf4305315 *man/print.gam.Rd
b8ac3ed8fc05bb14c605b996404853cd *man/qq.gam.Rd
f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd
......@@ -108,9 +109,9 @@ b5a06918956fd10f23285b453c05bdb4 *man/smooth.construct.Rd
76013feaf70d00976bba0154b6f2c946 *man/smooth.construct.cr.smooth.spec.Rd
f5e6d0f5122f61c336827b3615482157 *man/smooth.construct.ds.smooth.spec.Rd
db75c958cbfb561914a3291ab58b9786 *man/smooth.construct.fs.smooth.spec.Rd
83e9283d1559be13aec5f5e3b0334ca0 *man/smooth.construct.mrf.smooth.spec.Rd
4aaa84b520992fbc32b0c37f7f63c1dd *man/smooth.construct.mrf.smooth.spec.Rd
abe15377f471a2d8957a59c19eeef0bb *man/smooth.construct.ps.smooth.spec.Rd
d202c6718fb1138fdd99e6102250aedf *man/smooth.construct.re.smooth.spec.Rd
6aaff8575f68775ed21930893ea9e03d *man/smooth.construct.re.smooth.spec.Rd
a8d5eb4772641567fee15714f01a4727 *man/smooth.construct.so.smooth.spec.Rd
9c35c35b9583281eaa55b97c9b9fe770 *man/smooth.construct.sos.smooth.spec.Rd
3cb4e59f915c8d64b90754eaeeb5a86f *man/smooth.construct.t2.smooth.spec.Rd
......@@ -133,7 +134,7 @@ a16b3a5a4d13c705dcab8d1cd1b3347e *man/vcov.gam.Rd
becbe3e1f1588f7292a74a97ef07a9ae *po/R-de.po
0bdfcf98961b0d52b60f806dc1dba77e *po/R-en@quot.po
9126ed91030cd1c168d669854cf1f510 *po/R-fr.po
a7075f1ed68a4bfc645dc2238ab0da48 *po/R-mgcv.pot
7563cdf47066589a08ba01eacfcf0bcb *po/R-mgcv.pot
ccbf345b8ca9544e9239fa85fff897a1 *po/R-pl.po
d5a0f198090ecbfedaa6549f2918b997 *po/R-po.po
3550a794504bdfd4f75a83de1148bb40 *po/de.po
......
......@@ -82,6 +82,7 @@ S3method(logLik, gam)
S3method(model.matrix,gam)
S3method(plot, gam)
S3method(predict, gam)
S3method(predict, bam)
S3method(print, anova.gam)
S3method(print, gam)
S3method(print, summary.gam)
......
......@@ -366,7 +366,7 @@ bgam.fit <- function (G, mf, chunk.size, gp ,scale ,gamma,method, coef=NULL,etas
rss.extra <- qrx$y.norm2 - sum(qrx$f^2)
if (control$trace)
gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv")
message(gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv"))
if (!is.finite(dev)) stop("Non-finite deviance")
......@@ -578,7 +578,7 @@ bgam.fit2 <- function (G, mf, chunk.size, gp ,scale ,gamma,method, etastart = NU
rss.extra <- qrx$y.norm2 - sum(qrx$f^2)
if (control$trace)
gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv")
message(gettextf("Deviance = %s Iterations - %d\n", dev, iter, domain = "R-mgcv"))
if (!is.finite(dev)) stop("Non-finite deviance")
......@@ -724,6 +724,12 @@ predict.bam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
block.size=50000,newdata.guaranteed=FALSE,na.action=na.pass,
cluster=NULL,...) {
## function for prediction from a bam object, possibly in parallel
## remove some un-needed stuff from object
object$Sl <- object$qrx <- object$R <- object$F <- object$Ve <-
object$Vc <- object$G <- object$residuals <- object$fitted.values <-
object$linear.predictors <- NULL
gc()
if (!is.null(cluster)&&inherits(cluster,"cluster")) {
require(parallel)
n.threads <- length(cluster)
......@@ -757,7 +763,12 @@ predict.bam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
arg[[i]]$newdata <- newdata[ind,]
}
} ## finished setting up arguments
## newdata and object no longer needed - all info in thread lists...
if (!missing(newdata)) rm(newdata)
rm(object)
gc()
res <- parLapply(cluster,arg,pabapr) ## perform parallel prediction
gc()
## and splice results back together...
if (type=="lpmatrix") {
X <- res[[1]]
......@@ -1100,7 +1111,7 @@ sparse.model.matrix <- function(G,mf,chunk.size) {
bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,na.action=na.omit,
offset=NULL,method="fREML",control=list(),scale=0,gamma=1,knots=NULL,
sp=NULL,min.sp=NULL,paraPen=NULL,chunk.size=10000,rho=0,AR.start=NULL,sparse=FALSE,cluster=NULL,
gc.level=1,use.chol=FALSE,samfrac=1,...)
gc.level=1,use.chol=FALSE,samfrac=1,drop.unused.levels=TRUE,...)
## Routine to fit an additive model to a large dataset. The model is stated in the formula,
## which is then interpreted to figure out which bits relate to smooth terms and which to
......@@ -1135,7 +1146,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
mf$method <- mf$family<-mf$control<-mf$scale<-mf$knots<-mf$sp<-mf$min.sp <- mf$gc.level <-
mf$gamma <- mf$paraPen<- mf$chunk.size <- mf$rho <- mf$sparse <- mf$cluster <-
mf$use.chol <- mf$samfrac <- mf$...<-NULL
mf$drop.unused.levels<-TRUE
mf$drop.unused.levels <- drop.unused.levels
mf[[1]]<-as.name("model.frame")
pmf <- mf
......
......@@ -76,6 +76,35 @@ get.Eb <- function(rS,rank)
t(mroot(S,rank=rank)) ## E such that E'E = S
} ## get.Eb
gam.scale <- function(wp,wd,dof,extra=0) {
## obtain estimates of the scale parameter, using the weighted Pearson and
## deviance residuals and the residual effective degrees of freedom.
## Problem is that Pearson is unbiased, but potentially unstable (e.g.
## when count is 1 but mean is tiny, so that pearson residual is enormous,
## although deviance residual is much less extreme).
pearson <- (sum(wp^2)+extra)/dof
deviance <- (sum(wd^2)+extra)/dof
## now scale deviance residuals to have magnitude similar
## to pearson and compute new estimator.
kd <- wd
ind <- wd > 0
kd[ind] <- wd[ind]*median(wp[ind]/wd[ind])
ind <- wd < 0
kd[ind] <- wd[ind]*median(wp[ind]/wd[ind])
robust <- (sum(kd^2)+extra)/dof
## force estimate to lie between deviance and pearson estimators
if (pearson > deviance) {
if (robust < deviance) robust <- deviance
if (robust > pearson) robust <- pearson
} else {
if (robust > deviance) robust <- deviance
if (robust < pearson) robust <- pearson
}
list(pearson=pearson,deviance=deviance,robust=robust)
}
gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
weights = rep(1, nobs), start = NULL, etastart = NULL,
mustart = NULL, offset = rep(0, nobs),U1=diag(ncol(x)), Mp=-1, family = gaussian(),
......@@ -370,7 +399,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
dev <- sum(dev.resids(y, mu, weights))
if (control$trace)
gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv")
message(gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv"))
boundary <- FALSE
if (!is.finite(dev)) {
......@@ -421,7 +450,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
pdev <- dev + penalty ## the penalized deviance
if (control$trace)
gettextf("penalized deviance = %s", pdev, domain = "R-mgcv")
message(gettextf("penalized deviance = %s", pdev, domain = "R-mgcv"))
div.thresh <- 10*(.1+abs(old.pdev))*.Machine$double.eps^.5
......@@ -444,7 +473,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
dev <- sum(dev.resids(y, mu, weights))
pdev <- dev + t(start)%*%St%*%start ## the penalized deviance
if (control$trace)
gettextf("Step halved: new penalized deviance = %g", pdev, "\n")
message(gettextf("Step halved: new penalized deviance = %g", pdev, "\n"))
}
}
......@@ -475,8 +504,10 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
}
} ### end main loop
dev <- sum(dev.resids(y, mu, weights))
wdr <- dev.resids(y, mu, weights)
dev <- sum(wdr)
wdr <- sign(y-mu)*sqrt(pmax(wdr,0)) ## used below in scale estimation
## Now call the derivative calculation scheme. This requires the
## following inputs:
## z and w - the pseudodata and weights
......@@ -595,9 +626,18 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
mu <- linkinv(eta)
}
trA <- oo$trA;
pearson <- sum(weights*(y-mu)^2/family$variance(mu)) ## Pearson statistic
wpr <- (y-mu) *sqrt(weights/family$variance(mu)) ## weighted pearson residuals
se <- gam.scale(wpr,wdr,n.true-trA,dev.extra) ## get scale estimates
pearson.warning <- NULL
if (control$scale.est=="pearson") {
scale.est <- se$pearson
if (scale.est > 4 * se$robust) pearson.warning <- TRUE
} else scale.est <- if (control$scale.est=="deviance") se$deviance else se$robust
scale.est <- (pearson+dev.extra)/(n.true-trA)
#pearson <- sum(weights*(y-mu)^2/family$variance(mu)) ## Pearson statistic
#scale.est <- (pearson+dev.extra)/(n.true-trA)
#scale.est <- (dev+dev.extra)/(n.true-trA)
reml.scale <- NA
......@@ -755,7 +795,7 @@ gam.fit3 <- function (x, y, sp, Eb,UrS=list(),
list(coefficients = coef, residuals = residuals, fitted.values = mu,
family = family, linear.predictors = eta, deviance = dev,
null.deviance = nulldev, iter = iter, weights = wt, prior.weights = weights,
df.null = nulldf, y = y, converged = conv,
df.null = nulldf, y = y, converged = conv,pearson.warning = pearson.warning,
boundary = boundary,D1=D1,D2=D2,P=P,P1=P1,P2=P2,trA=trA,trA1=trA1,trA2=trA2,
GCV=GCV,GCV1=GCV1,GCV2=GCV2,GACV=GACV,GACV1=GACV1,GACV2=GACV2,UBRE=UBRE,
UBRE1=UBRE1,UBRE2=UBRE2,REML=REML,REML1=REML1,REML2=REML2,rV=rV,db.drho=db.drho,
......@@ -774,6 +814,10 @@ gam.fit3.post.proc <- function(X,L,object) {
F <- .Call(C_mgcv_pmmult2,PKt,sqrt(object$weights)*X,0,0,object$control$nthreads)
edf <- diag(F) ## effective degrees of freedom
edf1 <- 2*edf - rowSums(t(F)*F) ## alternative
## check on plausibility of scale (estimate)
if (object$scale.estimated&&!is.null(object$pearson.warning)) warning("Pearson scale estimate maybe unstable. See ?gam.scale.")
## edf <- rowSums(PKt*t(sqrt(object$weights)*X))
## Ve <- PKt%*%t(PKt)*object$scale ## frequentist cov
Ve <- F%*%Vb ## not quite as stable as above, but quicker
......
......@@ -1188,7 +1188,7 @@ gammPQL <- function (fixed, random, family, data, correlation, weights,
gamm <- function(formula,random=NULL,correlation=NULL,family=gaussian(),data=list(),weights=NULL,
subset=NULL,na.action,knots=NULL,control=list(niterEM=0,optimMethod="L-BFGS-B"),
niterPQL=20,verbosePQL=TRUE,method="ML",...)
niterPQL=20,verbosePQL=TRUE,method="ML",drop.unused.levels=TRUE,...)
# Routine to fit a GAMM to some data. Fixed and smooth terms are defined in the formula, but the wiggly
# parts of the smooth terms are treated as random effects. The onesided formula random defines additional
# random terms. correlation describes the correlation structure. This routine is basically an interface
......@@ -1241,7 +1241,7 @@ gamm <- function(formula,random=NULL,correlation=NULL,family=gaussian(),data=lis
mf$correlation <- mf$random <- mf$family <- mf$control <- mf$scale <- mf$knots <- mf$sp <-
mf$min.sp <- mf$H <- mf$gamma <- mf$fit <- mf$niterPQL <- mf$verbosePQL <- mf$G <- mf$method <- mf$... <- NULL
}
mf$drop.unused.levels <- TRUE
mf$drop.unused.levels <- drop.unused.levels
mf[[1]] <- as.name("model.frame")
pmf <- mf
gmf <- eval(mf, parent.frame()) # the model frame now contains all the data, for the gam part only
......
......@@ -1739,7 +1739,7 @@ variable.summary <- function(pf,dl,n) {
gam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,na.action,offset=NULL,
method="GCV.Cp",optimizer=c("outer","newton"),control=list(),#gam.control(),
scale=0,select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,fit=TRUE,
paraPen=NULL,G=NULL,in.out=NULL,...) {
paraPen=NULL,G=NULL,in.out=NULL,drop.unused.levels=TRUE,...) {
## Routine to fit a GAM to some data. The model is stated in the formula, which is then
## interpreted to figure out which bits relate to smooth terms and which to parametric terms.
## Basic steps:
......@@ -1765,7 +1765,7 @@ gam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
mf$formula <- gp$fake.formula
mf$family <- mf$control<-mf$scale<-mf$knots<-mf$sp<-mf$min.sp<-mf$H<-mf$select <-
mf$gamma<-mf$method<-mf$fit<-mf$paraPen<-mf$G<-mf$optimizer <- mf$in.out <- mf$...<-NULL
mf$drop.unused.levels <- TRUE
mf$drop.unused.levels <- drop.unused.levels
mf[[1]] <- as.name("model.frame")
pmf <- mf
mf <- eval(mf, parent.frame()) # the model frame now contains all the data
......@@ -1926,7 +1926,7 @@ gam.control <- function (nthreads=1,irls.reg=0.0,epsilon = 1e-7, maxit = 200,
rank.tol=.Machine$double.eps^0.5,
nlm=list(),optim=list(),newton=list(),outerPIsteps=0,
idLinksBases=TRUE,scalePenalty=TRUE,
keepData=FALSE)
keepData=FALSE,scale.est="pearson")
# Control structure for a gam.
# irls.reg is the regularization parameter to use in the GAM fitting IRLS loop.
# epsilon is the tolerance to use in the IRLS MLE loop. maxit is the number
......@@ -1936,7 +1936,7 @@ gam.control <- function (nthreads=1,irls.reg=0.0,epsilon = 1e-7, maxit = 200,
# rank.tol is the tolerance to use for rank determination
# outerPIsteps is the number of performance iteration steps used to intialize
# outer iteration
{
{ scale.est <- match.arg(scale.est,c("robust","pearson","deviance"))
if (!is.numeric(nthreads) || nthreads <1) stop("nthreads must be a positive integer")
if (!is.numeric(irls.reg) || irls.reg <0.0) stop("IRLS regularizing parameter must be a non-negative number.")
if (!is.numeric(epsilon) || epsilon <= 0)
......@@ -1982,7 +1982,7 @@ gam.control <- function (nthreads=1,irls.reg=0.0,epsilon = 1e-7, maxit = 200,
rank.tol=rank.tol,nlm=nlm,
optim=optim,newton=newton,outerPIsteps=outerPIsteps,
idLinksBases=idLinksBases,scalePenalty=scalePenalty,
keepData=as.logical(keepData[1]))
keepData=as.logical(keepData[1]),scale.est=scale.est)
}
......@@ -2177,7 +2177,7 @@ gam.fit <- function (G, start = NULL, etastart = NULL,
G$X<-X[good,,drop=FALSE] # truncated design matrix
# must set G$sig2 to scale parameter or -1 here....
G$sig2<-scale
G$sig2 <- scale
if (sum(!is.finite(G$y))+sum(!is.finite(G$w))>0)
......@@ -2215,7 +2215,7 @@ gam.fit <- function (G, start = NULL, etastart = NULL,
eta <- linkfun(mu) # force eta/mu consistency even if linkinv truncates
dev <- sum(dev.resids(y, mu, weights))
if (control$trace)
gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv")
message(gettextf("Deviance = %s Iterations - %d", dev, iter, domain = "R-mgcv"))
boundary <- FALSE
if (!is.finite(dev)) {
if (is.null(coefold))
......@@ -2330,7 +2330,7 @@ model.matrix.gam <- function(object,...)
predict.gam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
block.size=1000,newdata.guaranteed=FALSE,na.action=na.pass,
block.size=NULL,newdata.guaranteed=FALSE,na.action=na.pass,
unconditional=FALSE,...) {
# This function is used for predicting from a GAM. 'object' is a gam object, newdata a dataframe to
......@@ -2442,7 +2442,8 @@ predict.gam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
if (!is.null(attr(newdata,"terms"))) nd.is.mf <- TRUE
response <- newdata[[yname]]
}
## now check the factor levels and split into blocks...
if (new.data.ok) {
......@@ -2465,7 +2466,29 @@ predict.gam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
if (is.null(dim(newdata[[1]]))) np <- length(newdata[[1]])
else np <- dim(newdata[[1]])[1]
nb <- length(object$coefficients)
if (is.null(block.size)) block.size <- 1000
if (block.size < 1) block.size <- np
# n.blocks <- np %/% block.size
# b.size <- rep(block.size,n.blocks)
# last.block <- np-sum(b.size)
# if (last.block>0) {
# n.blocks <- n.blocks+1
# b.size[n.blocks] <- last.block
# }
} else { # no new data, just use object$model
np <- nrow(object$model)
nb <- length(object$coefficients)
# n.blocks <- 1
# b.size <- array(np,1)
}
## split prediction into blocks, to avoid running out of memory
if (is.null(block.size)) {
## use one block as predicting using model frame
## and no block size supplied...
n.blocks <- 1
b.size <- array(np,1)
} else {
n.blocks <- np %/% block.size
b.size <- rep(block.size,n.blocks)
last.block <- np-sum(b.size)
......@@ -2473,13 +2496,9 @@ predict.gam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
n.blocks <- n.blocks+1
b.size[n.blocks] <- last.block
}
} else { # no new data, just use object$model
np <- nrow(object$model)
nb <- length(object$coefficients)
n.blocks <- 1
b.size <- array(np,1)
}
# setup prediction arrays...
n.smooth<-length(object$smooth)
......
......@@ -1568,7 +1568,7 @@ smooth.construct.ps.smooth.spec<-function(object,data,knots)
if (!is.null(k)) {
if (sum(colSums(object$X)==0)>0) warning("knot range is so wide that there is *no* information about some basis coefficients")
}
if (length(unique(x)) < object$bs.dim) warning("basis dimension is larger than number of unique covariates")
## now construct penalty
S<-diag(object$bs.dim);
if (m[2]) for (i in 1:m[2]) S <- diff(S)
......@@ -1729,7 +1729,7 @@ smooth.construct.fs.smooth.spec<-function(object,data,knots) {
object$te.ok <- 0
object$rank <- c(object$rank*nf,rep(nf,null.d))
}
object$side.constrain <- FALSE ## don't apply side constraints - these are really random effects
object$null.space.dim <- 0
object$C <- matrix(0,0,ncol(object$X)) # null constraint matrix
object$plot.me <- TRUE
......@@ -2002,7 +2002,7 @@ smooth.construct.re.smooth.spec <- function(object,data,knots)
class(object)<-"random.effect" # Give object a class
object
}
} ## smooth.construct.re.smooth.spec
......@@ -2175,12 +2175,17 @@ smooth.construct.mrf.smooth.spec <- function(object, data, knots) {
## natural parameterization given in Wood (2006) 4.1.14
if (object$bs.dim<length(levels(k))) { ## use low rank approx
rp <- nat.param(object$X,object$S[[1]],type=0)
mi <- which(colSums(object$X)==0) ## any regions missing observations?
np <- ncol(object$X)
if (length(mi)>0) { ## create dummy obs for missing...
object$X <- rbind(matrix(0,length(mi),np),object$X)
for (i in 1:length(mi)) object$X[i,mi[i]] <- 1
}
rp <- nat.param(object$X,object$S[[1]],type=0)
## now retain only bs.dim least penalized elements
## of basis, which are the final bs.dim cols of rp$X
ind <- (np-object$bs.dim+1):np
object$X <- rp$X[,ind] ## model matrix
object$X <- if (length(mi)) rp$X[-(1:length(mi)),ind] else rp$X[,ind] ## model matrix
object$P <- rp$P[,ind] ## re-para matrix
##ind <- ind[ind <= rp$rank] ## drop last element as zeros not returned in D
object$S[[1]] <- diag(c(rp$D[ind[ind <= rp$rank]],rep(0,sum(ind>rp$rank))))
......
......@@ -18,7 +18,7 @@ bam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action=na.omit, offset=NULL,method="fREML",control=list(),
scale=0,gamma=1,knots=NULL,sp=NULL,min.sp=NULL,paraPen=NULL,
chunk.size=10000,rho=0,AR.start=NULL,sparse=FALSE,cluster=NULL,gc.level=1,
use.chol=FALSE,samfrac=1,...)
use.chol=FALSE,samfrac=1,drop.unused.levels=TRUE,...)
}
%- maybe also `usage' for other objects documented here.
......@@ -130,6 +130,9 @@ and then finds its Choleski decomposition, at the end. This is somewhat more eff
\item{samfrac}{For very large sample size Generalized additive models the number of iterations needed for the model fit can
be reduced by first fitting a model to a random sample of the data, and using the results to supply starting values. This initial fit is run with sloppy convergence tolerances, so is typically very low cost. \code{samfrac} is the sampling fraction to use. 0.1 is often reasonable. }
\item{drop.unused.levels}{by default unused levels are dropped from factors before fitting. For some smooths
involving factor variables you might want to turn this off. Only do so if you know what you are doing.}
\item{...}{further arguments for
passing on e.g. to \code{gam.fit} (such as \code{mustart}). }
......
......@@ -30,7 +30,7 @@ gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action,offset=NULL,method="GCV.Cp",
optimizer=c("outer","newton"),control=list(),scale=0,
select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
fit=TRUE,paraPen=NULL,G=NULL,in.out,...)
fit=TRUE,paraPen=NULL,G=NULL,in.out,drop.unused.levels=TRUE,...)
}
%- maybe also `usage' for other objects documented here.
......@@ -144,6 +144,9 @@ fixed s.p.s are not used); \code{scale} is the typical scale of the GCV/UBRE fun
for passing to the outer optimizer, or the the initial value of the scale parameter, if this is to
be estimated by RE/ML.}
\item{drop.unused.levels}{by default unused levels are dropped from factors before fitting. For some smooths
involving factor variables you might want to turn this off. Only do so if you know what you are doing.}
\item{...}{further arguments for
passing on e.g. to \code{gam.fit} (such as \code{mustart}). }
......
......@@ -15,7 +15,7 @@ gam.control(nthreads=1,irls.reg=0.0,epsilon = 1e-07, maxit = 200,
rank.tol=.Machine$double.eps^0.5,
nlm=list(),optim=list(),newton=list(),
outerPIsteps=0,idLinksBases=TRUE,scalePenalty=TRUE,
keepData=FALSE)
keepData=FALSE,scale.est="pearson")
}
\arguments{
\item{nthreads}{Some parts of some smoothing parameter selection methods (e.g. REML) can use some
......@@ -75,6 +75,9 @@ if you are linking smoothing parameters but have set \code{idLinkBases} to \code
\item{keepData}{Should a copy of the original \code{data} argument be kept in the \code{gam}
object? Strict compatibility with class \code{glm} would keep it, but it wastes space to
do so. }
\item{scale.est}{How to estiamte the scale parameter for exponential family models estimated
by outer iteration. See \code{\link{gam.scale}}.}
}
\details{
......
\name{gam.scale}
\alias{gam.scale}
%- Also NEED an `\alias' for EACH other topic documented here.
\title{Scale parameter estimation in GAMs}
\description{Scale parameter estimation in \code{\link{gam}} depends on the type of \code{family}. For
extended families then the RE/ML estimate is used. For conventional exponential families, estimated by the default
outer iteration, the scale estimator can be controlled using argument \code{scale.est} in
\code{\link{gam.control}}. The options are \code{"robust"}, \code{"pearson"} (default) or \code{"deviance"}.
The Pearson estimator is the (weighted) sum of squares of the pearson residuals, divided by the
effective residual degrees of freedom. The deviance estimator simply substitutes deviance residuals for Pearson residuals.
Usually the Pearson estimator is recommended for GLMs, since it is asymptotically unbiased. However, it can also be unstable at finite sample sizes, if a few Pearson residuals are very large. For example, a very low Poisson mean with a non zero count can give a huge Pearson residual, even though the deviance residual is much more modest. The (\code{"robust"}) estimator in \code{\link{gam}} is an attempt to mitigate against the bias of the deviance estimator and the instability of the Pearson estimator. The estimator simply scales the deviance residuals by the median ratio of pearson to deviance residuals, separately for positive and negative residuals (given the difference in skew of the two types). The estimator is forced to lie between the Pearson and deviance estimators. All three options coincide for Gaussian data. A warning is generated if the Pearson scale estimate is more than twice the robust version. In this case you consider changing to the robust estimator, especially if the Pearson residuals contain extreme outliers.
For performance iteration the Pearson estimator is always used.
\code{\link{gamm}} uses the estimate of the scale parameter from the underlying call to \code{lme}. \code{\link{bam}} uses the REML estimator if the method is \code{"fREML"}. Otherwise the estimator is a Pearson estimator.
}
\author{ Simon N. Wood \email{simon.wood@r-project.org} }
\seealso{ \code{\link{gam.control} } }
\keyword{models} \keyword{smooth} \keyword{regression} %-- one or more ..
......@@ -33,7 +33,7 @@ To use this function effectively it helps to be quite familiar with the use of
gamm(formula,random=NULL,correlation=NULL,family=gaussian(),
data=list(),weights=NULL,subset=NULL,na.action,knots=NULL,
control=list(niterEM=0,optimMethod="L-BFGS-B"),
niterPQL=20,verbosePQL=TRUE,method="ML",...)
niterPQL=20,verbosePQL=TRUE,method="ML",drop.unused.levels=TRUE,...)
}
\arguments{
......@@ -102,6 +102,10 @@ version of R does not have the \code{nlminb} optimizer function.}
additive mixed model case when \code{lme} is called directly. Ignored in the
generalized case (or if the model has an offset), in which case \code{gammPQL} is used.}
\item{drop.unused.levels}{by default unused levels are dropped from factors before fitting. For some smooths
involving factor variables you might want to turn this off. Only do so if you know what you are doing.}
\item{...}{further arguments for passing on e.g. to \code{lme}}
}
%- maybe also `usage' for other objects documented here.
......
......@@ -14,7 +14,7 @@ for quantities derived from the model (e.g. derivatives of smooths), and for loo
\usage{
\method{predict}{gam}(object,newdata,type="link",se.fit=FALSE,terms=NULL,
block.size=1000,newdata.guaranteed=FALSE,na.action=na.pass,
block.size=NULL,newdata.guaranteed=FALSE,na.action=na.pass,
unconditional=FALSE,...)
}
%- maybe also `usage' for other objects documented here.
......@@ -52,7 +52,8 @@ will be returned.}
\item{block.size}{maximum number of predictions to process per call to underlying
code: larger is quicker, but more memory intensive. Set to < 1 to use total number
of predictions as this.}
of predictions as this. If \code{NULL} then block size is 1000 if new data supplied,
and the number of rows in the model frame otherwise. }
\item{newdata.guaranteed}{Set to \code{TRUE} to turn off all checking of
\code{newdata} except for sanity of factor levels: this can speed things up
......
......@@ -82,9 +82,10 @@ Note that smooths of this class have a built in plot method, and that the utilit
can be useful for working with discrete area data. The plot method has two schemes, \code{scheme==0} is colour,
\code{scheme==1} is grey scale.
The situation in which there are areas with no data requires special handling. Create one dummy observation for each
such area, but give the corresponding data frame row a weight of zero during fitting (using the \code{weight}
argument to \code{gam}). In this case the basis dimension, \code{k}, must be set to less than (or at most equal to) the number of areas with data.
The situation in which there are areas with no data requires special handling. You should set \code{drop.unused.levels=FALSE} in
the model fitting function, \code{\link{gam}}, \code{\link{bam}} or \code{\link{gamm}}, having first ensured that any fixed effect
factors do not contain unobserved levels. Also make sure that the basis dimension is set to ensure that the total number of
coefficients is less than the number of observations.
}
......
......@@ -56,6 +56,10 @@ may therefore be relatively inefficient for models with large numbers of random
or \code{\link{gamm}} may be better alternatives. Note also that \code{\link{gam}} will not support
models with more coefficients than data.
The situation in which factor variable random effects intentionally have unobserved levels requires special handling.
You should set \code{drop.unused.levels=FALSE} in the model fitting function, \code{\link{gam}}, \code{\link{bam}}
or \code{\link{gamm}}, having first ensured that any fixed effect factors do not contain unobserved levels.
}
\references{
......
......@@ -2,7 +2,7 @@ msgid ""
msgstr ""
"Project-Id-Version: R 3.1.0\n"
"Report-Msgid-Bugs-To: bugs.r-project.org\n"
"POT-Creation-Date: 2014-06-22 22:21\n"
"POT-Creation-Date: 2014-07-21 09:50\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
......@@ -11,9 +11,6 @@ msgstr ""
"Content-Transfer-Encoding: 8bit\n"
msgid "Choleski based method failed, switch to QR"
msgstr ""
msgid "'family' argument seems not to be a valid family object"
msgstr ""
......@@ -197,6 +194,9 @@ msgstr ""
msgid "Algorithm stopped at boundary value"
msgstr ""
msgid "Pearson scale estimate maybe unstable. See ?gam.scale."
msgstr ""
msgid "deriv should be 1 or 2"
msgstr ""
......@@ -284,7 +284,7 @@ msgstr ""
msgid "step failed: max abs grad ="
msgstr ""
msgid "iteration limit reached: max abs grad ="
msgid "iteration limit reached: max abs grad = %g"
msgstr ""
msgid "gaulss requires 2 links specified as character strings"
......@@ -605,9 +605,6 @@ msgstr ""
msgid "nothing to do for this model"
msgstr ""
msgid "residuals not available"
msgstr ""
msgid "Pearson residuals not available for this family - returning deviance residuals"
msgstr ""
......@@ -875,6 +872,9 @@ msgstr ""
msgid "penalty order too high for basis dimension"
msgstr ""
msgid "basis dimension is larger than number of unique covariates"
msgstr ""
msgid "fs smooths can only have one factor argument"
msgstr ""