Commit 464de942 authored by Dirk Eddelbuettel's avatar Dirk Eddelbuettel

Import Upstream version 1.8-8

parent ee00362c
** denotes quite substantial/important changes
*** denotes really big changes
1.8-8
** New "gp" smooth class (see ?gp.smooth) implemeting the Matern
covariance based Gaussian process model of Kamman and Wand (2003), and a
variety of other simple GP smoothers.
* some smooth plot methods now accept 'colors' and 'contour.col' argument to
set color palette in image plots and contour line colors.
* predict.gam and predict.bam now accept an 'exclude' argument allowing
terms (e.g. random effects) to be zeroed for prediction. For efficiency,
smooth terms not in 'terms' or in 'exclude' are no longer evaluated, and
are instead set to zero or not returned. See ?predict.gam.
* ocat saturated likelihood definition changed to zero, leading to better
comprability of deviance between model fits (thanks to Herwig Friedl).
* null.deviance calculation for extended families modified to make more sense
when `mu' is the mean of a latent variable, rather than the response itself.
* bam now returns standarized residuals 'std.rsd' if `rho!=0'.
* bam(...,discrete=TRUE) can now handle 'fs' terms.
* bam(...,discrete=TRUE) now accepts 'by' variables. Thanks to Zheyaun Li
for debugging on this.
* bam now works with drop.unused.levels == TRUE when random effects should
have more levels than those that exist in data. (Thanks Alec Leh)
* bam chunk.size logic error fix - error could be triggered if chunk.size
reset automaticlly to be larger than data size.
* uniqucombs can now accept a data frame with some or all factor columns,
as well as purely numeric marices.
* discrete.mf modified to avoid discretizing a covariate more than once,
and to halt if a model requires the same covariate to be discretized
two different ways (e.g. s(x) + s(x,z)). This affects only
bam(...,discrete=TRUE).
* Some changes to ziP and ziplss families to improve numerical robustness,
and to ziP help file to suggest appropriate checking. Thanks to Keren
Raiter, for reporting problems.
* numerical robustness of extended gam methods (gam.fit4) improved for cases
with many zero or near zero iterative weights. Handling of zero weights
modified to avoid divide by (near) zero problems. Also tests for poor
scaling of sqrt(abs(w))*z and substitutes computations based on w*z if
detected. Also 'newton' routine now step halves if REML score not finite!
* Sl.setup (used by bam) modification to allow more efficient handling of terms
with multiple diagonal penalties with no non-zero elements in common, but
possibly with non zero elements `interleaved' between penalties.
1.8-7
** 'gam' default scale parameter changed to modified Pearson estimator
......
Package: mgcv
Version: 1.8-7
Version: 1.8-8
Author: Simon Wood <simon.wood@r-project.org>
Maintainer: Simon Wood <simon.wood@r-project.org>
Title: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness
......@@ -16,6 +16,6 @@ LazyLoad: yes
ByteCompile: yes
License: GPL (>= 2)
NeedsCompilation: yes
Packaged: 2015-07-22 17:38:22 UTC; sw283
Packaged: 2015-10-24 06:15:25 UTC; sw283
Repository: CRAN
Date/Publication: 2015-07-23 07:05:08
Date/Publication: 2015-10-24 09:17:33
f876a44fccfab3bcf4e8c6a8916f480f *ChangeLog
68b8a0864224d96ff69d755507862f24 *DESCRIPTION
bcb7f62d1e1297b1d685a095b00220aa *ChangeLog
f133d9c24128ba8a977d1674d59f2a23 *DESCRIPTION
eb723b61539feef013de476e68b5c50a *GPL-2
053c3c177572e1570e63329445eacc14 *NAMESPACE
e99642616897453fa1cab582bc73c84f *R/bam.r
b1b86938209b64b673e096da331769eb *NAMESPACE
9cf2059cc5ab5fd2f8a9f4439fec53e5 *R/bam.r
52ea7081e235df890d3e51198bcc3092 *R/coxph.r
209a7a1d10cec589213ffed9efc1d5cc *R/efam.r
9131483dd63ef2b80e8159601cc03922 *R/fast-REML.r
88dbdd15671f0a924bdd9f5f42a89a34 *R/gam.fit3.r
a6208b52c8af1f8779dd0b4b482c3fa7 *R/gam.fit4.r
09790e96aac078861d04d160d852c6d7 *R/efam.r
5a542999fd494e3872c877c6e1dc2706 *R/fast-REML.r
57f8b12b97304227870a4e43a33b5d75 *R/gam.fit3.r
472d6e64f3f403aad4f6550f921a4bba *R/gam.fit4.r
e63276b78a4ff2566af2e651bfc6aa9c *R/gam.sim.r
633c048e36083646c7d409e99cf7b037 *R/gamlss.r
29f3e7f145dabb21b25b6942066fab3a *R/gamlss.r
ff163969e9ad7a38857907bf38a39ec0 *R/gamm.r
3b0d5cac4a59ef1a8cb325b249699476 *R/jagam.r
d5ce48c9ea98c2f428cfb00bcc9fb5ea *R/mgcv.r
479a34c52f6ec8a04e7c66cfcf4bdf58 *R/mgcv.r
59a92561754c8156506af4004f594ac9 *R/misc.r
66c24aed2f8fc9f6bce321794f8aff87 *R/mvam.r
f031621e351c06b12c328e4033e2f097 *R/plots.r
966c267722f1027c53aa5621614333ae *R/smooth.r
68348617c5d3b0e2ea805c4360a8cdd4 *R/soap.r
76192b5b36d8828af6e50013823732b4 *R/plots.r
6069c2b79e5678d8c6ff35b65c6d180c *R/smooth.r
9d47fd13872418b318323699de89dfb4 *R/soap.r
76cc875719bf0ef9eab45ea5bfeccda6 *R/sparse.r
e468195a83fab90da8e760c2c3884bd3 *data/columb.polys.rda
40874e3ced720a596750f499ded8a60a *data/columb.rda
......@@ -39,7 +39,7 @@ e9b0a2e31b130cf2cb38618ec50d1919 *man/Predict.matrix.soap.film.Rd
f12468625253dd3603907de233762fd6 *man/Rrank.Rd
f6cadda5999c35800fd65c08d6812f7b *man/Tweedie.Rd
80f8763baa4987579e2aa56073a9e94e *man/anova.gam.Rd
e49e38bc1741829ae997f8b0b4fcd832 *man/bam.Rd
b57775363a39cd704e37f3b3d06962f6 *man/bam.Rd
71bde8b8caa24a36529ce7e0ac3165d8 *man/bam.update.Rd
a2beb811b1093c5e82ef32d7de1f7d32 *man/cSplineDes.Rd
9b4d616d1b6c4a46ca77d16cded3f806 *man/choose.k.Rd
......@@ -86,13 +86,13 @@ b440c31b77c368b484d9665b2b3e89fb *man/jagam.Rd
19a3f7da03e76d7d2fc0a4e544564c72 *man/magic.Rd
5169af4be5fccf9fa79b6de08e9ea035 *man/magic.post.proc.Rd
59672d8599211ce9cf44252f92110920 *man/mgcv-FAQ.Rd
01926b8c2bda6fc74b51f6ec7f5cc620 *man/mgcv-package.Rd
92499f4c6bc65a0cffd31bd8bc993a9f *man/mgcv-parallel.Rd
38c75ee7bb718c277da5bc649e6886e7 *man/mgcv-package.Rd
9742c9bd1d42aaebe27e2b4218d12672 *man/mgcv-parallel.Rd
00ccf213c31910cd14f1df65a300eb33 *man/model.matrix.gam.Rd
bc9b89db7e7ff246749551c16f5f1f07 *man/mono.con.Rd
d33914a328f645af13f5a42914ca0f35 *man/mroot.Rd
d0c2b73407680aac50a17753346d814f *man/mvn.Rd
9e95f6b9eec476188de830beb9ff3e0b *man/negbin.Rd
61e6a0a7fa986b4b6278c2d3c44c9c71 *man/negbin.Rd
8b766a6ad848b0f1ca469e381ded0169 *man/new.name.Rd
7d8f62e182f7c428cc9d46ddd4d97d43 *man/notExp.Rd
445571cf0b115bf19a2d3474e70f59e7 *man/notExp2.Rd
......@@ -103,10 +103,10 @@ c64b44bc684cbadfe77e7f6dc833ddb4 *man/pcls.Rd
8c0f8575b427f30316b639a326193aeb *man/pdTens.Rd
1721f1b266d9e14827e8226e2cb74a81 *man/pen.edf.Rd
edf57071572275a8443b2f0b66d44424 *man/place.knots.Rd
04a9af3e294bd14ad590aacb6d26332d *man/plot.gam.Rd
b903ebcf31703db156e033fdfa527d73 *man/plot.gam.Rd
c27a6b886929b1dc83bf4b90cae848f9 *man/polys.plot.Rd
085dff270e5255315cb244368faf79df *man/predict.bam.Rd
900647986ab1365d437368cc28b62658 *man/predict.gam.Rd
fe39dcd63a4a01bdac0604389d59bb41 *man/predict.bam.Rd
3d811dd86252174c0108e2b1b75aaf9a *man/predict.gam.Rd
7562cb5ce9a7fbf0cd60124cf4305315 *man/print.gam.Rd
b8ac3ed8fc05bb14c605b996404853cd *man/qq.gam.Rd
f77ca1471881d2f93c74864d076c0a0e *man/rTweedie.Rd
......@@ -116,40 +116,41 @@ f6f1333be2587ffef5970905b13468ea *man/rig.Rd
9f6f46f5c5da080bc82f9aa4685d364a *man/rmvn.Rd
7258dfc0149fff020054117fd2ee6bd8 *man/s.Rd
c438eb3e41cb1f51e72283380145d210 *man/scat.Rd
2199afd400a1821b74a24e2578bc0224 *man/single.index.Rd
9641bb5e2573b5fcbdc303fcf160f3e6 *man/single.index.Rd
6f03e337d54221bc167d531e25af1eea *man/slanczos.Rd
b5a06918956fd10f23285b453c05bdb4 *man/smooth.construct.Rd
2318b61ce2e4770d880abfeaba194577 *man/smooth.construct.Rd
4a689eba97e4fed138dccb8cad13205e *man/smooth.construct.ad.smooth.spec.Rd
76013feaf70d00976bba0154b6f2c946 *man/smooth.construct.cr.smooth.spec.Rd
f5e6d0f5122f61c336827b3615482157 *man/smooth.construct.ds.smooth.spec.Rd
db75c958cbfb561914a3291ab58b9786 *man/smooth.construct.fs.smooth.spec.Rd
3a0545da6f80f00e0b7347f8083a10b3 *man/smooth.construct.gp.smooth.spec.Rd
4aaa84b520992fbc32b0c37f7f63c1dd *man/smooth.construct.mrf.smooth.spec.Rd
abe15377f471a2d8957a59c19eeef0bb *man/smooth.construct.ps.smooth.spec.Rd
6aaff8575f68775ed21930893ea9e03d *man/smooth.construct.re.smooth.spec.Rd
a8d5eb4772641567fee15714f01a4727 *man/smooth.construct.so.smooth.spec.Rd
9c35c35b9583281eaa55b97c9b9fe770 *man/smooth.construct.sos.smooth.spec.Rd
0bfe981f2c3e6ea5b8d5372076ccde53 *man/smooth.construct.sos.smooth.spec.Rd
3cb4e59f915c8d64b90754eaeeb5a86f *man/smooth.construct.t2.smooth.spec.Rd
8672633a1fad8df3cb1f53d7fa883620 *man/smooth.construct.tensor.smooth.spec.Rd
3e6d88ef6a8ab21bd6f120120602dcf6 *man/smooth.construct.tp.smooth.spec.Rd
75a042d74460dd0c6e0c0a95c1f5d3f7 *man/smooth.terms.Rd
8120512d52c741f8c27c2689f69b63d3 *man/smoothCon.Rd
d4083ff900aa69fa07610e0af2a2987b *man/smooth.terms.Rd
8fd3089f8fd1379736ae93a67f4febd6 *man/smoothCon.Rd
b55a396da77559dac553613146633f97 *man/sp.vcov.Rd
83bd8e097711bf5bd0fff09822743d43 *man/spasm.construct.Rd
b9394812e5398ec95787c65c1325a027 *man/step.gam.Rd
f0791d830687d6155efb8a73db787401 *man/summary.gam.Rd
47d22ea4fe93bc0fa0f47bd262979941 *man/t2.Rd
fd71e0a218ad840e6ce4b873f9fa083e *man/t2.Rd
63e2050f9a01b12096d640e87916b78a *man/te.Rd
6eebb6ef90374ee09453d6da6449ed79 *man/tensor.prod.model.matrix.Rd
71add05e5823dcc0509f435ab4dc4627 *man/uniquecombs.Rd
42542181aa314eed22f05907a8546735 *man/uniquecombs.Rd
a16b3a5a4d13c705dcab8d1cd1b3347e *man/vcov.gam.Rd
281e73658c726997196727a99a4a1f9e *man/vis.gam.Rd
2c5f6815e609f2cdb56b0067a183f915 *man/ziP.Rd
1e514afeea034edfbd6ba563561a71de *man/ziP.Rd
ae8388103d8b1be39f55f426b205b576 *man/ziplss.Rd
7bd0744ad8ea562d7a624e066ef3390c *po/R-de.po
0bdfcf98961b0d52b60f806dc1dba77e *po/R-en@quot.po
4e65e93fef4d034a399f90421e8f323a *po/R-fr.po
73cdaf7a5a69f0b7cbfe411cd0c468b6 *po/R-ko.po
7eb472ce4c2d1dc30b4dd1091c0e88df *po/R-mgcv.pot
be84cc1bdb81bb411322266c34a0bf1d *po/R-mgcv.pot
7b07899266c3acf3d2a625850d7cd6ef *po/R-pl.po
382c94188dbc193fca9628287b66d1af *po/de.po
93f72334356fe6f05a64e567efd35c8e *po/en@quot.po
......@@ -160,15 +161,15 @@ dfd4eec9edc7d1ab6354d47b6e2bd42f *po/pl.po
03972284b3400cf82cacd5d2dc4b8cb3 *src/Makevars
e71c1a1624b431fbab0a4c8f151d2a97 *src/coxph.c
48f0c9de0da60d48cf4b306c2fdd039a *src/discrete.c
e4236680abfb9c89c7bf09b441c755c2 *src/gdi.c
5a8e93ae5f4fae722c1304cecc9c94b3 *src/gdi.c
2436f9b328e80370ce2203dbf1dd813c *src/general.h
890105488631271ad8b184aa0524b59f *src/init.c
d95aa7d646833d1e56f2414f32637bb3 *src/init.c
9a5d7cb3cf93cbdfc08353fbd20a270e *src/magic.c
55b92c3ab6a037f9742542c2af8cad56 *src/mat.c
53cde45313485d2d15e6377461dd3f46 *src/mat.c
aae0f298e384952bb8b6b923d40520a8 *src/matrix.c
6b781cbd5b9cfee68ad30bb7ce31ef3a *src/matrix.h
ec2ae157a9e3bedcccc88b053d0a4e0c *src/mgcv.c
39b22a8fa2128fd6b52b3601841b0393 *src/mgcv.h
d437a2cf1548e79c8dc910657fb1cabb *src/mgcv.h
6b90633f745d3b3314ec5a784bde99d0 *src/misc.c
08c1706ffeec4277c484435a0644b0e3 *src/mvn.c
cbe54250deb38aa5f88f8b669a4468cd *src/qp.c
......
......@@ -37,6 +37,7 @@ export(anova.gam, bam, bam.update, betar, cox.ph,concurvity, cSplineDes,
Predict.matrix.pspline.smooth,
Predict.matrix.random.effect,
Predict.matrix.t2.smooth,
Predict.matrix.gp.smooth,
qq.gam,
residuals.gam,rig,rTweedie,rmvn,
Rrank,s,scat,
......@@ -54,6 +55,7 @@ export(anova.gam, bam, bam.update, betar, cox.ph,concurvity, cSplineDes,
smooth.construct.ps.smooth.spec,
smooth.construct.re.smooth.spec,
smooth.construct.mrf.smooth.spec,
smooth.construct.gp.smooth.spec,
smooth.construct.sos.smooth.spec,
smooth.construct.so.smooth.spec,
smooth.construct.sf.smooth.spec,
......@@ -136,6 +138,7 @@ S3method(smooth.construct, cs.smooth.spec)
S3method(smooth.construct,ds.smooth.spec)
S3method(smooth.construct, fs.smooth.spec)
S3method(smooth.construct, mrf.smooth.spec)
S3method(smooth.construct, gp.smooth.spec)
S3method(smooth.construct,ps.smooth.spec)
S3method(smooth.construct, re.smooth.spec)
S3method(smooth.construct,so.smooth.spec)
......@@ -164,6 +167,7 @@ S3method(Predict.matrix,soap.film)
S3method(Predict.matrix,sf)
S3method(Predict.matrix,sw)
S3method(Predict.matrix,sos.smooth)
S3method(Predict.matrix,gp.smooth)
S3method(spasm.construct,cus)
......
## routines for very large dataset generalized additive modelling.
# routines for very large dataset generalized additive modelling.
## (c) Simon N. Wood 2009-2015
......@@ -122,7 +122,8 @@ compress.df <- function(dat,m=NULL) {
} else {
mm <- mm * m
}
xu <- uniquecombs(as.matrix(dat))
#xu <- uniquecombs(as.matrix(dat))
xu <- uniquecombs(dat)
if (nrow(xu)>mm*mf) { ## too many unique rows to use only unique
for (i in 1:d) if (!is.factor(dat[,i])) { ## round the metric variables
xl <- range(dat[,i])
......@@ -131,7 +132,8 @@ compress.df <- function(dat,m=NULL) {
kx <- round((dat[,i]-xl[1])/dx)+1
dat[,i] <- xu[kx] ## rounding the metric variables
}
xu <- uniquecombs(as.matrix(dat))
#xu <- uniquecombs(as.matrix(dat))
xu <- uniquecombs(dat)
}
k <- attr(xu,"index")
## shuffle rows in order to avoid induced dependencies between discretized
......@@ -153,17 +155,32 @@ compress.df <- function(dat,m=NULL) {
xu[ii,] <- xu ## shuffle rows of xu
k <- ii[k] ## correct k index accordingly
## ... finished shuffle
xu <- as.data.frame(xu)
#xu <- as.data.frame(xu)
## set names and levels to match dat...
names(xu) <- names(dat)
for (i in 1:d) if (is.factor(dat[,i])) {
xu[,i] <- as.factor(xu[,i])
levels(xu[,i]) <- levels(dat[,i])
}
#names(xu) <- names(dat)
#for (i in 1:d) if (is.factor(dat[,i])) {
# xu[,i] <- as.factor(xu[,i])
# levels(xu[,i]) <- levels(dat[,i])
#}
k -> attr(xu,"index")
xu
} ## compress.df
check.term <- function(term,rec) {
## utility function for discrete.mf. Checks whether variables in "term"
## have already been discretized, and if so whether this discretization
## can be re-used for the current "term". Stops if term already discretized
## but we can't re-use discretization. Otherwise returns index of k index
## or 0 if the term is not in the existing list.
ii <- which(rec$vnames%in%term)
if (length(ii)) { ## at least one variable already discretized
if (length(term)==rec$d[min(ii)]) { ## dimensions match previous discretization
if (sum(!(term%in%rec$vnames[ii]))) ("bam can not discretize with this nesting structure")
else return(rec$ki[min(ii)]) ## all names match previous - retun index of previous
} else stop("bam can not discretize with this nesting structure")
} else return(0) ## no match
} ## check.term
discrete.mf <- function(gp,mf,pmf,m=NULL) {
## discretize the covariates for the terms specified in smooth.spec
## id and factor by not allowed. pmf is a model frame for just the
......@@ -171,36 +188,80 @@ discrete.mf <- function(gp,mf,pmf,m=NULL) {
## NOTE: matrix arguments not allowed but not caught yet.
mf0 <- list()
nk <- 0 ## count number of index vectors to avoid too much use of cbind
for (i in 1:length(gp$smooth.spec)) nk <- nk +
for (i in 1:length(gp$smooth.spec)) nk <- nk + as.numeric(gp$smooth.spec[[i]]$by!="NA") +
if (inherits(gp$smooth.spec[[i]],"tensor.smooth.spec")) length(gp$smooth.spec[[i]]$margin) else 1
k <- matrix(0,nrow(mf),nk)
ik <- 0 ## index counter
nr <- rep(0,nk) ## number of rows for term
## structure to record terms already processed...
rec <- list(vnames = rep("",0), ## variable names
ki = rep(0,0), ## index of original index vector var relates to
d = rep(0,0)) ## dimension of terms involving this var
## loop through the terms discretizing the covariates...
for (i in 1:length(gp$smooth.spec)) {
mi <- if (is.null(m)||length(m)==1) m else m[i]
if (!is.null(gp$smooth.spec[[i]]$id)) stop("discretization can not handle smooth ids")
## deal with any by variable (should always go first as basically a 1D marginal)...
if (gp$smooth.spec[[i]]$by!="NA") {
##stop("currently discrete methods do not handle by variables")
termi <- gp$smooth.spec[[i]]$by ## var name
##if (is.factor(mf[termi])) stop("discretization can not handle factor by variables")
ik.prev <- check.term(termi,rec) ## term already discretized?
ik <- ik + 1 ## increment index counter
if (ik.prev==0) { ## new discretization required
mfd <- compress.df(mf[termi],m=mi)
k[,ik] <- attr(mfd,"index")
nr[ik] <- nrow(mfd)
mf0 <- c(mf0,mfd)
## record variable discretization info...
rec$vnames <- c(rec$vnames,termi)
rec$ki <- c(rec$ki,ik)
rec$d <- c(rec$d,1)
} else { ## re-use an earlier discretization...
k[,ik] <- k[,ik.prev]
nr[ik] <- nr[ik.prev]
}
##mf0[[gp$smooth.spec[[i]]$by]] <- rep(1,nr[ik]) ## actual by variables are handled by discrete methods
}
if (inherits(gp$smooth.spec[[i]],"tensor.smooth.spec")) { ## tensor branch
for (j in 1:length(gp$smooth.spec[[i]]$margin)) { ## loop through margins
mfd <- compress.df(mf[gp$smooth.spec[[i]]$margin[[j]]$term],m=mi)
termj <- gp$smooth.spec[[i]]$margin[[j]]$term
ik.prev <- check.term(termj,rec)
ik <- ik + 1
if (ik.prev==0) { ## new discretization required
mfd <- compress.df(mf[termj],m=mi)
k[,ik] <- attr(mfd,"index")
nr[ik] <- nrow(mfd)
mf0 <- c(mf0,mfd)
## record details...
d <- length(termj)
rec$vnames <- c(rec$vnames,termj)
rec$ki <- c(rec$ki,rep(ik,d))
rec$d <- c(rec$d,rep(d,d))
} else { ## re-use an earlier discretization...
k[,ik] <- k[,ik.prev]
nr[ik] <- nr[ik.prev]
}
}
} else { ## not te or ti...
termi <- gp$smooth.spec[[i]]$term
ik.prev <- check.term(termi,rec)
ik <- ik + 1 ## index counter
if (ik.prev==0) { ## new discretization required
mfd <- compress.df(mf[termi],m=mi)
k[,ik] <- attr(mfd,"index")
nr[ik] <- nrow(mfd)
mf0 <- c(mf0,mfd)
mf0 <- c(mf0,mfd)
d <- length(termi)
rec$vnames <- c(rec$vnames,termi)
rec$ki <- c(rec$ki,rep(ik,d))
rec$d <- c(rec$d,rep(d,d))
} else { ## re-use an earlier discretization...
k[,ik] <- k[,ik.prev]
nr[ik] <- nr[ik.prev]
}
} else { ## not te or ti...
mfd <- compress.df(mf[gp$smooth.spec[[i]]$term],m=mi)
ik <- ik + 1
k[,ik] <- attr(mfd,"index")
nr[ik] <- nrow(mfd)
mf0 <- c(mf0,mfd)
}
## deal with any by variable...
if (gp$smooth.spec[[i]]$by!="NA") {
stop("currently discrete methods do not handle by variables")
if (is.factor(mf[gp$smooth.spec[[i]]$by])) stop("discretization can not handle factor by variables")
if (!is.null(gp$smooth.spec[[i]]$id)) stop("discretization can not handle smooth ids")
mf0[[gp$smooth.spec[[i]]$by]] <- rep(1,nr[ik]) ## actual by variables are handled by discrete methods
}
} ## main term loop
## pad mf0 so that all rows are the same length
## padding is necessary if gam.setup is to be used for setup
......@@ -223,6 +284,30 @@ discrete.mf <- function(gp,mf,pmf,m=NULL) {
mf <- mf[1:maxr,]
for (na in names(mf0)) mf[[na]] <- mf0[[na]]
## finally one more pass through, expanding k and nr to deal with replication that
## will occur with factor by variables...
ik <- ncol(k)+1 ## starting index col for this term in k
for (i in length(gp$smooth.spec):1) { ## work down through terms so insertion painless
if (inherits(gp$smooth.spec[[i]],"tensor.smooth.spec")) nd <-
length(gp$smooth.spec[[i]]$margin) else nd <- 1 ## number of indices
ik <- ik - nd ## starting index if no by
if (gp$smooth.spec[[i]]$by!="NA") {
ik <- ik - 1 ## first index
nd <- nd + 1 ## number of indices
byvar <- mf[[gp$smooth.spec[[i]]$by]]
if (is.factor(byvar)) { ## then need to expand nr and index matrix
nex <- length(levels(byvar)) ## number of copies of term indices
if (is.ordered(byvar)) nex <- nex - 1 ## first level dropped
if (nex>0) { ## insert index copies
ii0 <- if (ik>1) 1:(ik-1) else rep(0,0) ## earlier
ii1 <- if (ik+nd-1 < length(nr)) (ik+nd):length(nr) else rep(0,0) ## later
ii <- ik:(ik+nd-1) ## cols for this term
nr <- c(nr[ii0],rep(nr[ii],nex),nr[ii1])
k <- cbind(k[,ii0],k[,rep(ii,nex)],k[,ii1])
}
} ## factor by
} ## existing by
} ## smooth.spec loop
list(mf=mf,k=k,nr=nr)
} ## discrete.mf
......@@ -269,6 +354,7 @@ mini.mf <-function(mf,chunk.size) {
} else if (is.factor(mf[[j]])) { ## factor variable...
## randomly sample one row from each factor level...
find <- apply(X=as.matrix(levels(mf[[j]])),MARGIN=1,FUN=fun,fac=mf[[j]],ind=ind)
find <- find[is.finite(find)] ## in case drop.unused.levels==FALSE, so that there ar levels without rows
nf <- length(find)
mf0[(k+1):(k+nf),] <- mf[find,]
k <- k + nf
......@@ -577,6 +663,10 @@ bgam.fit <- function (G, mf, chunk.size, gp ,scale ,gamma,method, coef=NULL,etas
if (!is.null(cl)&&inherits(cl,"cluster")) {
n.threads <- length(cl)
while(nobs/n.threads < ncol(G$X)) n.threads <- n.threads - 1
if (n.threads < length(cl)) {
warning("Too many cluster nodes to use all efficiently")
}
} else n.threads <- 1
if (n.threads>1) { ## set up thread argument lists
......@@ -1066,7 +1156,7 @@ pabapr <- function(arg) {
na.action=arg$na.action)
}
predict.bam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
predict.bam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,exclude=NULL,
block.size=50000,newdata.guaranteed=FALSE,na.action=na.pass,
cluster=NULL,...) {
## function for prediction from a bam object, possibly in parallel
......@@ -1084,11 +1174,11 @@ predict.bam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
if (n < 100*n.threads) n.threads <- 1 ## not worth the overheads
if (n.threads==1) { ## single threaded call
if (missing(newdata)) return(
predict.gam(object,newdata=object$model,type=type,se.fit=se.fit,terms=terms,
predict.gam(object,newdata=object$model,type=type,se.fit=se.fit,terms=terms,exclude=exclude,
block.size=block.size,newdata.guaranteed=newdata.guaranteed,
na.action=na.action,...)
) else return(
predict.gam(object,newdata=newdata,type=type,se.fit=se.fit,terms=terms,
predict.gam(object,newdata=newdata,type=type,se.fit=se.fit,terms=terms,exclude=exclude,
block.size=block.size,newdata.guaranteed=newdata.guaranteed,
na.action=na.action,...))
} else { ## parallel call...
......@@ -1099,7 +1189,7 @@ predict.bam <- function(object,newdata,type="link",se.fit=FALSE,terms=NULL,
for (i in 1:n.threads) {
n0 <- n1+1;n1 <- n1+nt[i]
ind <- n0:n1 ## this thread's data block from mf
arg[[i]] <- list(object=object,type=type,se.fit=se.fit,terms=terms,
arg[[i]] <- list(object=object,type=type,se.fit=se.fit,terms=terms,exclude=exclude,
block.size=block.size,newdata.guaranteed=newdata.guaranteed,
na.action=na.action)
arg[[i]]$object$model <- object$model[1:2,] ## save space
......@@ -1159,8 +1249,11 @@ bam.fit <- function(G,mf,chunk.size,gp,scale,gamma,method,rho=0,
if (n>chunk.size) { ## then use QR accumulation approach
if (!is.null(cl)&&inherits(cl,"cluster")) {
## require(parallel)
n.threads <- length(cl)
while(n/n.threads < ncol(G$X)) n.threads <- n.threads - 1
if (n.threads < length(cl)) {
warning("Too many cluster nodes to use all efficiently")
}
} else n.threads <- 1
G$coefficients <- rep(0,ncol(G$X))
......@@ -1472,12 +1565,32 @@ tero <- function(sm) {
sm
} ## tero
AR.resid <- function(rsd,rho=0,AR.start=NULL) {
## standardised residuals for AR1 model
if (rho==0) return(rsd)
ld <- 1/sqrt(1-rho^2) ## leading diagonal of root inverse correlation
sd <- -rho*ld ## sub diagonal
N <- length(rsd)
## see rwMatrix() for how following are used...
ar.row <- c(1,rep(1:N,rep(2,N))[-c(1,2*N)]) ## index of rows to reweight
ar.weight <- c(1,rep(c(sd,ld),N-1)) ## row weights
ar.stop <- c(1,1:(N-1)*2+1) ## (stop[i-1]+1):stop[i] are the rows to reweight to get ith row
if (!is.null(AR.start)) { ## need to correct the start of new AR sections...
ii <- which(AR.start==TRUE)
if (length(ii)>0) {
if (ii[1]==1) ii <- ii[-1] ## first observation does not need any correction
ar.weight[ii*2-2] <- 0 ## zero sub diagonal
ar.weight[ii*2-1] <- 1 ## set leading diagonal to 1
}
}
rwMatrix(ar.stop,ar.row,ar.weight,rsd)
} ## AR.resid
bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,na.action=na.omit,
offset=NULL,method="fREML",control=list(),scale=0,gamma=1,knots=NULL,sp=NULL,
min.sp=NULL,paraPen=NULL,chunk.size=10000,rho=0,AR.start=NULL,discrete=FALSE,
sparse=FALSE,cluster=NULL,
nthreads=NA,gc.level=1,use.chol=FALSE,samfrac=1,drop.unused.levels=TRUE,G=NULL,fit=TRUE,...)
sparse=FALSE,cluster=NULL,nthreads=NA,gc.level=1,use.chol=FALSE,samfrac=1,
drop.unused.levels=TRUE,G=NULL,fit=TRUE,...)
## Routine to fit an additive model to a large dataset. The model is stated in the formula,
## which is then interpreted to figure out which bits relate to smooth terms and which to
......@@ -1488,6 +1601,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
## 'n.threads' is number of threads to use for non-cluster computation (e.g. combining
## results from cluster nodes). If 'NA' then is set to max(1,length(cluster)).
{ control <- do.call("gam.control",control)
if (control$trace) t3 <- t2 <- t1 <- t0 <- proc.time()
if (is.null(G)) { ## need to set up model!
if (is.character(family))
family <- eval(parse(text = family))
......@@ -1525,15 +1639,17 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
gp <- interpret.gam(formula) # interpret the formula
if (discretize) {
## re-order the tensor terms for maximum efficiency, and
## signal that "re" terms should be constructed with marginals
## signal that "re"/"fs" terms should be constructed with marginals
## also for efficiency
for (i in 1:length(gp$smooth.spec)) {
if (length(gp$smooth.spec)>0) for (i in 1:length(gp$smooth.spec)) {
if (inherits(gp$smooth.spec[[i]],"tensor.smooth.spec"))
gp$smooth.spec[[i]] <- tero(gp$smooth.spec[[i]])
if (inherits(gp$smooth.spec[[i]],"re.smooth.spec")&&gp$smooth.spec[[i]]$dim>1) {
gp$smooth.spec[[i]]$xt <- "tensor"
class(gp$smooth.spec[[i]]) <- c("re.smooth.spec","tensor.smooth.spec")
if (inherits(gp$smooth.spec[[i]],c("re.smooth.spec","fs.smooth.spec"))&&gp$smooth.spec[[i]]$dim>1) {
#gp$smooth.spec[[i]]$xt <- "tensor"
class(gp$smooth.spec[[i]]) <- c(class(gp$smooth.spec[[i]]),"tensor.smooth.spec")
##c("re.smooth.spec","tensor.smooth.spec")
gp$smooth.spec[[i]]$margin <- list()
## only ok for 'fs' with univariate metric variable (caught in 'fs' construcor)...
for (j in 1:gp$smooth.spec[[i]]$dim) gp$smooth.spec[[i]]$margin[[j]] <- list(term=gp$smooth.spec[[i]]$term[j])
}
}
......@@ -1583,19 +1699,40 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
dk <- discrete.mf(gp,mf,pmf,m=discrete)
mf0 <- dk$mf ## padded discretized model frame
sparse.cons <- 0 ## default constraints required for tensor terms
## reset by variables to "NA" for gam.setup call
#by.ind <- rep(0,0); by.name <- rep("",0)
#for (i in 1:length(gp$smooth)) if (gp$smooth.spec[[i]]$by!="NA") {
# by.ind <- c(by.ind,i) ## index of smooth.spec with reset by name
# by.name <- c(by.name,gp$smooth.spec[[i]]$by)
# gp$smooth.spec[[i]]$by <- "NA" ## as if == 1 during setup
# gp$smooth.spec[[i]]$C <- matrix(0,0,0) ## signal no constraint because of by
#}
} else {
mf0 <- mini.mf(mf,chunk.size)
if (sparse) sparse.cons <- 2 else sparse.cons <- -1
}
rm(pmf); ## no further use
G <- gam.setup(gp,pterms=pterms,
if (control$trace) t1 <- proc.time()
reset <- TRUE
while (reset) {
G <- gam.setup(gp,pterms=pterms,
data=mf0,knots=knots,sp=sp,min.sp=min.sp,
H=NULL,absorb.cons=TRUE,sparse.cons=sparse.cons,select=FALSE,
idLinksBases=TRUE,scale.penalty=control$scalePenalty,
paraPen=paraPen)
paraPen=paraPen,apply.by=!discretize)
if (!discretize&&ncol(G$X)>=chunk.size) { ## no point having chunk.size < p
chunk.size <- 4*ncol(G$X)
warning(gettextf("chunk.size < number of coefficients. Reset to %d",chunk.size))
if (chunk.size>=nrow(mf)) { ## no sense splitting up computation
mf0 <- mf ## just use full dataset
} else reset <- FALSE
} else reset <- FALSE
}
if (control$trace) t2 <- proc.time()
if (discretize) {
## reset any by variable names in smooth list (from "NA" back to original)...
#if (length(by.ind)) for (i in 1:length(by.ind)) G$smooth[[by.ind[i]]]$by <- by.name[i]
v <- G$Xd <- list()
## have to extract full parametric model matrix from pterms and mf
G$Xd[[1]] <- model.matrix(G$pterms,mf)
......@@ -1612,16 +1749,33 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
drop <- rep(0,0) ## index of te related columns to drop
for (i in 1:length(G$smooth)) {
ts[kb] <- k
dt[kb] <- length(G$smooth[[i]]$margin)
if (inherits(G$smooth[[i]],"tensor.smooth")) {
## first deal with any by variable (as first marginal of tensor)...
if (G$smooth[[i]]$by!="NA") {
dt[kb] <- 1
by.var <- dk$mf[[G$smooth[[i]]$by]][1:dk$nr[k]]
if (is.factor(by.var)) {
## create dummy by variable...
by.var <- as.numeric(by.var==G$smooth[[i]]$by.level)
}
G$Xd[[k]] <- matrix(by.var,dk$nr[k],1)
k <- k + 1
} else dt[kb] <- 0
## ... by done
if (inherits(G$smooth[[i]],"tensor.smooth")) {
nmar <- length(G$smooth[[i]]$margin)
dt[kb] <- dt[kb] + nmar
if (inherits(G$smooth[[i]],"random.effect")&&!is.null(G$smooth[[i]]$rind)) {
## terms re-ordered for efficiency, so the same has to be done on indices...
rind <- k:(k+dt[kb]-1)
dk$nr[rind] <- dk$nr[k+G$smooth[[i]]$rind-1]
G$kd[,rind] <- G$kd[,k+G$smooth[[i]]$rind-1]
}
for (j in 1:dt[kb]) {
} else if (inherits(G$smooth[[i]],"fs.interaction")&&which(G$smooth[[i]]$fterm==G$smooth[[i]]$term)!=1) {
## have to reverse the terms because tensor representation assumes factor is first
dk$nr[k:(k+1)] <- dk$nr[(k+1):k]
G$kd[,k:(k+1)] <- G$kd[,(k+1):k]
}
for (j in 1:nmar) {
G$Xd[[k]] <- G$smooth[[i]]$margin[[j]]$X[1:dk$nr[k],,drop=FALSE]
k <- k + 1
}
......@@ -1641,9 +1795,9 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
v[[kb]] <- rep(0,0) ##
if (!inherits(qrc,"character")||qrc!="no constraints") warning("unknown tensor constraint type")
}
} else {
} else { ## not a tensor smooth
v[[kb]] <- rep(0,0)
dt[kb] <- 1
dt[kb] <- dt[kb] + 1
G$Xd[[k]] <- G$X[1:dk$nr[k],G$smooth[[i]]$first.para:G$smooth[[i]]$last.para]
k <- k + 1
}
......@@ -1653,11 +1807,11 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
## ... Xd is the list of discretized model matrices, or marginal model matrices
## kd contains indexing vectors, so the ith model matrix or margin is Xd[[i]][kd[i,],]
## ts[i] is the starting matrix in Xd for the ith model matrix, while dt[i] is the number
## of elements of Xd that make it up (1 for a dingleton, more for a tensor).
## of elements of Xd that make it up (1 for a singleton, more for a tensor).
## v is list of Householder vectors encoding constraints and qc the constraint indicator.
G$v <- v;G$ts <- ts;G$dt <- dt;G$qc <- qc
} ## if (discretize)
if (control$trace) t3 <- proc.time()
G$sparse <- sparse
## no advantage to "fREML" with no free smooths...
......@@ -1678,7 +1832,7 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
if (ncol(G$X)>nrow(mf)) stop("Model has more coefficients than data")
if (ncol(G$X) > chunk.size) { ## no sense having chunk.size < p
if (ncol(G$X) > chunk.size && !discretize) { ## no sense having chunk.size < p
chunk.size <- 4*ncol(G$X)
warning(gettextf("chunk.size < number of coefficients. Reset to %d",chunk.size))
}
......@@ -1763,6 +1917,8 @@ bam <- function(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,n
if (gc.level>0) gc()
if (control$trace)