%- maybe also `usage' for other objects documented here.
...
...
@@ -23,20 +22,30 @@ suggests that best p-value performance results from using ML estimated smoothing
\item{dispersion}{ a value for the dispersion parameter: not normally used.}
\item{test}{what sort of test to perform for a multi-model call. One of
\code{"Chisq"}, \code{"F"} or \code{"Cp"}. }
\item{freq}{whether to use frequentist or Bayesian approximations for single smooth term
\item{freq}{whether to use frequentist or Bayesian approximations for parametric term
p-values. See \code{\link{summary.gam}} for details.}
\item{p.type}{selects exact test statistic to use for single smooth term p-values. See
\code{\link{summary.gam}} for details.}
\item{digits}{number of digits to use when printing output.}
}
\details{ If more than one fitted model is provided than \code{anova.glm} is
used. If only one model is provided then the significance of each model term
is assessed using Wald tests: see \code{\link{summary.gam}} for details of the
actual computations.
In the latter case \code{print.anova.gam} is used as the
printing method. Note that the p-values for smooth terms are approximate only:
simulation evidence suggests that they work best with ML smoothness selection
(REML coming a close second).
used, with the difference in model degrees of freedom being taken as the difference
in effective degress of freedom. The p-values resulting from this are only approximate,
and must be used with care. The approximation is most accurate when the comparison
relates to unpenalized terms, or smoothers with a null space of dimension greater than zero.
(Basically we require that the difference terms could be well approximated by unpenalized
terms with degrees of freedom approximately the effective degrees of freedom). In simualtions the
p-values are usually slightly to low. For terms with a zero-dimensional null space
(i.e. those which can be penalized to zero) the approximation is often very poor, and significance
can be greatly overstated: i.e. p-values are often substantially too low. This case applies to random effect terms.
Note also that in the multi-model call to \code{anova.gam}, it is quite possible for a model with more terms to end up with lower effective degrees of freedom, but better fit, than the notionally null model with fewer terms. In such cases it is very rare that it makes sense to perform any sort of test, since there is then no basis on which to accept the notional null model.
If only one model is provided then the significance of each model term
is assessed using Wald tests: see \code{\link{summary.gam}} for details. The p-values
provided here are better justified than in the multi model case, and have close to the
correct distribution under the null for smooths with a non-zero dimensional null space (i.e. terms that can-not be penalized to zero). ML or REML smoothing parameter selection leads to the best results in simulations as they tend to avoid occasional severe undersmoothing. In the single model case \code{print.anova.gam} is used as the
printing method.
}
...
...
@@ -54,7 +63,8 @@ which is in fact an object returned from \code{\link{summary.gam}}.
\author{ Simon N. Wood \email{simon.wood@r-project.org} with substantial
improvements by Henric Nilsson.}
\section{WARNING}{ P-values for smooth terms are only approximate.
\section{WARNING}{ If models 'a' and 'b' differ only in terms with no un-penalized components then
p values from anova(a,b) are unreliable, and usually much too low.
@@ -17,21 +17,9 @@ with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines
fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths are
available using penalized thin plate regression splines (isotropic) or tensor product splines
(when an isotropic smooth is inappropriate). For an overview of the smooths available see \code{\link{smooth.terms}}.
For more on specifying models see \code{\link{gam.models}}, \code{\link{random.effects}} and \code{\link{linear.functional.terms}}.
For more on model selection see \code{\link{gam.selection}}.
For more on specifying models see \code{\link{gam.models}}, \code{\link{random.effects}} and \code{\link{linear.functional.terms}}. For more on model selection see \code{\link{gam.selection}}. Do read \code{\link{gam.check}} and \code{\link{choose.k}}.
\code{gam()} is not a clone of what S-PLUS provides: the major
differences are (i) that by default estimation of the
degree of smoothness of model terms is part of model fitting, (ii) a
Bayesian approach to variance estimation is employed that makes for easier
confidence interval calculation (with good coverage probabilities), (iii) that the model
can depend on any (bounded) linear functional of smooth terms, (iv) the parametric part of the model can be penalized, (v) simple random effects can be incorporated, and
(vi) the facilities for incorporating smooths of more than one variable are
different: specifically there are no \code{lo} smooths, but instead (a) \code{s}
terms can have more than one argument, implying an isotropic smooth and (b) \code{te} smooths are
provided as an effective means for modelling smooth interactions of any
number of variables via scale invariant tensor product smooths. See \link[gam]{gam}
from package \code{gam}, for GAMs via the original Hastie and Tibshirani approach.
See \link[gam]{gam} from package \code{gam}, for GAMs via the original Hastie and Tibshirani approach (see details for differences to this implementation).
For very large datasets see \code{\link{bam}}, for mixed GAM see \code{\link{gamm}} and \code{\link{random.effects}}.
}
...
...
@@ -255,7 +243,21 @@ general linear functionals of smooths, via the summation convention mechanism de
Details of the default underlying fitting methods are given in Wood (2011
and 2004). Some alternative methods are discussed in Wood (2000 and 2006).
}
\code{gam()} is not a clone of Trevor Hastie's oroginal (as supplied in S-PLUS or package \link[gam]{gam}) The major
differences are (i) that by default estimation of the
degree of smoothness of model terms is part of model fitting, (ii) a
Bayesian approach to variance estimation is employed that makes for easier
confidence interval calculation (with good coverage probabilities), (iii) that the model
can depend on any (bounded) linear functional of smooth terms, (iv) the parametric part of the model can be penalized,
(v) simple random effects can be incorporated, and
(vi) the facilities for incorporating smooths of more than one variable are
different: specifically there are no \code{lo} smooths, but instead (a) \code{\link{s}}
terms can have more than one argument, implying an isotropic smooth and (b) \code{\link{te}} or \code{\link{t2}} smooths are
provided as an effective means for modelling smooth interactions of any
number of variables via scale invariant tensor product smooths. Splines on the sphere, Duchon splines
and Gaussian Markov Random Fields are also available. See \link[gam]{gam}
from package \code{gam}, for GAMs via the original Hastie and Tibshirani approach.
\value{A vector of reference quantiles for the residual distribution, if these can be computed.}
\details{ This function plots 4 standard diagnostic plots, some smoothing parameter estimation
\details{ Checking a fitted \code{gam} is like checking a fitted \code{glm}, with two main differences. Firstly,
the basis dimensions used for smooth terms need to be checked, to ensure that they are not so small that they force
oversmoothing: the defaults are arbitrary. \code{\link{choose.k}} provides more detail, but the diagnostic tests described below and reported by this function may also help. Secondly, fitting may not always be as robust to violation of the distributional assumptions as would be the case for a regular GLM, so slightly more care may be needed here. In particular, the thoery of quasi-likelihood implies that if the mean variance relationship is OK for a GLM, then other departures from the assumed distribution are not problematic: GAMs can sometimes be more sensitive. For example, un-modelled overdispersion will typically lead to overfit, as the smoothness selection criterion tries to reduce the scale parameter to the one specified. Similarly, it is not clear how sensitive REML and ML smoothness selection will be to deviations from the assumed response dsistribution. For these reasons this routine uses an enhanced residual QQ plot.
This function plots 4 standard diagnostic plots, some smoothing parameter estimation
convergence information and the results of tests which may indicate if the smoothing basis dimension
for a term is too low.
Usually the 4 plots are various residual plots. For the default optimization methods the convergence information is summarized in a
readable way, but for other optimization methods, whatever is returned by way of
Usually the 4 plots are various residual plots. For the default optimization methods the convergence information is summarized in a readable way, but for other optimization methods, whatever is returned by way of
convergence diagnostics is simply printed.
The test of whether the basis dimension for a smooth is adequate is based on computing an estimate of the residual variance
@@ -17,7 +17,9 @@ of parametric random effects. It can not be used for models with more coefficien
than \code{gamm} or \code{gamm4}, when the number of random effects is modest.
To facilitate the use of random effects with \code{gam}, \code{\link{gam.vcomp}} is a utility routine for converting
smoothing parameters to variance components. It also provides confidence intervals, if smoothness estimation is by ML or REML.
smoothing parameters to variance components. It also provides confidence intervals, if smoothness estimation is by ML or REML.
Note that treating random effects as smooths does not remove the usual problems associated with testing variance components for equality to zero: see \code{\link{summary.gam}} and \code{\link{anova.gam}}.
(this can be calculated efficiently without forming the pseudoinverse explicitly). \eqn{T}{T} is compared to a
chi-squared distribution with degrees of freedom given by the EDF for the term,
(this can be calculated efficiently without forming the pseudoinverse explicitly). \eqn{T}{T} is compared to an approximation to an appropriate mizture of chi-squared distributions with degrees of freedom given by the EDF for the term,
or \eqn{T}{T} is used as a component in an F ratio statistic if the
scale parameter has been estimated.
...
...
@@ -84,12 +69,32 @@ approximation varying smoothly between the bounding integer rank approximations,
biased rounding of the EDF: values less than .05 above the preceding integer are rounded down, while other values are rounded up. Another option (\code{p.type==-1}) uses a statistic of formal rank given by the number of coefficients for the smooth, but with its terms weighted by the eigenvalues of the covariance matrix, so that penalized terms are down-weighted, but the null distribution requires simulation. Other options for \code{p.type} are 2 (naive rounding), 3 (round up), 4 (numerical rank determination): these are poor options for theoretically known reasons, and will generate a warning.
The resulting p-value also has a Bayesian interpretation:
the probability of observing an \eqn{\bf f}{f} less probable than \eqn{\bf 0}{0}, under the approximation for the posterior for \eqn{\bf f}{f} implied by the truncation used in the test statistic.
the probability of observing an \eqn{\bf f}{f} less probable than \eqn{\bf 0}{0},
under the approximation for the posterior for \eqn{\bf f}{f} implied by the truncation used in the test statistic.
Note that the p-values distributional approximations start to break down below one effective degree of freedom, and p-values are not reported below 0.5 degrees of freedom.
Note that for terms with no unpenalized terms the Nychka (1988) requirement for smoothing bias to be substantially
less than variance breaks down (see e.g. appendix of Marra and Wood, 2012), and this results in incorrect null distribution
for p-values computed using the above approach. In this case it is necessary to fall back on slightly cruder frequentist approximations
(which may overstate significance a little). The frequentist covariance matrix is used in place of the Bayesian version, and the statistic rank is set to 1 for EDF < 1. In the case of random effects, a further modification is required, since the eigen spectrum of the penalty is then flat and a good unpenalized approximation with rank given by the EDF of the term is not generally available, further breaking the theory used for other smoothers. In this case the rank of the test statistic is set to the full rank of the term, and the p-value relates to testing whether the individual random effects were in fact all zero (despite the estimated posterior modes being those observed).
In simulations the p-values have best behaviour under ML smoothness selection, with REML coming second.
If \code{p.type=5} then the frequentist approximation for p-values of smooth terms described in section
4.8.5 of Wood (2006) is used. The approximation is not great. If \eqn{ {\bf p}_i}{p_i}
is the parameter vector for the ith smooth term, and this term has estimated
covariance matrix \eqn{ {\bf V}_i}{V_i} then the
statistic is \eqn{ {\bf p}_i^\prime {\bf V}_i^{k-} {\bf