Commit 41a93e0b authored by Jerome Benoit's avatar Jerome Benoit

Imported Upstream version 0.999e

parent 87188930
This software is licensed under the GNU GPL v2, with two modifications,
as follows:
--An application hosted on a server and remotely operated by users, such
as a web application or database server, is understood to be distribution
of the software, and therefore all GPL v2 clauses regarding distribution
apply. For example, a web application must include a link for downloading
the application source code.
--You are explicitly granted permission to link this software to other
code licensed under other licenses, such as GPL v3 or the BSD license.
Linking to a differently-licensed code base does not free this code (or
the combination) of the stipulations of the GPLv2 plus the above clause.
This diff is collapsed.
......@@ -5,12 +5,19 @@ AUTOMAKE_OPTIONS = \
dist-bzip2 \
dist-zip
AM_DISTCHECK_CONFIGURE_FLAGS ?= \
--disable-maintainer-mode \
--enable-extended-tests
AM_CFLAGS = -g -Wall -O3
## Library versioning (C:R:A == current:revision:age)
LIBAPOPHENIA_LT_VERSION = 0:0:0
## 0.999b 0:0:0
## 0.999c 1:0:0
## 0.999e 2:0:0
LIBAPOPHENIA_LT_VERSION = 2:0:0
SUBDIRS = transform model . cmd tests docs eg
SUBDIRS = transform model . cmd eg tests docs
include_HEADERS = apop.h
......@@ -57,7 +64,6 @@ libapopkernel_la_SOURCES = \
apop_rake.c \
apop_regression.c \
apop_settings.c \
apop_smoothing.c \
apop_sort.c \
apop_stats.c \
apop_tests.c \
......@@ -90,7 +96,6 @@ libapophenia_la_LIBADD = \
$(LIBM)
EXTRA_DIST = \
COPYING2 \
rpm.spec \
apophenia.pc.in \
apophenia.map
......
This diff is collapsed.
......@@ -50,4 +50,4 @@ Thanks for your interest. I do hope that Apophenia helps you learn more from you
--BK
PS: Lawyers, please note that COPYING and COPYING2 files can be found in the install/ directory.
PS: Lawyers, please note that a file named COPYING in the install/ directory describes how this package is licensed under GPLv2.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -2,7 +2,7 @@
adaptive rejection metropolis sampling */
/** (C) Wally Gilks; see documentation below for details.
Adaptations for Apophenia (c) 2009 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2. */
Adaptations for Apophenia (c) 2009 by Ben Klemens. Licensed under the GPLv2; see COPYING. */
#include "apop_internal.h"
......@@ -74,21 +74,17 @@ Apop_settings_init(apop_arms,
void distract_doxygen_arms(){/*Doxygen gets thrown by the settings macros. This decoy function is a workaround. */}
/** \brief Adaptive rejection metropolis sampling.
This is a function to make random draws from any univariate distribution (more or less).
/** Adaptive rejection Metropolis sampling, to make random draws from a univariate distribution.
The author, Wally Gilks, explains on
http://www.amsta.leeds.ac.uk/~wally.gilks/adaptive.rejection/web_page/Welcome.html, that
http://www.amsta.leeds.ac.uk/~wally.gilks/adaptive.rejection/web_page/Welcome.html , that
``ARS works by constructing an envelope function of the log of the target density, which is then used in rejection sampling (see, for example, Ripley, 1987). Whenever a point is rejected by ARS, the envelope is updated to correspond more closely to the true log density, thereby reducing the chance of rejecting subsequent points. Fewer ARS rejection steps implies fewer point-evaluations of the log density.''
\li It accepts only functions with univariate inputs. I.e., it will put a single value in the vector part of a \ref apop_data set, and then evaluate the log likelihood at that point.
\li It accepts only functions with univariate inputs. I.e., it will put a single value into a 1x1 \ref apop_data set, and then evaluate the log likelihood at that point. For multivariate situations, see \ref apop_model_metropolis.
\li It is currently the default for the \ref apop_draw function, so you can just call that if you prefer.
\li It is currently the default for the \ref apop_draw function given a univariate model, so you can just call that if you prefer.
\li There are a great number of parameters, in the \c apop_arms_settings structure. The structure also holds a history of the points tested to date. That means that the system will be more accurate as more draws are made. It also means that if the parameters change, or you use \ref apop_model_copy, you should call <tt>Apop_settings_rm_group(your_model, apop_arms)</tt> to clear the model of points that are not valid for a different situation.
\li See \ref apop_arms_settings for the list of parameters that you may want to set, via a form like <tt>apop_model_add_group(your_model, apop_arms, .model=your_model, .xl=8, .xr =14);</tt>. The \c model element is mandatory; you'll get a run-time complaint if you forget it.
*/
int apop_arms_draw (double *out, gsl_rng *r, apop_model *m){
apop_arms_settings *params = Apop_settings_get_group(m, apop_arms);
......@@ -529,8 +525,8 @@ double logshift(double y, double y0){
double perfunc(apop_arms_settings *params, double x){
// to evaluate log density and increment count of evaluations
Staticdef( apop_data *, d , apop_data_alloc(1));
d->vector->data[0] = x;
Staticdef( apop_data *, d , apop_data_alloc(1,1));
d->matrix->data[0] = x;
double y = apop_log_likelihood(d, params->model);
Apop_assert(isfinite(y), "Evaluating the log likelihood of %g returned %g.", x, y);
(params->neval)++; // increment count of function evaluations
......
This diff is collapsed.
/** \file apop_bootstrap.c
Copyright (c) 2006--2007 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2. */
Copyright (c) 2006--2007 by Ben Klemens. Licensed under the GPLv2; see COPYING. */
#include "apop_internal.h"
......@@ -15,8 +15,6 @@ Copyright (c) 2006--2007 by Ben Klemens. Licensed under the modified GNU GPL v2
\li If you are confident that your code is debugged and would like a new stream of values every time your program runs (provided your runs are more than a second apart), seed with the time:
\include draw_some_normals.c
\ingroup convenience_fns
*/
gsl_rng *apop_rng_alloc(int seed){
static int first_use = 1;
......@@ -32,24 +30,24 @@ gsl_rng *apop_rng_alloc(int seed){
/** Give me a data set and a model, and I'll give you the jackknifed covariance matrix of the model parameters.
The basic algorithm for the jackknife (with many details glossed over): create a sequence of data
The basic algorithm for the jackknife (glossing over the details): create a sequence of data
sets, each with exactly one observation removed, and then produce a new set of parameter estimates
using that slightly shortened data set. Then, find the covariance matrix of the derived parameters.
Jackknife or bootstrap? As a broad rule of thumb, the jackknife works best on models that are closer to linear. The worse a linear approximation does (at the given data), the worse the jackknife approximates the variance.
\li Jackknife or bootstrap? As a broad rule of thumb, the jackknife works best on models
that are closer to linear. The worse a linear approximation does (at the given data),
the worse the jackknife approximates the variance.
Sample usage:
\code
apop_data_show(apop_jackknife_cov(your_data, your_model));
\endcode
\param in The data set. An \ref apop_data set where each row is a single data point.
\param model An \ref apop_model, that will be used internally by \ref apop_estimate.
\exception out->error=='n' \c NULL input data.
\return An \c apop_data set whose matrix element is the estimated covariance matrix of the parameters.
\see apop_bootstrap_cov
*/
For example:
\include jack.c
*/
apop_data * apop_jackknife_cov(apop_data *in, apop_model *model){
Apop_stopif(!in, apop_return_data_error(n), 0, "The data input can't be NULL.");
Get_vmsizes(in); //msize1, msize2, vsize
......@@ -98,40 +96,49 @@ apop_data * apop_jackknife_cov(apop_data *in, apop_model *model){
\param model An \ref apop_model, whose \c estimate method will be used here. (No default)
\param iterations How many bootstrap draws should I make? (default: 1,000)
\param rng An RNG that you have initialized, probably with \c apop_rng_alloc. (Default: an RNG from \ref apop_rng_get_thread)
\param keep_boots If 'y', then add a page to the output \ref apop_data set with the statistics calculated for each bootstrap iteration.
\param keep_boots Deprecated; use \c boot_store.
\param boot_store If not \c NULL, put the list of drawn parameter values here, with one parameter set per row. Sample use: <tt>apop_data *boots; apop_bootstrap_cov(data, model, .boot_store=&boots); apop_data_print(boots);</tt>
They are packed via \ref apop_data_pack, so use \ref apop_data_unpack if needed. (Default: 'n')
\code
apop_data *boot_output = apop_bootstrap_cov(your_data, your_model, .keep_boots='y');
apop_data *boot_stats = apop_data_get_page(boot_output, "<bootstrapped statistics>");
Apop_matrix_row(boot_stats->matrix, 27, row_27)
//If the output statistic is not just a vector, you'll need to use apop_data_unpack to put
//it into the right shape. Let's assume for now that it's just a vector:
printf("The statistics calculated on the 28th iteration:\n");
apop_vector_print(row_27);
gsl_vector *row_27 = Apop_rv(boot_stats, 27);
apop_data_print(apop_data_unpack(row_27));
\endcode
\param ignore_nans If \c 'y' and any of the elements in the estimation return \c NaN, then I will throw out that draw and try again. If \c 'n', then I will write that set of statistics to the list, \c NaN and all. I keep count of throw-aways; if there are more than \c iterations elements thrown out, then I throw an error and return with estimates using data I have so far. That is, I assume that \c NaNs are rare edge cases; if they are as common as good data, you might want to rethink how you are using the bootstrap mechanism. (Default: 'n')
\return An \c apop_data set whose matrix element is the estimated covariance matrix of the parameters.
\exception out->error=='n' \c NULL input data.
\exception out->error=='N' \c too many Nans.
\exception out->error=='N' \c too many NaNs.
\li This function uses the \ref designated syntax for inputs.
This example is a sort of demonstration of the Central Limit Theorem. The model is
a simulation, where each call to the estimation routine produces the mean/std dev of
a set of draws from a Uniform Distribution. Because the simulation takes no inputs,
\ref apop_bootstrap_cov simply re-runs the simulation and calculates a sequence of
mean/std dev pairs, and reports the covariance of that generated data set.
\include boot_clt.c
\see apop_jackknife_cov
*/
#ifdef APOP_NO_VARIADIC
apop_data * apop_bootstrap_cov(apop_data * data, apop_model *model, gsl_rng *rng, int iterations, char keep_boots, char ignore_nans){
apop_data * apop_bootstrap_cov(apop_data * data, apop_model *model, gsl_rng *rng, int iterations, char keep_boots, char ignore_nans, apop_data **boot_store){
#else
apop_varad_head(apop_data *, apop_bootstrap_cov) {
apop_data * apop_varad_var(data, NULL);
apop_model *model = varad_in.model;
int apop_varad_var(iterations, 1000);
Apop_stopif(!data, apop_return_data_error(n), 0, "The data input can't be NULL.");
gsl_rng * apop_varad_var(rng, apop_rng_get_thread());
char apop_varad_var(keep_boots, 'n');
apop_data** apop_varad_var(boot_store, NULL);
char apop_varad_var(ignore_nans, 'n');
return apop_bootstrap_cov_base(data, model, rng, iterations, keep_boots, ignore_nans);
return apop_bootstrap_cov_base(data, model, rng, iterations, keep_boots, ignore_nans, boot_store);
}
apop_data * apop_bootstrap_cov_base(apop_data * data, apop_model *model, gsl_rng *rng, int iterations, char keep_boots, char ignore_nans){
apop_data * apop_bootstrap_cov_base(apop_data * data, apop_model *model, gsl_rng *rng, int iterations, char keep_boots, char ignore_nans, apop_data **boot_store){
#endif
Get_vmsizes(data); //vsize, msize1, msize2
apop_model *e = apop_model_copy(model);
......@@ -140,11 +147,11 @@ apop_varad_head(apop_data *, apop_bootstrap_cov) {
*summary;
//prevent and infinite regression of covariance calculation.
Apop_model_add_group(e, apop_parts_wanted); //default wants for nothing.
size_t i, nan_draws=0;
apop_name *tmpnames = data->names; //save on some copying below.
data->names = NULL;
size_t i, nan_draws=0;
apop_name *tmpnames = (data && data->names) ? data->names : NULL; //save on some copying below.
if (data && data->names) data->names = NULL;
int height = GSL_MAX(msize1, GSL_MAX(vsize, data->textsize[0]));
int height = GSL_MAX(msize1, GSL_MAX(vsize, (data?(*data->textsize):0)));
for (i=0; i<iterations && nan_draws < iterations; i++){
for (size_t j=0; j< height; j++){ //create the data set
size_t randrow = gsl_rng_uniform_int(rng, height);
......@@ -168,7 +175,7 @@ apop_varad_head(apop_data *, apop_bootstrap_cov) {
apop_model_free(est);
gsl_vector_free(estp);
}
data->names = tmpnames;
if(data) data->names = tmpnames;
apop_data_free(subset);
apop_model_free(e);
int set_error=0;
......@@ -180,11 +187,11 @@ apop_varad_head(apop_data *, apop_bootstrap_cov) {
1, "I ran into %i NaNs, and so stopped. Returning results based "
"on %zu bootstrap iterations.", iterations, i);
summary = apop_data_covariance(array_of_boots);
gsl_matrix_scale(summary->matrix, 1./i);
if (keep_boots == 'n' || keep_boots == 'N')
if (!boot_store && (keep_boots == 'n' || keep_boots == 'N'))
apop_data_free(array_of_boots);
else
if (keep_boots != 'n' && keep_boots != 'N') //deprecated version
apop_data_add_page(summary, array_of_boots, "<Bootstrapped statistics>");
if (boot_store) *boot_store = array_of_boots;
if (set_error) summary->error = 'N';
return summary;
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/** \file apop_db_mysql.c
This file is included directly into \ref apop_db.c. It is read only if APOP_USE_MYSQL is defined.*/
/* Copyright (c) 2006--2007 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2. */
/* Copyright (c) 2006--2007 by Ben Klemens. Licensed under the GPLv2; see COPYING. */
#include <my_global.h>
#include <my_sys.h>
......@@ -79,7 +79,7 @@ static double apop_mysql_table_exists(char const *table, int delme){
static int get_name_row(unsigned int *num_fields, MYSQL_FIELD *fields){
for(size_t i = 0; i < *num_fields; i++)
if (!strcasecmp(fields[i].name, apop_opts.db_name_column)){
if (apop_opts.db_name_column && !strcasecmp(fields[i].name, apop_opts.db_name_column)){
(*num_fields)--;
return i;
}
......@@ -153,7 +153,7 @@ static void * process_result_set_chars (MYSQL *conn, MYSQL_RES *res_set) {
passed_name = 1;
continue;
}
apop_text_add(out, i, jj-passed_name, "%s", (row[jj]==NULL)? apop_opts.nan_string : row[jj]);
apop_text_set(out, i, jj-passed_name, "%s", (row[jj]==NULL)? apop_opts.nan_string : row[jj]);
}
}
check_and_clean(;)
......@@ -243,7 +243,7 @@ apop_data* apop_mysql_mixed_query(char const *intypes, char const *query){
if (c == 'n' || c =='N')
apop_name_add(out->names, row[j], 'r');
else if (c == 't'|| c=='T')
apop_text_add(out, i, thist++, "%s", (row[j]==NULL)? apop_opts.nan_string : row[j]);
apop_text_set(out, i, thist++, "%s", (row[j]==NULL)? apop_opts.nan_string : row[j]);
else if (c == 'v'|| c=='V'){
double valor = (!row[j] || !strcmp(row[j], "NULL")) ? NAN : atof(row[j]);
gsl_vector_set(out->vector, i, valor);
......
/** \file apop_db_sqlite.c
This file is included directly into \ref apop_db.c.
Copyright (c) 2006--2007 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2.
Copyright (c) 2006--2007 by Ben Klemens. Licensed under the GPLv2; see COPYING.
*/
#include <sqlite3.h>
#include <string.h>
......@@ -9,53 +9,8 @@ Copyright (c) 2006--2007 by Ben Klemens. Licensed under the modified GNU GPL v2
sqlite3 *db=NULL; //There's only one SQLite database handle. Here it is.
/** \page db_moments Database moments (plus pow()!)
SQLite lets users define new functions for use in queries, and Apophenia uses this facility to define a few common functions.
\li <tt>select ran() from table</tt> will produce a new random number between zero and one for every row of the input table, using \c gsl_rng_uniform.
\li The SQL standard includes the <tt>count(x)</tt> and <tt>avg(x)</tt> aggregators,
but statisticians are usually interested in higher moments as well---at least the
variance. Therefore, SQL queries using the Apophenia library may include any of these moments:
\code
select count(x), stddev(x), avg(x), var(x), variance(x), skew(x), kurt(x), kurtosis(x),
std(x), stddev_samp(x), stddev_pop(x), var_samp(x), var_pop(x)
from table
group by whatever
\endcode
<tt>var</tt> and <tt>variance</tt>; <tt>kurt</tt> and <tt>kurtosis</tt> do the same thing. Choose the one that sounds better to you. <tt>var</tt>, <tt>var_samp</tt>, <tt>stddev</tt> and <tt>stddev_samp</tt> give sample variance/standard deviation; <tt>variance</tt>, <tt>var_pop</tt> <tt>std</tt> and <tt>stddev_pop</tt> give population standard deviation. The plethora of variants are for mySQL compatibility.
\li The var/skew/kurtosis functions calculate sample moments, so if you want the population moment, multiply the result by (n-1)/n .
\li Also provided: wrapper functions for standard math library
functions---<tt>sqrt(x)</tt>, <tt>pow(x,y)</tt>, <tt>exp(x)</tt>, <tt>log(x)</tt>,
and trig functions. They call the standard math library function of the same name
to calculate \f$\sqrt{x}\f$, \f$x^y\f$, \f$e^x\f$, \f$\ln(x)\f$, \f$\sin(x)\f$,
\f$\arcsin(x)\f$, et cetera.
\li The <tt>ran()</tt> function calls <tt>gsl_rng_uniform</tt> to produce a uniform
draw between zero and one. It keeps its own <tt>gsl_rng</tt>, which is intialized on
first call using the value of <tt>apop_ots.rng_seed</tt> (which is then incremented,
so the next function to use it will get a different seed).
\code
select sqrt(x), pow(x,0.5), exp(x), log(x),
sin(x), cos(x), tan(x), asin(x), acos(x), atan(x)
from table
\endcode
Here is a test script using many of the above.
\include db_fns.c
Here is some more realistic sample code:
\include normalizations.c
*/
/** \cond doxy_ignore */
typedef struct StdDevCtx StdDevCtx;
struct StdDevCtx {
double avg; /* avg of terms */
......@@ -64,6 +19,7 @@ struct StdDevCtx {
double avg4; /* avg of the fourth-power of terms */
int cnt; /* Number of terms counted */
};
/** \endcond */
static void twoStep(sqlite3_context *context, int argc, sqlite3_value **argv){
if (argc<1) return;
......@@ -179,7 +135,8 @@ static void powFn(sqlite3_context *context, int argc, sqlite3_value **argv){
static void rngFn(sqlite3_context *context, int argc, sqlite3_value **argv){
Staticdef(gsl_rng *, rng, apop_rng_alloc(apop_opts.rng_seed++));
sqlite3_result_double(context, gsl_rng_uniform(rng));
//sqlite3_result_double(context, gsl_rng_uniform(rng));
sqlite3_result_double(context, gsl_rng_uniform(apop_rng_get_thread(-1)));
}
#define sqfn(name) static void name##Fn(sqlite3_context *context, int argc, sqlite3_value **argv){ \
......@@ -215,11 +172,13 @@ static int apop_sqlite_db_open(char const *filename){
return 0;
}
/** \cond doxy_ignore */
typedef struct { //for the apop_query_to_... functions.
int firstcall, namecol;
size_t currentrow;
apop_data *outdata;
} callback_t;
/** \endcond */
//This is the callback for apop_query_to_text.
static int db_to_chars(void *qinfo,int argc, char **argv, char **column){
......@@ -230,7 +189,7 @@ static int db_to_chars(void *qinfo,int argc, char **argv, char **column){
if (qi->firstcall){
qi->firstcall = 0;
for(int i=0; i<argc; i++)
if (!strcasecmp(column[i], apop_opts.db_name_column)){
if (apop_opts.db_name_column && !strcasecmp(column[i], apop_opts.db_name_column)){
qi->namecol = i;
break;
}
......@@ -243,7 +202,7 @@ static int db_to_chars(void *qinfo,int argc, char **argv, char **column){
apop_name_add(d->names, argv[jj], 'r');
ncshift ++;
} else {
apop_text_add(d, rows, jj-ncshift, (argv[jj]==NULL)? apop_opts.nan_string: argv[jj]);
apop_text_set(d, rows, jj-ncshift, (argv[jj]==NULL)? apop_opts.nan_string: argv[jj]);
//Asprintf(&(d->text[rows][jj-ncshift]), "%s", (argv[jj]==NULL)? "NaN": argv[jj]);
if(addnames)
apop_name_add(d->names, column[jj], 't');
......@@ -263,12 +222,14 @@ apop_data * apop_sqlite_query_to_text(char *query){
return qinfo.outdata;
}
/** \cond doxy_ignore */
typedef struct {
apop_data *d;
int intypes[5];//names, vectors, mcols, textcols, weights.
int current, thisrow, error_thrown;
const char *instring;
} apop_qt;
/** \endcond */
static void count_types(apop_qt *in, const char *intypes){
int i = 0;
......
......@@ -7,7 +7,7 @@
R project. Thanks, guys.
Un-R-ifying modifications Copyright (c) 2006--2009 by Ben Klemens.
Licensed under the modified GNU GPL v2; see COPYING and COPYING2.
Licensed under the GPLv2; see COPYING.
R version credits:
fexact.f -- translated by f2c (version 19971204).\\
......@@ -1865,7 +1865,6 @@ static double gammds(double *y, double *p, int *ifault) {
/** Convert from an \ref apop_data set to a table of integers.
Not too necessary, but I needed it for the Fisher exact test.
\ingroup conversions
*/
static int *apop_data_to_int_array(apop_data *intab){
int rowct = intab->matrix->size1,
......@@ -1883,9 +1882,11 @@ static int *apop_data_to_int_array(apop_data *intab){
"probability of table": Probability of the observed table for fixed marginal totals. <br>
"p value": Table p-value. The probability of a more extreme table,
where `extreme' is in a probabilistic sense.
\exception out->error=='p' Processing error in the test.
\li If there are processing errors, these values will be NaN.
\exception out->error=='p' Processing error in the test.
For example:
\include test_fisher.c
......
This diff is collapsed.
......@@ -62,7 +62,7 @@ void apop_gsl_error(char const *reason, char const *file, int line, int gsl_errn
#else
#define OMP_critical(tag)
#define OMP_for(...) for(__VA_ARGS__)
#define OMP_for_reduce(...) for(__VA_ARGS__)
#define OMP_for_reduce(red, ...) for(__VA_ARGS__)
#endif
#include "config.h"
......
This diff is collapsed.
......@@ -2,7 +2,7 @@
/** \file apop_linear_constraint.c
\c apop_linear_constraint finds a point that meets a set of linear constraints. This takes a lot of machinery, so it gets its own file.
Copyright (c) 2007, 2009 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2.
Copyright (c) 2007, 2009 by Ben Klemens. Licensed under the GPLv2; see COPYING.
*/
#include "apop_internal.h"
......@@ -91,7 +91,7 @@ static void get_candiate(gsl_vector *beta, apop_data *constraint, int current, g
/** This is designed to be called from within the constraint method of your \ref
apop_model. Just write the constraint vector+matrix and this will do the rest.
See the outline page for detailed discussion on setting contrasts.
See \ref constr for detailed discussion.
\param beta The proposed vector about to be tested. No default, must not be \c NULL.
......@@ -104,7 +104,7 @@ Allocate and fill the matrix representing these two constraints via:
apop_data *constr = apop_data_falloc((2,2,3), 3, 2, 4, 7,
0, 0, 1, 0);
\endcode
. Default: each elements is greater than zero. E.g., for three parameters:
. Default: each elements is greater than zero. For three parameters this would be equivalent to setting
\code
apop_data *constr = apop_data_falloc((3,3,3), 0, 1, 0, 0,
0, 0, 1, 0,
......@@ -113,13 +113,14 @@ apop_data *constr = apop_data_falloc((3,3,3), 0, 1, 0, 0,
\param margin If zero, then this is a >= constraint, otherwise I will return a point this amount within the borders. You could try \c GSL_DBL_EPSILON, which is the smallest value a \c double can hold, or something like 1e-3. Default = 0.
return The penalty = the distance between beta and the closest point that meets the constraints.
\return The penalty: the distance between beta and the closest point that meets the constraints.
If the constraint is met, the penalty is zero.
If the constraint is not met, this \c beta is shifted by \c margin (Euclidean distance) to meet the constraints.
\li If your \ref apop_data is not just a vector, try \ref apop_data_pack to pack it into a vector. This is what \ref apop_maximum_likelihood does.
\li This function uses the \ref designated syntax for inputs.
todo The apop_linear_constraint function doesn't check for odd cases like coplanar constraints.
\li If your \ref apop_data has more structure than a vector, try \ref apop_data_pack to pack it
into a vector. This is what \ref apop_maximum_likelihood does.
\li The function doesn't check for odd cases like coplanar constraints.
\li This function uses the \ref designated syntax for inputs.
*/
#ifdef APOP_NO_VARIADIC
long double apop_linear_constraint(gsl_vector *beta, apop_data * constraint, double margin){
......
This diff is collapsed.
This diff is collapsed.
/** \file apop_missing_data.c Some missing data handlers. */
/* Copyright (c) 2007, 2009 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2. */
/* Copyright (c) 2007, 2009 by Ben Klemens. Licensed under the GPLv2; see COPYING. */
#include "apop_internal.h"
#include <regex.h>
......@@ -8,14 +8,19 @@
/** If there is an NaN anywhere in the row of data (including the matrix, the vector, the weights, and the text) then delete the row from the data set.
\li If every row has an NaN, then this returns \c NULL.
\li If every row has a NaN, then this returns \c NULL.
\li If \c apop_opts.nan_string is not \c NULL, then I will make case-insensitive comparisons to the text elements to check for bad data as well.
\li If \c inplace = 'y', then I'll free each element of the input data
set and refill it with the pruned elements. I'll still take up (up to)
twice the size of the data set in memory during the function. If
every row has an NaN, then your \c apop_data set will end up with
\c NULL vector, matrix, .... if \c inplace = 'n', then the original data set is left unmolested.
every row has a NaN, then your \c apop_data set will end up with
\c NULL vector, matrix, .... if \c inplace = 'n', then the original data set is
left where it was, though internal elements may be moved.
\li I only look at the first page of data (i.e. the \c more element is ignored).
\li Listwise deletion is often not a statistically valid means of dealing with missing data.
It is typically better to impute the data (preferably multiple times). See \ref
apop_ml_impute for a less-invalid means, or <a href="https://github.com/rodri363/tea">Tea
for survey imputation</a> for heavy-duty survey editing and imputation.
\li This function uses the \ref designated syntax for inputs.
\param d The data, with NaNs
......@@ -24,6 +29,7 @@ you sent in and refill with the pruned data. If \c 'n', leave the
set alone and return a new data set. Default=\c 'n'.
\return A (potentially shorter) copy of the data set, without
NaNs. If <tt>inplace=='y'</tt>, a pointer to the input, which was shortened in place. If the entire data set is cleared out, then this will be \c NULL.
\see apop_data_rm_rows
*/
#ifdef APOP_NO_VARIADIC
apop_data * apop_data_listwise_delete(apop_data *d, char inplace){
......@@ -129,7 +135,7 @@ necessary data-parameter switching to make that happen.
\param mvn A parametrized \ref apop_model from which you expect the data was derived.
if \c NULL, then I'll use the Multivariate Normal that best fits the data after listwise deletion.
\return An estimated <tt>apop_ml_impute_model</tt>. Also, the data input will be filled in and ready to use.
\return An estimated \ref apop_model. Also, the data input will be filled in and ready to use.
*/
apop_model * apop_ml_impute(apop_data *d, apop_model* mvn){
if (!mvn){
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/** \file
Specifying model characteristics and details of estimation methods. */
/* Copyright (c) 2008--2009, 2011, 2013 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2. */
/* Copyright (c) 2008--2009, 2011, 2013 by Ben Klemens. Licensed under the GPLv2; see COPYING. */
#include "apop_internal.h"
static size_t get_settings_ct(apop_model *model){
......@@ -77,7 +77,7 @@ void * apop_settings_get_grp(apop_model *m, char *type, char fail){
}
/** Copy a settings group with the given name from the second model to
the first. (i.e., the arguments are in memcpy order).
the first (i.e., the arguments are in memcpy order).
You probably won't need this often---just use \ref apop_model_copy.
......
/** \file apop_smoothing.c A few smoothing-type functions, like moving averages. */
/* Copyright (c) 2007 by Ben Klemens. Licensed under the modified GNU GPL v2; see COPYING and COPYING2. */
#include "apop_internal.h"
/** Return a new vector that is the moving average of the input vector.
\param v The input vector, unsmoothed
\param bandwidth The number of elements to be smoothed.
*/
gsl_vector *apop_vector_moving_average(gsl_vector *v, size_t bandwidth){
Apop_stopif(!v, return NULL, 0, "You asked me to smooth a NULL vector; returning NULL.\n");
Apop_stopif(!bandwidth, return apop_vector_copy(v), 0, "Bandwidth must be >=1. Returning a copy of original vector with no smoothing.");
int halfspan = bandwidth/2;
gsl_vector *vout = gsl_vector_calloc(v->size - halfspan*2);
for(size_t i=0; i < vout->size; i ++){
double *item = gsl_vector_ptr(vout, i);
for (int j=-halfspan; j < halfspan+1; j ++)
*item += gsl_vector_get(v, j+ i+ halfspan);
*item /= halfspan*2 +1;
}
return vout;
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
<p><p>
<div class="tiny">Autogenerated by doxygen on $date.</div></body></html>
</body></html>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.