Skip to content
Commits on Source (6)
paml (4.9h+dfsg-1) unstable; urgency=medium
* New upstream version
* Point Vcs fields to salsa.debian.org
* Standards-Version: 4.1.4
-- Andreas Tille <tille@debian.org> Fri, 27 Apr 2018 07:20:08 +0200
paml (4.9g+dfsg-3) unstable; urgency=medium
* Really enable arch=all build
......
......@@ -6,9 +6,9 @@ Uploaders: Pjotr Prins <pjotr.debian@thebird.nl>,
Section: science
Priority: optional
Build-Depends: debhelper (>= 11~)
Standards-Version: 4.1.3
Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/paml.git
Vcs-Git: https://anonscm.debian.org/git/debian-med/paml.git
Standards-Version: 4.1.4
Vcs-Browser: https://salsa.debian.org/med-team/paml
Vcs-Git: https://salsa.debian.org/med-team/paml.git
Homepage: http://abacus.gene.ucl.ac.uk/software/paml.html
Package: paml
......
......@@ -8,6 +8,46 @@ discussion site
https://groups.google.com/forum/#!forum/pamlsoftware.
Version 4.9h, March 2018
(*) mcmctree: gamma-Dirichlet versus conditional i.i.d. priors for
rates for loci. Since 4.9d, the program and the documentation are
inconsistent about the two priors, and which value (0 or 1) means
which prior. I have now checked the program and the documentation to
make sure that they are consistent:
prior = 0: gamma-Dirichlet (dos Reis 2014). This is the default.
prior = 1: conditional i.i.d. prior (Zhu et al. 2015).
I believe these two are similar especially if the number of loci
(partitions) is large, but no serious comparisons between the two
priors have been published.
Thanks to Adnan Moussalli for pointing out the errors.
(*) codeml. It was discovered that the mechanistic amino acid
substitution model implemented in Yang et al. (1998; see table 3),
specified by seqtype = 2 model = 6, has been broken for a long time,
since version 3.0 (2000) at least. Version 2.0 (1999) seems to be
correct. This means that the model become broken soon since it was
published. I have now fixed this.
This model of amino acid substitution starts from a Markov chain for
codons and then aggregate the states and merge the synonymous codons
into one state (the coded amino acid). This is an approximate
formulation since the process after state aggregation is not Markovian
anymore.
I have now added another codon-based amino acid substitution model
that treats amino acids as ambiguities codons. The model is specified
by seqtype = 2 model = 5. This is an exact formulation.
(*) codeml. The number of categories in the BEB calculation under M2
and M8 is unintentionally set to 4 rather than 10. I have changed
this back to 10. The details of this calculation are in Yang et
al. 2005 MBE.
Version 4.9g, December 2017
......@@ -16,7 +56,7 @@ Version 4.9g, December 2017
messages like "strange: f[ 5] = -0.0587063 very small." This bug was
introduced in version 4.9b and affects versions 4.9b-f. A different
bug was introduced in version 4.9f that causes the log likelihood
function under the site model M8 (NSsites = 8) to calculated
function under the site model M8 (NSsites = 8) to be calculated
incorrectly. These are now fixed.
......
codeml results for fitting codon-based amino acid models.
the data are 12 mt proteins from 7 apes, the small dataset used in yang et al. 1998, table 2 & 3.
* seqfile = mtCDNApri.nuc * sequence data filename
seqfile = mtCDNApri.aa * sequence data filename
treefile = mtCDNApri.trees * tree file name
(((human, (chimpanzee, bonobo)), gorilla), (orangutan, Sumatran), gibbon);
********************
CODONML
Fequal (seqtype = 1 model = 0 CodonFreq = 0)
TREE # 1: (((1, (2, 3)), 4), (5, 6), 7); MP score: 4299
lnL(ntime: 11 np: 13): -31744.953377 +0.000000
8..9 9..10 10..1 10..11 11..2 11..3 9..4 8..12 12..5 12..6 8..7
0.120080 0.058139 0.175107 0.098210 0.072451 0.060123 0.206933 0.224103 0.115862 0.114939 0.408898 7.417478 0.112776
SEs for parameters:
0.009345 0.006777 0.009287 0.007425 0.005568 0.005205 0.010416 0.011610 0.008006 0.007893 0.015467 0.310582 0.003409
Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).
tree length = 1.654845
(((1: 0.175107, (2: 0.072451, 3: 0.060123): 0.098210): 0.058139, 4: 0.206933): 0.120080, (5: 0.115862, 6: 0.114939): 0.224103, 7: 0.408898);
(((human: 0.175107, (chimpanzee: 0.072451, bonobo: 0.060123): 0.098210): 0.058139, gorilla: 0.206933): 0.120080, (orangutan: 0.115862, Sumatran: 0.114939): 0.224103, gibbon: 0.408898);
kappa (ts/tv) = 7.41748
omega (dN/dS) = 0.11278
********************
CODONML
F3x4 (seqtype = 1 model = 0 CodonFreq = 2)
TREE # 1: (((1, (2, 3)), 4), (5, 6), 7); MP score: 4299
check convergence..
lnL(ntime: 11 np: 13): -29967.856102 +0.000000
8..9 9..10 10..1 10..11 11..2 11..3 9..4 8..12 12..5 12..6 8..7
0.261697 0.100524 0.242894 0.120349 0.078171 0.066639 0.286667 0.506947 0.158670 0.142655 0.829442 14.249448 0.041084
SEs for parameters:
0.023672 0.013801 0.014988 0.011874 0.006748 0.006435 0.018655 0.032072 0.013309 0.012734 0.043077 0.638686 0.001380
Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).
tree length = 2.794654
(((1: 0.242894, (2: 0.078171, 3: 0.066639): 0.120349): 0.100524, 4: 0.286667): 0.261697, (5: 0.158670, 6: 0.142655): 0.506947, 7: 0.829442);
(((human: 0.242894, (chimpanzee: 0.078171, bonobo: 0.066639): 0.120349): 0.100524, gorilla: 0.286667): 0.261697, (orangutan: 0.158670, Sumatran: 0.142655): 0.506947, gibbon: 0.829442);
kappa (ts/tv) = 14.24945
omega (dN/dS) = 0.04108
******************************************************
AAML JTT
(seqtype = 2 model = 3)
TREE # 1: (((1, (2, 3)), 4), (5, 6), 7); MP score: 895
lnL(ntime: 11 np: 11): -14717.981418 +0.000000
8..9 9..10 10..1 10..11 11..2 11..3 9..4 8..12 12..5 12..6 8..7
0.025579 0.009654 0.022079 0.012328 0.011335 0.010281 0.026980 0.052187 0.026525 0.019651 0.062350
SEs for parameters:
0.003197 0.002018 0.002777 0.002097 0.001932 0.001865 0.003193 0.004412 0.003071 0.002679 0.004757
tree length = 0.278951
(((1: 0.022079, (2: 0.011335, 3: 0.010281): 0.012328): 0.009654, 4: 0.026980): 0.025579, (5: 0.026525, 6: 0.019651): 0.052187, 7: 0.062350);
(((human: 0.022079, (chimpanzee: 0.011335, bonobo: 0.010281): 0.012328): 0.009654, gorilla: 0.026980): 0.025579, (orangutan: 0.026525, Sumatran: 0.019651): 0.052187, gibbon: 0.062350);
*********************
AA-CODONML, Old model FromCodon
(seqtype = 2 model = 6)
TREE # 1: (((1, (2, 3)), 4), (5, 6), 7); MP score: 895
lnL(ntime: 11 np: 12): -14718.224885 +0.000000
8..9 9..10 10..1 10..11 11..2 11..3 9..4 8..12 12..5 12..6 8..7
0.027189 0.010346 0.023115 0.012595 0.011879 0.010512 0.028689 0.056919 0.026419 0.021490 0.070764 9.156815
SEs for parameters:
0.003186 0.002070 0.002831 0.002126 0.001994 0.001832 0.003110 0.004354 0.003100 0.002754 0.004703 0.710971
tree length = 0.299918
(((1: 0.023115, (2: 0.011879, 3: 0.010512): 0.012595): 0.010346, 4: 0.028689): 0.027189, (5: 0.026419, 6: 0.021490): 0.056919, 7: 0.070764);
(((human: 0.023115, (chimpanzee: 0.011879, bonobo: 0.010512): 0.012595): 0.010346, gorilla: 0.028689): 0.027189, (orangutan: 0.026419, Sumatran: 0.021490): 0.056919, gibbon: 0.070764);
kappa (ts/tv) = 9.15682
*********************
AA-CODONML, new model FromCodon0
(seqtype = 2 model = 5)
TREE # 1: (((1, (2, 3)), 4), (5, 6), 7); MP score: -1
lnL(ntime: 11 np: 13): -14707.663779 +0.000000
8..9 9..10 10..1 10..11 11..2 11..3 9..4 8..12 12..5 12..6 8..7
0.636398 0.245832 0.527920 0.290247 0.270174 0.240427 0.658706 1.320827 0.602244 0.493482 1.621211 9.246897 0.031208
SEs for parameters:
0.321763 0.132591 0.271862 0.153691 0.142901 0.127706 0.332545 0.664754 0.316022 0.256229 0.821216 0.716647 0.016303
Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).
tree length = 6.907467
(((1: 0.527920, (2: 0.270174, 3: 0.240427): 0.290247): 0.245832, 4: 0.658706): 0.636398, (5: 0.602244, 6: 0.493482): 1.320827, 7: 1.621211);
(((human: 0.527920, (chimpanzee: 0.270174, bonobo: 0.240427): 0.290247): 0.245832, gorilla: 0.658706): 0.636398, (orangutan: 0.602244, Sumatran: 0.493482): 1.320827, gibbon: 1.621211);
kappa (ts/tv) = 9.24690
omega (dN/dS) = 0.03121
* seqfile = mtCDNApri.nuc * sequence data filename
seqfile = mtCDNApri.aa * sequence data filename
treefile = mtCDNApri.trees * tree file name
outfile = mlc * main result file name
noisy = 3 * 0,1,2,3,9: how much rubbish on the screen
verbose = 1 * 0: concise; 1: detailed, 2: too much
runmode = 0 * 0: user tree; 1: semi-automatic; 2: automatic
* 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise
seqtype = 2 * 1:codons; 2:AAs; 3:codons-->AAs
CodonFreq = 0 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
* 4:F1x4MG, 5:F3x4MG, 6:FMutSel0, 7:FMutSel
* estFreq = 0
* hkyREV = 0 * 0: HKY-like; 1: GTR(REV)-like
* ndata = 1 * number of data sets or loci
* bootstrap = 0 * generate bootstrap data sets
model = 5
* models for codons:
* 0:one, 1:b, 2:2 or more dN/dS ratios for branches, 6:FromCodon
* models for AAs or codon-translated AAs:
* 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
* 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)
NSsites = 0 * 23 24 25 26 * 23 24 25 26 * 0:one w; 1:NearlyNeutral; 2:PositiveSelection; 3:discrete;
* 4:freqs; 5:gamma; 6:2gamma; 7:beta; 8:beta&w+; 9:beta&gamma;
* 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
* 13:3normal>0;
* 22:M2a_Old(M2a_rel);
* 23:Tgamma; 24:Tinvgamma; 25:Tgamma+1; 26:Tinvgamma+1.
clock = 0 * 0:no clock, 1:global clock; 2:local clock
aaDist = 0 * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
aaRatefile = jones.dat * for aa seqs under model = 3 (empirical+F)
* dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own
icode = 1 * 0:universal code; 1:mammalian mt; 2-10:see below
Mgene = 0
* codon: 0:rates, 1:separate; 2:diff pi, 3:diff kappa, 4:all diff
* AA: 0:rates, 1:separate
* NShmm = 0 * 1: hidden Markov model
fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated
kappa = 3 * initial or fixed kappa
fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate
omega = 1.5 * initial or fIf yoixed omega, for codons or codon-based AAs
fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha
alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
Malpha = 0 * different alphas for genes
ncatG = 10 * # of categories in dG of NSsites models
getSE = 1 * 0: don't want them, 1: want S.E.s of estimates
RateAncestor = 1 * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)
Small_Diff = 1e-8
* cleandata = 1 * remove sites with ambiguity data (1:yes, 0:no)?
* fix_blength = 1 * 0: ignore, -1: random, 1: initial, 2: fixed
* method = 0 * Optimization method 0: simultaneous; 1: one branch a time
* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,
* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,
* 10: blepharisma nu., 11: Yang's regularized code
* These codes correspond to transl_table 1 to 11 of GenBank.
7 1
(((human, (chimpanzee, bonobo)), gorilla), (orangutan, Sumatran), gibbon);
// end of file
(((human: 0.254974, (chimpanzee: 0.078779, bonobo: 0.067071): 0.120425): 0.104179, gorilla: 0.302845): 0.289306, (orangutan: 0.168115, Sumatran: 0.141455): 0.573030, gibbon: 0.930806); [unrooted tree, M0 F3x4 branch lengths]
......@@ -83,7 +83,7 @@ struct TREEN {
} *nodes, **gnodes, nodes_t[2 * NS - 1];
/* for sptree.nodes[].fossil: lower, upper, bounds, gamma, inverse-gamma */
/* for stree.nodes[].fossil: lower, upper, bounds, gamma, inverse-gamma */
enum { LOWER_F = 1, UPPER_F, BOUND_F } FOSSIL_FLAGS;
char *fossils[] = { " ", "L", "U", "B" };
......@@ -95,7 +95,7 @@ struct SPECIESTREE {
double age, pfossil[7]; /* lower and upper bounds or alpha & beta */
double *lnrates; /* log rates for loci */
} nodes[2 * NS - 1];
} sptree;
} stree;
/* all trees are binary & rooted, with ancestors unknown. */
struct DATA { /* locus-specific data and tree information */
......@@ -203,7 +203,7 @@ int main(int argc, char *argv[])
error2("oom blengths0");
}
SetMapAmbiguity();
SetMapAmbiguity(com.seqtype, 0);
/* AllPatterns(fout); */
......
This diff is collapsed.
This diff is collapsed.
......@@ -4,7 +4,6 @@
#if (!defined PAML_H)
#define PAML_H
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
......@@ -30,7 +29,7 @@
int ReadSeq (FILE *fout, FILE *fseq, int cleandata, int locus);
int ScanFastaFile(FILE *f, int *ns, int *ls, int *aligned);
void EncodeSeqs (void);
void SetMapAmbiguity(void);
void SetMapAmbiguity(int seqtype, int ModelAA2Codon);
void ReadPatternFreq (FILE* fout, char* fpattf);
int Initialize (FILE *fout);
int MoveCodonSeq (int ns, int ls, char *z[]);
......@@ -242,6 +241,8 @@ int correl (double x[], double y[], int n, double *mx, double *my,
int comparefloat (const void *a, const void *b);
int comparedouble (const void *a, const void *b);
double Eff_IntegratedCorrelationTime (double x[], int n, double *mx, double *varx, double *rho1);
double Eff_IntegratedCorrelationTime2(double x[], int n, int nbatch, double *mx, double *varx);
int HPDinterval(double x[], int n, double HPD[2], double alpha);
int DescriptiveStatistics(FILE *fout, char infile[], int nbin, int propternary, int SkipColumns);
int DescriptiveStatisticsSimple (FILE *fout, char infile[], int SkipColumns);
......@@ -397,6 +398,6 @@ enum {PrBranch=1, PrNodeNum=2, PrLabel=4, PrNodeStr=8, PrAge=16, PrOmega=32} Out
#define FullSeqNames 0 /* 1: numbers at the beginning of sequence name are part of name */
#define pamlVerStr "paml version 4.9f, October 2017"
#define pamlVerStr "paml version 4.9h, March 2018"
#endif
......@@ -84,7 +84,7 @@ int main (int argc, char *argv[])
if ((fout=fopen (com.outf, "w"))==NULL) error2("outfile creation err.");
if((fseq=fopen (com.seqf,"r"))==NULL) error2("No sequence file!");
ReadSeq (NULL, fseq, com.cleandata, 0);
SetMapAmbiguity();
SetMapAmbiguity(com.seqtype, 0);
i=(com.ns*2-1)*sizeof(struct TREEN);
if((nodes=(struct TREEN*)malloc(i))==NULL) error2("oom");
......
......@@ -14,8 +14,8 @@
char BASEs[] = "TCAGUYRMKSWHBVD-N?";
char *EquateBASE[] = { "T","C","A","G", "T", "TC","AG","CA","TG","CG","TA",
"TCA","TCG","CAG","TAG", "TCAG","TCAG","TCAG" };
char CODONs[256][4], AAs[] = "ARNDCQEGHILKMFPSTWYV-*?X";
char CODONs[256][4];
char AAs[] = "ARNDCQEGHILKMFPSTWYV-*?X";
char nChara[256], CharaMap[256][64];
char AA3Str[] = { "AlaArgAsnAspCysGlnGluGlyHisIleLeuLysMetPheProSerThrTrpTyrVal***" };
char BINs[] = "TC";
......@@ -1347,7 +1347,7 @@ void bigexp(double lnx, double *a, double *b)
*a = pow(10, z - (*b));
}
unsigned int z_rndu = 1237, w_rndu = 1237;
unsigned int z_rndu = 666, w_rndu = 1237;
void SetSeed(int seed, int PrintSeed)
{
......@@ -5571,7 +5571,7 @@ double Eff_IntegratedCorrelationTime(double x[], int n, double *mx, double *varx
Note that this destroys x[].
*/
double Tint = 1, rho0 = 0, rho, m = 0, s = 0;
int i, ir;
int i, ir, minNr = 10;
/* if(n<1000) puts("chain too short for calculating Eff? "); */
for (i = 0; i < n; i++) m += x[i];
......@@ -5585,12 +5585,12 @@ double Eff_IntegratedCorrelationTime(double x[], int n, double *mx, double *varx
if (s / (fabs(m) + 1) < 1E-9)
Tint = n;
else {
for (ir = 1; ir < n - 10; ir++) {
for (ir = 1; ir < n - minNr; ir++) {
for (i = 0, rho = 0; i < n - ir; i++)
rho += x[i] * x[i + ir];
rho /= (n - 1.0);
if (ir == 1) *rho1 = rho;
if (ir > 10 && rho + rho0 < 0) break;
if (ir > minNr && rho + rho0 < 0) break;
Tint += rho * 2;
rho0 = rho;
}
......@@ -5599,26 +5599,36 @@ double Eff_IntegratedCorrelationTime(double x[], int n, double *mx, double *varx
}
double Eff_IntegratedCorrelationTime2(double x[], int n, int nbatch, double mx)
double Eff_IntegratedCorrelationTime2 (double x[], int n, int nbatch, double *mx, double *varx)
{
/* This calculates Eff, by using batch means. Tau, the integrated correlation time, is 1/Eff.
mx is input.
This is found to be unreliable.
*/
int lb, i, j;
double mxb, vx = 0, E = 0;
int lenbatch, i, j;
double mxb, vx = 0, E = 0, m = 0, s = 0;
/* if(n<1000) puts("chain too short for calculating Eff? "); */
/* normalize the x vector for numerical stability */
for (i = 0; i < n; i++) m += x[i];
m /= n;
for (i = 0; i < n; i++) x[i] -= m;
for (i = 0; i < n; i++) s += x[i] * x[i];
s = sqrt(s / (n - 1.0));
for (i = 0; i < n; i++) x[i] /= s;
*mx = m; *varx = s*s;
if (n < 1000) puts("chain too short for calculating Eff? ");
lb = n / nbatch;
lenbatch = n / nbatch;
for (i = 0; i < nbatch; i++) {
for (j = 0, mxb = 0; j < lb; j++) {
mxb += x[i*lb + j];
vx += square(x[i*lb + j] - mx);
for (j = 0, mxb = 0; j < lenbatch; j++) {
mxb += x[i*lenbatch + j];
vx += square(x[i*lenbatch + j]);
}
mxb /= lb;
E += square(mxb - mx) / nbatch;
mxb /= lenbatch;
E += mxb*mxb / nbatch;
}
E = (vx / n) / (E*lb);
E = (vx / n) / (E*lenbatch);
return(E);
}
......
This diff is collapsed.
......@@ -110,7 +110,7 @@ int main(int argc, char *argv[])
}
ReadSeq((com.verbose?fout:NULL), fseq, com.cleandata, 0);
SetMapAmbiguity();
SetMapAmbiguity(com.seqtype, 0);
sspace = max2(200000,64*com.ns*sizeof(double));
sspace = max2(sspace,64*64*5*sizeof(double));
......