Skip to content
Commits on Source (3)
bali-phy (3.1.4+dfsg-1) UNRELEASED; urgency=medium
* New upstream version 3.1.4+dfsg
-- Benjamin Redelings <benjamin.redelings@gmail.com> Mon, 14 May 2018 08:40:55 -0400
bali-phy (3.1.2+dfsg-1) unstable; urgency=medium
[ Dylan Aïssi ]
......
all: README.html README.xhtml README.pdf Tutorial.html Tutorial.xhtml Tutorial.pdf Tutorial2.html Tutorial2.xhtml Tutorial2.pdf Tutorial3.html Tutorial3.xhtml Tutorial3.pdf
all: README.html README.xhtml README.pdf Tutorial.html Tutorial.xhtml Tutorial.pdf Tutorial2.html Tutorial2.xhtml Tutorial2.pdf Tutorial3.html Tutorial3.xhtml Tutorial3.pdf Tutorial4.html Tutorial4.xhtml Tutorial4.pdf
clean:
-@rm -f *.html *.xhtml REAMDE.xml *.fo *.pdf *~
......
This diff is collapsed.
......@@ -979,52 +979,41 @@ scale = 1,3:</programlisting>
<section><info><title>Summarizing the output</title></info>
<para>This section is primarily oriented to extracting estimates from output files. See <xref linkend="mixing_and_convergence"/> for methods of determine effective sample sizes, and for checking mixing and convergence.</para>
<para>This section is primarily about extracting estimates from output files. See <xref linkend="mixing_and_convergence"/> for methods of determine effective sample sizes, and for checking mixing and convergence.</para>
<section><info><title>Finding the consensus tree (<filename>C1.trees</filename>)</title></info>
<section><info><title>Finding the majority consensus tree</title></info>
<para>
To compute the majority consensus tree, do the following. (The
program <link xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:href="http://tree.bio.ed.ac.uk/software/figtree/">FigTree</link>
allows you to view the resulting tree file graphically.)
<screen><prompt>%</prompt> trees-consensus C1.trees &gt; <filename>c50.PP.tree</filename></screen>
You can (and should) pool results from different MCMC runs by adding
multiple tree sample files on the command line. The different MCMC
runs should have the same input files and parameters.
<screen><prompt>%</prompt> trees-consensus <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees &gt; <filename>c50.PP.tree</filename></screen>
By default, the first 10% of tree samples are skipped as burn-in. You
can specify the number of samples (e.g. 1000) to skip by adding the
options <userinput>-s1000</userinput> or
<userinput>--skip=1000</userinput>. You may also specify a percentage
of all samples:
<screen><prompt>%</prompt> trees-consensus -s20% C1.trees &gt; <filename>c50.PP.tree</filename></screen>
To discard some samples, keeping (say) every 10th sample, you may add
the options <userinput>-x10</userinput> or
<userinput>--subsample=10</userinput>. This can make the program a
lot faster, at the possible expense of some loss in accuracy.
<screen><prompt>%</prompt> trees-consensus -s20% -x10 C1.trees &gt; <filename>c50.PP.tree</filename></screen>
</para>
<para>By default, the first 10% of tree samples are skipped as burn-in (<userinput>--skip=10%</userinput> or <userinput>-s 10%</userinput>) and every generation is analyzed (<userinput>--subsample=1</userinput> or <userinput>-x 1</userinput>). To discard the first 1000 tree samples and analyze every 10th sample:
<screen><prompt>%</prompt> trees-consensus -s 1000 -x 10 <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees &gt; <filename>c50.PP.tree</filename></screen>
By default, splits are included in the consensus tree if they have a
PP greater than 0.5. You can specify a more stringent level
(e.g. 0.66) by adding the option
<userinput>--consensus-PP=0.66</userinput> as follows:
<screen><prompt>%</prompt> trees-consensus -s20% -x10 --consensus-PP=0.66 C1.trees &gt; <filename>c66.PP.tree</filename></screen>
<screen><prompt>%</prompt> trees-consensus -s20% -x10 --consensus-PP=0.66 <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees &gt; <filename>c66.PP.tree</filename></screen>
You may also make the program write directly to the output file
(e.g. <filename>c66.PP.tree</filename>) by using the more general form
<userinput>--consensus-PP=0.66:c66.PP.tree</userinput>. Leaving off
the "<userinput>:c66.PP.tree</userinput>" part (as we did above) or specifying
"<userinput>:-</userinput>" sends the output to the standard output
(e.g. the terminal, if not redirected).
<screen><prompt>%</prompt> trees-consensus -s20% -x10 C1.trees --consensus-PP=0.66:<filename>c66.PP.tree</filename></screen>
<screen><prompt>%</prompt> trees-consensus -s20% -x10 <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees --consensus-PP=0.66:<filename>c66.PP.tree</filename></screen>
You can supply multiple levels and filenames separated by commas.
This is faster than running the program multiple times with different
consensus levels.
<screen><prompt>%</prompt> trees-consensus -s20% -x10 C1.trees --consensus-PP=0.5:<filename>c50.PP.tree</filename>,0.66:<filename>c66.PP.tree</filename></screen>
<screen><prompt>%</prompt> trees-consensus -s20% -x10 <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees --consensus-PP=0.5:<filename>c50.PP.tree</filename>,0.66:<filename>c66.PP.tree</filename></screen>
Finally, you may use the option <userinput>--consensus=</userinput>
instead of the option <userinput>--consensus-PP=</userinput> if you do
not wish the resulting tree to contain embedded posterior
probabilities on branches, as well as branch lengths.
<screen><prompt>%</prompt> trees-consensus -s20% -x10 C1.trees --consensus=0.5:<filename>c50.PP.tree</filename>,0.66:<filename>c66.PP.tree</filename></screen>
<screen><prompt>%</prompt> trees-consensus -s20% -x10 <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees --consensus=0.5:<filename>c50.PP.tree</filename>,0.66:<filename>c66.PP.tree</filename></screen>
Both the <userinput>--consensus=</userinput> and
<userinput>--consensus-PP=</userinput> options may be given simultaneously.
</para>
......@@ -1032,20 +1021,30 @@ Both the <userinput>--consensus=</userinput> and
<para>
See <userinput>trees-consensus --help</userinput> for a complete list of options.
</para>
</section>
<section><info><title>Finding the M.A.P. topology (<filename>C1.trees</filename>)</title></info>
<section><info><title>Finding the greedy consensus tree</title></info>
<para>
To compute the <emphasis>maximum a posteriori</emphasis> tree topology do:
<screen><prompt>%</prompt> trees-consensus --skip=<replaceable>burnin</replaceable> C1.trees --map-tree=<filename>MAP.tree</filename></screen>
The MAP topology may be used instead of a consensus tree when a fully resolved (e.g. bifurcating) tree is required. However, when the topology has many tips, each topology may be sampled only once, leading to low quality estimates of the MAP topology.
The greedy consensus tree may be used instead of a majority-consensus tree when a fully resolved (e.g. bifurcating) tree is required. When the topology has many tips and each topology may be sampled only once, the greedy consensus should be higher quality than the estimate of the MAP topology. To obtained a fully resolved tree, the greedy consensus strategy starts with the majority consensus and then adds the highest-supported split that does not conflict.</para>
<para>To compute the <emphasis>greedy consensus</emphasis> tree do:
<screen><prompt>%</prompt> trees-consensus --skip=<replaceable>burnin</replaceable> <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees --greedy-consensus=<filename>greedy.tree</filename></screen>
</para>
<para>The program <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://tree.bio.ed.ac.uk/software/figtree/">FigTree</link> allows you to view the consensus tree graphically.
</section>
<section><info><title>Finding the M.A.P. tree</title></info>
<para>
To compute the <emphasis>maximum a posteriori</emphasis> tree do:
<screen><prompt>%</prompt> trees-consensus --skip=<replaceable>burnin</replaceable> <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees --map-tree=<filename>MAP.tree</filename></screen>
When the tree has many tips, each topology may be sampled only once, leading to low quality estimates of the MAP topology. As a result, when you need a bifurcating tree you should probably use the greedy consensus instead.
</para>
</section>
<section><info><title>Checking topology convergence (<filename>C1.trees</filename>)</title></info>
<section><info><title>Checking topology convergence</title></info>
<para>
<screen><prompt>%</prompt> trees-bootstrap <replaceable>dir-1</replaceable>/C1.trees <replaceable>dir-2</replaceable>/C1.trees</screen>
......@@ -1055,20 +1054,18 @@ This command computes the effective sample size for the posterior probability of
</para>
</section>
<section><info><title>Summarizing numerical parameters (<filename>C1.log</filename>)</title></info>
<section><info><title>Summarizing numerical parameters</title></info>
<para>
This command gives a median and confidence interval, ESS, and a stabilization time:
<screen><prompt>%</prompt> statreport C1.log &gt; Report </screen>
This command compares multiple runs to give PSRF and joint ESS values as well:
<screen><prompt>%</prompt> statreport <replaceable>dir-1</replaceable>/C1.log <replaceable>dir-2</replaceable>/C1.log &gt; Report </screen>
The program <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://tree.bio.ed.ac.uk/software/tracer/">Tracer</link> allows you to view the same summaries graphically.</para>
When multiple runs are analyzed, this command gives PSRF and joint ESS values. The program <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://tree.bio.ed.ac.uk/software/tracer/">Tracer</link> allows you to view the same summaries graphically.</para>
<para>See <xref linkend="mixing_and_convergence"/> for more information.
</para>
</section>
<section><info><title>Computing an alignment using Posterior Decoding (<filename>C1.P<replaceable>p</replaceable>.fastas</filename>)</title></info>
<section><info><title>Computing an alignment using Posterior Decoding</title></info>
<para>
<screen><prompt>%</prompt> cut-range --skip=<replaceable>burn-in</replaceable> &lt; C1.P<replaceable>p</replaceable>.fastas | alignment-max &gt; P<replaceable>p</replaceable>-max.fasta</screen>
......@@ -1076,18 +1073,19 @@ You can use the program <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:h
</para>
</section>
<section><info><title>Find the alignment from the maximum a posterior (MAP) point (<filename>C1.MAP</filename>)</title></info>
<!-- section><info><title>Find the alignment from the maximum a posterior (MAP) point (<filename>C1.MAP</filename>)</title></info>
<screen><prompt>%</prompt> alignment-find &lt; C1.MAP &gt; P1-MAP.fasta</screen>
</section>
This only works correctly on single-partition analyses.
</section -->
<section><info><title>Create an Au (Alignment Uncertainty) plot (<filename>C1.P<replaceable>p</replaceable>.fastas</filename>)</title></info>
<section><info><title>Create an Au (Alignment Uncertainty) plot</title></info>
<para>To annotate a specific alignment <replaceable>alignment</replaceable>.fasta, choose a fully resolved tree estimate <replaceable>tree</replaceable>:
<screen><prompt>%</prompt> cut-range --skip=<replaceable>burn-in</replaceable> &lt; C1.P<replaceable>p</replaceable>.fastas | alignment-gild <replaceable>alignment</replaceable>.fasta <replaceable>tree</replaceable> &gt; <replaceable>alignment</replaceable>-AU.prob
<prompt>%</prompt> alignment-draw <replaceable>alignment</replaceable>.fasta --AU <replaceable>alignment</replaceable>-AU.prob &gt; <replaceable>alignment</replaceable>-AU.html</screen>
The majority consensus tree is usually not fully resolved, so we recommend using the MAP topology instead.
The majority consensus tree is usually not fully resolved, so we recommend using the greedy consensus instead.
</para>
</section>
......@@ -1160,21 +1158,14 @@ The majority consensus tree is usually not fully resolved, so we recommend using
</para>
<variablelist>
<varlistentry><term>MAP.fasta</term><listitem>
<para>An estimate of the MAP alignment.
</para></listitem></varlistentry>
<varlistentry><term>P<replaceable>p</replaceable>-max.fasta</term><listitem>
<para>An estimate of the alignment for partition
<replaceable>p</replaceable> using maximum posterior decoding.</para>
</listitem></varlistentry>
<varlistentry><term>MAP-AU.html</term><listitem><para>An AU plot of the MAP alignment (AA/DNA color-cheme).
</para></listitem></varlistentry>
<varlistentry><term>P<replaceable>p</replaceable>-max-AU.html</term><listitem>
<para>An AU plot of the maximum posterior decoding alignment for partition
<replaceable>p</replaceable> (AA/DNA color-cheme).</para>
<replaceable>p</replaceable> (AA/DNA color-scheme).</para>
</listitem></varlistentry>
<!-- varlistentry><term>consensus.fasta</term><listitem>
......@@ -1385,8 +1376,7 @@ add[1.0,2.0] // arguments are a=Double, so result is of type Double</programl
</para>
<para>The basic CTMC models are equ, hky85, tn93, gtr, hky85_sym+x3,
tn93_sym+x3, gtr_sym+x3, jtt, wag, lg08, and yn94. Each of these models is a
way of specifying the exchangeability matrix $S_{ij}$.</para>
tn93_sym+x3, gtr_sym+x3, jtt, wag, lg08, and yn94.</para>
<table frame="none" rowsep="1"><info><title>Substitution Models</title></info>
......@@ -1540,7 +1530,7 @@ add[1.0,2.0] // arguments are a=Double, so result is of type Double</programl
<tbody>
<row>
<entry>
<para><userinput>F</userinput></para>
<para><userinput>f</userinput></para>
<para>Whelan and Goldman (2001)</para>
</entry>
<entry>any</entry>
......@@ -1971,7 +1961,7 @@ The NHX attribute must be applied to the branch, not the node. Therefore it mus
</para>
<para>The posterior probability of positive selection is the posterior mean of the posSelection parameter. This may be computed using the statreport program with the <userinput>--mean</userinput> option.
</para>
<para>In case this probability is extremely close to 1 or 0, you may wish to add the option <userinput>--Rao-Blackwellize BranchSiteTest:posSelection</userinput>. This will report the log-probability of positive selection each iteration. The user may exponentiate the reported values and then average them (using R, for example) in order to compute a more accurate estimate of the posterior probability of positive selection.
<para>In case this probability is extremely close to 1 or 0, you may wish to add the option <userinput>--Rao-Blackwellize branch-site:posSelection</userinput>. This will report the log-probability of positive selection each iteration. The user may exponentiate the reported values and then average them (using R, for example) in order to compute a more accurate estimate of the posterior probability of positive selection.
</para>
</section>
<section><info><title>Substitution model examples</title></info>
......@@ -2027,12 +2017,12 @@ See section <xref linkend="functions"/> for the general syntax. rs07 is the de
<para>Redelings and Suchard (2005)</para>
</entry>
<entry>
<para>$\delta$: the gap-opening probability</para>
<para>$\epsilon$: the gap-extension probability</para>
<para><userinput>logDelta</userinput>: the gap-opening probability</para>
<para><userinput>mean_length</userinput>: the gap-extension probability</para>
</entry>
<entry>
<para>Gap lengths are geometrically distributed with extension probability $\epsilon$.</para>
<para>This indel model is independent of the branch length connecting the ancestor and descent sequences.</para>
<para>Gap lengths are geometrically distributed.</para>
<para>Longer branches <emphasis>do not</emphasis> have more indels.</para>
</entry>
</row>
......@@ -2042,12 +2032,12 @@ See section <xref linkend="functions"/> for the general syntax. rs07 is the de
<para>Redelings and Suchard (2007)</para>
</entry>
<entry>
<para>$\lambda$: the insertion and deletion rate</para>
<para>$\epsilon$: the gap-extension probability</para>
<para><userinput>log_rate</userinput>: The log of the insertion and deletion rate</para>
<para><userinput>mean_length</userinput>: the mean of the indel length</para>
</entry>
<entry>
<para>Gap lengths are geometrically distributed with extension probability $\epsilon$.</para>
<para>The probability of an indel event depends on the branch length in this model.</para>
<para>Gap lengths are geometrically distributed.</para>
<para>Longer branches <emphasis>do</emphasis> have more indels.</para>
</entry>
</row>
......@@ -2083,7 +2073,7 @@ See section <xref linkend="functions"/> for the general syntax. rs07 is the de
diagnose when more samples must be taken.
</para>
<para>Some of the better diagnostics for lack of convergence rely on running at least 4 independent copies of the Markov chain (preferably 10) from different random starting points to see if the sampled posterior distributions for each chain are the same. Unfortunately, when the distributions all seem to be this same, this doesn't <emphasis>prove</emphasis> that they have all converged to the equilibrium distribution. However, if the distributions are different then you can reject either convergence or good mixing.</para>
<para>Some of the better diagnostics for lack of convergence rely on running at least 2 independent copies of the Markov chain (preferably 4-10) from different random starting points to see if the sampled posterior distributions for each chain are the same. Unfortunately, when the distributions all seem to be this same, this doesn't <emphasis>prove</emphasis> that they have all converged to the equilibrium distribution. However, if the distributions are different then you can reject either convergence or good mixing.</para>
<section><info><title>Definition of Convergence</title></info>
......@@ -2650,7 +2640,7 @@ your <command>bali-phy</command> runs on that computer.
<qandaentry>
<question><para>Why does <command>bp-analyze</command> say "Program 'gnuplot' not found. Trace plots will not be generated"?</para></question>
<answer>
<para>This is because you have not installed <application>gnuplot</application>. This is not a fatal error message, it just means that pictures of traces, partition support, and SRQ plots will not be generated automatically. You can still view MCMC traces with <application>Tracer</application>.</para>
<para>This is because you have not installed <application>gnuplot</application>. This is not a fatal error message, it just means that pictures of partition support, and SRQ plots will not be generated automatically.</para>
</answer>
</qandaentry>
......
No preview for this file type
This diff is collapsed.
......@@ -5,7 +5,12 @@ use strict;
my $screen = "";
while(my $line = <>)
{
if ($line =~ s|^\% (.*)$|<prompt>\%</prompt> <userinput>$1</userinput>|)
if ($line =~ s|^\% (.*)//(.*)$|<prompt>\%</prompt> <userinput>$1</userinput>//$2|)
{
$screen = $screen . $line;
next;
}
elsif ($line =~ s|^\% (.*)$|<prompt>\%</prompt> <userinput>$1</userinput>|)
{
$screen = $screen . $line;
next;
......
1. Convert orig.nex -> orig.fasta
I used seaview
2. Remove a sequence from orig.fasta -> cleaned.fasta
// I did this
Remove sequence with seaview -> cleaned.fasta
sed -i 's/\?/N/g' cleaned.fasta
// FIXME! Removes empty columns
alignment-thin --remove Cystopters_fragilis_5950_c8_592 orig.fasta | sed 's/\?/N/g' > cleaned.fasta
// FIXME! alignment-info also empty columns
3. Select exon/intron pieces from alignment with gaps used to separate them.
alignment-cat -c 5-8 -e cleaned.fasta > exon1.fasta
alignment-cat -c 9-453 -e cleaned.fasta > intron1.fasta
alignment-cat -c 454-596 -e cleaned.fasta > exon2.fasta
alignment-cat -c 597-864 -e cleaned.fasta > intron2.fasta
alignment-cat -c 865-948 -e cleaned.fasta > exon3.fasta
alignment-cat -c 949-1328 -e cleaned.fasta > intron3.fasta
alignment-cat -c 1329-1393 -e cleaned.fasta > exon4.fasta
4. Put the pieces back together
alignment-cat exon1.fasta intron1.fasta exon2.fasta intron2.fasta exon3.fasta intron3.fasta exon4.fasta > combined.fasta
5. Get the coding sequence
alignment-cat exon{1,2,3,4}.fasta -c2-295 > coding.fasta
sed -i 's/-/N/g' coding.fasta
5. Alignment with mafft and muscle
mafft combined.fasta > combined-mafft.fasta
muscle -in combined.fasta -out combined-muscle.fasta
6. Compute alignment diff
alignments-diff combined-{mafft,muscle}.fasta > mafft.AU
alignments-diff combined-{muscle,mafft}.fasta > muscle.AU
alignment-draw combined-mafft.fasta --AU mafft.AU --scale=invert --show-ruler --color-scheme=Rainbow+fade[1,0]+contrast > mafft.html
alignment-draw combined-muscle.fasta --AU muscle.AU --scale=invert --show-ruler --color-scheme=Rainbow+fade[1,0]+contrast > muscle.html
7. Compute dual diff
alignments-diff combined-mafft.fasta combined-muscle.fasta --merge --differences-file=mafft-muscle.AU --fill=unknown > mafft-muscle.fasta
alignment-draw mafft-muscle.fasta --AU mafft-muscle.AU --scale=invert --color-scheme=Rainbow+fade[1,0]+contrast > mafft-muscle.html
8. run bali-phy
config.txt:
align = exon1.fasta
align = intron1.fasta
align = exon2.fasta
align = intron2.fasta
align = exon3.fasta
align = intron3.fasta
align = exon4.fasta
smodel = 1,3,5,7:gtr+Rates.free[n=4]
imodel = 1,3,5,7:none
smodel = 2,4,6:gtr+Rates.free[n=4]
imodel = 2,4,6:rs07
9. run bali-phy with codon model on exons
This requires some work, since exon boundaries don't necessarily
fall between codons. However, since we are already fixing the alignment,
we can concatenate the exons to get a sequence without stop codons.
alignment-translate < coding.fasta
We can then use a codon model on the combined exons:
config2.txt
align = intron1.fasta
align = intron2.fasta
align = intron3.fasta
align = coding.fasta
smodel = 1,2,3:gtr+Rates.free[n=4]
imodel = 1,2,3:rs07
smodel = 4:yn94
#smodel = 4:fMutSel0
#smodel = 4:m3[gtr_sym,f3x4]
imodel = 4:none
This diff is collapsed.
align = exon1.fasta
align = intron1.fasta
align = exon2.fasta
align = intron2.fasta
align = exon3.fasta
align = intron3.fasta
align = exon4.fasta
smodel = 1,3,5,7:gtr+Rates.free[n=4]
imodel = 1,3,5,7:none
scale = 1,3,5,7:
imodel = 2,4,6:rs07
smodel = 2,4,6:gtr+Rates.free[n=4]
scale = 2,4,6:
align = intron1.fasta
align = intron2.fasta
align = intron3.fasta
align = coding.fasta
smodel = 1,2,3:gtr+Rates.free[n=4]
imodel = 1,2,3:rs07
#smodel = 4:yn94
#smodel = 4:fMutSel0
smodel = 4:m3[gtr_sym,f3x4]
imodel = 4:none
This diff is collapsed.
This diff is collapsed.
......@@ -13,7 +13,7 @@
{
"arg_name": "alpha",
"arg_type": "Double",
"default_value": "~log_laplace[-6,2]",
"default_value": "~log_laplace[6,2]",
"description": "The shape parameter for the Gamma distribution"
},
{
......
......@@ -2,7 +2,7 @@
"name": "rs05",
"synonyms": ["RS05"],
"result_type": "IndelModel",
"call": "IModel.rs05[logDelta,meanIndelLengthMinus1,tau,tree]",
"call": "IModel.rs05[logDelta,mean_length,tau,tree]",
"title": "Redelings & Suchard (2005) model of insertions and deletions",
"citation":{"type": "article",
"title": "Joint Estimation of Alignment and Phylogeny.",
......@@ -18,9 +18,9 @@
"default_value": "~Laplace[-4,0.707]"
},
{
"arg_name": "meanIndelLengthMinus1",
"arg_name": "mean_length",
"arg_type": "Double",
"default_value": "~Exponential[10]"
"default_value": "~Exponential[10,1]"
},
{
"arg_name": "tau",
......
project('bali-phy', ['cpp','c'],
version: '3.1.1',
version: '3.1.4',
default_options : [
'buildtype=release',
'cpp_std=c++14'
......
......@@ -597,8 +597,8 @@ sub print_model_section
$section .= "<h3 style=\"clear:both\"><a name=\"tree\">Tree (+priors)</a></h3>\n";
# $section .= "<table class=\"backlit2\">\n";
$section .= "<table>\n";
$section .= "<tr><td>topology</td><td>$topology_prior</td></tr>" if (defined($topology_prior));
$section .= "<tr><td>branch lengths</td><td>$branch_prior</td></tr>" if (defined($topology_prior));
$section .= "<tr><td class=\"modelname\">topology</td><td>$topology_prior</td></tr>" if (defined($topology_prior));
$section .= "<tr><td class=\"modelname\">branch lengths</td><td>$branch_prior</td></tr>" if (defined($branch_prior));
$section .= "</table>\n";
$section .= "<h3 style=\"clear:both\"><a name=\"smodel\">Substitution model (+priors) </a></h3>\n";
......@@ -624,7 +624,7 @@ sub section_analysis
#$section .= '<table class="backlit2 center" style="width:100%">'."\n";
$section .= '<table class="backlit2 center">'."\n";
$section .= "<tr><th>chain #</th><th>burnin</th><th>subsample</th><th>Iterations (remaining)</th>";
$section .= "<tr><th>chain #</th><th>burnin</th><th>subsample</th><th>Iterations (after burnin)</th>";
$section .= "<th>command line</th>" if ($commands_differ);
$section .= "<th>subdirectory</th>";
$section .= "<th>directory</th>" if ($directories_differ);
......@@ -1613,7 +1613,7 @@ sub draw_alignments
}
if (! more_recent_than("Results/$alignment-diff.html","Results/$alignment-diff.AU")) {
exec_show("alignment-draw Results/$alignment.fasta --scale=invert --AU Results/$alignment-diff.AU --show-ruler --color-scheme=Rainbow+fade[1,0]+contrast > Results/$alignment-diff.html");
exec_show("alignment-draw Results/$alignment.fasta --scale=identity --AU Results/$alignment-diff.AU --show-ruler --color-scheme=diff[3]+contrast > Results/$alignment-diff.html");
}
}
print "done.\n";
......
......@@ -157,6 +157,56 @@ public:
{ }
};
/// ColorMap which represents uncertainty in terms of Diff colors
struct Diff_ColorMap: public ColorMap
{
int n_blocks;
double start;
double end;
bool gaps_different;
public:
Diff_ColorMap* clone() const {return new Diff_ColorMap(*this);}
RGB bg_color(double x,const string& s) const
{
// color gaps differently? (grey-scale?)
if (gaps_different and ((s == "-") or (s == "---")))
{
double v_uncertain = 1.0;
double v_certain = 0.6;
double value = v_uncertain + 0*(v_certain - v_uncertain);
return HSV(0,0,value).to_RGB();
}
else
{
if (x == 0) return white;
x = (x-1)/(n_blocks-1);
// otherwise use the rainbow
double hue = start + x*(end - start);
double sat = 1.0 - x*0.5;
return HSV(hue,sat,0.95).to_RGB();
}
}
RGB fg_color(double,const string&) const {
return black;
}
Diff_ColorMap(int i,bool g=true)
:n_blocks(i),start(0.15),end(0.55),gaps_different(g)
{ }
Diff_ColorMap(int i,double s,double e,bool g=true)
:n_blocks(i),start(s),end(e),gaps_different(g)
{ }
};
RGB AA_color(char aa) {
if (strchr("GPST",aa))
return orange;
......@@ -301,21 +351,21 @@ public:
};
class ColorScheme: public Object {
Scale scale;
std::function<double(double)> transform;
owned_ptr<ColorMap> color_map;
public:
virtual ColorScheme* clone() const {return new ColorScheme(*this);}
RGB bg_color(double x,const string& s) const {
return color_map->bg_color(scale(x),s);
return color_map->bg_color(transform(x),s);
}
RGB fg_color(double x,const string& s) const {
return color_map->fg_color(scale(x),s);
return color_map->fg_color(transform(x),s);
}
ColorScheme(const ColorMap& CM,const Scale& S)
:scale(S),color_map(CM)
ColorScheme(const ColorMap& CM,const std::function<double(double)>& S)
:transform(S),color_map(CM)
{ }
};
......@@ -444,16 +494,13 @@ bool match2(vector<string>& sstack,const string& s) {
}
Scale get_scale(const variables_map& args) {
Scale scale;
std::function<double(double)> get_scale(const variables_map& args)
{
string scale_name = args["scale"].as<string>();
if (scale_name == "identity") {
scale.f = identity;
scale.min = 0;
scale.max = 1;
if (args.count("min")) scale.min = args["min"].as<double>();
if (args.count("max")) scale.max = args["max"].as<double>();
}
if (scale_name == "identity")
return [](double x){return x;};
Scale scale;
if (scale_name == "invert") {
scale.f = invert;
scale.min = 0;
......@@ -499,6 +546,17 @@ owned_ptr<ColorMap> get_base_color_map(vector<string>& string_stack,bool gaps_di
color_map = claim(new Rainbow_ColorMap(0.7,1,gaps_different));
else if (match(string_stack,"BlueRed",arg))
color_map = claim(new Rainbow_ColorMap(1,0.7,gaps_different));
else if (match(string_stack,"diff",arg))
{
color_map = claim(new Plain_ColorMap);
vector<double> v = split<double>(arg,',');
if (v.size() == 1)
color_map = claim(new Diff_ColorMap((int)v[0], gaps_different));
else if (v.size() == 3)
color_map = claim(new Diff_ColorMap((int)v[0], v[1], v[2], gaps_different));
else
throw myexception()<<"diff: argument should be for the form [int] or [int,double,double]";
}
else if (match(string_stack,"Rainbow",arg))
{
double start = 0.666;
......@@ -577,9 +635,9 @@ owned_ptr<ColorScheme> get_color_scheme(const variables_map& args)
owned_ptr<ColorMap> color_map = get_color_map(args,gaps_different);
assert(color_map);
Scale scale = get_scale(args);
auto transform = get_scale(args);
return ColorScheme(*color_map,scale);
return ColorScheme(*color_map,transform);
}
variables_map parse_cmd_line(int argc,char* argv[])
......
......@@ -88,6 +88,8 @@ variables_map parse_cmd_line(int argc,char* argv[])
("dual","Write out the two aligned alignments separately")
("fill",value<string>()->default_value("gap"),"blank columns filled with: gap or unknown")
("differences-file,d",value<string>(),"Filename to store differences in AU format")
("blocksize",value<int>()->default_value(20),"Width of blocks of same color")
("classes",value<int>()->default_value(3),"Number of groups for non-matching characters")
;
options_description all("All options");
......@@ -328,21 +330,38 @@ int main(int argc,char* argv[])
else
{
// Add 1 column for the ruler
matrix<int> D(A1.length(), A1.n_sequences()+1, 0.0);
auto column_lookup2 = column_lookup(A2);
const int blocksize = args["blocksize"].as<int>();
const int n_types = args["classes"].as<int>();
matrix<int> D(A1.length(), A1.n_sequences()+1, 0);
for(int i=0;i<columns1.size();i++)
{
int c1 = columns1[i];
int c2 = columns2[i];
if (c1 == -1 or c2 == -1) continue;
if (c1 == -1) continue;
bool any_different = false;
bool any_different = (c2 != 1);
for(int k=0;k<A1.n_sequences();k++)
{
if (M1(c1,k) != M2(c2,k)) any_different = true;
// 0. Skip if there's not a letter at A1(c1,k)
if (M1(c1,k) < 0) continue;
// 1. color the letter in A1 according to its position in A2
int c12 = column_lookup2[k][M1(c1,k)];
int block = c12 / blocksize;
int type = block % n_types;
D(c1,k) = 1+type;
if (M1(c1,k) == M2(c2,k) and M1(c1,k) >=0)
D(c1,k) = 1.0;
// 2. mark the letter as a match if its the same one in A2.
if (c2 != -1)
{
if (M1(c1,k) == M2(c2,k))
D(c1,k) = 0;
else
any_different = true;
}
}
// Mark the ruler as unchanged if the whole column is unchanged
......