Skip to content
Commits on Source (13)
2019-11-28 Martin C. Frith <Martin C. Frith>
* doc/last-train.txt, scripts/last-train, test/last-train-test.out:
train: double scale if integer-rounding is bad
[887c0e6339d1] [tip]
2019-11-27 Martin C. Frith <Martin C. Frith>
* scripts/last-train, test/last-train-test.out:
!Make last-train use balanced length probability
[fff603c39404]
* scripts/last-train, test/last-train-test.out:
train: change integer-rounding of gap open costs
[d4e0fcd8cc2b]
2019-11-26 Martin C. Frith <Martin C. Frith>
* src/LastalArguments.cc, src/lastal.cc, test/last-test.out:
!Change lastal default gapless score threshold
[24e288b4947c]
* scripts/last-train:
Refactor
[4a486c301cd5]
* scripts/last-train:
Refactor
[932a5e822d1c]
2019-11-21 Martin C. Frith <Martin C. Frith>
* scripts/last-train:
Refactor
[037daaa2fe00]
* scripts/last-train:
Refactor
[ad8ed4200d8c]
* scripts/last-train:
Refactor
[87e4bbb5b03f]
2019-11-18 Martin C. Frith <Martin C. Frith>
* src/GappedXdropAligner.cc, src/GappedXdropAligner.hh,
src/GappedXdropAligner2qual.cc, src/GappedXdropAligner3frame.cc,
src/GappedXdropAligner3framePssm.cc, src/GappedXdropAlignerInl.hh,
src/GappedXdropAlignerPssm.cc, src/mcf_simd.hh:
Refactor
[f0bc43d74861]
* src/GappedXdropAligner.cc, src/GappedXdropAlignerInl.hh,
src/GappedXdropAlignerPssm.cc, src/ScoreMatrix.cc,
src/ScoreMatrix.hh, src/mcf_simd.hh:
Refactor
[f164ba36017b]
* scripts/last-train, src/Alignment.cc, src/Centroid.cc,
src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, test/last-test.out:
Remove always-zero counts from lastal -j7
[43344af70eab]
2019-11-15 Martin C. Frith <Martin C. Frith>
* src/Centroid.cc:
Re-order some probability calculations
[f46e1c98977c]
* doc/lastal.txt, examples/multiMito.maf, scripts/last-train,
src/Alignment.cc, src/Centroid.cc, src/Centroid.hh, test/last-
test.out, test/last-train-test.out, test/maf-swap-test.out:
!Get lastal probabilities from "model A"
[033e6b4b0308]
* src/Centroid.cc:
Re-order probability calculations
[b97eb8aee1f7]
2019-11-14 Martin C. Frith <Martin C. Frith>
* src/Centroid.cc:
Refactor (better variable names)
[54db8ec6938c]
* doc/lastal.txt, src/Alignment.cc, src/Centroid.cc, src/Centroid.hh,
src/LastalArguments.cc:
Remove probabilities for generalized affine gaps
[2e10e14cc66d]
2019-11-12 Martin C. Frith <Martin C. Frith>
* src/GappedXdropAlignerPssm.cc:
Make fastq gapped alignment faster (SIMD)!
[6afae8dec8f0]
* src/GappedXdropAlignerPssm.cc:
Refactor
[168d611f79ab]
* src/GappedXdropAlignerPssm.cc, test/last-test.out, test/last-
test.sh:
Refactor
[bdc3abede08a]
* makefile, src/Centroid.cc, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner2qual.cc,
src/GappedXdropAlignerPssm.cc, src/makefile, src/mcf_simd.hh:
Make fasta gapped alignment faster (SIMD)!
[2a74b539aecf]
2019-11-11 Martin C. Frith <Martin C. Frith>
* src/GappedXdropAligner.cc:
Refactor
[92f84eaa2c77]
* src/Centroid.cc, src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/makefile,
src/mcf_contiguous_queue.hh:
Refactor (not sure if useful)
[f1192efe0bec]
* src/Alignment.cc, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner2qual.cc,
src/GappedXdropAlignerPssm.cc, src/GreedyXdropAligner.cc,
src/GreedyXdropAligner.hh:
Refactor
[de3177d4a626]
* src/GappedXdropAligner.cc, src/GappedXdropAligner.hh,
src/GappedXdropAligner2qual.cc, src/GappedXdropAligner3frame.cc,
src/GappedXdropAligner3framePssm.cc, src/GappedXdropAlignerPssm.cc:
Refactor
[b726e939f94c]
* src/GappedXdropAligner.cc, src/GappedXdropAligner.hh,
src/GappedXdropAligner2qual.cc, src/GappedXdropAlignerPssm.cc:
Refactor
[1587261f2469]
* src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner2qual.cc,
src/GappedXdropAligner3frame.cc,
src/GappedXdropAligner3framePssm.cc, src/GappedXdropAlignerInl.hh,
src/GappedXdropAlignerPssm.cc:
Refactor (std::size_t -> size_t)
[0cdc7da1019d]
* src/Centroid.cc, src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner3frame.cc,
src/alp/njn_random.cpp, src/makefile:
Make lastal gapped alignment a bit faster
[5506d2756b47]
* src/LastEvaluer.hh, test/last-test.out, test/last-test.sh:
Fix lastal crash on empty lastdb database
[e5c59d1ed3a6]
2019-10-09 Martin C. Frith <Martin C. Frith>
* data/BL80.mat, data/PAM10.mat, data/PAM30.mat:
Add PAM10 substitution score matrix
[e57e547b8235]
2019-10-08 Martin C. Frith <Martin C. Frith>
* src/lastal.cc, src/lastdb.cc:
Omit non-alphabet letters for DB seq lengths
[7f841cc97f20]
* src/SubsetMinimizerFinder.cc, src/SubsetMinimizerFinder.hh,
src/SubsetSuffixArray.cc, src/lastal.cc:
Refactor
[4011a21f96ad]
2019-10-07 Martin C. Frith <Martin C. Frith>
* src/ScoreMatrix.cc, src/ScoreMatrix.hh, src/lastal.cc,
src/mcf_substitution_matrix_stats.cc,
src/mcf_substitution_matrix_stats.hh:
Allow score matrix with letter frequencies
[50bcc51ca59b]
* src/CyclicSubsetSeed.hh, src/ScoreMatrix.cc, src/ScoreMatrix.hh,
src/ScoreMatrixRow.hh, src/makefile:
Refactor
[ca6613dd1f06]
* src/LastEvaluer.cc, src/LastEvaluer.hh, src/lastal.cc:
Enable stored E-value params for any genetic code
[f757e6ca4edb]
* build/gc-inc.sh, data/gc.prt, doc/lastal.txt, src/GeneticCode.cc,
src/GeneticCode.hh, src/LastalArguments.cc, src/lastal.cc,
src/makefile, test/last-test.out, test/last-test.sh:
Add NCBI genetic codes
[88adecc284af]
* doc/last-evalues.txt, src/LastEvaluer.cc, src/LastEvaluer.hh,
src/LastalArguments.cc, src/LastalArguments.hh, src/lastal.cc,
src/lastdb.cc, test/last-test.out:
Improve E-values for multiple short DB sequences
[49caf55d4d4d]
2019-08-21 Martin C. Frith <Martin C. Frith>
* doc/last-dotplot.txt, scripts/last-dotplot:
dotplot: change satellite-repeat color to purple
[fb43f850eba2] [tip]
[fb43f850eba2]
2019-07-16 Martin C. Frith <Martin C. Frith>
......
#! /bin/sh
# This generates source code from genetic codes.
cat <<EOF
const struct {
const char *name;
const char *text;
} geneticCodes[] = {
EOF
cat "$@" | tr -d '",' |
awk '
$1 == "id" {print "{\"" $2 "\", \"\\"}
$1 == "ncbieaa" {print " AAs = " $2 "\\n\\"}
/-- Base/ {print $2 " = " $3 "\\n\\"}
/-- Base3/ {print "\"},"}
'
echo "};"
# This protein scoring scheme is good at finding somewhat
# short-and-strong similarities. (S Henikoff & JG Henikoff, PNAS 1992
# This protein scoring scheme is good for finding somewhat strong, and
# short, similarities. (S Henikoff & JG Henikoff, PNAS 1992
# 89(22):10915-9).
#nickname BLOSUM80
......
# This protein scoring scheme is good for finding very strong, and
# short, similarities (MO Dayhoff et al. 1978).
#last -a20 -b3
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 7 -10 -7 -6 -10 -7 -5 -4 -11 -8 -9 -10 -8 -12 -4 -3 -3 -20 -11 -5 -6 -6 -6 -23
R -10 9 -9 -17 -11 -4 -15 -13 -4 -8 -12 -2 -7 -12 -7 -6 -10 -5 -14 -11 -11 -7 -9 -23
N -7 -9 9 -1 -17 -7 -5 -6 -2 -8 -10 -4 -15 -12 -9 -2 -5 -11 -7 -12 7 -6 -6 -23
D -6 -17 -1 8 -21 -6 0 -6 -7 -11 -19 -8 -17 -21 -12 -7 -8 -21 -17 -11 7 -1 -9 -23
C -10 -11 -17 -21 10 -20 -20 -13 -10 -9 -21 -20 -20 -19 -11 -6 -11 -22 -7 -9 -18 -20 -13 -23
Q -7 -4 -7 -6 -20 9 -1 -10 -2 -11 -8 -6 -7 -19 -6 -8 -9 -19 -18 -10 -6 7 -8 -23
E -5 -15 -5 0 -20 -1 8 -7 -9 -8 -13 -7 -10 -20 -9 -7 -9 -23 -11 -10 -1 7 -8 -23
G -4 -13 -6 -6 -13 -10 -7 7 -13 -17 -14 -10 -12 -12 -10 -4 -10 -21 -20 -9 -6 -8 -8 -23
H -11 -4 -2 -7 -10 -2 -9 -13 10 -13 -9 -10 -17 -9 -7 -9 -11 -10 -6 -9 -4 -4 -8 -23
I -8 -8 -8 -11 -9 -11 -8 -17 -13 9 -4 -9 -3 -5 -12 -10 -5 -20 -9 -1 -9 -9 -8 -23
L -9 -12 -10 -19 -21 -8 -13 -14 -9 -4 7 -11 -2 -5 -10 -12 -10 -9 -10 -5 -12 -10 -9 -23
K -10 -2 -4 -8 -20 -6 -7 -10 -10 -9 -11 7 -4 -20 -10 -7 -6 -18 -12 -13 -5 -6 -8 -23
M -8 -7 -15 -17 -20 -7 -10 -12 -17 -3 -2 -4 12 -7 -11 -8 -7 -19 -17 -4 -16 -8 -9 -23
F -12 -12 -12 -21 -19 -19 -20 -12 -9 -5 -5 -20 -7 9 -13 -9 -12 -7 -1 -12 -14 -20 -12 -23
P -4 -7 -9 -12 -11 -6 -9 -10 -7 -12 -10 -10 -11 -13 8 -4 -7 -20 -20 -9 -10 -7 -8 -23
S -3 -6 -2 -7 -6 -8 -7 -4 -9 -10 -12 -7 -8 -9 -4 7 -2 -8 -10 -10 -4 -8 -6 -23
T -3 -10 -5 -8 -11 -9 -9 -10 -11 -5 -10 -6 -7 -12 -7 -2 8 -19 -9 -6 -6 -9 -7 -23
W -20 -5 -11 -21 -22 -19 -23 -21 -10 -20 -9 -18 -19 -7 -20 -8 -19 13 -8 -22 -13 -21 -16 -23
Y -11 -14 -7 -17 -7 -18 -11 -20 -6 -9 -10 -12 -17 -1 -20 -10 -9 -8 10 -10 -9 -13 -11 -23
V -5 -11 -12 -11 -9 -10 -10 -9 -9 -1 -5 -13 -4 -12 -9 -10 -6 -22 -10 8 -11 -10 -8 -23
B -6 -11 7 7 -18 -6 -1 -6 -4 -9 -12 -5 -16 -14 -10 -4 -6 -13 -9 -11 7 -3 -8 -23
Z -6 -7 -6 -1 -20 7 7 -8 -4 -9 -10 -6 -8 -20 -7 -8 -9 -21 -13 -10 -3 7 -8 -23
X -6 -9 -6 -9 -13 -8 -8 -8 -8 -8 -9 -8 -9 -12 -8 -6 -7 -16 -11 -8 -8 -8 -8 -23
* -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 1
# This protein scoring scheme is good for finding short-and-strong
# This protein scoring scheme is good for finding strong, and short,
# similarities (MO Dayhoff et al. 1978).
#last -a13 -b3
......
--**************************************************************************
-- This is the NCBI genetic code table
-- Initial base data set from Andrzej Elzanowski while at PIR International
-- Addition of Eubacterial and Alternative Yeast by J.Ostell at NCBI
-- Base 1-3 of each codon have been added as comments to facilitate
-- readability at the suggestion of Peter Rice, EMBL
-- Later additions by Taxonomy Group staff at NCBI
--
-- Version 4.5
-- Added Cephalodiscidae mitochondrial genetic code 33
--
-- Version 4.4
-- Added GTG as start codon for genetic code 3
-- Added Balanophoraceae plastid genetic code 32
--
-- Version 4.3
-- Change to CTG -> Leu in genetic codes 27, 28, 29, 30
--
-- Version 4.2
-- Added Karyorelict nuclear genetic code 27
-- Added Condylostoma nuclear genetic code 28
-- Added Mesodinium nuclear genetic code 29
-- Added Peritrich nuclear genetic code 30
-- Added Blastocrithidia nuclear genetic code 31
--
-- Version 4.1
-- Added Pachysolen tannophilus nuclear genetic code 26
--
-- Version 4.0
-- Updated version to reflect numerous undocumented changes:
-- Corrected start codons for genetic code 25
-- Name of new genetic code is Candidate Division SR1 and Gracilibacteria
-- Added candidate division SR1 nuclear genetic code 25
-- Added GTG as start codon for genetic code 24
-- Corrected Pterobranchia Mitochondrial genetic code (24)
-- Added genetic code 24, Pterobranchia Mitochondrial
-- Genetic code 11 is now Bacterial, Archaeal and Plant Plastid
-- Fixed capitalization of mitochondrial in codes 22 and 23
-- Added GTG, ATA, and TTG as alternative start codons to code 13
--
-- Version 3.9
-- Code 14 differs from code 9 only by translating UAA to Tyr rather than
-- STOP. A recent study (Telford et al, 2000) has found no evidence that
-- the codon UAA codes for Tyr in the flatworms, but other opinions exist.
-- There are very few GenBank records that are translated with code 14,
-- but a test translation shows that retranslating these records with code
-- 9 can cause premature terminations. Therefore, GenBank will maintain
-- code 14 until further information becomes available.
--
-- Version 3.8
-- Added GTG start to Echinoderm mitochondrial code, code 9
--
-- Version 3.7
-- Added code 23 Thraustochytrium mitochondrial code
-- formerly OGMP code 93
-- submitted by Gertraude Berger, Ph.D.
--
-- Version 3.6
-- Added code 22 TAG-Leu, TCA-stop
-- found in mitochondrial DNA of Scenedesmus obliquus
-- submitted by Gertraude Berger, Ph.D.
-- Organelle Genome Megasequencing Program, Univ Montreal
--
-- Version 3.5
-- Added code 21, Trematode Mitochondrial
-- (as deduced from: Garey & Wolstenholme,1989; Ohama et al, 1990)
-- Added code 16, Chlorophycean Mitochondrial
-- (TAG can translated to Leucine instaed to STOP in chlorophyceans
-- and fungi)
--
-- Version 3.4
-- Added CTG,TTG as allowed alternate start codons in Standard code.
-- Prats et al. 1989, Hann et al. 1992
--
-- Version 3.3 - 10/13/95
-- Added alternate intiation codon ATC to code 5
-- based on complete mitochondrial genome of honeybee
-- Crozier and Crozier (1993)
--
-- Version 3.2 - 6/24/95
-- Code Comments
-- 10 Alternative Ciliate Macronuclear renamed to Euplotid Macro...
-- 15 Blepharisma Macro.. code added
-- 5 Invertebrate Mito.. GTG allowed as alternate initiator
-- 11 Eubacterial renamed to Bacterial as most alternate starts
-- have been found in Archea
--
--
-- Version 3.1 - 1995
-- Updated as per Andrzej Elzanowski at NCBI
-- Complete documentation in NCBI toolkit documentation
-- Note: 2 genetic codes have been deleted
--
-- Old id Use id - Notes
--
-- id 7 id 4 - Kinetoplast code now merged in code id 4
-- id 8 id 1 - all plant chloroplast differences due to RNA edit
--
--
--*************************************************************************
Genetic-code-table ::= {
{
name "Standard" ,
name "SGC0" ,
id 1 ,
ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------**--*----M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Vertebrate Mitochondrial" ,
name "SGC1" ,
id 2 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG",
sncbieaa "----------**--------------------MMMM----------**---M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Yeast Mitochondrial" ,
name "SGC2" ,
id 3 ,
ncbieaa "FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**----------------------MM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate
Mitochondrial; Mycoplasma; Spiroplasma" ,
name "SGC3" ,
id 4 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--MM------**-------M------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Invertebrate Mitochondrial" ,
name "SGC4" ,
id 5 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG",
sncbieaa "---M------**--------------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear" ,
name "SGC5" ,
id 6 ,
ncbieaa "FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Echinoderm Mitochondrial; Flatworm Mitochondrial" ,
name "SGC8" ,
id 9 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Euplotid Nuclear" ,
name "SGC9" ,
id 10 ,
ncbieaa "FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Bacterial, Archaeal and Plant Plastid" ,
id 11 ,
ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------**--*----M------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Alternative Yeast Nuclear" ,
id 12 ,
ncbieaa "FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**--*----M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Ascidian Mitochondrial" ,
id 13 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG",
sncbieaa "---M------**----------------------MM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Alternative Flatworm Mitochondrial" ,
id 14 ,
ncbieaa "FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG",
sncbieaa "-----------*-----------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Blepharisma Macronuclear" ,
id 15 ,
ncbieaa "FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------*---*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Chlorophycean Mitochondrial" ,
id 16 ,
ncbieaa "FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------*---*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Trematode Mitochondrial" ,
id 21 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Scenedesmus obliquus Mitochondrial" ,
id 22 ,
ncbieaa "FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "------*---*---*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Thraustochytrium Mitochondrial" ,
id 23 ,
ncbieaa "FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--*-------**--*-----------------M--M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Pterobranchia Mitochondrial" ,
id 24 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG",
sncbieaa "---M------**-------M---------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Candidate Division SR1 and Gracilibacteria" ,
id 25 ,
ncbieaa "FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------**-----------------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Pachysolen tannophilus Nuclear" ,
id 26 ,
ncbieaa "FFLLSSSSYY**CC*WLLLAPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**--*----M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Karyorelict Nuclear" ,
id 27 ,
ncbieaa "FFLLSSSSYYQQCCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Condylostoma Nuclear" ,
id 28 ,
ncbieaa "FFLLSSSSYYQQCCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**--*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Mesodinium Nuclear" ,
id 29 ,
ncbieaa "FFLLSSSSYYYYCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Peritrich Nuclear" ,
id 30 ,
ncbieaa "FFLLSSSSYYEECC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Blastocrithidia Nuclear" ,
id 31 ,
ncbieaa "FFLLSSSSYYEECCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Balanophoraceae Plastid" ,
id 32 ,
ncbieaa "FFLLSSSSYY*WCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------*---*----M------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Cephalodiscidae Mitochondrial" ,
id 33 ,
ncbieaa "FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG",
sncbieaa "---M-------*-------M---------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}
}
last-align (1021-1) unstable; urgency=medium
* Team upload.
* New upstream version
* Fiddling with CFLAGS/CXXFLAGS defaults in Makefile
* fixed and minimized 2to3-patches - path for last-train
so there is no FTBFS when generating man pages
* TODO: Need to work on platform dependencies
-- Steffen Moeller <moeller@debian.org> Sat, 26 Oct 2019 17:57:55 +0200
last-align (984-2) unstable; urgency=medium
* Use 2to3 to port to Python3
......
......@@ -3,8 +3,10 @@ Bug-Debian: https://bugs.debian.org/943148
Author: Andreas Tille <tille@debian.org>
Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
--- a/scripts/last-dotplot
+++ b/scripts/last-dotplot
Index: last-align/scripts/last-dotplot
===================================================================
--- last-align.orig/scripts/last-dotplot
+++ last-align/scripts/last-dotplot
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
......@@ -53,40 +55,26 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
def readGenePred(opts, fileName, rangeDict):
for line in myOpen(fileName):
--- a/scripts/last-map-probs
+++ b/scripts/last-map-probs
Index: last-align/scripts/last-map-probs
===================================================================
--- last-align.orig/scripts/last-map-probs
+++ last-align/scripts/last-map-probs
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2010, 2011, 2012, 2014 Martin C. Frith
@@ -7,7 +7,7 @@
# probabilities make the risky assumption that one of the alignments
# reported for each query is correct.
-from __future__ import print_function
+
import gzip
import sys, os, math, optparse, signal
--- a/scripts/last-postmask
+++ b/scripts/last-postmask
Index: last-align/scripts/last-postmask
===================================================================
--- last-align.orig/scripts/last-postmask
+++ last-align/scripts/last-postmask
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2014 Martin C. Frith
@@ -12,7 +12,7 @@
# Limitations: doesn't (yet) handle sequence quality data,
# frameshifts, or generalized affine gaps.
-from __future__ import print_function
+
import gzip
import optparse, os, signal, sys
@@ -37,7 +37,7 @@ def complement(base):
def fastScoreMatrix(rowHeads, colHeads, matrix, deleteCost, insertCost):
......@@ -105,20 +93,18 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
colHeads = fields[1:]
elif nf == len(colHeads) + 2 and len(fields[1]) == 1:
rowHeads.append(fields[1])
--- a/scripts/last-train
+++ b/scripts/last-train
@@ -1,7 +1,7 @@
Index: last-align/scripts/last-train
===================================================================
--- last-align.orig/scripts/last-train
+++ last-align/scripts/last-train
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2015 Martin C. Frith
-from __future__ import print_function
+
import gzip
import math, optparse, os, random, signal, subprocess, sys, tempfile
@@ -215,7 +215,7 @@ def writeMatrixWithLetters(outFile, matr
writeMatrixBody(outFile, prefix, left, body, "%6s")
# References:
@@ -248,7 +248,7 @@ def writeScoreMatrix(outFile, matrix, pr
writeMatrixBody(outFile, prefix, alphabet, matrix, "%6s")
def matProbsFromCounts(counts, opts):
- r = range(len(counts))
......@@ -126,34 +112,26 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
if opts.revsym: # add complement (reverse strand) substitutions
counts = [[counts[i][j] + counts[-1-i][-1-j] for j in r] for i in r]
if opts.matsym: # symmetrize the substitution matrix
--- a/scripts/maf-convert
+++ b/scripts/maf-convert
Index: last-align/scripts/maf-convert
===================================================================
--- last-align.orig/scripts/maf-convert
+++ last-align/scripts/maf-convert
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2010, 2011, 2013, 2014 Martin C. Frith
# Read MAF-format alignments: write them in other formats.
# Seems to work with Python 2.x, x>=6
@@ -6,7 +6,7 @@
# By "MAF" we mean "multiple alignment format" described in the UCSC
# Genome FAQ, not e.g. "MIRA assembly format".
-from __future__ import print_function
+
from itertools import *
import gzip
--- a/scripts/maf-cut
+++ b/scripts/maf-cut
@@ -1,6 +1,6 @@
Index: last-align/scripts/maf-cut
===================================================================
--- last-align.orig/scripts/maf-cut
+++ last-align/scripts/maf-cut
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
+
-from __future__ import print_function
from __future__ import print_function
import gzip
import itertools
@@ -70,12 +70,12 @@ def cutMafRecords(mafLines, alnBeg, alnE
def mafFieldWidths(mafRecords):
......@@ -170,23 +148,16 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
print("%*s %-*s %*s %*s %*s %*s %*s" % tuple(formatParams))
def cutOneMaf(cutRange, mafLines):
--- a/scripts/maf-join
+++ b/scripts/maf-join
Index: last-align/scripts/maf-join
===================================================================
--- last-align.orig/scripts/maf-join
+++ last-align/scripts/maf-join
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2009, 2010, 2011 Martin C. Frith
@@ -10,7 +10,7 @@
# WARNING: Alignment columns with a gap in the top genome are joined
# arbitrarily!!!
-from __future__ import print_function
+
import functools
import sys, os, fileinput, optparse, signal
@@ -229,7 +229,7 @@ def nextWindow(window, theInput, referen
while True:
maf = theInput.peek()
......@@ -196,23 +167,16 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
if not maf.before(referenceMaf): yield maf
except StopIteration: pass
--- a/scripts/maf-swap
+++ b/scripts/maf-swap
Index: last-align/scripts/maf-swap
===================================================================
--- last-align.orig/scripts/maf-swap
+++ last-align/scripts/maf-swap
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Read MAF-format alignments, and write them, after moving the Nth
# sequence to the top in each alignment.
@@ -7,7 +7,7 @@
# flip all the strands. But don't do this if the top sequence is
# translated DNA.
-from __future__ import print_function
+
import gzip
import itertools
@@ -69,12 +69,12 @@ def flippedMafRecords(mafLines):
def sLineFieldWidths(mafLines):
......
Index: last-align/src/makefile
===================================================================
--- last-align.orig/src/makefile
+++ last-align/src/makefile
@@ -1,4 +1,5 @@
-CXXFLAGS = -msse4.1 -O3 -Wall -Wextra -Wcast-qual -Wswitch-enum \
+#CXXFLAGS = -msse4.1
+CXXFLAGS += -O3 -Wall -Wextra -Wcast-qual -Wswitch-enum \
-Wundef -Wcast-align -pedantic -g -std=c++11 -pthread \
-DHAS_CXX_THREADS
# -Wconversion
@@ -10,7 +11,7 @@ CXXFLAGS = -msse4.1 -O3 -Wall -Wextra -W
ALPHABET_CAPACITY = 64
CPPF = -DALPHABET_CAPACITY=$(ALPHABET_CAPACITY) $(CPPFLAGS)
-CFLAGS = -Wall -O2
+CFLAGS ?= -Wall -O2
alpObj = alp/sls_alignment_evaluer.o alp/sls_pvalues.o \
alp/sls_alp_sim.o alp/sls_alp_regression.o alp/sls_alp_data.o \
@@ -66,33 +67,33 @@ all: $(ALL)
indexAllObj4 = $(indexObj0) $(indexObj4)
lastdb: $(indexAllObj4)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(indexAllObj4) -lz
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(indexAllObj4) -lz
indexAllObj8 = $(indexObj0) $(indexObj8)
lastdb8: $(indexAllObj8)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(indexAllObj8) -lz
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(indexAllObj8) -lz
alignAllObj4 = $(alignObj0) $(alignObj4)
lastal: $(alignAllObj4)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(alignAllObj4) -lz
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(alignAllObj4) -lz
alignAllObj8 = $(alignObj0) $(alignObj8)
lastal8: $(alignAllObj8)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(alignAllObj8) -lz
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(alignAllObj8) -lz
splitAllObj4 = $(splitObj0) $(splitObj4)
last-split: $(splitAllObj4)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(splitAllObj4)
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(splitAllObj4)
splitAllObj8 = $(splitObj0) $(splitObj8)
last-split8: $(splitAllObj8)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(splitAllObj8)
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(splitAllObj8)
last-pair-probs: $(PPOBJ)
- $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(PPOBJ) -lz
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(PPOBJ) -lz
last-merge-batches: $(MBOBJ)
- $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(MBOBJ)
+ $(CC) $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ $(MBOBJ)
.SUFFIXES:
.SUFFIXES: .o .c .cc .cpp .o8
@@ -139,10 +140,10 @@ fix8 = sed 's/.*\.o/& &8/'
depend:
sed '/[m][v]/q' makefile > m
- $(CXX) -MM *.cc | $(fix8) >> m
- $(CC) -MM *.c >> m
- $(CXX) -MM alp/*.cpp | sed 's|.*:|alp/&|' >> m
- $(CXX) -MM -I. split/*.cc | sed 's|.*:|split/&|' | $(fix8) >> m
+ $(CXX) $(CPPFLAGS) -MM *.cc | $(fix8) >> m
+ $(CC) $(CPPFLAGS) -MM *.c >> m
+ $(CXX) $(CPPFLAGS) -MM alp/*.cpp | sed 's|.*:|alp/&|' >> m
+ $(CXX) $(CPPFLAGS) -MM -I. split/*.cc | sed 's|.*:|split/&|' | $(fix8) >> m
mv m makefile
Alignment.o Alignment.o8: Alignment.cc Alignment.hh ScoreMatrixRow.hh SegmentPair.hh \
mcf_gap_costs.hh Alphabet.hh Centroid.hh GappedXdropAligner.hh \
2to3.patch
helpMakefiles.patch
......@@ -11,6 +11,19 @@ Reference:
URL: http://nar.oxfordjournals.org/cgi/content/abstract/38/7/e100
ePrint: http://nar.oxfordjournals.org/cgi/reprint/38/7/e100.pdf
PMID: 20110255
- Author: Martin C Frith
Title: >
How sequence alignment scores correspond to probability models
Journal: Bioinformatics
Volume: btz576
Year: 2019
Month: 7
Day: 22
DOI: 10.1093/bioinformatics/btz576
ePrint: "https://academic.oup.com/bioinformatics/advance-article-pdf/\
doi/10.1093/bioinformatics/btz576/29097431/btz576.pdf"
URL: "https://academic.oup.com/bioinformatics/advance-article/doi/\
10.1093/bioinformatics/btz576/5536873"
Repository: http://last.cbrc.jp/last/
Webservice: http://lastweb.cbrc.jp/
Registry:
......
......@@ -334,7 +334,7 @@ expected number of alignments with greater or equal score, between two
randomly-shuffled sequences of length 1 billion each.</p>
<p><strong>E</strong> is the expected number of alignments with greater or equal
score, between: a random sequence with the same length as the query
sequence, and a random sequence with the same length as the database.</p>
sequence, and random sequences with the same length as the database.</p>
<p>In this example, the two alignments are identical, but their &quot;E&quot;
values are not. That is because the query sequences have different
lengths: human.chrX is longer than human.chrY.</p>
......@@ -375,7 +375,7 @@ are fixed at 0.25.</p>
<h3>Calculation of EG2</h3>
<p>EG2 is calculated from the score like this:</p>
<pre class="literal-block">
EG2 = K * (1 billion) * (1 billion) * exp(-lambda * score)
EG2 = K * exp(-lambda * score) * (1 billion) * (1 billion)
</pre>
<p>The parameters lambda and K are printed in the header of lastal's
output.</p>
......@@ -384,21 +384,30 @@ output.</p>
<h3>Calculation of E</h3>
<p>E is calculated by this formula:</p>
<pre class="literal-block">
E = K * area * exp(-lambda * score) * (number of strands)
E = K * exp(-lambda * score) * strands * area(m, n, score) * N / n,
</pre>
<p>This depends on the number of strands that are searched, either 1 or
2.</p>
<p>The area is approximately (database length) * (query length). However,
if the query (or database) is very short, this formula over-estimates
E. LAST uses a &quot;finite-size correction&quot; to calculate E more
accurately.</p>
<p>where:</p>
<ul>
<li><p class="first"><tt class="docutils literal">strands</tt> is either 1 or 2 (if forward and reverse DNA strands are
searched).</p>
</li>
<li><p class="first"><tt class="docutils literal">m</tt> is the length of the query sequence.</p>
</li>
<li><p class="first"><tt class="docutils literal">n</tt> is the length of the longest sequence in the database.</p>
</li>
<li><p class="first"><tt class="docutils literal">N</tt> is the total length of all database sequences.</p>
</li>
<li><p class="first"><tt class="docutils literal">area(m, n, score)</tt> is <a class="reference external" href="https://doi.org/10.1186/1756-0500-5-286">slightly less than</a> <tt class="docutils literal">m * n</tt>.</p>
</li>
</ul>
</div>
<div class="section" id="effect-of-option-d">
<h3>Effect of option -D</h3>
<p>Option -D indirectly sets the EG2 threshold, via this formula:</p>
<p>Option -D sets the minimum alignment score, by this formula:</p>
<pre class="literal-block">
1 = EG2 * (database length)/1e9 * (option -D)/1e9 * (number of strands)
K * exp(-lambda * score) * strands * D * length(n, score) * N / n ≤ 1,
</pre>
<p>where <tt class="docutils literal">length(n, score)</tt> is slightly less than <tt class="docutils literal">n</tt>.</p>
</div>
<div class="section" id="bit-score">
<h3>Bit score</h3>
......
......@@ -18,7 +18,7 @@ randomly-shuffled sequences of length 1 billion each.
**E** is the expected number of alignments with greater or equal
score, between: a random sequence with the same length as the query
sequence, and a random sequence with the same length as the database.
sequence, and random sequences with the same length as the database.
In this example, the two alignments are identical, but their "E"
values are not. That is because the query sequences have different
......@@ -67,7 +67,7 @@ Calculation of EG2
EG2 is calculated from the score like this::
EG2 = K * (1 billion) * (1 billion) * exp(-lambda * score)
EG2 = K * exp(-lambda * score) * (1 billion) * (1 billion)
The parameters lambda and K are printed in the header of lastal's
output.
......@@ -77,22 +77,26 @@ Calculation of E
E is calculated by this formula::
E = K * area * exp(-lambda * score) * (number of strands)
E = K * exp(-lambda * score) * strands * area(m, n, score) * N / n,
This depends on the number of strands that are searched, either 1 or
2.
where:
The area is approximately (database length) * (query length). However,
if the query (or database) is very short, this formula over-estimates
E. LAST uses a "finite-size correction" to calculate E more
accurately.
* ``strands`` is either 1 or 2 (if forward and reverse DNA strands are
searched).
* ``m`` is the length of the query sequence.
* ``n`` is the length of the longest sequence in the database.
* ``N`` is the total length of all database sequences.
* ``area(m, n, score)`` is `slightly less than
<https://doi.org/10.1186/1756-0500-5-286>`_ ``m * n``.
Effect of option -D
~~~~~~~~~~~~~~~~~~~
Option -D indirectly sets the EG2 threshold, via this formula::
Option -D sets the minimum alignment score, by this formula::
1 = EG2 * (database length)/1e9 * (option -D)/1e9 * (number of strands)
K * exp(-lambda * score) * strands * D * length(n, score) * N / n ≤ 1,
where ``length(n, score)`` is slightly less than ``n``.
Bit score
~~~~~~~~~
......
......@@ -419,8 +419,8 @@ X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -4
</div>
<div class="section" id="bl80-or-blosum80">
<h2>BL80 or BLOSUM80</h2>
<p>This protein scoring scheme is good at finding somewhat
short-and-strong similarities. (S Henikoff &amp; JG Henikoff, PNAS 1992
<p>This protein scoring scheme is good for finding somewhat strong, and
short, similarities. (S Henikoff &amp; JG Henikoff, PNAS 1992
89(22):10915-9).
It uses this matrix:</p>
<pre class="literal-block">
......@@ -504,9 +504,44 @@ X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -6 -6
<p>It sets these default lastal parameter values:
-a13 -b2</p>
</div>
<div class="section" id="pam10">
<h2>PAM10</h2>
<p>This protein scoring scheme is good for finding very strong, and
short, similarities (MO Dayhoff et al. 1978).
It uses this matrix:</p>
<pre class="literal-block">
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 7 -10 -7 -6 -10 -7 -5 -4 -11 -8 -9 -10 -8 -12 -4 -3 -3 -20 -11 -5 -6 -6 -6 -23
R -10 9 -9 -17 -11 -4 -15 -13 -4 -8 -12 -2 -7 -12 -7 -6 -10 -5 -14 -11 -11 -7 -9 -23
N -7 -9 9 -1 -17 -7 -5 -6 -2 -8 -10 -4 -15 -12 -9 -2 -5 -11 -7 -12 7 -6 -6 -23
D -6 -17 -1 8 -21 -6 0 -6 -7 -11 -19 -8 -17 -21 -12 -7 -8 -21 -17 -11 7 -1 -9 -23
C -10 -11 -17 -21 10 -20 -20 -13 -10 -9 -21 -20 -20 -19 -11 -6 -11 -22 -7 -9 -18 -20 -13 -23
Q -7 -4 -7 -6 -20 9 -1 -10 -2 -11 -8 -6 -7 -19 -6 -8 -9 -19 -18 -10 -6 7 -8 -23
E -5 -15 -5 0 -20 -1 8 -7 -9 -8 -13 -7 -10 -20 -9 -7 -9 -23 -11 -10 -1 7 -8 -23
G -4 -13 -6 -6 -13 -10 -7 7 -13 -17 -14 -10 -12 -12 -10 -4 -10 -21 -20 -9 -6 -8 -8 -23
H -11 -4 -2 -7 -10 -2 -9 -13 10 -13 -9 -10 -17 -9 -7 -9 -11 -10 -6 -9 -4 -4 -8 -23
I -8 -8 -8 -11 -9 -11 -8 -17 -13 9 -4 -9 -3 -5 -12 -10 -5 -20 -9 -1 -9 -9 -8 -23
L -9 -12 -10 -19 -21 -8 -13 -14 -9 -4 7 -11 -2 -5 -10 -12 -10 -9 -10 -5 -12 -10 -9 -23
K -10 -2 -4 -8 -20 -6 -7 -10 -10 -9 -11 7 -4 -20 -10 -7 -6 -18 -12 -13 -5 -6 -8 -23
M -8 -7 -15 -17 -20 -7 -10 -12 -17 -3 -2 -4 12 -7 -11 -8 -7 -19 -17 -4 -16 -8 -9 -23
F -12 -12 -12 -21 -19 -19 -20 -12 -9 -5 -5 -20 -7 9 -13 -9 -12 -7 -1 -12 -14 -20 -12 -23
P -4 -7 -9 -12 -11 -6 -9 -10 -7 -12 -10 -10 -11 -13 8 -4 -7 -20 -20 -9 -10 -7 -8 -23
S -3 -6 -2 -7 -6 -8 -7 -4 -9 -10 -12 -7 -8 -9 -4 7 -2 -8 -10 -10 -4 -8 -6 -23
T -3 -10 -5 -8 -11 -9 -9 -10 -11 -5 -10 -6 -7 -12 -7 -2 8 -19 -9 -6 -6 -9 -7 -23
W -20 -5 -11 -21 -22 -19 -23 -21 -10 -20 -9 -18 -19 -7 -20 -8 -19 13 -8 -22 -13 -21 -16 -23
Y -11 -14 -7 -17 -7 -18 -11 -20 -6 -9 -10 -12 -17 -1 -20 -10 -9 -8 10 -10 -9 -13 -11 -23
V -5 -11 -12 -11 -9 -10 -10 -9 -9 -1 -5 -13 -4 -12 -9 -10 -6 -22 -10 8 -11 -10 -8 -23
B -6 -11 7 7 -18 -6 -1 -6 -4 -9 -12 -5 -16 -14 -10 -4 -6 -13 -9 -11 7 -3 -8 -23
Z -6 -7 -6 -1 -20 7 7 -8 -4 -9 -10 -6 -8 -20 -7 -8 -9 -21 -13 -10 -3 7 -8 -23
X -6 -9 -6 -9 -13 -8 -8 -8 -8 -8 -9 -8 -9 -12 -8 -6 -7 -16 -11 -8 -8 -8 -8 -23
* -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 1
</pre>
<p>It sets these default lastal parameter values:
-a20 -b3</p>
</div>
<div class="section" id="pam30">
<h2>PAM30</h2>
<p>This protein scoring scheme is good for finding short-and-strong
<p>This protein scoring scheme is good for finding strong, and short,
similarities (MO Dayhoff et al. 1978).
It uses this matrix:</p>
<pre class="literal-block">
......
......@@ -107,8 +107,8 @@ It uses this matrix::
BL80 or BLOSUM80
----------------
This protein scoring scheme is good at finding somewhat
short-and-strong similarities. (S Henikoff & JG Henikoff, PNAS 1992
This protein scoring scheme is good for finding somewhat strong, and
short, similarities. (S Henikoff & JG Henikoff, PNAS 1992
89(22):10915-9).
It uses this matrix::
......@@ -193,10 +193,46 @@ It uses this matrix::
It sets these default lastal parameter values:
-a13 -b2
PAM10
-----
This protein scoring scheme is good for finding very strong, and
short, similarities (MO Dayhoff et al. 1978).
It uses this matrix::
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 7 -10 -7 -6 -10 -7 -5 -4 -11 -8 -9 -10 -8 -12 -4 -3 -3 -20 -11 -5 -6 -6 -6 -23
R -10 9 -9 -17 -11 -4 -15 -13 -4 -8 -12 -2 -7 -12 -7 -6 -10 -5 -14 -11 -11 -7 -9 -23
N -7 -9 9 -1 -17 -7 -5 -6 -2 -8 -10 -4 -15 -12 -9 -2 -5 -11 -7 -12 7 -6 -6 -23
D -6 -17 -1 8 -21 -6 0 -6 -7 -11 -19 -8 -17 -21 -12 -7 -8 -21 -17 -11 7 -1 -9 -23
C -10 -11 -17 -21 10 -20 -20 -13 -10 -9 -21 -20 -20 -19 -11 -6 -11 -22 -7 -9 -18 -20 -13 -23
Q -7 -4 -7 -6 -20 9 -1 -10 -2 -11 -8 -6 -7 -19 -6 -8 -9 -19 -18 -10 -6 7 -8 -23
E -5 -15 -5 0 -20 -1 8 -7 -9 -8 -13 -7 -10 -20 -9 -7 -9 -23 -11 -10 -1 7 -8 -23
G -4 -13 -6 -6 -13 -10 -7 7 -13 -17 -14 -10 -12 -12 -10 -4 -10 -21 -20 -9 -6 -8 -8 -23
H -11 -4 -2 -7 -10 -2 -9 -13 10 -13 -9 -10 -17 -9 -7 -9 -11 -10 -6 -9 -4 -4 -8 -23
I -8 -8 -8 -11 -9 -11 -8 -17 -13 9 -4 -9 -3 -5 -12 -10 -5 -20 -9 -1 -9 -9 -8 -23
L -9 -12 -10 -19 -21 -8 -13 -14 -9 -4 7 -11 -2 -5 -10 -12 -10 -9 -10 -5 -12 -10 -9 -23
K -10 -2 -4 -8 -20 -6 -7 -10 -10 -9 -11 7 -4 -20 -10 -7 -6 -18 -12 -13 -5 -6 -8 -23
M -8 -7 -15 -17 -20 -7 -10 -12 -17 -3 -2 -4 12 -7 -11 -8 -7 -19 -17 -4 -16 -8 -9 -23
F -12 -12 -12 -21 -19 -19 -20 -12 -9 -5 -5 -20 -7 9 -13 -9 -12 -7 -1 -12 -14 -20 -12 -23
P -4 -7 -9 -12 -11 -6 -9 -10 -7 -12 -10 -10 -11 -13 8 -4 -7 -20 -20 -9 -10 -7 -8 -23
S -3 -6 -2 -7 -6 -8 -7 -4 -9 -10 -12 -7 -8 -9 -4 7 -2 -8 -10 -10 -4 -8 -6 -23
T -3 -10 -5 -8 -11 -9 -9 -10 -11 -5 -10 -6 -7 -12 -7 -2 8 -19 -9 -6 -6 -9 -7 -23
W -20 -5 -11 -21 -22 -19 -23 -21 -10 -20 -9 -18 -19 -7 -20 -8 -19 13 -8 -22 -13 -21 -16 -23
Y -11 -14 -7 -17 -7 -18 -11 -20 -6 -9 -10 -12 -17 -1 -20 -10 -9 -8 10 -10 -9 -13 -11 -23
V -5 -11 -12 -11 -9 -10 -10 -9 -9 -1 -5 -13 -4 -12 -9 -10 -6 -22 -10 8 -11 -10 -8 -23
B -6 -11 7 7 -18 -6 -1 -6 -4 -9 -12 -5 -16 -14 -10 -4 -6 -13 -9 -11 7 -3 -8 -23
Z -6 -7 -6 -1 -20 7 7 -8 -4 -9 -10 -6 -8 -20 -7 -8 -9 -21 -13 -10 -3 7 -8 -23
X -6 -9 -6 -9 -13 -8 -8 -8 -8 -8 -9 -8 -9 -12 -8 -6 -7 -16 -11 -8 -8 -8 -8 -23
* -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 1
It sets these default lastal parameter values:
-a20 -b3
PAM30
-----
This protein scoring scheme is good for finding short-and-strong
This protein scoring scheme is good for finding strong, and short,
similarities (MO Dayhoff et al. 1978).
It uses this matrix::
......
......@@ -445,7 +445,9 @@ alignments: they are described in more detail at <a class="reference external" h
<td>Maximum expected alignments per square giga. (See <a class="reference external" href="last-evalues.html">here</a>.)</td></tr>
<tr><td class="option-group">
<kbd><span class="option">-s <var>NUMBER</var></span></kbd></td>
<td>Which query strand to use: 0=reverse, 1=forward, 2=both.</td></tr>
<td>Which query strand to use: 0=reverse, 1=forward, 2=both.
If specified, this parameter is written in last-train's
output, so it will override lastal's default.</td></tr>
<tr><td class="option-group">
<kbd><span class="option">-S <var>NUMBER</var></span></kbd></td>
<td><p class="first">Specify how to use the substitution score matrix for
......@@ -513,6 +515,36 @@ output, so it will override lastal's default.</p>
</blockquote>
</div>
</div>
<div class="section" id="details">
<h2>Details</h2>
<ul>
<li><p class="first">last-train (and lastal) uses &quot;Model A&quot;, in Figure 5A of <a class="reference external" href="https://doi.org/10.1093/bioinformatics/btz576">btz576</a>.</p>
</li>
<li><p class="first">last-train (and lastal) converts between path and alignment
parameters as in Supplementary Section 3.1 of <a class="reference external" href="https://doi.org/10.1093/bioinformatics/btz576">btz576</a>.</p>
</li>
<li><p class="first">last-train converts probabilities to scores using an arbitrary scale
factor. (If all the scores are multiplied by any positive constant,
it makes no difference to alignment.) This scale is determined by
the initial score parameters.</p>
</li>
<li><p class="first">last-train uses parameters with &quot;homogeneous letter probabilities&quot;
and &quot;balanced length probability&quot; (<a class="reference external" href="https://doi.org/10.1093/bioinformatics/btz576">btz576</a>).</p>
</li>
<li><p class="first">last-train rounds the scores to integers, which makes them slightly
inaccurate. It then finds an adjusted scale factor (without
changing the scores), which makes the integer-rounded scores
correspond to homogeneous letter probabilities and balanced length
probability. It writes this adjusted scale as a &quot;-t&quot; option for
lastal, e.g. &quot;-t4.4363&quot;.</p>
</li>
<li><p class="first">In rare cases, it may be impossible to find such an adjusted scale
factor. If that happens, last-train doubles the original scale (to
reduce the inaccuracy of integer rounding), until the problem goes
away.</p>
</li>
</ul>
</div>
<div class="section" id="bugs">
<h2>Bugs</h2>
<ul>
......
......@@ -82,6 +82,8 @@ Alignment options
-E EG2 Maximum expected alignments per square giga. (See `here
<last-evalues.html>`_.)
-s NUMBER Which query strand to use: 0=reverse, 1=forward, 2=both.
If specified, this parameter is written in last-train's
output, so it will override lastal's default.
-S NUMBER Specify how to use the substitution score matrix for
reverse strands. If you use ``--revsym``, this makes no
difference. "0" means that the matrix is used as-is for
......@@ -133,6 +135,36 @@ Alignment options
If specified, this parameter is written in last-train's
output, so it will override lastal's default.
Details
-------
* last-train (and lastal) uses "Model A", in Figure 5A of btz576_.
* last-train (and lastal) converts between path and alignment
parameters as in Supplementary Section 3.1 of btz576_.
* last-train converts probabilities to scores using an arbitrary scale
factor. (If all the scores are multiplied by any positive constant,
it makes no difference to alignment.) This scale is determined by
the initial score parameters.
* last-train uses parameters with "homogeneous letter probabilities"
and "balanced length probability" (btz576_).
* last-train rounds the scores to integers, which makes them slightly
inaccurate. It then finds an adjusted scale factor (without
changing the scores), which makes the integer-rounded scores
correspond to homogeneous letter probabilities and balanced length
probability. It writes this adjusted scale as a "-t" option for
lastal, e.g. "-t4.4363".
* In rare cases, it may be impossible to find such an adjusted scale
factor. If that happens, last-train doubles the original scale (to
reduce the inaccuracy of integer rounding), until the problem goes
away.
.. _btz576: https://doi.org/10.1093/bioinformatics/btz576
Bugs
----
......
......@@ -769,12 +769,14 @@ this core</p>
<p class="last">Use -w0 to turn this off.</p>
</td></tr>
<tr><td class="option-group">
<kbd><span class="option">-G <var>FILE</var></span></kbd></td>
<td>Use an alternative genetic code in the specified file. For an
example of the format, see vertebrateMito.gc in the examples
directory. By default, the standard genetic code is used. This
<kbd><span class="option">-G <var>GENETIC-CODE</var></span></kbd></td>
<td>Specify the genetic code for translating DNA to protein. This
option has no effect unless DNA-versus-protein alignment is
selected with option -F.</td></tr>
selected with option -F. Codes are specified by numbers
(e.g. 1 = standard, 2 = vertebrate mitochondrial), listed here:
<a class="reference external" href="https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi">https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi</a>. Any
other GENETIC-CODE is assumed to be a file name: for an example
of the format, see vertebrateMito.gc in the examples directory.</td></tr>
<tr><td class="option-group">
<kbd><span class="option">-t <var>TEMPERATURE</var></span></kbd></td>
<td>Parameter for converting between scores and probability ratios.
......@@ -824,9 +826,9 @@ with &quot;c&quot; for each alignment. The first 16 numbers on this line
are the expected counts of matches and mismatches: first the
count of reference As aligned to query As, then the count of
reference As aligned to query Cs, and so on. For proteins there
will be 400 such numbers. The final ten numbers are expected
will be 400 such numbers. The next 5 numbers are expected
counts related to gaps. They are:</p>
<ul>
<ul class="last">
<li><p class="first">The count of matches plus mismatches. (This may exceed the
total of the preceding numbers, if the sequences have non-ACGT
letters.)</p>
......@@ -839,20 +841,6 @@ letters.)</p>
</li>
<li><p class="first">The count of insert opens (= count of insert closes).</p>
</li>
<li><p class="first">The count of adjacent pairs of insertions and deletions.</p>
</li>
</ul>
<p>The final four numbers are always zero, unless you use
generalized affine gap costs. They are:</p>
<ul class="last">
<li><p class="first">The count of unaligned letter pairs.</p>
</li>
<li><p class="first">The count of unaligned letter pair opens (= count of closes).</p>
</li>
<li><p class="first">The count of adjacent pairs of deletions and unaligned letter pairs.</p>
</li>
<li><p class="first">The count of adjacent pairs of insertions and unaligned letter pairs.</p>
</li>
</ul>
</td></tr>
<tr><td class="option-group">
......
......@@ -400,12 +400,14 @@ Miscellaneous options
Use -w0 to turn this off.
-G FILE
Use an alternative genetic code in the specified file. For an
example of the format, see vertebrateMito.gc in the examples
directory. By default, the standard genetic code is used. This
-G GENETIC-CODE
Specify the genetic code for translating DNA to protein. This
option has no effect unless DNA-versus-protein alignment is
selected with option -F.
selected with option -F. Codes are specified by numbers
(e.g. 1 = standard, 2 = vertebrate mitochondrial), listed here:
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi. Any
other GENETIC-CODE is assumed to be a file name: for an example
of the format, see vertebrateMito.gc in the examples directory.
-t TEMPERATURE
Parameter for converting between scores and probability ratios.
......@@ -461,7 +463,7 @@ Miscellaneous options
are the expected counts of matches and mismatches: first the
count of reference As aligned to query As, then the count of
reference As aligned to query Cs, and so on. For proteins there
will be 400 such numbers. The final ten numbers are expected
will be 400 such numbers. The next 5 numbers are expected
counts related to gaps. They are:
* The count of matches plus mismatches. (This may exceed the
......@@ -471,15 +473,6 @@ Miscellaneous options
* The count of inserted letters.
* The count of delete opens (= count of delete closes).
* The count of insert opens (= count of insert closes).
* The count of adjacent pairs of insertions and deletions.
The final four numbers are always zero, unless you use
generalized affine gap costs. They are:
* The count of unaligned letter pairs.
* The count of unaligned letter pair opens (= count of closes).
* The count of adjacent pairs of deletions and unaligned letter pairs.
* The count of adjacent pairs of insertions and unaligned letter pairs.
-Q NUMBER
Specify how to read the query sequences::
......
This diff is collapsed.