Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • frediz/last-align
  • med-team/last-align
2 results
Show changes
Commits on Source (2)
2019-11-28 Martin C. Frith <Martin C. Frith>
* doc/last-train.txt, scripts/last-train, test/last-train-test.out:
train: double scale if integer-rounding is bad
[887c0e6339d1] [tip]
2019-11-27 Martin C. Frith <Martin C. Frith>
* scripts/last-train, test/last-train-test.out:
!Make last-train use balanced length probability
[fff603c39404]
* scripts/last-train, test/last-train-test.out:
train: change integer-rounding of gap open costs
[d4e0fcd8cc2b]
2019-11-26 Martin C. Frith <Martin C. Frith>
* src/LastalArguments.cc, src/lastal.cc, test/last-test.out:
!Change lastal default gapless score threshold
[24e288b4947c]
* scripts/last-train:
Refactor
[4a486c301cd5]
* scripts/last-train:
Refactor
[932a5e822d1c]
2019-11-21 Martin C. Frith <Martin C. Frith>
* scripts/last-train:
Refactor
[037daaa2fe00]
* scripts/last-train:
Refactor
[ad8ed4200d8c]
* scripts/last-train:
Refactor
[87e4bbb5b03f]
2019-11-18 Martin C. Frith <Martin C. Frith>
* src/GappedXdropAligner.cc, src/GappedXdropAligner.hh,
src/GappedXdropAligner2qual.cc, src/GappedXdropAligner3frame.cc,
src/GappedXdropAligner3framePssm.cc, src/GappedXdropAlignerInl.hh,
src/GappedXdropAlignerPssm.cc, src/mcf_simd.hh:
Refactor
[f0bc43d74861]
* src/GappedXdropAligner.cc, src/GappedXdropAlignerInl.hh,
src/GappedXdropAlignerPssm.cc, src/ScoreMatrix.cc,
src/ScoreMatrix.hh, src/mcf_simd.hh:
Refactor
[f164ba36017b]
* scripts/last-train, src/Alignment.cc, src/Centroid.cc,
src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, test/last-test.out:
Remove always-zero counts from lastal -j7
[43344af70eab]
2019-11-15 Martin C. Frith <Martin C. Frith>
* src/Centroid.cc:
Re-order some probability calculations
[f46e1c98977c]
* doc/lastal.txt, examples/multiMito.maf, scripts/last-train,
src/Alignment.cc, src/Centroid.cc, src/Centroid.hh, test/last-
test.out, test/last-train-test.out, test/maf-swap-test.out:
!Get lastal probabilities from "model A"
[033e6b4b0308]
* src/Centroid.cc:
Re-order probability calculations
[b97eb8aee1f7]
2019-11-14 Martin C. Frith <Martin C. Frith>
* src/Centroid.cc:
Refactor (better variable names)
[54db8ec6938c]
* doc/lastal.txt, src/Alignment.cc, src/Centroid.cc, src/Centroid.hh,
src/LastalArguments.cc:
Remove probabilities for generalized affine gaps
[2e10e14cc66d]
2019-11-12 Martin C. Frith <Martin C. Frith>
* src/GappedXdropAlignerPssm.cc:
Make fastq gapped alignment faster (SIMD)!
[6afae8dec8f0]
* src/GappedXdropAlignerPssm.cc:
Refactor
[168d611f79ab]
* src/GappedXdropAlignerPssm.cc, test/last-test.out, test/last-
test.sh:
Refactor
[bdc3abede08a]
* makefile, src/Centroid.cc, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner2qual.cc,
src/GappedXdropAlignerPssm.cc, src/makefile, src/mcf_simd.hh:
Make fasta gapped alignment faster (SIMD)!
[2a74b539aecf]
2019-11-11 Martin C. Frith <Martin C. Frith>
* src/GappedXdropAligner.cc:
Refactor
[92f84eaa2c77]
* src/Centroid.cc, src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/makefile,
src/mcf_contiguous_queue.hh:
Refactor (not sure if useful)
[f1192efe0bec]
* src/Alignment.cc, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner2qual.cc,
src/GappedXdropAlignerPssm.cc, src/GreedyXdropAligner.cc,
src/GreedyXdropAligner.hh:
Refactor
[de3177d4a626]
* src/GappedXdropAligner.cc, src/GappedXdropAligner.hh,
src/GappedXdropAligner2qual.cc, src/GappedXdropAligner3frame.cc,
src/GappedXdropAligner3framePssm.cc, src/GappedXdropAlignerPssm.cc:
Refactor
[b726e939f94c]
* src/GappedXdropAligner.cc, src/GappedXdropAligner.hh,
src/GappedXdropAligner2qual.cc, src/GappedXdropAlignerPssm.cc:
Refactor
[1587261f2469]
* src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner2qual.cc,
src/GappedXdropAligner3frame.cc,
src/GappedXdropAligner3framePssm.cc, src/GappedXdropAlignerInl.hh,
src/GappedXdropAlignerPssm.cc:
Refactor (std::size_t -> size_t)
[0cdc7da1019d]
* src/Centroid.cc, src/Centroid.hh, src/GappedXdropAligner.cc,
src/GappedXdropAligner.hh, src/GappedXdropAligner3frame.cc,
src/alp/njn_random.cpp, src/makefile:
Make lastal gapped alignment a bit faster
[5506d2756b47]
* src/LastEvaluer.hh, test/last-test.out, test/last-test.sh:
Fix lastal crash on empty lastdb database
[e5c59d1ed3a6]
2019-10-09 Martin C. Frith <Martin C. Frith>
* data/BL80.mat, data/PAM10.mat, data/PAM30.mat:
Add PAM10 substitution score matrix
[e57e547b8235]
2019-10-08 Martin C. Frith <Martin C. Frith>
* src/lastal.cc, src/lastdb.cc:
Omit non-alphabet letters for DB seq lengths
[7f841cc97f20]
* src/SubsetMinimizerFinder.cc, src/SubsetMinimizerFinder.hh,
src/SubsetSuffixArray.cc, src/lastal.cc:
Refactor
[4011a21f96ad]
2019-10-07 Martin C. Frith <Martin C. Frith>
* src/ScoreMatrix.cc, src/ScoreMatrix.hh, src/lastal.cc,
src/mcf_substitution_matrix_stats.cc,
src/mcf_substitution_matrix_stats.hh:
Allow score matrix with letter frequencies
[50bcc51ca59b]
* src/CyclicSubsetSeed.hh, src/ScoreMatrix.cc, src/ScoreMatrix.hh,
src/ScoreMatrixRow.hh, src/makefile:
Refactor
[ca6613dd1f06]
* src/LastEvaluer.cc, src/LastEvaluer.hh, src/lastal.cc:
Enable stored E-value params for any genetic code
[f757e6ca4edb]
* build/gc-inc.sh, data/gc.prt, doc/lastal.txt, src/GeneticCode.cc,
src/GeneticCode.hh, src/LastalArguments.cc, src/lastal.cc,
src/makefile, test/last-test.out, test/last-test.sh:
Add NCBI genetic codes
[88adecc284af]
* doc/last-evalues.txt, src/LastEvaluer.cc, src/LastEvaluer.hh,
src/LastalArguments.cc, src/LastalArguments.hh, src/lastal.cc,
src/lastdb.cc, test/last-test.out:
Improve E-values for multiple short DB sequences
[49caf55d4d4d]
2019-08-21 Martin C. Frith <Martin C. Frith> 2019-08-21 Martin C. Frith <Martin C. Frith>
* doc/last-dotplot.txt, scripts/last-dotplot: * doc/last-dotplot.txt, scripts/last-dotplot:
dotplot: change satellite-repeat color to purple dotplot: change satellite-repeat color to purple
[fb43f850eba2] [tip] [fb43f850eba2]
2019-07-16 Martin C. Frith <Martin C. Frith> 2019-07-16 Martin C. Frith <Martin C. Frith>
......
#! /bin/sh
# This generates source code from genetic codes.
cat <<EOF
const struct {
const char *name;
const char *text;
} geneticCodes[] = {
EOF
cat "$@" | tr -d '",' |
awk '
$1 == "id" {print "{\"" $2 "\", \"\\"}
$1 == "ncbieaa" {print " AAs = " $2 "\\n\\"}
/-- Base/ {print $2 " = " $3 "\\n\\"}
/-- Base3/ {print "\"},"}
'
echo "};"
# This protein scoring scheme is good at finding somewhat # This protein scoring scheme is good for finding somewhat strong, and
# short-and-strong similarities. (S Henikoff & JG Henikoff, PNAS 1992 # short, similarities. (S Henikoff & JG Henikoff, PNAS 1992
# 89(22):10915-9). # 89(22):10915-9).
#nickname BLOSUM80 #nickname BLOSUM80
......
# This protein scoring scheme is good for finding very strong, and
# short, similarities (MO Dayhoff et al. 1978).
#last -a20 -b3
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 7 -10 -7 -6 -10 -7 -5 -4 -11 -8 -9 -10 -8 -12 -4 -3 -3 -20 -11 -5 -6 -6 -6 -23
R -10 9 -9 -17 -11 -4 -15 -13 -4 -8 -12 -2 -7 -12 -7 -6 -10 -5 -14 -11 -11 -7 -9 -23
N -7 -9 9 -1 -17 -7 -5 -6 -2 -8 -10 -4 -15 -12 -9 -2 -5 -11 -7 -12 7 -6 -6 -23
D -6 -17 -1 8 -21 -6 0 -6 -7 -11 -19 -8 -17 -21 -12 -7 -8 -21 -17 -11 7 -1 -9 -23
C -10 -11 -17 -21 10 -20 -20 -13 -10 -9 -21 -20 -20 -19 -11 -6 -11 -22 -7 -9 -18 -20 -13 -23
Q -7 -4 -7 -6 -20 9 -1 -10 -2 -11 -8 -6 -7 -19 -6 -8 -9 -19 -18 -10 -6 7 -8 -23
E -5 -15 -5 0 -20 -1 8 -7 -9 -8 -13 -7 -10 -20 -9 -7 -9 -23 -11 -10 -1 7 -8 -23
G -4 -13 -6 -6 -13 -10 -7 7 -13 -17 -14 -10 -12 -12 -10 -4 -10 -21 -20 -9 -6 -8 -8 -23
H -11 -4 -2 -7 -10 -2 -9 -13 10 -13 -9 -10 -17 -9 -7 -9 -11 -10 -6 -9 -4 -4 -8 -23
I -8 -8 -8 -11 -9 -11 -8 -17 -13 9 -4 -9 -3 -5 -12 -10 -5 -20 -9 -1 -9 -9 -8 -23
L -9 -12 -10 -19 -21 -8 -13 -14 -9 -4 7 -11 -2 -5 -10 -12 -10 -9 -10 -5 -12 -10 -9 -23
K -10 -2 -4 -8 -20 -6 -7 -10 -10 -9 -11 7 -4 -20 -10 -7 -6 -18 -12 -13 -5 -6 -8 -23
M -8 -7 -15 -17 -20 -7 -10 -12 -17 -3 -2 -4 12 -7 -11 -8 -7 -19 -17 -4 -16 -8 -9 -23
F -12 -12 -12 -21 -19 -19 -20 -12 -9 -5 -5 -20 -7 9 -13 -9 -12 -7 -1 -12 -14 -20 -12 -23
P -4 -7 -9 -12 -11 -6 -9 -10 -7 -12 -10 -10 -11 -13 8 -4 -7 -20 -20 -9 -10 -7 -8 -23
S -3 -6 -2 -7 -6 -8 -7 -4 -9 -10 -12 -7 -8 -9 -4 7 -2 -8 -10 -10 -4 -8 -6 -23
T -3 -10 -5 -8 -11 -9 -9 -10 -11 -5 -10 -6 -7 -12 -7 -2 8 -19 -9 -6 -6 -9 -7 -23
W -20 -5 -11 -21 -22 -19 -23 -21 -10 -20 -9 -18 -19 -7 -20 -8 -19 13 -8 -22 -13 -21 -16 -23
Y -11 -14 -7 -17 -7 -18 -11 -20 -6 -9 -10 -12 -17 -1 -20 -10 -9 -8 10 -10 -9 -13 -11 -23
V -5 -11 -12 -11 -9 -10 -10 -9 -9 -1 -5 -13 -4 -12 -9 -10 -6 -22 -10 8 -11 -10 -8 -23
B -6 -11 7 7 -18 -6 -1 -6 -4 -9 -12 -5 -16 -14 -10 -4 -6 -13 -9 -11 7 -3 -8 -23
Z -6 -7 -6 -1 -20 7 7 -8 -4 -9 -10 -6 -8 -20 -7 -8 -9 -21 -13 -10 -3 7 -8 -23
X -6 -9 -6 -9 -13 -8 -8 -8 -8 -8 -9 -8 -9 -12 -8 -6 -7 -16 -11 -8 -8 -8 -8 -23
* -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 1
# This protein scoring scheme is good for finding short-and-strong # This protein scoring scheme is good for finding strong, and short,
# similarities (MO Dayhoff et al. 1978). # similarities (MO Dayhoff et al. 1978).
#last -a13 -b3 #last -a13 -b3
......
--**************************************************************************
-- This is the NCBI genetic code table
-- Initial base data set from Andrzej Elzanowski while at PIR International
-- Addition of Eubacterial and Alternative Yeast by J.Ostell at NCBI
-- Base 1-3 of each codon have been added as comments to facilitate
-- readability at the suggestion of Peter Rice, EMBL
-- Later additions by Taxonomy Group staff at NCBI
--
-- Version 4.5
-- Added Cephalodiscidae mitochondrial genetic code 33
--
-- Version 4.4
-- Added GTG as start codon for genetic code 3
-- Added Balanophoraceae plastid genetic code 32
--
-- Version 4.3
-- Change to CTG -> Leu in genetic codes 27, 28, 29, 30
--
-- Version 4.2
-- Added Karyorelict nuclear genetic code 27
-- Added Condylostoma nuclear genetic code 28
-- Added Mesodinium nuclear genetic code 29
-- Added Peritrich nuclear genetic code 30
-- Added Blastocrithidia nuclear genetic code 31
--
-- Version 4.1
-- Added Pachysolen tannophilus nuclear genetic code 26
--
-- Version 4.0
-- Updated version to reflect numerous undocumented changes:
-- Corrected start codons for genetic code 25
-- Name of new genetic code is Candidate Division SR1 and Gracilibacteria
-- Added candidate division SR1 nuclear genetic code 25
-- Added GTG as start codon for genetic code 24
-- Corrected Pterobranchia Mitochondrial genetic code (24)
-- Added genetic code 24, Pterobranchia Mitochondrial
-- Genetic code 11 is now Bacterial, Archaeal and Plant Plastid
-- Fixed capitalization of mitochondrial in codes 22 and 23
-- Added GTG, ATA, and TTG as alternative start codons to code 13
--
-- Version 3.9
-- Code 14 differs from code 9 only by translating UAA to Tyr rather than
-- STOP. A recent study (Telford et al, 2000) has found no evidence that
-- the codon UAA codes for Tyr in the flatworms, but other opinions exist.
-- There are very few GenBank records that are translated with code 14,
-- but a test translation shows that retranslating these records with code
-- 9 can cause premature terminations. Therefore, GenBank will maintain
-- code 14 until further information becomes available.
--
-- Version 3.8
-- Added GTG start to Echinoderm mitochondrial code, code 9
--
-- Version 3.7
-- Added code 23 Thraustochytrium mitochondrial code
-- formerly OGMP code 93
-- submitted by Gertraude Berger, Ph.D.
--
-- Version 3.6
-- Added code 22 TAG-Leu, TCA-stop
-- found in mitochondrial DNA of Scenedesmus obliquus
-- submitted by Gertraude Berger, Ph.D.
-- Organelle Genome Megasequencing Program, Univ Montreal
--
-- Version 3.5
-- Added code 21, Trematode Mitochondrial
-- (as deduced from: Garey & Wolstenholme,1989; Ohama et al, 1990)
-- Added code 16, Chlorophycean Mitochondrial
-- (TAG can translated to Leucine instaed to STOP in chlorophyceans
-- and fungi)
--
-- Version 3.4
-- Added CTG,TTG as allowed alternate start codons in Standard code.
-- Prats et al. 1989, Hann et al. 1992
--
-- Version 3.3 - 10/13/95
-- Added alternate intiation codon ATC to code 5
-- based on complete mitochondrial genome of honeybee
-- Crozier and Crozier (1993)
--
-- Version 3.2 - 6/24/95
-- Code Comments
-- 10 Alternative Ciliate Macronuclear renamed to Euplotid Macro...
-- 15 Blepharisma Macro.. code added
-- 5 Invertebrate Mito.. GTG allowed as alternate initiator
-- 11 Eubacterial renamed to Bacterial as most alternate starts
-- have been found in Archea
--
--
-- Version 3.1 - 1995
-- Updated as per Andrzej Elzanowski at NCBI
-- Complete documentation in NCBI toolkit documentation
-- Note: 2 genetic codes have been deleted
--
-- Old id Use id - Notes
--
-- id 7 id 4 - Kinetoplast code now merged in code id 4
-- id 8 id 1 - all plant chloroplast differences due to RNA edit
--
--
--*************************************************************************
Genetic-code-table ::= {
{
name "Standard" ,
name "SGC0" ,
id 1 ,
ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------**--*----M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Vertebrate Mitochondrial" ,
name "SGC1" ,
id 2 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG",
sncbieaa "----------**--------------------MMMM----------**---M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Yeast Mitochondrial" ,
name "SGC2" ,
id 3 ,
ncbieaa "FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**----------------------MM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate
Mitochondrial; Mycoplasma; Spiroplasma" ,
name "SGC3" ,
id 4 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--MM------**-------M------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Invertebrate Mitochondrial" ,
name "SGC4" ,
id 5 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG",
sncbieaa "---M------**--------------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear" ,
name "SGC5" ,
id 6 ,
ncbieaa "FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Echinoderm Mitochondrial; Flatworm Mitochondrial" ,
name "SGC8" ,
id 9 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Euplotid Nuclear" ,
name "SGC9" ,
id 10 ,
ncbieaa "FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Bacterial, Archaeal and Plant Plastid" ,
id 11 ,
ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------**--*----M------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Alternative Yeast Nuclear" ,
id 12 ,
ncbieaa "FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**--*----M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Ascidian Mitochondrial" ,
id 13 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG",
sncbieaa "---M------**----------------------MM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
name "Alternative Flatworm Mitochondrial" ,
id 14 ,
ncbieaa "FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG",
sncbieaa "-----------*-----------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Blepharisma Macronuclear" ,
id 15 ,
ncbieaa "FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------*---*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Chlorophycean Mitochondrial" ,
id 16 ,
ncbieaa "FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------*---*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Trematode Mitochondrial" ,
id 21 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Scenedesmus obliquus Mitochondrial" ,
id 22 ,
ncbieaa "FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "------*---*---*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Thraustochytrium Mitochondrial" ,
id 23 ,
ncbieaa "FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--*-------**--*-----------------M--M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Pterobranchia Mitochondrial" ,
id 24 ,
ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG",
sncbieaa "---M------**-------M---------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Candidate Division SR1 and Gracilibacteria" ,
id 25 ,
ncbieaa "FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------**-----------------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Pachysolen tannophilus Nuclear" ,
id 26 ,
ncbieaa "FFLLSSSSYY**CC*WLLLAPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**--*----M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Karyorelict Nuclear" ,
id 27 ,
ncbieaa "FFLLSSSSYYQQCCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Condylostoma Nuclear" ,
id 28 ,
ncbieaa "FFLLSSSSYYQQCCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**--*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Mesodinium Nuclear" ,
id 29 ,
ncbieaa "FFLLSSSSYYYYCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Peritrich Nuclear" ,
id 30 ,
ncbieaa "FFLLSSSSYYEECC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Blastocrithidia Nuclear" ,
id 31 ,
ncbieaa "FFLLSSSSYYEECCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "----------**-----------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Balanophoraceae Plastid" ,
id 32 ,
ncbieaa "FFLLSSSSYY*WCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M------*---*----M------------MMMM---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
} ,
{
name "Cephalodiscidae Mitochondrial" ,
id 33 ,
ncbieaa "FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG",
sncbieaa "---M-------*-------M---------------M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}
}
...@@ -334,7 +334,7 @@ expected number of alignments with greater or equal score, between two ...@@ -334,7 +334,7 @@ expected number of alignments with greater or equal score, between two
randomly-shuffled sequences of length 1 billion each.</p> randomly-shuffled sequences of length 1 billion each.</p>
<p><strong>E</strong> is the expected number of alignments with greater or equal <p><strong>E</strong> is the expected number of alignments with greater or equal
score, between: a random sequence with the same length as the query score, between: a random sequence with the same length as the query
sequence, and a random sequence with the same length as the database.</p> sequence, and random sequences with the same length as the database.</p>
<p>In this example, the two alignments are identical, but their &quot;E&quot; <p>In this example, the two alignments are identical, but their &quot;E&quot;
values are not. That is because the query sequences have different values are not. That is because the query sequences have different
lengths: human.chrX is longer than human.chrY.</p> lengths: human.chrX is longer than human.chrY.</p>
...@@ -375,7 +375,7 @@ are fixed at 0.25.</p> ...@@ -375,7 +375,7 @@ are fixed at 0.25.</p>
<h3>Calculation of EG2</h3> <h3>Calculation of EG2</h3>
<p>EG2 is calculated from the score like this:</p> <p>EG2 is calculated from the score like this:</p>
<pre class="literal-block"> <pre class="literal-block">
EG2 = K * (1 billion) * (1 billion) * exp(-lambda * score) EG2 = K * exp(-lambda * score) * (1 billion) * (1 billion)
</pre> </pre>
<p>The parameters lambda and K are printed in the header of lastal's <p>The parameters lambda and K are printed in the header of lastal's
output.</p> output.</p>
...@@ -384,21 +384,30 @@ output.</p> ...@@ -384,21 +384,30 @@ output.</p>
<h3>Calculation of E</h3> <h3>Calculation of E</h3>
<p>E is calculated by this formula:</p> <p>E is calculated by this formula:</p>
<pre class="literal-block"> <pre class="literal-block">
E = K * area * exp(-lambda * score) * (number of strands) E = K * exp(-lambda * score) * strands * area(m, n, score) * N / n,
</pre> </pre>
<p>This depends on the number of strands that are searched, either 1 or <p>where:</p>
2.</p> <ul>
<p>The area is approximately (database length) * (query length). However, <li><p class="first"><tt class="docutils literal">strands</tt> is either 1 or 2 (if forward and reverse DNA strands are
if the query (or database) is very short, this formula over-estimates searched).</p>
E. LAST uses a &quot;finite-size correction&quot; to calculate E more </li>
accurately.</p> <li><p class="first"><tt class="docutils literal">m</tt> is the length of the query sequence.</p>
</li>
<li><p class="first"><tt class="docutils literal">n</tt> is the length of the longest sequence in the database.</p>
</li>
<li><p class="first"><tt class="docutils literal">N</tt> is the total length of all database sequences.</p>
</li>
<li><p class="first"><tt class="docutils literal">area(m, n, score)</tt> is <a class="reference external" href="https://doi.org/10.1186/1756-0500-5-286">slightly less than</a> <tt class="docutils literal">m * n</tt>.</p>
</li>
</ul>
</div> </div>
<div class="section" id="effect-of-option-d"> <div class="section" id="effect-of-option-d">
<h3>Effect of option -D</h3> <h3>Effect of option -D</h3>
<p>Option -D indirectly sets the EG2 threshold, via this formula:</p> <p>Option -D sets the minimum alignment score, by this formula:</p>
<pre class="literal-block"> <pre class="literal-block">
1 = EG2 * (database length)/1e9 * (option -D)/1e9 * (number of strands) K * exp(-lambda * score) * strands * D * length(n, score) * N / n ≤ 1,
</pre> </pre>
<p>where <tt class="docutils literal">length(n, score)</tt> is slightly less than <tt class="docutils literal">n</tt>.</p>
</div> </div>
<div class="section" id="bit-score"> <div class="section" id="bit-score">
<h3>Bit score</h3> <h3>Bit score</h3>
......
...@@ -18,7 +18,7 @@ randomly-shuffled sequences of length 1 billion each. ...@@ -18,7 +18,7 @@ randomly-shuffled sequences of length 1 billion each.
**E** is the expected number of alignments with greater or equal **E** is the expected number of alignments with greater or equal
score, between: a random sequence with the same length as the query score, between: a random sequence with the same length as the query
sequence, and a random sequence with the same length as the database. sequence, and random sequences with the same length as the database.
In this example, the two alignments are identical, but their "E" In this example, the two alignments are identical, but their "E"
values are not. That is because the query sequences have different values are not. That is because the query sequences have different
...@@ -67,7 +67,7 @@ Calculation of EG2 ...@@ -67,7 +67,7 @@ Calculation of EG2
EG2 is calculated from the score like this:: EG2 is calculated from the score like this::
EG2 = K * (1 billion) * (1 billion) * exp(-lambda * score) EG2 = K * exp(-lambda * score) * (1 billion) * (1 billion)
The parameters lambda and K are printed in the header of lastal's The parameters lambda and K are printed in the header of lastal's
output. output.
...@@ -77,22 +77,26 @@ Calculation of E ...@@ -77,22 +77,26 @@ Calculation of E
E is calculated by this formula:: E is calculated by this formula::
E = K * area * exp(-lambda * score) * (number of strands) E = K * exp(-lambda * score) * strands * area(m, n, score) * N / n,
This depends on the number of strands that are searched, either 1 or where:
2.
The area is approximately (database length) * (query length). However, * ``strands`` is either 1 or 2 (if forward and reverse DNA strands are
if the query (or database) is very short, this formula over-estimates searched).
E. LAST uses a "finite-size correction" to calculate E more * ``m`` is the length of the query sequence.
accurately. * ``n`` is the length of the longest sequence in the database.
* ``N`` is the total length of all database sequences.
* ``area(m, n, score)`` is `slightly less than
<https://doi.org/10.1186/1756-0500-5-286>`_ ``m * n``.
Effect of option -D Effect of option -D
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Option -D indirectly sets the EG2 threshold, via this formula:: Option -D sets the minimum alignment score, by this formula::
1 = EG2 * (database length)/1e9 * (option -D)/1e9 * (number of strands) K * exp(-lambda * score) * strands * D * length(n, score) * N / n ≤ 1,
where ``length(n, score)`` is slightly less than ``n``.
Bit score Bit score
~~~~~~~~~ ~~~~~~~~~
......
...@@ -419,8 +419,8 @@ X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -4 ...@@ -419,8 +419,8 @@ X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -4
</div> </div>
<div class="section" id="bl80-or-blosum80"> <div class="section" id="bl80-or-blosum80">
<h2>BL80 or BLOSUM80</h2> <h2>BL80 or BLOSUM80</h2>
<p>This protein scoring scheme is good at finding somewhat <p>This protein scoring scheme is good for finding somewhat strong, and
short-and-strong similarities. (S Henikoff &amp; JG Henikoff, PNAS 1992 short, similarities. (S Henikoff &amp; JG Henikoff, PNAS 1992
89(22):10915-9). 89(22):10915-9).
It uses this matrix:</p> It uses this matrix:</p>
<pre class="literal-block"> <pre class="literal-block">
...@@ -504,9 +504,44 @@ X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -6 -6 ...@@ -504,9 +504,44 @@ X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -6 -6
<p>It sets these default lastal parameter values: <p>It sets these default lastal parameter values:
-a13 -b2</p> -a13 -b2</p>
</div> </div>
<div class="section" id="pam10">
<h2>PAM10</h2>
<p>This protein scoring scheme is good for finding very strong, and
short, similarities (MO Dayhoff et al. 1978).
It uses this matrix:</p>
<pre class="literal-block">
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 7 -10 -7 -6 -10 -7 -5 -4 -11 -8 -9 -10 -8 -12 -4 -3 -3 -20 -11 -5 -6 -6 -6 -23
R -10 9 -9 -17 -11 -4 -15 -13 -4 -8 -12 -2 -7 -12 -7 -6 -10 -5 -14 -11 -11 -7 -9 -23
N -7 -9 9 -1 -17 -7 -5 -6 -2 -8 -10 -4 -15 -12 -9 -2 -5 -11 -7 -12 7 -6 -6 -23
D -6 -17 -1 8 -21 -6 0 -6 -7 -11 -19 -8 -17 -21 -12 -7 -8 -21 -17 -11 7 -1 -9 -23
C -10 -11 -17 -21 10 -20 -20 -13 -10 -9 -21 -20 -20 -19 -11 -6 -11 -22 -7 -9 -18 -20 -13 -23
Q -7 -4 -7 -6 -20 9 -1 -10 -2 -11 -8 -6 -7 -19 -6 -8 -9 -19 -18 -10 -6 7 -8 -23
E -5 -15 -5 0 -20 -1 8 -7 -9 -8 -13 -7 -10 -20 -9 -7 -9 -23 -11 -10 -1 7 -8 -23
G -4 -13 -6 -6 -13 -10 -7 7 -13 -17 -14 -10 -12 -12 -10 -4 -10 -21 -20 -9 -6 -8 -8 -23
H -11 -4 -2 -7 -10 -2 -9 -13 10 -13 -9 -10 -17 -9 -7 -9 -11 -10 -6 -9 -4 -4 -8 -23
I -8 -8 -8 -11 -9 -11 -8 -17 -13 9 -4 -9 -3 -5 -12 -10 -5 -20 -9 -1 -9 -9 -8 -23
L -9 -12 -10 -19 -21 -8 -13 -14 -9 -4 7 -11 -2 -5 -10 -12 -10 -9 -10 -5 -12 -10 -9 -23
K -10 -2 -4 -8 -20 -6 -7 -10 -10 -9 -11 7 -4 -20 -10 -7 -6 -18 -12 -13 -5 -6 -8 -23
M -8 -7 -15 -17 -20 -7 -10 -12 -17 -3 -2 -4 12 -7 -11 -8 -7 -19 -17 -4 -16 -8 -9 -23
F -12 -12 -12 -21 -19 -19 -20 -12 -9 -5 -5 -20 -7 9 -13 -9 -12 -7 -1 -12 -14 -20 -12 -23
P -4 -7 -9 -12 -11 -6 -9 -10 -7 -12 -10 -10 -11 -13 8 -4 -7 -20 -20 -9 -10 -7 -8 -23
S -3 -6 -2 -7 -6 -8 -7 -4 -9 -10 -12 -7 -8 -9 -4 7 -2 -8 -10 -10 -4 -8 -6 -23
T -3 -10 -5 -8 -11 -9 -9 -10 -11 -5 -10 -6 -7 -12 -7 -2 8 -19 -9 -6 -6 -9 -7 -23
W -20 -5 -11 -21 -22 -19 -23 -21 -10 -20 -9 -18 -19 -7 -20 -8 -19 13 -8 -22 -13 -21 -16 -23
Y -11 -14 -7 -17 -7 -18 -11 -20 -6 -9 -10 -12 -17 -1 -20 -10 -9 -8 10 -10 -9 -13 -11 -23
V -5 -11 -12 -11 -9 -10 -10 -9 -9 -1 -5 -13 -4 -12 -9 -10 -6 -22 -10 8 -11 -10 -8 -23
B -6 -11 7 7 -18 -6 -1 -6 -4 -9 -12 -5 -16 -14 -10 -4 -6 -13 -9 -11 7 -3 -8 -23
Z -6 -7 -6 -1 -20 7 7 -8 -4 -9 -10 -6 -8 -20 -7 -8 -9 -21 -13 -10 -3 7 -8 -23
X -6 -9 -6 -9 -13 -8 -8 -8 -8 -8 -9 -8 -9 -12 -8 -6 -7 -16 -11 -8 -8 -8 -8 -23
* -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 1
</pre>
<p>It sets these default lastal parameter values:
-a20 -b3</p>
</div>
<div class="section" id="pam30"> <div class="section" id="pam30">
<h2>PAM30</h2> <h2>PAM30</h2>
<p>This protein scoring scheme is good for finding short-and-strong <p>This protein scoring scheme is good for finding strong, and short,
similarities (MO Dayhoff et al. 1978). similarities (MO Dayhoff et al. 1978).
It uses this matrix:</p> It uses this matrix:</p>
<pre class="literal-block"> <pre class="literal-block">
......
...@@ -107,8 +107,8 @@ It uses this matrix:: ...@@ -107,8 +107,8 @@ It uses this matrix::
BL80 or BLOSUM80 BL80 or BLOSUM80
---------------- ----------------
This protein scoring scheme is good at finding somewhat This protein scoring scheme is good for finding somewhat strong, and
short-and-strong similarities. (S Henikoff & JG Henikoff, PNAS 1992 short, similarities. (S Henikoff & JG Henikoff, PNAS 1992
89(22):10915-9). 89(22):10915-9).
It uses this matrix:: It uses this matrix::
...@@ -193,10 +193,46 @@ It uses this matrix:: ...@@ -193,10 +193,46 @@ It uses this matrix::
It sets these default lastal parameter values: It sets these default lastal parameter values:
-a13 -b2 -a13 -b2
PAM10
-----
This protein scoring scheme is good for finding very strong, and
short, similarities (MO Dayhoff et al. 1978).
It uses this matrix::
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 7 -10 -7 -6 -10 -7 -5 -4 -11 -8 -9 -10 -8 -12 -4 -3 -3 -20 -11 -5 -6 -6 -6 -23
R -10 9 -9 -17 -11 -4 -15 -13 -4 -8 -12 -2 -7 -12 -7 -6 -10 -5 -14 -11 -11 -7 -9 -23
N -7 -9 9 -1 -17 -7 -5 -6 -2 -8 -10 -4 -15 -12 -9 -2 -5 -11 -7 -12 7 -6 -6 -23
D -6 -17 -1 8 -21 -6 0 -6 -7 -11 -19 -8 -17 -21 -12 -7 -8 -21 -17 -11 7 -1 -9 -23
C -10 -11 -17 -21 10 -20 -20 -13 -10 -9 -21 -20 -20 -19 -11 -6 -11 -22 -7 -9 -18 -20 -13 -23
Q -7 -4 -7 -6 -20 9 -1 -10 -2 -11 -8 -6 -7 -19 -6 -8 -9 -19 -18 -10 -6 7 -8 -23
E -5 -15 -5 0 -20 -1 8 -7 -9 -8 -13 -7 -10 -20 -9 -7 -9 -23 -11 -10 -1 7 -8 -23
G -4 -13 -6 -6 -13 -10 -7 7 -13 -17 -14 -10 -12 -12 -10 -4 -10 -21 -20 -9 -6 -8 -8 -23
H -11 -4 -2 -7 -10 -2 -9 -13 10 -13 -9 -10 -17 -9 -7 -9 -11 -10 -6 -9 -4 -4 -8 -23
I -8 -8 -8 -11 -9 -11 -8 -17 -13 9 -4 -9 -3 -5 -12 -10 -5 -20 -9 -1 -9 -9 -8 -23
L -9 -12 -10 -19 -21 -8 -13 -14 -9 -4 7 -11 -2 -5 -10 -12 -10 -9 -10 -5 -12 -10 -9 -23
K -10 -2 -4 -8 -20 -6 -7 -10 -10 -9 -11 7 -4 -20 -10 -7 -6 -18 -12 -13 -5 -6 -8 -23
M -8 -7 -15 -17 -20 -7 -10 -12 -17 -3 -2 -4 12 -7 -11 -8 -7 -19 -17 -4 -16 -8 -9 -23
F -12 -12 -12 -21 -19 -19 -20 -12 -9 -5 -5 -20 -7 9 -13 -9 -12 -7 -1 -12 -14 -20 -12 -23
P -4 -7 -9 -12 -11 -6 -9 -10 -7 -12 -10 -10 -11 -13 8 -4 -7 -20 -20 -9 -10 -7 -8 -23
S -3 -6 -2 -7 -6 -8 -7 -4 -9 -10 -12 -7 -8 -9 -4 7 -2 -8 -10 -10 -4 -8 -6 -23
T -3 -10 -5 -8 -11 -9 -9 -10 -11 -5 -10 -6 -7 -12 -7 -2 8 -19 -9 -6 -6 -9 -7 -23
W -20 -5 -11 -21 -22 -19 -23 -21 -10 -20 -9 -18 -19 -7 -20 -8 -19 13 -8 -22 -13 -21 -16 -23
Y -11 -14 -7 -17 -7 -18 -11 -20 -6 -9 -10 -12 -17 -1 -20 -10 -9 -8 10 -10 -9 -13 -11 -23
V -5 -11 -12 -11 -9 -10 -10 -9 -9 -1 -5 -13 -4 -12 -9 -10 -6 -22 -10 8 -11 -10 -8 -23
B -6 -11 7 7 -18 -6 -1 -6 -4 -9 -12 -5 -16 -14 -10 -4 -6 -13 -9 -11 7 -3 -8 -23
Z -6 -7 -6 -1 -20 7 7 -8 -4 -9 -10 -6 -8 -20 -7 -8 -9 -21 -13 -10 -3 7 -8 -23
X -6 -9 -6 -9 -13 -8 -8 -8 -8 -8 -9 -8 -9 -12 -8 -6 -7 -16 -11 -8 -8 -8 -8 -23
* -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 -23 1
It sets these default lastal parameter values:
-a20 -b3
PAM30 PAM30
----- -----
This protein scoring scheme is good for finding short-and-strong This protein scoring scheme is good for finding strong, and short,
similarities (MO Dayhoff et al. 1978). similarities (MO Dayhoff et al. 1978).
It uses this matrix:: It uses this matrix::
......
...@@ -445,7 +445,9 @@ alignments: they are described in more detail at <a class="reference external" h ...@@ -445,7 +445,9 @@ alignments: they are described in more detail at <a class="reference external" h
<td>Maximum expected alignments per square giga. (See <a class="reference external" href="last-evalues.html">here</a>.)</td></tr> <td>Maximum expected alignments per square giga. (See <a class="reference external" href="last-evalues.html">here</a>.)</td></tr>
<tr><td class="option-group"> <tr><td class="option-group">
<kbd><span class="option">-s <var>NUMBER</var></span></kbd></td> <kbd><span class="option">-s <var>NUMBER</var></span></kbd></td>
<td>Which query strand to use: 0=reverse, 1=forward, 2=both.</td></tr> <td>Which query strand to use: 0=reverse, 1=forward, 2=both.
If specified, this parameter is written in last-train's
output, so it will override lastal's default.</td></tr>
<tr><td class="option-group"> <tr><td class="option-group">
<kbd><span class="option">-S <var>NUMBER</var></span></kbd></td> <kbd><span class="option">-S <var>NUMBER</var></span></kbd></td>
<td><p class="first">Specify how to use the substitution score matrix for <td><p class="first">Specify how to use the substitution score matrix for
...@@ -513,6 +515,36 @@ output, so it will override lastal's default.</p> ...@@ -513,6 +515,36 @@ output, so it will override lastal's default.</p>
</blockquote> </blockquote>
</div> </div>
</div> </div>
<div class="section" id="details">
<h2>Details</h2>
<ul>
<li><p class="first">last-train (and lastal) uses &quot;Model A&quot;, in Figure 5A of <a class="reference external" href="https://doi.org/10.1093/bioinformatics/btz576">btz576</a>.</p>
</li>
<li><p class="first">last-train (and lastal) converts between path and alignment
parameters as in Supplementary Section 3.1 of <a class="reference external" href="https://doi.org/10.1093/bioinformatics/btz576">btz576</a>.</p>
</li>
<li><p class="first">last-train converts probabilities to scores using an arbitrary scale
factor. (If all the scores are multiplied by any positive constant,
it makes no difference to alignment.) This scale is determined by
the initial score parameters.</p>
</li>
<li><p class="first">last-train uses parameters with &quot;homogeneous letter probabilities&quot;
and &quot;balanced length probability&quot; (<a class="reference external" href="https://doi.org/10.1093/bioinformatics/btz576">btz576</a>).</p>
</li>
<li><p class="first">last-train rounds the scores to integers, which makes them slightly
inaccurate. It then finds an adjusted scale factor (without
changing the scores), which makes the integer-rounded scores
correspond to homogeneous letter probabilities and balanced length
probability. It writes this adjusted scale as a &quot;-t&quot; option for
lastal, e.g. &quot;-t4.4363&quot;.</p>
</li>
<li><p class="first">In rare cases, it may be impossible to find such an adjusted scale
factor. If that happens, last-train doubles the original scale (to
reduce the inaccuracy of integer rounding), until the problem goes
away.</p>
</li>
</ul>
</div>
<div class="section" id="bugs"> <div class="section" id="bugs">
<h2>Bugs</h2> <h2>Bugs</h2>
<ul> <ul>
......
...@@ -82,6 +82,8 @@ Alignment options ...@@ -82,6 +82,8 @@ Alignment options
-E EG2 Maximum expected alignments per square giga. (See `here -E EG2 Maximum expected alignments per square giga. (See `here
<last-evalues.html>`_.) <last-evalues.html>`_.)
-s NUMBER Which query strand to use: 0=reverse, 1=forward, 2=both. -s NUMBER Which query strand to use: 0=reverse, 1=forward, 2=both.
If specified, this parameter is written in last-train's
output, so it will override lastal's default.
-S NUMBER Specify how to use the substitution score matrix for -S NUMBER Specify how to use the substitution score matrix for
reverse strands. If you use ``--revsym``, this makes no reverse strands. If you use ``--revsym``, this makes no
difference. "0" means that the matrix is used as-is for difference. "0" means that the matrix is used as-is for
...@@ -133,6 +135,36 @@ Alignment options ...@@ -133,6 +135,36 @@ Alignment options
If specified, this parameter is written in last-train's If specified, this parameter is written in last-train's
output, so it will override lastal's default. output, so it will override lastal's default.
Details
-------
* last-train (and lastal) uses "Model A", in Figure 5A of btz576_.
* last-train (and lastal) converts between path and alignment
parameters as in Supplementary Section 3.1 of btz576_.
* last-train converts probabilities to scores using an arbitrary scale
factor. (If all the scores are multiplied by any positive constant,
it makes no difference to alignment.) This scale is determined by
the initial score parameters.
* last-train uses parameters with "homogeneous letter probabilities"
and "balanced length probability" (btz576_).
* last-train rounds the scores to integers, which makes them slightly
inaccurate. It then finds an adjusted scale factor (without
changing the scores), which makes the integer-rounded scores
correspond to homogeneous letter probabilities and balanced length
probability. It writes this adjusted scale as a "-t" option for
lastal, e.g. "-t4.4363".
* In rare cases, it may be impossible to find such an adjusted scale
factor. If that happens, last-train doubles the original scale (to
reduce the inaccuracy of integer rounding), until the problem goes
away.
.. _btz576: https://doi.org/10.1093/bioinformatics/btz576
Bugs Bugs
---- ----
......
...@@ -769,12 +769,14 @@ this core</p> ...@@ -769,12 +769,14 @@ this core</p>
<p class="last">Use -w0 to turn this off.</p> <p class="last">Use -w0 to turn this off.</p>
</td></tr> </td></tr>
<tr><td class="option-group"> <tr><td class="option-group">
<kbd><span class="option">-G <var>FILE</var></span></kbd></td> <kbd><span class="option">-G <var>GENETIC-CODE</var></span></kbd></td>
<td>Use an alternative genetic code in the specified file. For an <td>Specify the genetic code for translating DNA to protein. This
example of the format, see vertebrateMito.gc in the examples
directory. By default, the standard genetic code is used. This
option has no effect unless DNA-versus-protein alignment is option has no effect unless DNA-versus-protein alignment is
selected with option -F.</td></tr> selected with option -F. Codes are specified by numbers
(e.g. 1 = standard, 2 = vertebrate mitochondrial), listed here:
<a class="reference external" href="https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi">https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi</a>. Any
other GENETIC-CODE is assumed to be a file name: for an example
of the format, see vertebrateMito.gc in the examples directory.</td></tr>
<tr><td class="option-group"> <tr><td class="option-group">
<kbd><span class="option">-t <var>TEMPERATURE</var></span></kbd></td> <kbd><span class="option">-t <var>TEMPERATURE</var></span></kbd></td>
<td>Parameter for converting between scores and probability ratios. <td>Parameter for converting between scores and probability ratios.
...@@ -824,9 +826,9 @@ with &quot;c&quot; for each alignment. The first 16 numbers on this line ...@@ -824,9 +826,9 @@ with &quot;c&quot; for each alignment. The first 16 numbers on this line
are the expected counts of matches and mismatches: first the are the expected counts of matches and mismatches: first the
count of reference As aligned to query As, then the count of count of reference As aligned to query As, then the count of
reference As aligned to query Cs, and so on. For proteins there reference As aligned to query Cs, and so on. For proteins there
will be 400 such numbers. The final ten numbers are expected will be 400 such numbers. The next 5 numbers are expected
counts related to gaps. They are:</p> counts related to gaps. They are:</p>
<ul> <ul class="last">
<li><p class="first">The count of matches plus mismatches. (This may exceed the <li><p class="first">The count of matches plus mismatches. (This may exceed the
total of the preceding numbers, if the sequences have non-ACGT total of the preceding numbers, if the sequences have non-ACGT
letters.)</p> letters.)</p>
...@@ -839,20 +841,6 @@ letters.)</p> ...@@ -839,20 +841,6 @@ letters.)</p>
</li> </li>
<li><p class="first">The count of insert opens (= count of insert closes).</p> <li><p class="first">The count of insert opens (= count of insert closes).</p>
</li> </li>
<li><p class="first">The count of adjacent pairs of insertions and deletions.</p>
</li>
</ul>
<p>The final four numbers are always zero, unless you use
generalized affine gap costs. They are:</p>
<ul class="last">
<li><p class="first">The count of unaligned letter pairs.</p>
</li>
<li><p class="first">The count of unaligned letter pair opens (= count of closes).</p>
</li>
<li><p class="first">The count of adjacent pairs of deletions and unaligned letter pairs.</p>
</li>
<li><p class="first">The count of adjacent pairs of insertions and unaligned letter pairs.</p>
</li>
</ul> </ul>
</td></tr> </td></tr>
<tr><td class="option-group"> <tr><td class="option-group">
......
...@@ -400,12 +400,14 @@ Miscellaneous options ...@@ -400,12 +400,14 @@ Miscellaneous options
Use -w0 to turn this off. Use -w0 to turn this off.
-G FILE -G GENETIC-CODE
Use an alternative genetic code in the specified file. For an Specify the genetic code for translating DNA to protein. This
example of the format, see vertebrateMito.gc in the examples
directory. By default, the standard genetic code is used. This
option has no effect unless DNA-versus-protein alignment is option has no effect unless DNA-versus-protein alignment is
selected with option -F. selected with option -F. Codes are specified by numbers
(e.g. 1 = standard, 2 = vertebrate mitochondrial), listed here:
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi. Any
other GENETIC-CODE is assumed to be a file name: for an example
of the format, see vertebrateMito.gc in the examples directory.
-t TEMPERATURE -t TEMPERATURE
Parameter for converting between scores and probability ratios. Parameter for converting between scores and probability ratios.
...@@ -461,7 +463,7 @@ Miscellaneous options ...@@ -461,7 +463,7 @@ Miscellaneous options
are the expected counts of matches and mismatches: first the are the expected counts of matches and mismatches: first the
count of reference As aligned to query As, then the count of count of reference As aligned to query As, then the count of
reference As aligned to query Cs, and so on. For proteins there reference As aligned to query Cs, and so on. For proteins there
will be 400 such numbers. The final ten numbers are expected will be 400 such numbers. The next 5 numbers are expected
counts related to gaps. They are: counts related to gaps. They are:
* The count of matches plus mismatches. (This may exceed the * The count of matches plus mismatches. (This may exceed the
...@@ -471,15 +473,6 @@ Miscellaneous options ...@@ -471,15 +473,6 @@ Miscellaneous options
* The count of inserted letters. * The count of inserted letters.
* The count of delete opens (= count of delete closes). * The count of delete opens (= count of delete closes).
* The count of insert opens (= count of insert closes). * The count of insert opens (= count of insert closes).
* The count of adjacent pairs of insertions and deletions.
The final four numbers are always zero, unless you use
generalized affine gap costs. They are:
* The count of unaligned letter pairs.
* The count of unaligned letter pair opens (= count of closes).
* The count of adjacent pairs of deletions and unaligned letter pairs.
* The count of adjacent pairs of insertions and unaligned letter pairs.
-Q NUMBER -Q NUMBER
Specify how to read the query sequences:: Specify how to read the query sequences::
......
This diff is collapsed.
CXXFLAGS = -O3 -std=c++11 -pthread -DHAS_CXX_THREADS CXXFLAGS = -msse4.1 -O3 -std=c++11 -pthread -DHAS_CXX_THREADS
all: all:
@cd src && $(MAKE) CXXFLAGS="$(CXXFLAGS)" @cd src && $(MAKE) CXXFLAGS="$(CXXFLAGS)"
......
#! /usr/bin/env python #! /usr/bin/env python
# Copyright 2015 Martin C. Frith # Copyright 2015 Martin C. Frith
# References:
# [Fri19] How sequence alignment scores correspond to probability models,
# MC Frith, Bioinformatics, 2019.
from __future__ import print_function from __future__ import print_function
import gzip import gzip
import math, optparse, os, random, signal, subprocess, sys, tempfile import math
import optparse
import os
import random
import signal
import subprocess
import sys
import tempfile
def myOpen(fileName): # faster than fileinput def myOpen(fileName): # faster than fileinput
if fileName == "-": if fileName == "-":
...@@ -13,6 +24,38 @@ def myOpen(fileName): # faster than fileinput ...@@ -13,6 +24,38 @@ def myOpen(fileName): # faster than fileinput
return gzip.open(fileName, "rt") # xxx dubious for Python2 return gzip.open(fileName, "rt") # xxx dubious for Python2
return open(fileName) return open(fileName)
def rootOfIncreasingFunction(func, lowerBound, upperBound, *args):
# Find x such that func(x, *args) == 0
while True:
mid = 0.5 * (lowerBound + upperBound)
if mid <= lowerBound or mid >= upperBound:
return mid
if func(mid, *args) < 0:
lowerBound = mid
else:
upperBound = mid
def homogeneousLetterFreqs(scale, matScores):
# Solve the simultaneous equations in Section 2.1 of [Fri19]
expMat = [[math.exp(j / scale) for j in i] for i in matScores]
m = [row[:] + [1.0] for row in expMat] # augmented matrix
n = len(expMat)
for k in range(n):
iMax = k
for i in range(k, n):
if abs(m[i][k]) > abs(m[iMax][k]):
iMax = i
if iMax > k:
m[k], m[iMax] = m[iMax], m[k]
if abs(m[k][k]) <= 0:
raise ArithmeticError("singular matrix")
for i in range(n):
if i != k:
mul = m[i][k] / m[k][k]
for j in range(k + 1, n + 1):
m[i][j] -= m[k][j] * mul
return [m[k][n] / m[k][k] for k in range(n)]
def randomSample(things, sampleSize): def randomSample(things, sampleSize):
"""Randomly get sampleSize things (or all if fewer).""" """Randomly get sampleSize things (or all if fewer)."""
reservoir = [] # "reservoir sampling" algorithm reservoir = [] # "reservoir sampling" algorithm
...@@ -133,6 +176,7 @@ def scaleFromHeader(lines): ...@@ -133,6 +176,7 @@ def scaleFromHeader(lines):
raise Exception("couldn't read the scale") raise Exception("couldn't read the scale")
def countsFromLastOutput(lines, opts): def countsFromLastOutput(lines, opts):
numTransitionCounts = 5
matrix = [] matrix = []
# use +1 pseudocounts as a kludge to mitigate numerical problems: # use +1 pseudocounts as a kludge to mitigate numerical problems:
matches = 1.0 matches = 1.0
...@@ -146,11 +190,12 @@ def countsFromLastOutput(lines, opts): ...@@ -146,11 +190,12 @@ def countsFromLastOutput(lines, opts):
strand = line.split()[4] # slow? strand = line.split()[4] # slow?
if line[0] == "c": if line[0] == "c":
c = [float(i) for i in line.split()[1:]] c = [float(i) for i in line.split()[1:]]
transCounts = c[-numTransitionCounts:]
if not matrix: if not matrix:
matrixSize = int(math.sqrt(len(c) - 10)) matrixSize = int(math.sqrt(len(c) - numTransitionCounts))
matrix = [[1.0] * matrixSize for i in range(matrixSize)] matrix = [[1.0] * matrixSize for i in range(matrixSize)]
identities = sum(c[i * matrixSize + i] for i in range(matrixSize)) identities = sum(c[i * matrixSize + i] for i in range(matrixSize))
alignmentLength = c[-10] + c[-9] + c[-8] alignmentLength = transCounts[0] + transCounts[1] + transCounts[2]
if 100 * identities > opts.pid * alignmentLength: continue if 100 * identities > opts.pid * alignmentLength: continue
for i in range(matrixSize): for i in range(matrixSize):
for j in range(matrixSize): for j in range(matrixSize):
...@@ -158,11 +203,11 @@ def countsFromLastOutput(lines, opts): ...@@ -158,11 +203,11 @@ def countsFromLastOutput(lines, opts):
matrix[i][j] += c[i * matrixSize + j] matrix[i][j] += c[i * matrixSize + j]
else: else:
matrix[-1-i][-1-j] += c[i * matrixSize + j] matrix[-1-i][-1-j] += c[i * matrixSize + j]
matches += c[-10] matches += transCounts[0]
deletes += c[-9] deletes += transCounts[1]
inserts += c[-8] inserts += transCounts[2]
delOpens += c[-7] delOpens += transCounts[3]
insOpens += c[-6] insOpens += transCounts[4]
alignments += 1 alignments += 1
gapCounts = matches, deletes, inserts, delOpens, insOpens, alignments gapCounts = matches, deletes, inserts, delOpens, insOpens, alignments
return matrix, gapCounts return matrix, gapCounts
...@@ -180,10 +225,6 @@ def guessAlphabet(matrixSize): ...@@ -180,10 +225,6 @@ def guessAlphabet(matrixSize):
if matrixSize == 20: return "ACDEFGHIKLMNPQRSTVWY" if matrixSize == 20: return "ACDEFGHIKLMNPQRSTVWY"
raise Exception("can't handle unusual alphabets") raise Exception("can't handle unusual alphabets")
def matrixWithLetters(matrix):
alphabet = guessAlphabet(len(matrix))
return [alphabet] + [[a] + i for a, i in zip(alphabet, matrix)]
def writeMatrixHead(outFile, prefix, alphabet, formatString): def writeMatrixHead(outFile, prefix, alphabet, formatString):
writeWords(outFile, [prefix + " "] + [formatString % k for k in alphabet]) writeWords(outFile, [prefix + " "] + [formatString % k for k in alphabet])
...@@ -206,14 +247,6 @@ def writeScoreMatrix(outFile, matrix, prefix): ...@@ -206,14 +247,6 @@ def writeScoreMatrix(outFile, matrix, prefix):
writeMatrixHead(outFile, prefix, alphabet, "%6s") writeMatrixHead(outFile, prefix, alphabet, "%6s")
writeMatrixBody(outFile, prefix, alphabet, matrix, "%6s") writeMatrixBody(outFile, prefix, alphabet, matrix, "%6s")
def writeMatrixWithLetters(outFile, matrix, prefix):
head = matrix[0]
tail = matrix[1:]
left = [i[0] for i in tail]
body = [i[1:] for i in tail]
writeMatrixHead(outFile, prefix, head, "%6s")
writeMatrixBody(outFile, prefix, left, body, "%6s")
def matProbsFromCounts(counts, opts): def matProbsFromCounts(counts, opts):
r = range(len(counts)) r = range(len(counts))
if opts.revsym: # add complement (reverse strand) substitutions if opts.revsym: # add complement (reverse strand) substitutions
...@@ -237,12 +270,27 @@ def matProbsFromCounts(counts, opts): ...@@ -237,12 +270,27 @@ def matProbsFromCounts(counts, opts):
return probs return probs
def gapProbsFromCounts(counts, opts): def probImbalance(endProb, matchProb, firstDelProb, delExtendProb,
firstInsProb, insExtendProb):
# (RHS - LHS) of Equation (12) in [Fri19]
d = firstDelProb / (endProb - delExtendProb)
i = firstInsProb / (endProb - insExtendProb)
return 1 - matchProb / (endProb * endProb) - d - i
def balancedEndProb(*args):
matchProb, firstDelProb, delExtendProb, firstInsProb, insExtendProb = args
lowerBound = max(delExtendProb, insExtendProb)
upperBound = 1.0
r = rootOfIncreasingFunction(probImbalance, lowerBound, upperBound, *args)
return r
def gapRatiosFromCounts(counts, opts):
matches, deletes, inserts, delOpens, insOpens, alignments = counts matches, deletes, inserts, delOpens, insOpens, alignments = counts
if not alignments: raise Exception("no alignments") if not alignments: raise Exception("no alignments")
gaps = deletes + inserts gaps = deletes + inserts
gapOpens = delOpens + insOpens gapOpens = delOpens + insOpens
denominator = matches + gapOpens + (alignments + 1) # +1 pseudocount denominator = matches + gapOpens + (alignments + 1) # +1 pseudocount
matchProb = matches / denominator
if opts.gapsym: if opts.gapsym:
delOpenProb = gapOpens / denominator / 2 delOpenProb = gapOpens / denominator / 2
insOpenProb = gapOpens / denominator / 2 insOpenProb = gapOpens / denominator / 2
...@@ -262,6 +310,7 @@ def gapProbsFromCounts(counts, opts): ...@@ -262,6 +310,7 @@ def gapProbsFromCounts(counts, opts):
print("# alignments:", alignments) print("# alignments:", alignments)
print("# mean delete size: %g" % (deletes / delOpens)) print("# mean delete size: %g" % (deletes / delOpens))
print("# mean insert size: %g" % (inserts / insOpens)) print("# mean insert size: %g" % (inserts / insOpens))
print("# matchProb: %g" % matchProb)
print("# delOpenProb: %g" % delOpenProb) print("# delOpenProb: %g" % delOpenProb)
print("# insOpenProb: %g" % insOpenProb) print("# insOpenProb: %g" % insOpenProb)
print("# delExtendProb: %g" % delExtendProb) print("# delExtendProb: %g" % delExtendProb)
...@@ -269,54 +318,109 @@ def gapProbsFromCounts(counts, opts): ...@@ -269,54 +318,109 @@ def gapProbsFromCounts(counts, opts):
print() print()
delCloseProb = 1 - delExtendProb delCloseProb = 1 - delExtendProb
insCloseProb = 1 - insExtendProb
firstDelProb = delOpenProb * delCloseProb firstDelProb = delOpenProb * delCloseProb
insCloseProb = 1 - insExtendProb
firstInsProb = insOpenProb * insCloseProb firstInsProb = insOpenProb * insCloseProb
# If we define "an alignment" to mean "a set of indistinguishable endProb = balancedEndProb(matchProb, firstDelProb, delExtendProb,
# paths", then: firstInsProb, insExtendProb)
#delExtendProb += firstDelProb # probably, endProb is negligibly less than 1
#insExtendProb += firstInsProb
# Else, this ensures gap existence cost >= 0: matchRatio = matchProb / (endProb * endProb)
delExtendProb = max(delExtendProb, firstDelProb)
insExtendProb = max(insExtendProb, firstInsProb) firstDelRatio = firstDelProb / endProb
delExtendRatio = delExtendProb / endProb
delExistProb = firstDelProb / delExtendProb delRatios = firstDelRatio, delExtendRatio
insExistProb = firstInsProb / insExtendProb
firstInsRatio = firstInsProb / endProb
return delExistProb, insExistProb, delExtendProb, insExtendProb insExtendRatio = insExtendProb / endProb
insRatios = firstInsRatio, insExtendRatio
def scoreFromLetterProbs(scale, pairProb, prob1, prob2):
probRatio = pairProb / (prob1 * prob2) return matchRatio, delRatios, insRatios
return scoreFromProb(scale, probRatio)
def scoreFromLetterProbs(scale, matchRatio, pairProb, rowProb, colProb):
def matScoresFromProbs(scale, probs): # Equation (4) in [Fri19]
rowProbs = [sum(i) for i in probs] probRatio = pairProb / (rowProb * colProb)
colProbs = [sum(i) for i in zip(*probs)] return scoreFromProb(scale, matchRatio * probRatio)
return [[scoreFromLetterProbs(scale, j, x, y) for j, y in zip(i, colProbs)]
for i, x in zip(probs, rowProbs)] def matScoresFromProbs(scale, matchRatio, matProbs, rowProbs, colProbs):
r = range(len(matProbs))
def gapCostsFromProbs(scale, probs): return [[scoreFromLetterProbs(scale, matchRatio, matProbs[i][j],
delExistProb, insExistProb, delExtendProb, insExtendProb = probs rowProbs[i], colProbs[j]) for j in r]
delExistCost = costFromProb(scale, delExistProb) for i in r]
insExistCost = costFromProb(scale, insExistProb)
delExtendCost = costFromProb(scale, delExtendProb) def gapCostsFromProbRatios(scale, firstGapRatio, gapExtendRatio):
insExtendCost = costFromProb(scale, insExtendProb) # The next addition gets the alignment parameter from the path
if delExtendCost == 0: delExtendCost = 1 # parameters, as in Supplementary section 3.1 of [Fri19]:
if insExtendCost == 0: insExtendCost = 1 gapExtendRatio += firstGapRatio
return delExistCost, insExistCost, delExtendCost, insExtendCost firstGapCost = max(costFromProb(scale, firstGapRatio), 1)
gapExtendCost = max(costFromProb(scale, gapExtendRatio), 1)
return firstGapCost, gapExtendCost
def imbalanceFromGap(scale, firstGapCost, gapExtendCost):
firstGapRatio = math.exp(-firstGapCost / scale)
gapExtendRatio = math.exp(-gapExtendCost / scale)
# The next subtraction gets the path parameter from the alignment
# parameters, as in Supplementary section 3.1 of [Fri19]:
gapExtendRatio -= firstGapRatio
return firstGapRatio / (1 - gapExtendRatio)
def scoreImbalance(scale, matScores, delCosts, insCosts):
# C' - 1, where C' is defined in Equation (13) of [Fri19]
d = imbalanceFromGap(scale, *delCosts)
i = imbalanceFromGap(scale, *insCosts)
return 1 / sum(homogeneousLetterFreqs(scale, matScores)) + d + i - 1
def negScoreImbalance(*args):
return -scoreImbalance(*args)
def balancedScale(nearScale, *args):
# Find a scale, near nearScale, with balanced length probability
bump = 1.000001
value = scoreImbalance(nearScale, *args)
if abs(value) <= 0:
return nearScale
oldLower = oldUpper = nearScale
while True: # xxx
newLower = oldLower / bump
lowerValue = scoreImbalance(newLower, *args)
if (lowerValue < 0) != (value < 0):
func = scoreImbalance if value > 0 else negScoreImbalance
return rootOfIncreasingFunction(func, newLower, oldLower, *args)
oldLower = newLower
newUpper = oldUpper * bump
upperValue = scoreImbalance(newUpper, *args)
if (upperValue < 0) != (value < 0):
func = scoreImbalance if value < 0 else negScoreImbalance
return rootOfIncreasingFunction(func, oldUpper, newUpper, *args)
oldUpper = newUpper
def scoresAndScale(originalScale, matParams, delRatios, insRatios):
while True:
matScores = matScoresFromProbs(originalScale, *matParams)
delCosts = gapCostsFromProbRatios(originalScale, *delRatios)
insCosts = gapCostsFromProbRatios(originalScale, *insRatios)
scale = balancedScale(originalScale, matScores, delCosts, insCosts)
rowFreqs = homogeneousLetterFreqs(scale, zip(*matScores))
colFreqs = homogeneousLetterFreqs(scale, matScores)
if all(i >= 0 for i in rowFreqs + colFreqs):
return matScores, delCosts, insCosts, scale
print("# the integer-rounded scores are too inaccurate: "
"doubling the scale")
originalScale *= 2
def writeGapCosts(gapCosts, out): def writeGapCosts(gapCosts, out):
delExistCost, insExistCost, delExtendCost, insExtendCost = gapCosts firstDelCost, delExtendCost, firstInsCost, insExtendCost = gapCosts
print("#last -a", delExistCost, file=out) print("#last -a", firstDelCost - delExtendCost, file=out)
print("#last -A", insExistCost, file=out) print("#last -A", firstInsCost - insExtendCost, file=out)
print("#last -b", delExtendCost, file=out) print("#last -b", delExtendCost, file=out)
print("#last -B", insExtendCost, file=out) print("#last -B", insExtendCost, file=out)
def printGapCosts(gapCosts): def printGapCosts(gapCosts):
delExistCost, insExistCost, delExtendCost, insExtendCost = gapCosts firstDelCost, delExtendCost, firstInsCost, insExtendCost = gapCosts
print("# delExistCost:", delExistCost) print("# delExistCost:", firstDelCost - delExtendCost)
print("# insExistCost:", insExistCost) print("# insExistCost:", firstInsCost - insExtendCost)
print("# delExtendCost:", delExtendCost) print("# delExtendCost:", delExtendCost)
print("# insExtendCost:", insExtendCost) print("# insExtendCost:", insExtendCost)
print() print()
...@@ -388,26 +492,29 @@ def doTraining(opts, args): ...@@ -388,26 +492,29 @@ def doTraining(opts, args):
print() print()
sys.stdout.flush() sys.stdout.flush()
matCounts, gapCounts = countsFromLastOutput(q.stdout, opts) matCounts, gapCounts = countsFromLastOutput(q.stdout, opts)
gapProbs = gapProbsFromCounts(gapCounts, opts) matchRatio, delRatios, insRatios = gapRatiosFromCounts(gapCounts, opts)
gapCosts = gapCostsFromProbs(internalScale, gapProbs)
printGapCosts(gapCosts)
matProbs = matProbsFromCounts(matCounts, opts) matProbs = matProbsFromCounts(matCounts, opts)
matScores = matScoresFromProbs(internalScale, matProbs) rowProbs = [sum(i) for i in matProbs]
colProbs = [sum(i) for i in zip(*matProbs)]
matParams = matchRatio, matProbs, rowProbs, colProbs
sas = scoresAndScale(internalScale, matParams, delRatios, insRatios)
matScores, delCosts, insCosts, scale = sas
printGapCosts(delCosts + insCosts)
print("# score matrix " print("# score matrix "
"(query letters = columns, reference letters = rows):") "(query letters = columns, reference letters = rows):")
writeScoreMatrix(sys.stdout, matScores, "# ") writeScoreMatrix(sys.stdout, matScores, "# ")
print() print()
parameters = gapCosts, matScores parameters = delCosts, insCosts, matScores
if parameters in oldParameters: break if parameters in oldParameters: break
oldParameters.append(parameters) oldParameters.append(parameters)
internalMatrix = matrixWithLetters(matScores)
x = fixedLastalArgs(opts, lastalProgName) x = fixedLastalArgs(opts, lastalProgName)
x.append("-t{0:.6}".format(scale))
x.append("-p-") x.append("-p-")
x += args x += args
p = subprocess.Popen(x, stdin=subprocess.PIPE, stdout=subprocess.PIPE, p = subprocess.Popen(x, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
universal_newlines=True) universal_newlines=True)
writeGapCosts(gapCosts, p.stdin) writeGapCosts(delCosts + insCosts, p.stdin)
writeMatrixWithLetters(p.stdin, internalMatrix, "") writeScoreMatrix(p.stdin, matScores, "")
p.stdin.close() p.stdin.close()
# in python2.6, the next line must come after p.stdin.close() # in python2.6, the next line must come after p.stdin.close()
q = subprocess.Popen(y, stdin=p.stdout, stdout=subprocess.PIPE, q = subprocess.Popen(y, stdin=p.stdout, stdout=subprocess.PIPE,
...@@ -416,17 +523,17 @@ def doTraining(opts, args): ...@@ -416,17 +523,17 @@ def doTraining(opts, args):
q = subprocess.Popen(z, stdin=q.stdout, stdout=subprocess.PIPE, q = subprocess.Popen(z, stdin=q.stdout, stdout=subprocess.PIPE,
universal_newlines=True) universal_newlines=True)
sas = scoresAndScale(externalScale, matParams, delRatios, insRatios)
matScores, delCosts, insCosts, scale = sas
if opts.X: print("#last -X", opts.X) if opts.X: print("#last -X", opts.X)
if opts.Q: print("#last -Q", opts.Q) if opts.Q: print("#last -Q", opts.Q)
gapCosts = gapCostsFromProbs(externalScale, gapProbs) print("#last -t{0:.6}".format(scale))
writeGapCosts(gapCosts, sys.stdout) writeGapCosts(delCosts + insCosts, sys.stdout)
if opts.s: print("#last -s", opts.s) if opts.s: print("#last -s", opts.s)
if opts.S: print("#last -S", opts.S) if opts.S: print("#last -S", opts.S)
matScores = matScoresFromProbs(externalScale, matProbs)
externalMatrix = matrixWithLetters(matScores)
print("# score matrix " print("# score matrix "
"(query letters = columns, reference letters = rows):") "(query letters = columns, reference letters = rows):")
writeMatrixWithLetters(sys.stdout, externalMatrix, "") writeScoreMatrix(sys.stdout, matScores, "")
def lastTrain(opts, args): def lastTrain(opts, args):
if opts.sample_number: if opts.sample_number:
......
...@@ -35,19 +35,10 @@ static void addExpectedCounts( double* expectedCounts, ...@@ -35,19 +35,10 @@ static void addExpectedCounts( double* expectedCounts,
double* transitionCounts = &expectedCounts[ numEmissionCounts ]; double* transitionCounts = &expectedCounts[ numEmissionCounts ];
transitionCounts[0] += ec.toMatch; transitionCounts[0] += ec.toMatch;
transitionCounts[1] += ec.DD + ec.MD + ec.PD; // deleted letter count transitionCounts[1] += ec.delNext + ec.delInit; // deleted letter count
transitionCounts[2] += ec.II + ec.MI + ec.DI + ec.PI; // ins. letter count transitionCounts[2] += ec.insNext + ec.insInit; // inserted letter count
transitionCounts[3] += ec.MD + ec.PD; // deletion open/close count transitionCounts[3] += ec.delInit; // deletion open/close count
transitionCounts[4] += ec.MI + ec.DI + ec.PI; // insertion open/close count transitionCounts[4] += ec.insInit; // insertion open/close count
transitionCounts[5] += ec.DI; // adjacent insertion & deletion count
transitionCounts[7] += ec.PP + ec.MP; // unaligned letter pair count
transitionCounts[6] += ec.MP; // pair-gap open/close count
transitionCounts[8] += ec.PD;
transitionCounts[9] += ec.PI;
// MD = DM + DI - PD + DQ
// MI = IM - DI - PI + IQ
// PM = MP - PD - PI - PQ
// DM + IM + PM = MD + MI + MP - DQ - IQ - PQ
} }
static void countSeedMatches( double* expectedCounts, static void countSeedMatches( double* expectedCounts,
...@@ -83,7 +74,7 @@ void Alignment::makeXdrop( Centroid& centroid, ...@@ -83,7 +74,7 @@ void Alignment::makeXdrop( Centroid& centroid,
if( outputType == 7 ){ if( outputType == 7 ){
assert( seed.size > 0 ); // makes things easier to understand assert( seed.size > 0 ); // makes things easier to understand
const int numEmissionCounts = alph.size * alph.size; const int numEmissionCounts = alph.size * alph.size;
const int numTransitionCounts = 10; const int numTransitionCounts = 5;
std::vector<double>& expectedCounts = extras.expectedCounts; std::vector<double>& expectedCounts = extras.expectedCounts;
expectedCounts.resize( numEmissionCounts + numTransitionCounts ); expectedCounts.resize( numEmissionCounts + numTransitionCounts );
countSeedMatches( &expectedCounts[0], countSeedMatches( &expectedCounts[0],
...@@ -328,6 +319,11 @@ void Alignment::extend( std::vector< SegmentPair >& chunks, ...@@ -328,6 +319,11 @@ void Alignment::extend( std::vector< SegmentPair >& chunks,
return; return;
} }
if (!isForward) {
--start1;
--start2;
}
int extensionScore = int extensionScore =
isGreedy ? greedyAligner.align( seq1 + start1, seq2 + start2, isGreedy ? greedyAligner.align( seq1 + start1, seq2 + start2,
isForward, sm, maxDrop, alph.size ) isForward, sm, maxDrop, alph.size )
...@@ -371,10 +367,6 @@ void Alignment::extend( std::vector< SegmentPair >& chunks, ...@@ -371,10 +367,6 @@ void Alignment::extend( std::vector< SegmentPair >& chunks,
if( outputType > 3 ){ // calculate match probabilities if( outputType > 3 ){ // calculate match probabilities
assert( !isGreedy ); assert( !isGreedy );
assert( !sm2qual ); assert( !sm2qual );
if (!isForward) {
--start1;
--start2;
}
centroid.doForwardBackwardAlgorithm(seq1, seq2, start1, start2, isForward, centroid.doForwardBackwardAlgorithm(seq1, seq2, start1, start2, isForward,
gap, globality); gap, globality);
......
This diff is collapsed.
...@@ -7,7 +7,6 @@ ...@@ -7,7 +7,6 @@
#include "mcf_gap_costs.hh" #include "mcf_gap_costs.hh"
#include "SegmentPair.hh" #include "SegmentPair.hh"
#include "OneQualityScoreMatrix.hh" #include "OneQualityScoreMatrix.hh"
#include <stddef.h> // size_t
#include <vector> #include <vector>
#include <iostream> // for debug #include <iostream> // for debug
...@@ -17,10 +16,10 @@ namespace cbrc{ ...@@ -17,10 +16,10 @@ namespace cbrc{
public: public:
double emit[scoreMatrixRowSize][scoreMatrixRowSize]; double emit[scoreMatrixRowSize][scoreMatrixRowSize];
double toMatch; double toMatch;
double MD, MP, MI; double delInit;
double DD, DI; double delNext;
double PP, PD, PI; double insInit;
double II; double insNext;
public: public:
ExpectedCount (); ExpectedCount ();
}; };
...@@ -57,7 +56,7 @@ namespace cbrc{ ...@@ -57,7 +56,7 @@ namespace cbrc{
void doForwardBackwardAlgorithm(const uchar* seq1, const uchar* seq2, void doForwardBackwardAlgorithm(const uchar* seq1, const uchar* seq2,
size_t start1, size_t start2, size_t start1, size_t start2,
bool isExtendFwd, const mcf::GapCosts& gap, bool isExtendFwd, const GapCosts& gap,
int globality) { int globality) {
seq1 += start1; seq1 += start1;
seq2 += start2; seq2 += start2;
...@@ -98,7 +97,7 @@ namespace cbrc{ ...@@ -98,7 +97,7 @@ namespace cbrc{
// Added by MH (2008/10/10) : compute expected counts for transitions and emissions // Added by MH (2008/10/10) : compute expected counts for transitions and emissions
void computeExpectedCounts(const uchar* seq1, const uchar* seq2, void computeExpectedCounts(const uchar* seq1, const uchar* seq2,
size_t start1, size_t start2, bool isExtendFwd, size_t start1, size_t start2, bool isExtendFwd,
const mcf::GapCosts& gap, unsigned alphabetSize, const GapCosts& gap, unsigned alphabetSize,
ExpectedCount& count) const; ExpectedCount& count) const;
private: private:
...@@ -119,12 +118,10 @@ namespace cbrc{ ...@@ -119,12 +118,10 @@ namespace cbrc{
dvec_t fM; // f^M(i,j) dvec_t fM; // f^M(i,j)
dvec_t fD; // f^D(i,j) Ix dvec_t fD; // f^D(i,j) Ix
dvec_t fI; // f^I(i,j) Iy dvec_t fI; // f^I(i,j) Iy
dvec_t fP; // f^P(i,j)
dvec_t bM; // b^M(i,j) dvec_t bM; // b^M(i,j)
dvec_t bD; // b^D(i,j) dvec_t bD; // b^D(i,j)
dvec_t bI; // b^I(i,j) dvec_t bI; // b^I(i,j)
dvec_t bP; // b^P(i,j)
dvec_t mD; dvec_t mD;
dvec_t mI; dvec_t mI;
...@@ -141,11 +138,11 @@ namespace cbrc{ ...@@ -141,11 +138,11 @@ namespace cbrc{
void forward(const uchar* seq1, const uchar* seq2, void forward(const uchar* seq1, const uchar* seq2,
const ExpMatrixRow* pssm, bool isExtendFwd, const ExpMatrixRow* pssm, bool isExtendFwd,
const mcf::GapCosts& gap, int globality); const GapCosts& gap, int globality);
void backward(const uchar* seq1, const uchar* seq2, void backward(const uchar* seq1, const uchar* seq2,
const ExpMatrixRow* pssm, bool isExtendFwd, const ExpMatrixRow* pssm, bool isExtendFwd,
const mcf::GapCosts& gap, int globality); const GapCosts& gap, int globality);
void initForwardMatrix(); void initForwardMatrix();
void initBackwardMatrix(); void initBackwardMatrix();
...@@ -157,12 +154,6 @@ namespace cbrc{ ...@@ -157,12 +154,6 @@ namespace cbrc{
bestPos1 = cur; bestPos1 = cur;
} }
} }
// start of the x-drop region (i.e. number of skipped seq1 letters
// before the x-drop region) for this antidiagonal
size_t seq1start( size_t antidiagonal ) const {
return xa.scoreEndIndex( antidiagonal ) - xa.scoreOrigin( antidiagonal );
}
}; };
} }
......