Skip to content
Commits on Source (7)
blasr (5.3.2+dfsg-1) unstable; urgency=medium
The following tools are not build by the default build process any more:
loadPulses
pls2fasta
samFilter
samtoh5
samtom4
sdpMatcher
toAfg
Please contact the Debian Maintainer if you really need these.
-- Andreas Tille <tille@debian.org> Mon, 28 Jan 2019 16:07:16 +0100
blasr (5.3-1) unstable; urgency=medium
The blasr command line interface has changed. Long options are now
......
* Test the get-orig-source script
* Update manpages
* run test suite for bax2bam and bam2bax
.TH BLASR "1" "July 2015" "blasr 3ca7fe8" "User Commands"
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH BLASR "1" "January 2019" "blasr 5.3.2" "User Commands"
.SH NAME
blasr \- Map SMRT Sequences to a reference genome.
blasr \- Map SMRT Sequences to a reference genome
.SH SYNOPSIS
.P
.B blasr
.I reads.bam
.I genome.fasta \fB\-bam \-out\fI out.bam
reads.bam genome.fasta \fB\-\-bam\fR \fB\-\-out\fR out.bam
.P
.B blasr
.I reads.fasta
.I genome.fasta
reads.fasta genome.fasta
.P
.B blasr
.I reads.fasta
.I genome.fasta \fB\-sa\fI genome.fasta.sa
reads.fasta genome.fasta \fB\-\-sa\fR genome.fasta.sa
.P
.B blasr
.I reads.bax.h5
.I genome.fasta \fR[\fB\-sa \fIgenome.fasta.sa\fR]
reads.bax.h5 genome.fasta [\-\-sa genome.fasta.sa]
.P
.B blasr
.I reads.bax.h5
.I genome.fasta \fB\-sa\fI genome.fasta.sa \fB\-maxScore\fR \-100 \fB\-minMatch\fR 15 ...
reads.bax.h5 genome.fasta \fB\-\-sa\fR genome.fasta.sa \fB\-\-maxScore\fR 100 \fB\-\-minMatch\fR 15 ...
.P
.B blasr
.I reads.bax.h5
.I genome.fasta \fB\-sa\fI genome.fasta.sa \fB\-nproc\fR 24 \fB\-out\fI alignment.out\fR ...
reads.bax.h5 genome.fasta \fB\-\-sa\fR genome.fasta.sa \fB\-\-nproc\fR 24 \fB\-\-out\fR alignment.out ...
.SH DESCRIPTION
.P
\fBblasr\fR is a read mapping program that maps reads to positions
blasr is a read mapping program that maps reads to positions
in a genome by clustering short exact matches between the read and
the genome, and scoring clusters using alignment. The matches are
generated by searching all suffixes of a read against the genome
......@@ -42,21 +35,19 @@ precomputed suffix array index on the reference sequence is
specified.
.P
Although reads may be input in FASTA format, the recommended input is
PacBio BAM files because these contain qualtiy value
PacBio BAM files because these contain quality value
information that is used in the alignment and produces higher quality
variant detection.
Although alignments can be output in various formats, the recommended
output format is PacBio BAM.
Support for bax.h5 and plx.h5 files will be \fBDEPRECATED\fR.
Support for region tables for h5 files will be \fBDEPRECATED\fR.
Support to bax.h5 and plx.h5 files will be DEPRECATED.
Support to region tables for h5 files will be DEPRECATED.
.P
When suffix array index of a genome is not specified, the suffix array is
built before producing alignment. This may be prohibitively slow
when the genome is large (e.g. Human). It is best to precompute the
suffix array of a genome using the program
.BR sawriter (1),
and then specify the suffix array on the command line using
\fB\-sa\fR genome.fa.sa.
suffix array of a genome using the program sawriter, and then specify
the suffix array on the command line using \fB\-sa\fR genome.fa.sa.
.P
The optional parameters are roughly divided into three categories:
control over anchoring, alignment scoring, and output.
......@@ -75,362 +66,5 @@ in the human genome.
.P
For small genomes such as bacterial genomes or BACs, the default parameters
are sufficient for maximal sensitivity and good speed.
.SH OPTIONS
.TP
.B Input Files
.RS
.TP
.B Reads
.RS
.TP
.I reads.bam
A PacBio BAM file of reads.
This is the preferred input to \fBblasr\fR because rich quality
value (insertion,deletion, and substitution quality values) information is
maintained. The extra quality information improves variant detection and mapping speed.
.TP
.I reads.fasta
A multi\-fasta file of reads, though any fasta file is valid input
.TP
.IR reads.bax.h5 | reads.plx.h5
the old \fBDEPRECATED\fR output format of SMRT reads.
.TP
.I input.fofn
File of file names
.RE
.TP
\fB\-sa\fI suffixArrayFile
Use the suffix array 'sa' for detecting matches between the reads and the
reference.
The suffix array has been prepared by the \fBsawriter\fR(1) program.
.TP
\fB\-ctab\fI tab
A table of tuple counts used to estimate match significance.
This is by the program 'printTupleCountTable'.
While it is quick to generate on the fly, if there are many invocations of
\fBblasr\fR, it is useful to precompute the ctab.
.TP
\fB\-regionTable\fI table\fR (\fBDEPRECATED\fR)
Read in a read-region table in HDF format for masking portions of reads.
This may be a single table if there is just one input file,
or a fofn. When a region table is specified, any region table inside
the reads.plx.h5 or reads.bax.h5 files are ignored.
.RE
.B (DEPRECATED) Options for modifying reads.
.RS
.P
There is ancilliary information about substrings of reads
that is stored in a 'region table' for each read file. Because
HDF is used, the region table may be part of the .bax.h5 or .plx.h5 file,
or a separate file. A contiguously read substring from the template
is a subread, and any read may contain multiple subreads. The boundaries
of the subreads may be inferred from the region table either directly or
by definition of adapter boundaries. Typically region tables also
contain information for the location of the high and low quality regions of
reads. Reads produced by spurious reads from empty ZMWs have a high
quality start coordinate equal to high quality end, making no usable read.
.TP
\fB\-useccs\fR
Align the circular consensus sequence (ccs), then report alignments
of the ccs subreads to the window that the ccs was mapped to.
Only alignments of the subreads are reported.
.TP
\fB\-useccsall\fR
Similar to \fB\-useccs\fR, except all subreads are aligned, rather than just
the subreads used to call the ccs. This will include reads that only
cover part of the template.
.TP
\fB\-useccsdenovo\fR
Align the circular consensus, and report only the alignment of the ccs
sequence.
.TP
\fB\-noSplitSubreads\fR (false)
Do not split subreads at adapters. This is typically only
useful when the genome in an unrolled version of a known template, and
contains template-adapter-reverse_template sequence.
.TP
\fB\-ignoreRegions\fR (false)
Ignore any information in the region table.
.TP
\fB\-ignoreHQRegions\fR (false)
Ignore any hq regions in the region table.
.RE
.B Alignments To Report
.RS
.TP
\fB\-bestn\fI n \fR(10)
Report the top \fIn\fR alignments.
.TP
\fB\-hitPolicy\fR (all)
Specify a policy to treat multiple hits from [all, allbest, random, randombest, leftmost]
.RS
.TP
.I all
report all alignments.
.TP
.I allbest
report all equally top scoring alignments.
.TP
.I random
report a random alignment.
.TP
.I randombest
report a random alignment from multiple equally top scoring alignments.
.TP
.I leftmost
report an alignment which has the best alignmentscore and has the smallest mapping coordinate in any reference.
.RE
.TP
\fB\-placeRepeatsRandomly\fR (false)
\fBDEPRECATED!\fR If true, equivalent to \fB\-hitPolicy\fI randombest\fR.
.TP
\fB\-randomSeed\fR (0)
Seed for random number generator. By default (0), use current time as seed.
.TP
\fB\-noSortRefinedAlignments\fR (false)
Once candidate alignments are generated and scored via sparse dynamic
programming, they are rescored using local alignment that accounts
for different error profiles.
Resorting based on the local alignment may change the order the hits are returned.
.TP
\fB\-allowAdjacentIndels\fR
When specified, adjacent insertion or deletions are allowed. Otherwise, adjacent
insertion and deletions are merged into one operation. Using quality values
to guide pairwise alignments may dictate that the higher probability alignment
contains adjacent insertions or deletions.
Current tools such as GATK do not permit this and so they are not reported by
default.
.RE
.B Output Formats and Files
.RS
.TP
\fB\-out\fI out \fR(terminal)
Write output to \fIout\fR.
.TP
\fB\-sam\fR
Write output in SAM format.
.TP
\fB\-m\fI t
If not printing SAM, modify the output of the alignment.
.TP
When \fIt\fR is:
.RS
.TP
0
Print blast like output with |'s connecting matched nucleotides.
.TP
1
Print only a summary: score and pos.
.TP
2
Print in Compare.xml format.
.TP
3
Print in vulgar format (\fBDEPRECATED\fR).
.TP
4
Print a longer tabular version of the alignment.
.TP
5
Print in a machine\-parsable format that is read by compareSequences.py.
.RE
.TP
\fB\-header\fR
Print a header as the first line of the output file describing the contents of each column.
.TP
\fB\-titleTable\fI tab \fR(NULL)
Construct a table of reference sequence titles.
The reference sequences are enumerated by row, 0,1,...
The reference index is printed in alignment results rather than the full
reference name.
This makes output concise, particularly whenvery verbose titles exist in
reference names.
.TP
\fB\-unaligned\fI file
Output reads that are not aligned to \fIfile\fR
.TP
.IR \fB\-clipping\fI \0[ none | hard | subread | soft ] \0\fR(none)
.IP
Use no/hard/subread/soft clipping, ONLY for SAM/BAM output.
.TP
\fB\-printSAMQV\fR (false)
Print quality values to SAM output.
.TP
\fB\-cigarUseSeqMatch\fR (false)
CIGAR strings in SAM/BAM output use '=' and 'X' to represent sequence match and mismatch instead of 'M'.
.RE
.B Options for anchoring alignment regions.
.RS
.P
This will have the greatest effect on speed and sensitivity.
.TP
\fB\-minMatch\fI m \fR(12)
Minimum seed length.
Higher minMatch will speed up alignment, but decrease sensitivity.
.TP
\fB\-maxMatch\fI l \fR(inf)
Stop mapping a read to the genome when the lcp length reaches \fIl\fR.
This is useful when the query is part of the reference, for example when
constructing pairwise alignments for de novo assembly.
.TP
\fB\-maxLCPLength\fI l \fR(inf)
The same as \fB\-maxMatch\fR.
.TP
\fB\-maxAnchorsPerPosition\fI m \fR(10000)
Do not add anchors from a position if it matches to more than \fIm\fR locations in the target.
.TP
\fB\-advanceExactMatches\fI E \fR(0)
Another trick for speeding up alignments with match \- E fewer anchors.
Rather than finding anchors between the read and the genome at every position
in the read, when an anchor is found at position i in a read of length L, the
next position in a read to find an anchor is at i+L\-E.
Use this when alignining already assembled contigs.
.TP
\fB\-nCandidates\fI n \fR(10)
Keep up to \fIn\fR candidates for the best alignment.
A large value of n will slow mapping because the slower dynamic programming
steps are applied to more clusters of anchors which can be a rate limiting
step when reads are very long.
.TP
\fB\-concordant\fR (false)
Map all subreads of a zmw (hole) to where the longest full pass subread of
the zmw aligned to. This requires to use the region table and hq regions.
This option only works when reads are in base or pulse h5 format.
.TP
\fB\-concordantTemplate\fR (mediansubread)
Select a full pass subread of a zmw as template for concordant mapping.
longestsubread - use the longest full pass subread
mediansubread - use the median length full pass subread
typicalsubread - use the second longest full pass subread if length of
the longest full pass subread is an outlier
.TP
\fB\-fastMaxInterval\fR (false)
Fast search maximum increasing intervals as alignment candidates. The search
is not as exhaustive as the default, but is much faster.
.TP
\fB\-aggressiveIntervalCut\fR (false)
Agreesively filter out non-promising alignment candidates, if there
exists at least one promising candidate. If this option is turned on,
\fBblasr\fR is likely to ignore short alignments of ALU elements.
.TP
\fB\-fastSDP\fR (false)
Use a fast heuristic algorithm to speed up sparse dynamic programming.
.RE
.B Options for Refining Hits
.RS
.TP
\fB\-sdpTupleSize\fI K \fR(11)
Use matches of length \fIK\fR to speed dynamic programming alignments.
This controls
accuracy of assigning gaps in pairwise alignments once a mapping has been found,
rather than mapping sensitivity itself.
.TP
\fB\-scoreMatrix\fI score matrix string
Specify an alternative score matrix for scoring fasta reads.
The matrix is in the format
.TS
;
L L L L L L .
A C G T N
A a b c d e
C f g h i j
G k l m n o
T p q r s t
N u v w x y
.TE
The values a...y should be input as a quoted space separated
string: "a b c ... y". Lowerf scores are better, so matches should be less
than mismatches e.g. a,g,m,s = \-5 (match), mismatch = 6.
.TP
\fB\-affineOpen\fI value\fR (10)
Set the penalty for opening an affine alignment.
.TP
\fB\-affineExtend\fI a \fR(0)
Change affine (extension) gap penalty. Lower value allows more gaps.
.RE
.B Options for overlap/dynamic programming alignments and pairwise overlap for de novo assembly.
.RS
.TP
\fB\-useQuality\fR (false)
Use substitution/insertion/deletion/merge quality values to score gap and
mismatch penalties in pairwise alignments. Because the insertion and deletion
rates are much higher than substitution, this will make many alignments
favor an insertion/deletion over a substitution.nNaive consensus calling methods
will then often miss substitution polymorphisms. This option should be
used when calling consensus using the Quiver method. Furthermore, when
not using quality values to score alignments, there will be a lower consensus
accuracy in homolymer regions.
.TP
\fB\-affineAlign\fR (false)
Refine alignment using affine guided align.
.RE
.B Options for filtering reads and alignments
.RS
.TP
\fB\-minReadLength\fI l\fR (50)
Skip reads that have a full length less than \fIl\fR. Subreads may be shorter.
.TP
\fB\-minSubreadLength \fIl \fR(0)
Do not align subreads of length less than \fIl\fR.
.TP
\fB\-minRawSubreadScore \fIm \fR(0)
Do not align subreads whose quality score in region table is less than \fIm\fR
(quality scores should be in range [0, 1000]).
.TP
\fB\-maxScore\fI m \fR(\-200)
Maximum score to output (high is bad, negative good).
.TP
\fB\-minAlnLength\fR
(0) Report alignments only if their lengths are greater than minAlnLength.
.HP
\fB\-minPctSimilarity\fR
(0) Report alignments only if their percentage similairty is greater than minPctSimilarity.
.TP
\fB\-minPctAccuracy\fR
(0) Report alignments only if their percentage accuray is greater than minAccuracy.
.RE
.B Options for parallel alignment
.RS
.TP
\fB\-nproc\fI N \fR(1)
Align using \fIN\fR processes. All large data structures such as the suffix array and tuple count table are shared.
.TP
\fB\-start\fI S \fR(0)
Index of the first read to begin aligning.
This is useful when multiple instances are running on the same data,
for example when on a multi-rack cluster.
.TP
\fB\-stride\fI S \fR(1)
Align one read every \fIS\fR reads.
.RE
.B Options for subsampling reads.
.RS
.TP
\fB\-subsample\fR (0)
Proportion of reads to randomly subsample (expressed as a decimal) and align.
.TP
\fB\-holeNumbers\fI LIST
When specified, only align reads whose ZMW hole numbers are in \fILIST\fR.
\fILIST\fR is a comma-delimited string of ranges, such as '1,2,3,10\-13'.
This option only works when reads are in bam, bax.h5 or plx.h5 format.
.RE
.TP
\fB\-h\fR
Print help information.
.SH CITATION
To cite BLASR, please use: Chaisson M.J., and Tesler G., Mapping
single molecule sequencing reads using Basic Local Alignment with
Successive Refinement (BLASR): Theory and Application, BMC
Bioinformatics 2012, 13:238.
.SH BUGS
Please report any bugs to \fIhttps://github.com/PacificBiosciences/blasr/issues\fR.
.SH SEE ALSO
.BR loadPulses (1)
.BR pls2fasta (1)
.BR samFilter (1)
.BR samtoh5 (1)
.BR samtom4 (1)
.BR sawriter (1)
.BR sdpMatcher (1)
.BR toAfg (1)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
blasr (5.3.2-1) UNRELEASED; urgency=medium
blasr (5.3.2+dfsg-1) unstable; urgency=medium
* Team upload
......@@ -13,11 +13,13 @@ blasr (5.3.2-1) UNRELEASED; urgency=medium
* Build system switched from cmake to meson
* Versioned Build-Depends: libblasr-dev
* Cleanup d/rules
* Update manpages
* Mention tools that are not build an more in NEWS.Debian
[ Jelmer Vernooij ]
* Use secure copyright file specification URI.
-- Andreas Tille <tille@debian.org> Tue, 21 Aug 2018 16:37:31 +0200
-- Andreas Tille <tille@debian.org> Mon, 28 Jan 2019 16:10:36 +0100
blasr (5.3+0-2) unstable; urgency=low
......
......@@ -2,6 +2,9 @@ Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: blasr
Upstream-Contact: Pacific Biosciences <devnet@pacificbiosciences.com>
Source: https://github.com/PacificBiosciences/blasr
Files-Excluded: */Darwin
*/Linux
*/win32
Files: *
Copyright: 2011-2016 Pacific Biosciences of California, Inc.
......
#!/bin/sh
MANDIR=debian
mkdir -p $MANDIR
VERSION=`dpkg-parsechangelog | awk '/^Version:/ {print $2}' | sed -e 's/^[0-9]*://' -e 's/-.*//' -e 's/[+~]dfsg$//'`
NAME=`grep "^Description:" debian/control | sed 's/^Description: *//' | head -n1`
PROGNAME=`grep "^Package:" debian/control | sed 's/^Package: *//' | head -n1`
AUTHOR=".SH AUTHOR\nThis manpage was written by $DEBFULLNAME for the Debian distribution and
can be used for any other usage of the program.
"
# If program name is different from package name or title should be
# different from package short description change this here
progname=${PROGNAME}
help2man --no-info --no-discard-stderr \
--name="Map SMRT Sequences to a reference genome" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
progname=sawriter
help2man --no-info --no-discard-stderr --help-option=" " \
--name="generate suffix arrays for nucleotide sequences" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
echo "$MANDIR/*.1" > debian/manpages
cat <<EOT
Please enhance the help2man output.
The following web page might be helpful in doing so:
http://liw.fi/manpages/
EOT
debian/blasr.1
debian/loadPulses.1
debian/pls2fasta.1
debian/samFilter.1
debian/samtoh5.1
debian/samtom4.1
debian/sawriter.1
debian/sdpMatcher.1
debian/toAfg.1
debian/*.1
Author: Andreas Tille <tille@debian.org>
Last-Update: Mon, 28 Jan 2019 15:56:26 +0100
Description: Add some missing libraries to linker
--- a/meson.build
+++ b/meson.build
@@ -51,15 +51,19 @@ blasr_thread_dep = dependency('threads',
......@@ -23,13 +27,3 @@
########################
# sources + executable #
@@ -110,7 +114,8 @@ blasr_main = executable(
install : true,
dependencies : blasr_deps,
link_with : blasr_static_impl,
- cpp_args : [blasr_warning_flags, '-DUSE_PBBAM=1', '-DCMAKE_BUILD=1'])
+ cpp_args : [blasr_warning_flags, '-DUSE_PBBAM=1', '-DCMAKE_BUILD=1'],
+)
blasr_utils_sawriter = executable(
'sawriter', files([
.TH SAWRITER "1" "July 2015" "sawriter 3ca7fe8" "User Commands"
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH SAWRITER "1" "January 2019" "sawriter 5.3.2" "User Commands"
.SH NAME
sawriter \- generate suffix arrays for nucleotide sequences
.SH SYNOPSIS
.B sawriter
.I saOut \0fastaIn
.RI [ fastaIn2
.IR fastaIn3 \0...]
.RB [ \-blt
.IR p ]
.RB [ \-4bit ]
.RB [ \-larsson | \-manmy | \-kar | \-mafe | \-welter ]
saOut fastaIn [fastaIn2 fastaIn3 ...] [\-blt p] [\-larsson] [\-4bit] [\-manmy] [\-kar]
.P
.B sawriter
.I fastaIn \fR(writes to fastaIn.sa)
fastaIn (writes to fastIn.sa).
.SH OPTIONS
.TP
.BI \-blt \0p
Build a lookup table on prefixes of length \fIp\fR. This speeds
\fB\-blt\fR p Build a lookup table on prefixes of length 'p'. This speeds
.IP
up lookups considerably (more than the LCP table), but misses matches
less than \fIp\fR when searching.
less than p when searching.
.TP
.B \-4bit
\fB\-4bit\fR
Read in (one) fasta file as a compressed sequence file.
.TP
.B Methods
.RS
.B \-larsson
\fB\-larsson\fR
(default) Uses the method of Larsson and Sadakane to build the array.
.TP
.B \-mamy
\fB\-mamy\fR
Uses the method of MAnber and MYers to build the array (slower than larsson,
.IP
and produces the same result. This is mainly for double checking
the correctness of larsson).
.TP
.B \-kark
Use Karkkainen DS3 method for building the suffix array.
This will probably be slower than larsson, but takes only an extra
N/(sqrt 3) extra space.
\fB\-kark\fR
Use Karkkainen DS3 method for building the suffix array. This will probably be more
slow than larsson, but takes only an extra N/(sqrt 3) extra space.
.TP
.B \-mafe
\fB\-mafe\fR
(disabled for now!) Use the lightweight construction algorithm from Manzini and Ferragina
.TP
.B \-welter
Use lightweight (sort of light) suffix array construction.
This is a bit more slow than normal larsson.
.RE
\fB\-welter\fR
Use lightweight (sort of light) suffix array construction. This is a bit more slow than
normal larsson.
.TP
.BI \-welterweight \0N
use a difference cover of size \fIN\fR for building the suffix array.
\fB\-welterweight\fR N use a difference cover of size N for building the suffix array.
Valid values are 7,32,64,111, and 2281.
.SH SEE ALSO
.BR blasr (1)
.BR loadPulses (1)
.BR pls2fasta (1)
.BR samFilter (1)
.BR samtoh5 (1)
.BR samtom4 (1)
.BR sdpMatcher (1)
.BR toAfg (1)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
version=4
https://github.com/PacificBiosciences/blasr/releases .*/archive/@ANY_VERSION@@ARCHIVE_EXT@
opts="repacksuffix=+dfsg,dversionmangle=auto,repack,compression=xz" \
https://github.com/PacificBiosciences/blasr/releases .*/archive/v?@ANY_VERSION@@ARCHIVE_EXT@