Skip to content
Commits on Source (13)
Copyright (c) 2011-2013, Pacific Biosciences of California, Inc.
Copyright (c) 2016, Pacific Biosciences of California, Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Pacific Biosciences nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE
GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC
BIOSCIENCES AND ITS CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR ITS CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY
THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC BIOSCIENCES AND ITS
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR
ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
......@@ -17,13 +17,20 @@ develop:
test:
# Unit tests
find tests/unit -name "*.py" | xargs nosetests
#find tests/unit -name "*.py" | xargs nosetests
nosetests --verbose tests/unit/*.py
# End-to-end tests
@echo pbalign cram tests require blasr installed.
find tests/cram -name "*.t" | xargs cram
h5test:
# Tests for pre-3.0 smrtanalysis when default file formats are *.h5
@echo pbalign h5 tests require blasr, samtoh5, loadPulses, samFilter and etc installed.
nosetests --verbose tests/unit_h5/*.py
find tests/cram_h5 -name "*.t" | xargs cram -v
doc:
sphinx-apidoc -T -f -o doc src/ && cd doc && make html
sphinx-apidoc -T -f -o doc pbalign/ && cd doc && make html
docs: doc
......
###pbalign maps PacBio reads to reference sequences.###
pbalign maps PacBio reads to reference sequences.
Want to know how to install and run pbalign?
**Q: Want to know how to install and run pbalign?**
A: Please refer to [pbalign readme document](https://github.com/PacificBiosciences/pbalign/blob/master/doc/howto.rst)
**Q: 'pbalign.py' does not work?**
A: The main script has been changed from 'pbalign.py' to 'pbalign'. Please use 'pbalign' instead.
**Q: Can pbalign handle large datasets with many movies?**
A: pbalign is not designed to handle large datasets, you should follow a [divide and conquer way](https://github.com/PacificBiosciences/pbalign/wiki/Tutorial:-How-to-divide-and-conquer-large-datasets-using-pbalign) to align many movies to a reference.
Please refer to https://github.com/PacificBiosciences/pbalign/blob/master/doc/howto.rst
pbalign (0.3.0-1) unstable; urgency=medium
* New upstream release (corresponds to smrtanalysis-4.0.0 tag)
* Readability improvements and clean-ups in d/control and d/rules
* Update email address and copyright year
* Bump Standards-Version to 3.9.8
* Use encrypted protocols for VCS URLs
* Add build-dependency on pbcommand
* Drop patch applied upstream
* Don't install LICENSES file
* Set blasr minimum version
-- Afif Elghraoui <afif@debian.org> Sun, 15 Jan 2017 15:49:13 -0800
pbalign (0.2.0-1~bpo8+1) jessie-backports; urgency=medium
* Rebuild for jessie-backports.
......
......@@ -2,31 +2,36 @@ Source: pbalign
Section: python
Priority: optional
Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
Uploaders: Afif Elghraoui <afif@ghraoui.name>
Build-Depends: debhelper (>= 9),
Uploaders: Afif Elghraoui <afif@debian.org>
Build-Depends:
debhelper (>= 9),
dh-python,
python-all,
python-setuptools,
python-pbcore (>= 0.8.5),
python-pbcommand (>= 0.2.0),
python-sphinx,
help2man
Standards-Version: 3.9.6
help2man,
Standards-Version: 3.9.8
Homepage: https://github.com/PacificBiosciences/pbalign
Vcs-Git: git://anonscm.debian.org/debian-med/pbalign.git
Vcs-Browser: http://anonscm.debian.org/cgit/debian-med/pbalign.git
Vcs-Git: https://anonscm.debian.org/git/debian-med/pbalign.git
Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/pbalign.git
Package: pbalign
Architecture: all
Depends: ${misc:Depends},
Depends:
${misc:Depends},
${python:Depends},
python-pbalign,
python-pkg-resources,
blasr
Recommends: python-pbh5tools,
hdf5-tools
Suggests: bowtie2,
blasr (>= 5.3+0),
Recommends:
python-pbh5tools,
hdf5-tools,
Suggests:
bowtie2,
gmap,
pbalign-doc
pbalign-doc,
Description: map Pacific Biosciences reads to reference DNA sequences
pbalign aligns PacBio reads to reference sequences, filters aligned
reads according to user-specific filtering criteria, and converts the
......@@ -37,14 +42,17 @@ Description: map Pacific Biosciences reads to reference DNA sequences
Package: python-pbalign
Architecture: all
Depends: ${misc:Depends},
Depends:
${misc:Depends},
${python:Depends},
blasr
Recommends: python-pbh5tools,
hdf5-tools
Suggests: bowtie2,
blasr (>= 5.3+0),
Recommends:
python-pbh5tools,
hdf5-tools,
Suggests:
bowtie2,
gmap,
pbalign-doc
pbalign-doc,
Description: map Pacific Biosciences reads to reference DNA sequences (Python2)
pbalign aligns PacBio reads to reference sequences, filters aligned
reads according to user-specific filtering criteria, and converts the
......@@ -57,9 +65,10 @@ Description: map Pacific Biosciences reads to reference DNA sequences (Python2)
Package: pbalign-doc
Section: doc
Architecture: all
Depends: ${misc:Depends},
Depends:
${misc:Depends},
libjs-jquery,
libjs-underscore
libjs-underscore,
Description: documentation for pbalign
pbalign aligns PacBio reads to reference sequences, filters aligned
reads according to user-specific filtering criteria, and converts the
......
......@@ -8,7 +8,7 @@ Copyright: 2011-2014 Pacific Biosciences of California, Inc.
License: PacBio-BSD-3-Clause
Files: debian/*
Copyright: 2015 Afif Elghraoui <afif@ghraoui.name>
Copyright: 2015-2016 Afif Elghraoui <afif@debian.org>
License: PacBio-BSD-3-Clause
License: PacBio-BSD-3-Clause
......
Description: Fix source folder name in Makefile
The sphinx-apidoc call targets a nonexistent src/ folder
Author: Afif Elghraoui <afif@ghraoui.name>
Forwarded: https://github.com/PacificBiosciences/pbalign/pull/12
Last-Update: 2015-09-05
--- python-pbalign.orig/Makefile
+++ python-pbalign/Makefile
@@ -23,7 +23,7 @@
find tests/cram -name "*.t" | xargs cram
doc:
- sphinx-apidoc -T -f -o doc src/ && cd doc && make html
+ sphinx-apidoc -T -f -o doc pbalign/ && cd doc && make html
docs: doc
#!/usr/bin/make -f
#DH_VERBOSE = 1
DPKG_EXPORT_BUILDFLAGS = 1
export LC_ALL=C.UTF-8
include /usr/share/dpkg/default.mk
export PYBUILD_NAME = pbalign
......@@ -13,7 +12,7 @@ HELP2MAN = help2man --no-info --version-string $(DEB_VERSION_UPSTREAM)
MANDIR = $(CURDIR)/debian/$(DEB_SOURCE)/usr/share/man/man1
%:
LC_ALL=C.UTF-8 dh $@ --with=python2 --buildsystem=pybuild
dh $@ --with=python2 --buildsystem=pybuild
override_dh_auto_build:
dh_auto_build
......
......@@ -37,8 +37,7 @@ required,
The following software is optionally required if ``--forQuiver`` option
will be used to convert the output Compare HDF5 file to be compatible
with Quiver.
- ``pbh5tools.cmph5tools``, a PacBio Bioinformatics tools that manipulates
Compare HDF5 files.
- ``pbh5tools.cmph5tools``, a PacBio Bioinformatics tools that manipulates Compare HDF5 files.
- ``h5repack``, a HDF5 tool to compress and repack HDF5 files.
The default aligner that pbalign uses is ``blasr``. If you want to use
......@@ -140,9 +139,9 @@ Before installing pbcore, you may need to install numpy and h5py from ::
, or if you have root permission on Ubuntu, do ::
$ git install numpy
$ pip install numpy
$ sudo apt-get install libhdf5-serial-dev
$ git install h5py
$ pip install h5py
To install pbcore, execute ::
......
......@@ -34,7 +34,7 @@
from __future__ import absolute_import
_changelist = "$Change: 141024 $"
_changelist = "$Change: 173392 $"
def _get_changelist(perforce_str):
......@@ -60,7 +60,7 @@ def get_dir():
"""Return lib directory."""
return op.dirname(op.realpath(__file__))
VERSION = (0, 2, 0, get_changelist())
VERSION = (0, 3, 0)
def get_version():
......
......@@ -38,6 +38,7 @@ from copy import copy
from pbalign.options import importDefaultOptions
from pbalign.utils.tempfileutil import TempFileManager
from pbalign.service import Service
from pbalign.utils.fileutil import getFileFormat, FILE_FORMATS
class AlignService (Service):
......@@ -184,8 +185,11 @@ class AlignService (Service):
self._tempFileManager,
self._fileNames.isWithinRepository)
outFormat = getFileFormat(self._fileNames.outputFileName)
suffix = ".bam" if (outFormat == FILE_FORMATS.BAM or
outFormat == FILE_FORMATS.XML) else ".sam"
self._fileNames.alignerSamOut = self._tempFileManager.\
RegisterNewTmpFile(suffix=".sam")
RegisterNewTmpFile(suffix=suffix)
# Generate and execute cmd.
try:
......
......@@ -35,7 +35,7 @@
from __future__ import absolute_import
from pbalign.alignservice.align import AlignService
from pbalign.utils.fileutil import FILE_FORMATS, real_upath
from pbalign.utils.fileutil import FILE_FORMATS, real_upath, getFileFormat
import logging
......@@ -96,54 +96,60 @@ class BlasrService(AlignService):
ignoredBinaryOptions = ['-m', '-out', '-V']
ignoredUnitaryOptions = ['-h', '--help', '--version',
'-v', '-vv', '-sam']
'-v', '-vv', '--sam', '--bam']
items = self.__parseAlgorithmOptionItems(options.algorithmOptions)
i = 0
try:
while i < len(items):
infoMsg, errMsg, item = "", "", items[i]
if item == "-sa":
if item == "--sa":
val = real_upath(items[i+1])
if fileNames.sawriterFileName != val:
infoMsg = "Over write sa file with {0}".format(val)
fileNames.sawriterFileName = val
elif item == "-regionTable":
elif item == "--regionTable":
val = real_upath(items[i+1])
if fileNames.regionTable != val:
infoMsg = "Over write regionTable with {0}.\n"\
.format(val)
fileNames.regionTable = val
elif item == "-bestn":
elif item == "--bestn":
val = int(items[i+1])
if options.maxHits is not None and \
int(options.maxHits) != val:
errMsg = "blasr -bestn specified within " + \
errMsg = "blasr --bestn specified within " + \
"--algorithmOptions is equivalent to " + \
"--maxHits. Conflicting values of " + \
"--algorithmOptions '-bestn' and " +\
"--algorithmOptions '--bestn' and " +\
"--maxHits have been found."
else:
options.maxHits = val
elif item == "-minMatch":
elif item == "--minMatch":
val = int(items[i+1])
if options.minAnchorSize is not None and \
int(options.minAnchorSize) != val:
errMsg = "blasr -minMatch specified within " + \
errMsg = "blasr --minMatch specified within " + \
"--algorithmOptions is equivalent to " + \
"--minAnchorSize. Conflicting values " + \
"of --algorithmOptions '-minMatch' and " + \
"of --algorithmOptions '--minMatch' and " + \
"--minAnchorSize have been found."
else:
options.minAnchorSize = val
elif item == "-nproc":
elif item == "--maxMatch":
val = int(items[i+1])
if options.maxMatch is not None and \
int(options.maxMatch) != val:
infoMsg = "Override maxMatch with {n}.".format(n=val)
options.maxMatch = val
elif item == "--nproc":
val = int(items[i+1])
# The number of threads is not critical.
if options.nproc is None or \
int(options.nproc) != val:
infoMsg = "Over write nproc with {n}.".format(n=val)
options.nproc = val
elif item == "-noSplitSubreads":
elif item == "--noSplitSubreads":
if not options.noSplitSubreads:
infoMsg = "Over write noSplitSubreads with True."
logging.info(self.name +
......@@ -151,22 +157,25 @@ class BlasrService(AlignService):
options.noSplitSubreads = True
del items[i]
continue
elif item == "-concordant":
elif item == "--concordant":
if not options.concordant:
infoMsg = "Over writer concordant with True."
logging.info(self.name +
": Resolve algorithmOptions. " + infoMsg)
options.concordant = True
del items[i]
elif "-useccs" in item: # -useccs, -useccsall, -useccsdenovo
val = item.lstrip('-')
elif "--useccs" in item: # -useccs, -useccsall, -useccsdenovo
val = item.lstrip('--')
if options.useccs != val and options.useccs is not None:
errMsg = "Found conflicting options in " + \
"--algorithmOptions '{v}' \nand --useccs={u}"\
.format(v=item, u=options.useccs)
else:
options.useccs = val
elif item == "-seed" or item == "-randomSeed":
elif item == "--unaligned":
val = str(items[i+1])
options.unaligned = val
elif item == "--seed" or item == "--randomSeed":
val = int(items[i+1])
if options.seed is None or int(options.seed) != val:
infoMsg = "Overwrite random seed with {0}.".format(val)
......@@ -195,6 +204,15 @@ class BlasrService(AlignService):
logging.error(errMsg + str(e))
raise ValueError(errMsg + str(e))
# Existing suffix array always uses match size 8.
# When BLASR search option -minMatch is less than 8, suffix array needs
# to be created on the fly.
if (options.minAnchorSize is not None and options.minAnchorSize != "" and
int(options.minAnchorSize) < 8):
logging.warning("Suffix array must be recreated on the fly when " +
"minMatch < 8, which may take a long time.")
fileNames.sawriterFileName = None
# Update algorithmOptions when resolve is done
options.algorithmOptions = " ".join(items)
return options
......@@ -211,57 +229,84 @@ class BlasrService(AlignService):
Output:
a command-line string which can be used in bash.
"""
cmdStr = "blasr {queryFile} {targetFile} -sam -out {outFile} ".format(
cmdStr = "blasr {queryFile} {targetFile} --out {outFile} ".format(
queryFile=fileNames.queryFileName,
targetFile=fileNames.targetFileName,
outFile=fileNames.alignerSamOut)
if getFileFormat(fileNames.alignerSamOut) == FILE_FORMATS.BAM:
cmdStr += " --bam "
else:
cmdStr += " --sam "
if ((fileNames.sawriterFileName is not None) and
(fileNames.sawriterFileName != "")):
cmdStr += " -sa {sawriter} ".format(
cmdStr += " --sa {sawriter} ".format(
sawriter=fileNames.sawriterFileName)
if ((fileNames.regionTable != "") and
(fileNames.regionTable is not None)):
cmdStr += " -regionTable {regionTable} ".format(
cmdStr += " --regionTable {regionTable} ".format(
regionTable=fileNames.regionTable)
if options.maxHits is not None and options.maxHits != "":
cmdStr += " -bestn {n}".format(n=options.maxHits)
cmdStr += " --bestn {n}".format(n=options.maxHits)
if (options.minAnchorSize is not None and
options.minAnchorSize != ""):
cmdStr += " -minMatch {0} ".format(options.minAnchorSize)
cmdStr += " --minMatch {0} ".format(options.minAnchorSize)
if (options.maxMatch is not None and options.maxMatch != ""):
cmdStr += " --maxMatch {0} ".format(options.maxMatch)
if options.nproc is not None and options.nproc != "":
cmdStr += " -nproc {0} ".format(options.nproc)
cmdStr += " --nproc {0} ".format(options.nproc)
# Specify filter criteira and hit policy.
if options.minLength is not None:
cmdStr += " -minSubreadLength {n} -minReadLength {n} ".\
cmdStr += " --minSubreadLength {n} --minAlnLength {n} ".\
format(n=options.minLength)
if options.maxDivergence is not None:
maxDivergence = int(options.maxDivergence if options.maxDivergence
> 1.0 else (options.maxDivergence * 100))
cmdStr += " --minPctSimilarity {0}".format(100 - maxDivergence)
if options.minAccuracy is not None:
minAccuracy = int(options.minAccuracy if options.minAccuracy > 1.0
else (options.minAccuracy * 100))
cmdStr += " --minPctAccuracy {0}".format(minAccuracy)
if options.scoreCutoff is not None:
cmdStr += " --maxScore {0}".format(options.scoreCutoff)
cmdStr += " --hitPolicy {0} ".format(options.hitPolicy)
if options.noSplitSubreads:
cmdStr += " -noSplitSubreads "
cmdStr += " --noSplitSubreads "
if options.concordant:
cmdStr += " -concordant "
cmdStr += " --concordant "
if options.seed is not None and options.seed != 0:
cmdStr += " -randomSeed {0} ".format(options.seed)
cmdStr += " --randomSeed {0} ".format(options.seed)
if options.hitPolicy == "randombest":
cmdStr += " -placeRepeatsRandomly "
#if options.hitPolicy == "randombest":
# cmdStr += " --placeRepeatsRandomly "
if options.useccs is not None and options.useccs != "":
cmdStr += " -{0} ".format(options.useccs)
cmdStr += " --{0} ".format(options.useccs)
# When input is a FASTA file, blasr -clipping = soft
if fileNames.inputFileFormat == FILE_FORMATS.FASTA:
cmdStr += " -clipping soft "
cmdStr += " --clipping soft "
if options.algorithmOptions is not None:
cmdStr += " {0} ".format(options.algorithmOptions)
if options.unaligned is not None:
cmdStr += " --unaligned {f} --noPrintUnalignedSeqs".format(f=options.unaligned)
return cmdStr
def _preProcess(self, inputFileName, referenceFile=None,
......
......@@ -29,75 +29,109 @@
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
###############################################################################
"""Define LoadPulseService class, which calls loadPulses to load PacBio
pulse metrics to a cmp.h5 file. Five metrics, inclduing DeletionQV,
DeletionTag, InsertionQV, MergeQV, and SubstitutionQV, are loaded by default
unless --metrics is specified."""
"""This script defines BamPostService, which
* calls 'samtools sort' to sort out.bam, and
* calls 'samtools index' to make out.bai index, and
* calls 'makePbi.py' to make out.pbi index file.
"""
# Author: Yuan Li
from __future__ import absolute_import
import logging
from pbalign.service import Service
from pbalign.utils.progutil import Execute
class LoadPulsesService(Service):
"""
LoadPulsesService calls loadPulses to load PacBio pulse information to a
cmp.h5 file.
"""
class BamPostService(Service):
"""Sort a bam, makes bam index and PacBio index."""
@property
def name(self):
"""Name of LoadPulsesService."""
return "LoadPulsesService"
"""Name of bam post service."""
return "BamPostService"
@property
def progName(self):
"""Program to call."""
return "loadPulses"
# Quiver only uses the following five metrics.
def __init__(self, basFofnFile, cmpFile, options):
"""Initialize a LoadPulsesService object.
Input:
basFofnFile: the input BASE.H5 (or fofn) files
cmpFile : an input CMP.H5 file
options : pbalign options
"""
self.basFofnFile = basFofnFile
self.cmpFile = cmpFile
self.options = options
return "samtools"
@property
def cmd(self):
"""String of a command-line to execute."""
return self._toCmd(self.basFofnFile, self.cmpFile)
def _toCmd(self, basFofnFile, cmpFile):
"""Generate a loadPulses command line.
Input:
basFofnFile: a BAX/PLX.H5 (or fofn) file with pulses
cmpFile : an input CMP.H5 file
Output:
a command-line string
return ""
def __init__(self, filenames):
"""Initialize a BamPostService object.
Input - unsortedBamFile: a filtered, unsorted bam file
refFasta : a reference fasta file
Output - sortedBamFile: sorted BAM file
outBaiFile: index BAI file
"""
cmdStr = self.progName + \
" {basFofnFile} {cmpFile} ".format(
basFofnFile=basFofnFile, cmpFile=cmpFile)
metrics = self.options.metrics.replace(" ", "")
cmdStr += " -metrics {metrics} ".format(metrics=metrics)
if self.options.byread:
cmdStr += " -byread "
return cmdStr
self.refFasta = filenames.targetFileName
# filtered, unsorted bam file.
self.unsortedBamFile = filenames.filteredSam
self.outBamFile = filenames.outBamFileName
self.outBaiFile = filenames.outBaiFileName
self.outPbiFile = filenames.outPbiFileName
def _sortbam(self, unsortedBamFile, sortedBamFile):
"""Sort unsortedBamFile and output sortedBamFile."""
if not sortedBamFile.endswith(".bam"):
raise ValueError("sorted bam file name %s must end with .bam" %
sortedBamFile)
sortedPrefix = sortedBamFile[0:-4]
cmd = 'samtools --version'
_samtoolsversion = ["0","1","19"]
try:
_out, _code, _msg = Execute(self.name, cmd)
if "samtools" in _out[0]:
_samtoolsversion = str(_out[0][8:]).strip().split('.')
else:
pass
except Exception:
pass
_stvmajor = int(_samtoolsversion[0])
if _stvmajor >= 1:
cmd = 'samtools sort -m 4G -o {sortedBamFile} {unsortedBamFile}'.format(
sortedBamFile=sortedBamFile, unsortedBamFile=unsortedBamFile)
else:
cmd = 'samtools sort -m 4G {unsortedBamFile} {prefix}'.format(
unsortedBamFile=unsortedBamFile, prefix=sortedPrefix)
Execute(self.name, cmd)
def _makebai(self, sortedBamFile, outBaiFile):
"""Build *.bai index file."""
cmd = 'samtools --version'
_samtoolsversion = ["0","1","19"]
try:
_out, _code, _msg = Execute(self.name, cmd)
if "samtools" in _out[0]:
_samtoolsversion = str(_out[0][8:]).strip().split('.')
else:
pass
except Exception:
pass
_stvmajor = int(_samtoolsversion[0])
_stvminor = int(_samtoolsversion[1])
if _stvmajor == 1 and _stvminor == 2:
# only for 1.2
cmd = "samtools index {sortedBamFile}".format(
sortedBamFile=sortedBamFile)
else:
cmd = "samtools index {sortedBamFile} {outBaiFile}".format(
sortedBamFile=sortedBamFile, outBaiFile=outBaiFile)
Execute(self.name, cmd)
def _makepbi(self, sortedBamFile):
"""Generate *.pbi PacBio BAM index."""
cmd = "pbindex %s" % sortedBamFile
Execute(self.name, cmd)
def run(self):
"""Run the loadPulses service."""
logging.info(self.name + ": Load pulses using {progName}.".
format(progName=self.progName))
return self._execute()
""" Run the BAM post-processing service. """
logging.info(self.name + ": Sort and build index for a bam file.")
self._sortbam(unsortedBamFile=self.unsortedBamFile,
sortedBamFile=self.outBamFile)
self._makebai(sortedBamFile=self.outBamFile,
outBaiFile=self.outBaiFile)
self._makepbi(sortedBamFile=self.outBamFile)
"""
VERY thin wrapper on top of pbalign to provide a tool contract-driven task
specific to CCS reads.
"""
import sys
from pbcommand.models import FileTypes
from pbalign import pbalignrunner
import pbalign.options
class Constants(pbalign.options.Constants):
TOOL_ID = "pbalign.tasks.pbalign_ccs"
DRIVER_EXE = "python -m pbalign.ccs --resolved-tool-contract"
INPUT_FILE_TYPE = FileTypes.DS_CCS
OUTPUT_FILE_TYPE = FileTypes.DS_ALIGN_CCS
# some modified defaults
ALGORITHM_OPTIONS_DEFAULT = "--minMatch 12 --bestn 10 --minPctSimilarity 70.0"
def get_parser():
return pbalign.options.get_contract_parser(Constants, ccs_mode=True)
def main(argv=sys.argv):
return pbalignrunner.main(
argv=argv,
get_parser_func=get_parser,
contract_runner_func=pbalignrunner.resolved_tool_contract_runner_ccs)
if __name__ == "__main__":
sys.exit(main())
......@@ -37,8 +37,7 @@ in an input SAM file according to filtering criteria."""
from __future__ import absolute_import
import logging
from pbalign.service import Service
from pbalign.utils.fileutil import isExist
from pbalign.utils.fileutil import getFileFormat, FILE_FORMATS, isExist
class FilterService(Service):
""" Call samFilter to filter low quality hits and apply multiple hits
......@@ -54,22 +53,22 @@ class FilterService(Service):
return "samFilter"
def __init__(self, inSamFile, refFile, outSamFile,
alnServiceName, scoreSign, options,
alignerName, scoreSign, options,
adapterGffFile=None):
"""Initialize a FilterService object.
Input:
inSamFile: an input SAM file
inSamFile: an input SAM/BAM file
refFile : the reference FASTA file
outSAM : an output SAM file
outSAM : an output SAM/BAM file
alnServiceName: the name of the align service
scoreSign: score sign of the aligner, can be -1 or 1
options : pbalign options
adapterGffFile: a GFF file storing all the adapters
"""
self.inSamFile = inSamFile
self.inSamFile = inSamFile # sam|bam
self.refFile = refFile
self.outSamFile = outSamFile
self.alnServiceName = alnServiceName
self.outSamFile = outSamFile # sam|bam
self.alignerName = alignerName
self.scoreSign = scoreSign
self.options = options
self.adapterGffFile = adapterGffFile
......@@ -78,23 +77,32 @@ class FilterService(Service):
def cmd(self):
"""String of a command-line to execute."""
return self._toCmd(self.inSamFile, self.refFile,
self.outSamFile, self.alnServiceName,
self.outSamFile, self.alignerName,
self.scoreSign, self.options,
self.adapterGffFile)
def _toCmd(self, inSamFile, refFile, outSamFile,
alnServiceName, scoreSign, options, adapterGffFile):
alignerName, scoreSign, options, adapterGffFile):
""" Generate a samFilter command line from options.
Input:
inSamFile : the input SAM file
refFile : the reference FASTA file
outSamFile: the output SAM file
alnServiceName: aligner service name
alignerName: aligner service name
scoreSign : score sign, can be -1 or 1
options : argument options
Output:
a command-line string
"""
# blasr supports in-line alignment filteration,
# no need to call samFilter at all.
if alignerName == "blasr" and \
not self.options.filterAdapterOnly:
cmdStr = "rm -f {outFile} && ln -s {inFile} {outFile}".format(
inFile=inSamFile, outFile=outSamFile)
return cmdStr
# if aligner is not blasr, call samFilter instead
cmdStr = self.progName + \
" {inSamFile} {refFile} {outSamFile} ".format(
inSamFile=inSamFile,
......@@ -121,7 +129,7 @@ class FilterService(Service):
cmdStr += " -scoreSign {0}".format(scoreSign)
else:
logging.error("{0}'s score sign is neither 1 nor -1.".format(
alnServiceName))
alignerName))
if options.scoreCutoff is not None:
cmdStr += " -scoreCutoff {0}".format(options.scoreCutoff)
......@@ -133,6 +141,7 @@ class FilterService(Service):
isExist(adapterGffFile):
cmdStr += " -filterAdapterOnly {gffFile}".format(
gffFile=adapterGffFile)
return cmdStr
def run(self):
......
#!/usr/bin/env python
###############################################################################
# Copyright (c) 2011-2013, Pacific Biosciences of California, Inc.
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Pacific Biosciences nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY
# THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC BIOSCIENCES AND ITS
# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
# NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR
# ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
###############################################################################
# Author: Yuan Li
"""Initialization."""
from __future__ import absolute_import
#!/usr/bin/env python
###############################################################################
# Copyright (c) 2011-2013, Pacific Biosciences of California, Inc.
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Pacific Biosciences nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY
# THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC BIOSCIENCES AND ITS
# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
# NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR
# ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
###############################################################################
"""This script defines ForQuiverService, which post-processes a cmp.h5 file so
that it can be used by quiver directly. ForQuiverService sorts the file, loads
pulse information to it and finally repacks it.."""
# Author: Yuan Li
from __future__ import absolute_import
import logging
from pbalign.forquiverservice.sort import SortService
from pbalign.forquiverservice.loadpulses import LoadPulsesService
from pbalign.forquiverservice.loadchemistry import LoadChemistryService
from pbalign.forquiverservice.repack import RepackService
class ForQuiverService(object):
"""
Uses SortService, LoadPulsesService, LoadChemistryService,
RepackService to post process a cmp.h5 file so that the file can
be used by quiver directly.
"""
@property
def name(self):
"""Name of ForQuiverService."""
return "ForQuiverService"
def __init__(self, fileNames, options):
"""Initialize a ForQuiverService object.
Input:
fileNames : pbalign file names
options : pbalign options
"""
self.fileNames = fileNames
self.options = options
self._loadpulsesService = LoadPulsesService(
self.fileNames.pulseFileName,
self.fileNames.outputFileName,
self.options)
self._loadchemistryService = LoadChemistryService(
self.fileNames.pulseFileName,
self.fileNames.outputFileName,
self.options)
self._sortService = SortService(
self.fileNames.outputFileName,
self.options)
self._repackService = RepackService(
self.fileNames.outputFileName,
self.fileNames.outputFileName + ".TMP")
def run(self):
""" Run the ForQuiver service."""
logging.info(self.name + ": Sort.")
self._sortService.checkAvailability()
self._sortService.run()
logging.info(self.name + ": LoadPulses.")
self._loadpulsesService.checkAvailability()
self._loadpulsesService.run()
logging.info(self.name + ": LoadChemistry.")
self._loadchemistryService.checkAvailability()
self._loadchemistryService.run()
logging.info(self.name + ": Repack.")
self._repackService.checkAvailability()
self._repackService.run()
#!/usr/bin/env python
###############################################################################
# Copyright (c) 2011-2013, Pacific Biosciences of California, Inc.
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Pacific Biosciences nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY
# THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC BIOSCIENCES AND ITS
# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
# NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR
# ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
###############################################################################
from __future__ import absolute_import
import logging
from pbalign.service import Service
class LoadChemistryService(Service):
@property
def name(self):
return "LoadChemistryService"
@property
def progName(self):
return "loadChemistry.py"
# Quiver only uses the following five metrics.
def __init__(self, basFofnFile, cmpFile, options):
"""
Input:
basFofnFile: the input BASE.H5 (or fofn) files
cmpFile : an input CMP.H5 file
options : pbalign options
"""
self.basFofnFile = basFofnFile
self.cmpFile = cmpFile
self.options = options
@property
def cmd(self):
"""String of a command-line to execute."""
return self._toCmd(self.basFofnFile, self.cmpFile)
def _toCmd(self, basFofnFile, cmpFile):
"""
Generate a loadChemistry command line.
"""
cmdStr = self.progName + \
" {basFofnFile} {cmpFile} ".format(
basFofnFile=basFofnFile, cmpFile=cmpFile)
return cmdStr
def run(self):
"""Run the loadChemistry service."""
logging.info(self.name + ": Load pulses using {progName}.".
format(progName=self.progName))
return self._execute()