Skip to content
Commits on Source (6)
sudo: false
dist: xenial
language: python
cache:
directories:
- "$HOME/.cache/pip"
python:
- 3.4.3
- 3.5
- 3.6
matrix:
include:
- python: 3.7
dist: xenial
sudo: true
- "3.4.5"
- "3.5"
- "3.6"
- "3.7"
- "3.8"
install:
- pip install --upgrade pip wheel
- pip install Cython
......
# Changes
v1.1.22 (dev)
-------------
v1.1.24 (2019.11.27)
* Fix #87 - Python 3.8 compatibility, time.clock removed
v1.1.23 (2019.11.22)
* Set minimum Python version to 3.4.5
* Fixed #86 - no trimming performed for single-end BAM file
v1.1.22 (2019.05.20)
* Documentation fixes (#79)
* Fix for index error when running `detect` command (#80)
v1.1.21 (2018.11.24)
--------------------
* Added long options for paired adapter parameters.
v1.1.20 (2018.11.21)
--------------------
* Fix #74: Make pysam open SAM/BAM files with check_sq=False by default
* Fixed setup.py to correctly include README for PyPI.
v1.1.19 (2018.05.16)
--------------------
* Fix #68: Error when using insert aligner with adapters of different lengths
v1.1.18 (2018.03.16)
--------------------
* Added two new tools to the benchmarks:
* fastp
* Cutadapt
......@@ -27,78 +36,78 @@ v1.1.18 (2018.03.16)
* Fix #64: InsertAligner not respecting match_adapter_wildcards and match_read_wildcards options.
v1.1.17 (2018.01.13)
--------------------
* Fix #51: Reads of different lengths not error corrected.
v1.1.16 (2018.01.07)
--------------------
* Fix for #57: LegacyReport stops on adapter with no trimmed reads, and LegacyReport errors when histogram data is None. Thanks to @cokelaer and @pyMyt1!
* Fix for #58: NextSeqTrimmer not trimming from both ends. Thanks to @pkMyt1!
v1.1.15 (2017.09.28)
--------------------
* Fix for #41: Error when using progress bar.
* Fix for #42: Discordance between Cutadapt and Atropos in number of expected events.
* Added '--alphabet' option. Set to 'dna' to validate input sequences against the allowed DNA characters (A/C/G/T/N). This fixes #43 and partially fixes #44.
* Fixed #44: Uncaught errors not being logged.
v1.1.14 (2017.09.19)
--------------------
* Fix for #39: miRNA option error (thanks to @mottodora)
* Fix for #37: fixes overflow error when computing RandomMatchProbability on long reads (>150 bp)
v1.1.13 (2017.09.14)
--------------------
* Fix for #38: Atropos fails with MultiCore error when using OrderPreservingWriterResultsHandler (thanks to @cshenanigans)
v1.1.12 (2017.08.15)
--------------------
* Expose --min-frequency and --min-contaminant-match-frac options to 'detect -d heuristic' command.
* Expose --min-kmer-match-frac option to 'detect -d known' command.
* Fixed #35: using incorrect metric to determine match fraction in 'detect -d known' command.
v1.1.11 (2017.08.15)
--------------------
* Fixed #34: JSON report output not working with SRA streaming.
v1.1.10 (2017.08.09)
--------------------
* Improve debugging messages
v1.1.9 (2017.08.01)
-------------------
* Fix #30: failure when using --preserve-order option
v1.1.8 (2017.07.10)
-------------------
* Add --config option for specifying options in a config file.
* Fix for #29: allow paired-end quality and N trimming without adapter trimming.
* Removed twine register command from make release
v1.1.7 (2017.06.01)
-------------------
* Stream reads directly from an SRA accession for any atropos command using the
-sra option.
* Add detect option to specify the bases that signify when the sequencer has
read past the end of a fragment.
v1.1.6 (2017.05.30)
-------------------
* Add FASTA output for detect command, and enable json, yaml, and pickle output for all commands.
v1.1.5 (2017.05.18)
-------------------
* Major update to the documentation.
* Fixed error messages in multi-threaded mode.
* Fixed bug when generating reports for runs involving error correction.
v1.1.4 (2017.05.02)
-------------------
* Exposed option to set PRNG seed when subsampling reads.
* Fixed issue #14: 'detect' and 'error' commands were broken. This involved rewriting those commands to use the same pipeline and reporting frameworks as the 'trim' and 'qc' commands.
v1.1.3 (2017.05.01)
-------------------
* Updated Dockerfile to use smaller, Alpine-based image.
* Added Docker image for v1.1.2 to Docker Hub.
* Updated Travis config to automatically build Docker images for each release.
......@@ -107,7 +116,6 @@ v1.1.3 (2017.05.01)
* Fixed #13: unnecessary differences in summary output between Cutadapt and Atropos.
v1.1.2 (2017.04.12)
-------------------
* New 'qc' command computes read-level statistics.
* The 'trim' command can also compute read-level statistic pre- and/or post-trimming using the new '--stats' option.
......@@ -125,7 +133,6 @@ v1.1.2 (2017.04.12)
* Ported some recent enhancments over from Cutadapt.
v1.0.23 (2016.12.07)
--------------------
* Identified a subtle bug having to do with insufficient memory in multi-threaded mode. The main thread appears to hang waiting for the next read from the input file. This appears to occur only under a strictly-regulated memory cap such as on cluster environment. This bug is not fixed, but I added the following:
* Set the default batch size based on the queue sizes
......@@ -133,36 +140,30 @@ v1.0.23 (2016.12.07)
* Bug fixes
v1.0.22 (2016.12.02)
--------------------
* Abstracted the ErrorEstimator class to enable alternate implementations.
* Added a new ShadowRegressionErrorEstimator that uses the ShadowRegression R package (Wang et al.) to more accurately estimate sequencing error rate. This requires that R and the [ShadowRegression package](http://bcb.dfci.harvard.edu/~vwang/shadowRegression.html) and its dependencies be installed -- MASS and ReadCount, which in turn depend on a bunch of Bioconductor packages. At some point, this dependency will go away when I reimplement the method in pure python.
* The error command now reports the longest matching read fragment, which is usually a closer match for the actual adapter sequence than the longest matching k-mer.
v1.0.21 (2016.11.23)
--------------------
* Bugfixes
v1.0.20 (2016.11.22)
--------------------
* Changed the order of trimming operations - OverwriteReadModifier is now after read and quality trimming.
* Refactored the main Atropos interface to improve testability.
* Added more unit tests.
v1.0.19 (2016.11.21)
--------------------
* Fixed a major bug in OverwriteReadModifier, and in the unit tests for paired-end trimmers.
v1.0.18 (2016.11.20)
--------------------
* Added OverwriteReadModifier, a paired-end modifier that overwrites one read end with the other if the mean quality over the first N bases (where N is user-specified) of one is below a threshold value and the mean quality of the other is above a second threshold. This dramatically improves the number of high-quality read mappings in data sets where there are systematic problems with one read-end.
v1.0.17 (2016.11.18)
--------------------
* Perform error correction when insert match fails but adapter matches are complementary
* Improvements to handling of cached adapter lists
......@@ -171,7 +172,6 @@ v1.0.17 (2016.11.18)
* Many
v1.0.16 (2016.09.20)
-------------------
* Migrate to Versioneer for version management.
* Enable stderr as a valid output using the '\_' shortcut.
......@@ -182,14 +182,12 @@ v1.0.16 (2016.09.20)
* When InsertAdapterCutter.symmetric is True and mismatch_action is not None, insert match fails, at least one adapter match succeeds, and the adapter matches (if there are two) are complementary, then the reads are treated as overlapping and error correction is performed. This leads to substantial improvements when one read is of good quality while the other is other is of poor quality.
v1.0.15 (2016.09.14)
--------------------
* Fixed missing import bug in 'detect' command.
* Added estimate of fraction of contaminated reads to output of 'detect' command.
* Optionally cache list of known contaminants rather than re-download it every time.
v1.0.14 (2016.09.13)
--------------------
* Implemented \_align.MultiAligner, which returns all matches that satisfy the overlap and error thresholds. align.InsertAligner now uses MultiAligner for insert matching, and tests all matches in decreasing size order until it finds one with adapter matches (if any).
* Major improvements to the accuracy of the 'detect' command.
......@@ -201,62 +199,52 @@ v1.0.14 (2016.09.13)
* Sevaral other bugfixes.
v1.0.13 (2016.08.31)
--------------------
* Add options to specify max error rates for insert and adapter matching within insert aligner.
* Add new command to estimate empirical error rate in data set from base qualities.
v1.0.12 (2016.08.30)
--------------------
* Add ability to correct errors during insert-match adapter trimming.
* Implement additional adapter-detection algorithms.
* Fix bug where default output file is force-created in parallel-write mode
v1.0.11 (2016.08.24)
--------------------
* Clarify and fix issues with bisulfite trimming. Notably, rrbs and non-directional are now allowed independently or in combination.
v1.0.10 (2016.08.23)
--------------------
* Introduced new 'detect' command for automatically detecting adapter sequences.
* Options are now required to specify input files.
* Major updates to documentation.
v1.0.9 (2016.08.22)
-------------------
* Bugfix release
v1.0.8 (2016.08.19)
-------------------
* Reverted previously introduced (and no longer necessary) dependency on bitarray).
* Switched the insert aligner back to the default implementation, as the one that ignores indels is not any faster.
v1.0.7 (2016.08.18)
-------------------
* Re-engineered modifiers.py (and all dependent code) to enable use of modifiers that simultaneously edit both reads in a pair.
* Add --op-order option to enable use to specify order of first four trimming operations.
* Implemented insert-based alignment for paired-end adapter trimming. This is currently experimental. Benchmarking against SeqPurge and Skewer using simulated reads showed that the method Cutadapt uses to align adapters, while optimal for single-end reads, is much less sensitive and specific than the insert match algorithms used by SeqPurge and Skewer. Our algorithm is similar to the one used by SeqPurge but leverages the dynamic programming model of Cutadapt.
v1.0.6 (2016.08.08)
-------------------
* Based on tests, worker compression is faster than writer compression when more than 8 threads are available, so set this to be the default.
v1.0.5 (2016.08.06)
-------------------
* Interanal code reorganization - compression code moved to separate module
* Eliminated the --worker-compression option in favor of --compression (whose value is either 'worker' or 'writer')
* More documentation improvements
v1.0.3 (2016.08.05)
-------------------
* Significant performance improvements:
* Start an extra worker once the main process is finished loading reads
......@@ -267,12 +255,11 @@ v1.0.3 (2016.08.05)
* Eliminated the --parallel-environment option
v1.0.1 (2016.08.04)
-------------------
* Fix documentation bugs associated with migration from optparse to argparse
v1.0 (2016.07.29)
-----------------
* Initial release (forked from cutadapt 1.10)
* Re-wrote much of filters.py and modifiers.py to separate modifying/filtering from file writing.
* File writing is now managed by a separate class (seqio.Writers)
......
tests = tests
module = atropos
#pytestops = "--full-trace"
#pytestops = "-v -s"
#pytestops = --full-trace
#pytestops = -v -s
repo = jdidion/$(module)
desc = Release $(version)
BUILD = python setup.py build_ext -i && python setup.py install $(installargs)
TEST = py.test $(pytestops) $(tests)
BUILD =
TEST =
all:
$(BUILD)
$(TEST)
all: clean install test
install:
$(BUILD)
build:
python setup.py build_ext -i
python setup.py sdist bdist_wheel
install: clean build
python setup.py install $(installargs)
test:
$(TEST)
py.test $(pytestops) $(tests)
docs:
make -C doc html
......@@ -48,21 +50,14 @@ docker:
docker login -u jdidion && \
docker push $(repo)
release:
$(clean)
# tag
tag:
git tag $(version)
# build
$(BUILD)
$(TEST)
python setup.py sdist bdist_wheel
# release
python setup.py sdist upload -r pypi
release: clean tag install test
twine upload dist/*
git push origin --tags
$(github_release)
$(docker)
github_release:
# github release
curl -v -i -X POST \
-H "Content-Type:application/json" \
-H "Authorization: token $(token)" \
......
......@@ -26,8 +26,7 @@ Atropos is available from [pypi](https://pypi.python.org/pypi/atropos) and can b
First install dependencies:
* Required
* Python 3.3+ (python 2.x is NOT supported)
- note: we have identified a possible bug in python 3.4.2 that causes random segmentation faults. We think this mainly affects unit testing (and thus specifically test on 3.4.3). If you encounter this bug, we recommend upgrading to a newer python version.
* Python 3.4.5+ (python 2.x is NOT supported)
* Cython 0.25.2+ (`pip install Cython`)
* Maybe python libraries
* pytest (for running unit tests)
......@@ -197,6 +196,7 @@ The citation for the original Cutadapt paper is:
* Scythe is an interesting new trimmer. Depending on how the benchmarks look in the forthcoming paper, we will add it to the list of tools we compare against Atropos, and perhaps implement their Bayesian approach for adapter match.
* Experiment with replacing the multicore implementation with an asyncio-based implementation (using ProcessPoolExecutor and uvloop).
* Automatic adaptive tuning of queue sizes to maximize the balance between memory usage and latency.
* FastProNGS has some nice visualizations that could be included, rather than relying on MultiQC: https://github.com/Megagenomics/FastProNGS
While we consider the command-line interface to be stable, the internal code organization of Atropos is likely to change. At this time, we recommend to not directly interface with Atropos as a library (or to be prepared for your code to break). The internal code organization will be stabilized as of version 2.0, which is planned for sometime in 2017.
......
......@@ -23,8 +23,8 @@ def get_keywords():
# setup.py/versioneer.py will grep for the variable names, so they must
# each be defined on a line of their own. _version.py will just call
# get_keywords().
git_refnames = " (tag: 1.1.22)"
git_full = "2b15c778f0ccf1d0fb753e4334fa6dc0048a9ee6"
git_refnames = " (HEAD -> 1.1, tag: 1.1.24)"
git_full = "9281be92f0e52a14085841344a509f7808efcfe1"
keywords = {"refnames": git_refnames, "full": git_full}
return keywords
......
......@@ -174,50 +174,6 @@ MatchInfo = namedtuple("MatchInfo", (
"seq_after", "adapter_name", "qual_before", "qual_adapter", "qual_after",
"is_front", "asize", "rsize_adapter", "rsize_total"))
# Alternative semi-global alignment (
# http://www.bioinf.uni-freiburg.de/Lehre/Courses/2013_SS/V_Bioinformatik_1/lecture4.pdf)
# strategies designed to improve insert matching of paired-end reads.
#
# Note: these are currently just prototype implementations. They will need to be optimized
# using numpy and/or re-written in cython.
#
# 1. SeqPurge algorithm: insert match algorithm that performs thresholded exhaustive
# comparison to minimize probability of incorrect alignment. Relies on the fact that
# overlapping reads share alleles and indels (i.e. no gaps are required) (in C++).
# https://github.com/imgag/ngs-bits/tree/master/src/SeqPurge.
# * Speed up sequence comparison:
# * Between adapters and overhangs:
# * http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080745/pdf/btu177.pdf
# * Between reads:
# * http://bioinformatics.oxfordjournals.org/content/30/14/2000.abstract
# 2. Skewer algorithm: bit-masked k-difference matching (in C++).
# https://github.com/relipmoc/skewer
# 3. Quality-aware overlap alignment (in Haskell).
# https://hackage.haskell.org/package/bio-0.5.3/docs/Bio-Alignment-QAlign.html
# 4. FOGSAA, modified for semi-global alignment.
# http://www.nature.com/articles/srep01746
# http://www.isical.ac.in/~bioinfo_miu/FOGSAA.7z
# 5. EDLIB: edit distance-based alignment
# https://github.com/Martinsos/edlib
# 6. Phred-adjusted ML for error probability:
# https://biosails.github.io/pheniqs/glossary.html#phred_adjusted_maximum_likelihood_decoding
# 7. Adaptive banded alignment: https://github.com/ocxtal/libgaba
# Also think about different sequence encodings that might enable faster alignment
# https://github.com/hammerlab/kerseq/blob/master/kerseq/sequence_encoding.py
# 8. https://github.com/yamada-kd/nepal
# 9. The SeqAn C++ library implements several alignment algorithms:
# http://www.sciencedirect.com/science/article/pii/S0168165617315420
# 10. Could we treat paired end read + adapter alignment as an MSA problem?
# 11. Look at alignment-free tools for pairwise sequence comparison:
# * http://www.combio.pl/alfree/tools/
# * http://www.combio.pl/alfree
# * http://bioinformatics.org.au/tools/decaf+py/
# 12. https://github.com/sdu-hpcl/BGSA
# 13. Train a NN to approximate the pairwise alignment distance
# https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty887/5140215?redirectedFrom=fulltext
# 14. Nucl2vec: https://github.com/prakharg24/Nucl2vec (local alignment only - could
# it be adapted to semi-global?)
# * Can re-implement in SpaCy? https://spacy.io/usage/vectors-similarity
class InsertAligner(object):
"""Implementation of an insert matching algorithm.
......
......@@ -298,7 +298,8 @@ def print_trim_report(summary, outfile):
total = summary["total_record_count"]
if total == 0:
_print(
_print_error = Printer(outfile)
_print_error(
"No reads processed! Either your input file is empty or you "
"used the wrong -f/--format parameter.")
return
......
......@@ -11,9 +11,9 @@ from atropos.io import STDOUT, xopen
from atropos.io.compression import splitext_compressed
from atropos.util import Summarizable, truncate_string, ALPHABETS
SINGLE = 0
READ1 = 1
READ2 = 2
SINGLE = READ1
PAIRED = 1|2
class FormatError(AtroposError):
......@@ -1045,16 +1045,16 @@ def get_format(
elif qualities is False:
# Same, but we know that we want to write reads without qualities.
file_format = 'fasta'
if file_format is None:
else:
raise UnknownFileType("Could not determine file type.")
file_format = file_format.lower()
if file_format == 'fastq' and qualities is False:
raise ValueError(
"Output format cannot be FASTQ since no quality values are "
"available.")
file_format = file_format.lower()
if file_format == 'fasta':
if colorspace:
return ColorspaceFastaFormat(line_length)
......
......@@ -224,7 +224,7 @@ class Timestamp(object):
"""
def __init__(self):
self.dtime = datetime.now()
self.clock = time.clock()
self.process_time = time.process_time()
def timestamp(self):
"""Returns the unix timestamp.
......@@ -248,7 +248,7 @@ class Timestamp(object):
"""
return dict(
wallclock=max(minval, self.timestamp() - other.timestamp()),
cpu=max(minval, self.clock - other.clock))
cpu=max(minval, self.process_time - other.process_time))
class Timing(Summarizable):
"""Context manager that maintains timing information using
......
atropos (1.1.24+dfsg-1) unstable; urgency=medium
* Team upload.
* New upstream version
* Trim trailing whitespace.
-- Steffen Moeller <moeller@debian.org> Sat, 30 Nov 2019 13:00:31 +0100
atropos (1.1.22+dfsg-1) unstable; urgency=medium
* Team upload.
......
......@@ -18,7 +18,8 @@ Files: *
Copyright: 2010-2016 Marcel Martin <marcel.martin@scilifelab.se>
2016-2019 John P Didion, and are subject to
License: Expat
Comment: Atropos began as a fork of Cutadapt [1]. Specifically, it is a fork of commit
Comment:
Atropos began as a fork of Cutadapt [1]. Specifically, it is a fork of commit
2f3cc0717aa9ff1e0326ea6bcb36b712950d4999 on June 22, 2016
.
All additions and non-trivial modifications (which can be discovered by comparing
......@@ -35,7 +36,8 @@ Comment: Atropos began as a fork of Cutadapt [1]. Specifically, it is a fork of
[2] https://github.com/jdidion/atropos
[3] https://github.com/marcelm/cutadapt/tree/2f3cc0717aa9ff1e0326ea6bcb36b712950d4999
[4] https://creativecommons.org/publicdomain/zero/1.0/
Comment: On Debian systems the full text of the CCO license can be found at
.
On Debian systems the full text of the CCO license can be found at
/usr/share/common-licenses/CC0-1.0
Files: bin/_preamble.py
......
......@@ -25,4 +25,3 @@ override_dh_auto_clean:
rm -rf .pytest_cache
rm -rf atropos.egg-info
rm -f atropos/align/_align.c atropos/commands/trim/_qualtrim.c atropos/io/_seqio.c
......@@ -6,6 +6,7 @@ Cython is run when
* or the pre-generated C sources are out of date,
* or when --cython is given on the command line.
"""
import codecs
import os.path
import sys
......@@ -24,13 +25,6 @@ if sys.version_info < (3, 3):
sys.exit(1)
with open(
os.path.join(os.path.abspath(os.path.dirname(__file__)), "README.md"),
encoding="utf-8"
) as f:
long_description = f.read()
def out_of_date(_extensions):
"""
Check whether any pyx source is newer than the corresponding generated
......@@ -145,9 +139,16 @@ setup(
author_email="john.didion@nih.gov",
url="https://atropos.readthedocs.org/",
description="trim adapters from high-throughput sequencing reads",
long_description=long_description,
long_description=codecs.open(
os.path.join(
os.path.dirname(os.path.realpath(__file__)),
"README.md"
),
"rb",
"utf-8"
).read(),
long_description_content_type="text/markdown",
license="Original Cutadapt code is under MIT license; improvements and additions are in the Public Domain",
license="MIT",
ext_modules=extensions,
packages=find_packages(),
scripts=["bin/atropos"],
......@@ -176,7 +177,6 @@ setup(
"License :: Public Domain",
"Natural Language :: English",
"Programming Language :: Cython",
"Programming Language :: Python :: 3.3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6"
......