Skip to content
Commits on Source (7)
......@@ -2,6 +2,71 @@
Changes
=======
v1.18 (2018-09-07)
------------------
Features
~~~~~~~~
* Close :issue:`327`: Maximum and minimum lengths can now be specified
separately for R1 and R2 with ``-m LENGTH1:LENGTH2``. One of the
lengths can be omitted, in which case only the length of the other
read is checked (as in ``-m 17:`` or ``-m :17``).
* Close :issue:`322`: Use ``-j 0`` to auto-detect how many cores to run on.
This should even work correctly on cluster systems when Cutadapt runs as
a batch job to which fewer cores than exist on the machine have been
assigned. Note that the number of threads used by ``pigz`` cannot be
controlled at the moment, see :issue:`290`.
* Close :issue:`225`: Allow setting the maximum error rate and minimum overlap
length per adapter. A new :ref:`syntax for adapter-specific
parameters <trimming-parameters>` was added for this. Example:
``-a "ADAPTER;min_overlap=5"``.
* Close :issue:`152`: Using the new syntax for adapter-specific parameters,
it is now possible to allow partial matches of a 3' adapter at the 5' end
(and partial matches of a 5' adapter at the 3' end) by specifying the
``anywhere`` parameter (as in ``-a "ADAPTER;anywhere"``).
* Allow ``--pair-filter=first`` in addition to ``both`` and ``any``. If
used, a read pair is discarded if the filtering criterion applies to R1;
and R2 is ignored.
* Close :issue:`112`: Implement a ``--report=minimal`` option for printing
a succinct two-line report in tab-separated value (tsv) format. Thanks
to :user:`jvolkening` for coming up with an initial patch!
Bug fixes
~~~~~~~~~
* Fix :issue:`128`: The “Reads written” figure in the report incorrectly
included both trimmed and untrimmed reads if ``--untrimmed-output`` was used.
Other
~~~~~
* The options ``--no-trim`` and ``--mask-adapter`` should now be written as
``--action=mask`` and ``--action=none``. The old options still work.
* This is the last release to support :ref:`colorspace data <colorspace>`.
* This is the last release to support Python 2.
v1.17 (2018-08-20)
------------------
* Close :issue:`53`: Implement adapters :ref:`that disallow internal matches <non-internal>`.
This is a bit like anchoring, but less strict: The adapter sequence
can appear at different lengths, but must always be at one of the ends.
Use ``-a ADAPTERX`` (with a literal ``X``) to disallow internal matches
for a 3' adapter. Use ``-g XADAPTER`` to disallow for a 5' adapter.
* :user:`klugem` contributed PR :issue:`299`: The ``--length`` option (and its
alias ``-l``) can now be used with negative lengths, which will remove bases
from the beginning of the read instead of from the end.
* Close :issue:`107`: Add a ``--discard-casava`` option to remove reads
that did not pass CASAVA filtering (this is possibly relevant only for
older datasets).
* Fix :issue:`318`: Cutadapt should now be installable with Python 3.7.
* Running Cutadapt under Python 3.3 is no longer supported (Python 2.7 or
3.4+ are needed)
* Planned change: One of the next Cutadapt versions will drop support for
Python 2 entirely, requiring Python 3.
v1.16 (2018-02-21)
------------------
......
Metadata-Version: 1.1
Name: cutadapt
Version: 1.16
Version: 1.18
Summary: trim adapters from high-throughput sequencing reads
Home-page: https://cutadapt.readthedocs.io/
Author: Marcel Martin
Author-email: marcel.martin@scilifelab.se
License: MIT
Description-Content-Type: UNKNOWN
Description: .. image:: https://travis-ci.org/marcelm/cutadapt.svg?branch=master
:target: https://travis-ci.org/marcelm/cutadapt
......
python-cutadapt (1.18-1) unstable; urgency=medium
* New upstream version
* Standards-Version: 4.2.1
* Remove ancient X-Python*-Version fields
-- Andreas Tille <tille@debian.org> Thu, 13 Sep 2018 08:52:00 +0200
python-cutadapt (1.16-2) unstable; urgency=medium
* Testsuite: autopkgtest-pkg-python
......
......@@ -21,12 +21,10 @@ Build-Depends: debhelper (>= 11~),
python3-nose,
python3-xopen (>= 0.3.2),
cython3
Standards-Version: 4.1.4
Standards-Version: 4.2.1
Vcs-Browser: https://salsa.debian.org/med-team/python-cutadapt
Vcs-Git: https://salsa.debian.org/med-team/python-cutadapt.git
Homepage: http://pypi.python.org/pypi/cutadapt
X-Python-Version: >= 2.7
X-Python3-Version: >= 3.4
Package: python-cutadapt
Architecture: any
......
......@@ -15,5 +15,5 @@ Subject: cython_version
-MIN_CYTHON_VERSION = '0.24'
+MIN_CYTHON_VERSION = '0.23.2'
if sys.version_info < (2, 7):
sys.stdout.write("At least Python 2.7 is required.\n")
vi = sys.version_info
if (vi[0] == 2 and vi[1] < 7) or (vi[0] == 3 and vi[1] < 4):
=================
Algorithm details
=================
.. _adapter-alignment-algorithm:
Adapter alignment algorithm
===========================
Since the publication of the `EMBnet journal application note about
cutadapt <http://dx.doi.org/10.14806/ej.17.1.200>`_, the alignment algorithm
used for finding adapters has changed significantly. An overview of this new
algorithm is given in this section. An even more detailed description is
available in Chapter 2 of my PhD thesis `Algorithms and tools for the analysis
of high-throughput DNA sequencing data <http://hdl.handle.net/2003/31824>`_.
The algorithm is based on *semiglobal alignment*, also called *free-shift*,
*ends-free* or *overlap* alignment. In a regular (global) alignment, the
two sequences are compared from end to end and all differences occuring over
that length are counted. In semiglobal alignment, the sequences are allowed to
freely shift relative to each other and differences are only penalized in the
overlapping region between them::
FANTASTIC
ELEFANT
The prefix ``ELE`` and the suffix ``ASTIC`` do not have a counterpart in the
respective other row, but this is not counted as an error. The overlap ``FANT``
has a length of four characters.
Traditionally, *alignment scores* are used to find an optimal overlap aligment:
This means that the scoring function assigns a positive value to matches,
while mismatches, insertions and deletions get negative values. The optimal
alignment is then the one that has the maximal total score. Usage of scores
has the disadvantage that they are not at all intuitive: What does a total score
of *x* mean? Is that good or bad? How should a threshold be chosen in order to
avoid finding alignments with too many errors?
For cutadapt, the adapter alignment algorithm uses *unit costs* instead.
This means that mismatches, insertions and deletions are counted as one error, which
is easier to understand and allows to specify a single parameter for the
algorithm (the maximum error rate) in order to describe how many errors are
acceptable.
There is a problem with this: When using costs instead of scores, we would like
to minimize the total costs in order to find an optimal alignment. But then the
best alignment would always be the one in which the two sequences do not overlap
at all! This would be correct, but meaningless for the purpose of finding an
adapter sequence.
The optimization criteria are therefore a bit different. The basic idea is to
consider the alignment optimal that maximizes the overlap between the two
sequences, as long as the allowed error rate is not exceeded.
Conceptually, the procedure is as follows:
1. Consider all possible overlaps between the two sequences and compute an
alignment for each, minimizing the total number of errors in each one.
2. Keep only those alignments that do not exceed the specified maximum error
rate.
3. Then, keep only those alignments that have a maximal number of matches
(that is, there is no alignment with more matches).
4. If there are multiple alignments with the same number of matches, then keep
only those that have the smallest error rate.
5. If there are still multiple candidates left, choose the alignment that starts
at the leftmost position within the read.
In Step 1, the different adapter types are taken into account: Only those
overlaps that are actually allowed by the adapter type are actually considered.
.. _quality-trimming-algorithm:
Quality trimming algorithm
--------------------------
The trimming algorithm implemented in cutadapt is the same as the one used by
BWA, but applied to both
ends of the read in turn (if requested). That is: Subtract the given cutoff
from all qualities; compute partial sums from all indices to the end of the
sequence; cut the sequence at the index at which the sum is minimal. If both
ends are to be trimmed, repeat this for the other end.
The basic idea is to remove all bases starting from the end of the read whose
quality is smaller than the given threshold. This is refined a bit by allowing
some good-quality bases among the bad-quality ones. In the following example,
we assume that the 3' end is to be quality-trimmed.
Assume you use a threshold of 10 and have these quality values:
42, 40, 26, 27, 8, 7, 11, 4, 2, 3
Subtracting the threshold gives:
32, 30, 16, 17, -2, -3, 1, -6, -8, -7
Then sum up the numbers, starting from the end (partial sums). Stop early if
the sum is greater than zero:
(70), (38), 8, -8, -25, -23, -20, -21, -15, -7
The numbers in parentheses are not computed (because 8 is greater than zero),
but shown here for completeness. The position of the minimum (-25) is used as
the trimming position. Therefore, the read is trimmed to the first four bases,
which have quality values 42, 40, 26, 27.
......@@ -4,8 +4,7 @@ Developing
The `Cutadapt source code is on GitHub <https://github.com/marcelm/cutadapt/>`_.
Cutadapt is written in Python with some extension modules that are written
in Cython. Cutadapt uses a single code base that is compatible with both
Python 2 and 3. Python 2.7 is the minimum supported Python version. With
relatively little effort, compatibility with Python 2.6 could be restored.
Python 2 and 3. Python 2.7 is the minimum supported Python version.
Development installation
......@@ -63,6 +62,54 @@ Yes, there are inconsistencies in the current code base since it’s a few years
Making a release
----------------
Since version 1.17, Travis CI is used to automatically deploy a new Cutadapt release
(both as an sdist and as wheels) whenever a new tag is pushed to the Git repository.
Cutadapt uses `versioneer <https://github.com/warner/python-versioneer>`_ to automatically manage
version numbers. This means that the version is not stored in the source code but derived from
the most recent Git tag. The following procedure can be used to bump the version and make a new
release.
#. Update ``CHANGES.rst`` (version number and list of changes)
#. Ensure you have no uncommitted changes in the working copy.
#. Run a ``git pull``.
#. Run ``tox``, ensuring all tests pass.
#. Tag the current commit with the version number (there must be a ``v`` prefix)::
git tag v0.1
To release a development version, use a ``dev`` version number such as ``v1.17.dev1``.
Users will not automatically get these unless they use ``pip install --pre``.
#. Push the tag::
git push --tags
#. Wait for Travis to finish and to deploy to PyPI.
#. Update the `bioconda recipe <https://github.com/bioconda/bioconda-recipes/blob/master/recipes/cutadapt/meta.yaml>`_.
It is probly easiest to edit the recipe via the web interface and send in a
pull request. Ensure that the list of dependencies (the ``requirements:``
section in the recipe) is in sync with the ``setup.py`` file.
Since this is just a version bump, the pull request does not need a
review by other bioconda developers. As soon as the tests pass and if you
have the proper permissions, it can be merged directly.
Releases to bioconda still need to be made manually.
Making a release manually
-------------------------
.. note:
This section is outdated, see the previous section!
If this is the first time you attempt to upload a distribution to PyPI, create a
configuration file named ``.pypirc`` in your home directory with the following
contents::
......
This diff is collapsed.
......@@ -22,7 +22,6 @@ improvements.
- allow to remove not the adapter itself, but the sequence before or after it
- instead of trimming, convert adapter to lowercase
- warn when given adapter sequence contains non-IUPAC characters
- try multithreading again, this time use os.pipe() or 0mq
- extensible file type detection
- the --times setting should be an attribute of Adapter
......@@ -34,7 +33,6 @@ Backwards-incompatible changes
- Possibly drop wildcard-file support, extend info-file instead
- Drop "legacy mode"
- For non-anchored 5' adapters, find rightmost match
- Move ``scripts/cutadapt.py`` to ``__main__.py``
Specifying adapters
......
......@@ -10,6 +10,7 @@ Table of contents
installation
guide
colorspace
algorithms
recipes
ideas
develop
......
......@@ -45,7 +45,7 @@ Dependencies
Cutadapt installation requires this software to be installed:
* Python 2.7 or at least Python 3.3
* Python 2.7 or at least Python 3.4
* Possibly a C compiler. For Linux, cutadapt packages are provided as
so-called “wheels” (``.whl`` files) which come pre-compiled.
......
......@@ -5,18 +5,6 @@ Recipes (FAQ)
This section gives answers to frequently asked questions. It shows you how to
get cutadapt to do what you want it to do!
.. _avoid-internal-adapter-matches:
Avoid internal adapter matches
------------------------------
To force matches to be at the end of the read and thus avoiding internal
adapter matches, append a few ``X`` characters to the adapter sequence, like
this: ``-a TACGGCATXXX``. The ``X`` is counted as a mismatch and will force the
match to be at the end. Just make sure that there are more ``X`` characters than
the length of the adapter times the error rate. This is not the same as an
anchored 3' adapter since partial matches are still allowed.
Remove more than one adapter
----------------------------
......
......@@ -9,5 +9,4 @@ parentdir_prefix = cutadapt-
[egg_info]
tag_build =
tag_date = 0
tag_svn_revision = 0
......@@ -12,8 +12,9 @@ import versioneer
MIN_CYTHON_VERSION = '0.24'
if sys.version_info < (2, 7):
sys.stdout.write("At least Python 2.7 is required.\n")
vi = sys.version_info
if (vi[0] == 2 and vi[1] < 7) or (vi[0] == 3 and vi[1] < 4):
sys.stdout.write('Minimum supported Python versions are 2.7 and 3.4.\n')
sys.exit(1)
......@@ -110,8 +111,11 @@ setup(
ext_modules=extensions,
package_dir={'': 'src'},
packages=find_packages('src'),
install_requires=['xopen>=0.3.2'],
entry_points={'console_scripts': ['cutadapt = cutadapt.__main__:main']},
install_requires=['xopen>=0.3.2'],
extras_require = {
'dev': ['Cython', 'pytest', 'pytest-timeout', 'nose', 'sphinx', 'sphinx_issues'],
},
classifiers=[
"Development Status :: 5 - Production/Stable",
"Environment :: Console",
......
Metadata-Version: 1.1
Name: cutadapt
Version: 1.16
Version: 1.18
Summary: trim adapters from high-throughput sequencing reads
Home-page: https://cutadapt.readthedocs.io/
Author: Marcel Martin
Author-email: marcel.martin@scilifelab.se
License: MIT
Description-Content-Type: UNKNOWN
Description: .. image:: https://travis-ci.org/marcelm/cutadapt.svg?branch=master
:target: https://travis-ci.org/marcelm/cutadapt
......
......@@ -7,6 +7,7 @@ setup.cfg
setup.py
versioneer.py
doc/Makefile
doc/algorithms.rst
doc/changes.rst
doc/colorspace.rst
doc/conf.py
......@@ -35,6 +36,7 @@ src/cutadapt/pipeline.py
src/cutadapt/qualtrim.py
src/cutadapt/report.py
src/cutadapt/seqio.py
src/cutadapt/utils.py
src/cutadapt.egg-info/PKG-INFO
src/cutadapt.egg-info/SOURCES.txt
src/cutadapt.egg-info/dependency_links.txt
......@@ -54,11 +56,13 @@ tests/test_trim.py
tests/utils.py
tests/cut/454.fa
tests/cut/SRR2040271_1.fastq
tests/cut/adapterx.fasta
tests/cut/anchored-back.fasta
tests/cut/anchored.fasta
tests/cut/anchored_no_indels.fasta
tests/cut/anchored_no_indels_wildcard.fasta
tests/cut/anywhere_repeat.fastq
tests/cut/casava.fastq
tests/cut/demultiplexed.first.1.fastq
tests/cut/demultiplexed.first.2.fastq
tests/cut/demultiplexed.second.1.fastq
......@@ -100,6 +104,8 @@ tests/cut/overlapa.fa
tests/cut/overlapb.fa
tests/cut/paired-filterboth.1.fastq
tests/cut/paired-filterboth.2.fastq
tests/cut/paired-filterfirst.1.fastq
tests/cut/paired-filterfirst.2.fastq
tests/cut/paired-m27.1.fastq
tests/cut/paired-m27.2.fastq
tests/cut/paired-onlyA.1.fastq
......@@ -125,6 +131,7 @@ tests/cut/polya.fasta
tests/cut/rest.fa
tests/cut/restfront.fa
tests/cut/s_1_sequence.txt
tests/cut/shortened-negative.fastq
tests/cut/shortened.fastq
tests/cut/small-no-trim.fasta
tests/cut/small.fasta
......@@ -159,6 +166,7 @@ tests/cut/wildcard.fa
tests/cut/wildcardN.fa
tests/cut/wildcard_adapter.fa
tests/cut/wildcard_adapter_anywhere.fa
tests/cut/xadapter.fasta
tests/data/454.fa
tests/data/E3M.fasta
tests/data/E3M.qual
......@@ -168,6 +176,7 @@ tests/data/anchored-back.fasta
tests/data/anchored.fasta
tests/data/anchored_no_indels.fasta
tests/data/anywhere_repeat.fastq
tests/data/casava.fastq
tests/data/dos.fastq
tests/data/empty.fastq
tests/data/example.fa
......@@ -216,7 +225,9 @@ tests/data/tooshort.noprimer.fa
tests/data/trimN3.fasta
tests/data/trimN5.fasta
tests/data/twoadapters.fasta
tests/data/underscore_fastq.gz
tests/data/wildcard.fa
tests/data/wildcardN.fa
tests/data/wildcard_adapter.fa
tests/data/withplus.fastq
tests/data/xadapterx.fasta
\ No newline at end of file
xopen>=0.3.2
[dev]
Cython
pytest
pytest-timeout
nose
sphinx
sphinx_issues
# coding: utf-8
from __future__ import print_function, division, absolute_import
import sys
from ._version import get_versions
__version__ = get_versions()['version']
del get_versions
def check_importability(): # pragma: no cover
try:
import cutadapt._align
except ImportError as e:
if 'undefined symbol' in str(e):
print("""
ERROR: A required extension module could not be imported because it is
incompatible with your system. A quick fix is to recompile the extension
modules with the following command:
{0} setup.py build_ext -i
See the documentation for alternative ways of installing the program.
The original error message follows.
""".format(sys.executable))
raise
This diff is collapsed.
This diff is collapsed.