Skip to content
Commits on Source (8)
......@@ -15,17 +15,20 @@ addons:
packages:
- zlib1g-dev
- libbz2-dev
- python3
- python3-pip
before_install:
- pip install --user cpp-coveralls
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then python3 -m pip install --user cpp-coveralls; fi
install: true
script:
- make
- make test COVERAGE=yes
- make validate
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then make test COVERAGE=yes; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then make test COVERAGE=no; fi
- make regression
- PATH=$PATH:$PWD/build/ make -C examples
after_success:
- coveralls --exclude tests --exclude googletest-release-1.8.0 --gcov-options '\-lp'
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then python3 -m coveralls --exclude tests --gcov-options '\-lp'; fi
.\" Man page generated from reStructuredText.
.
.TH "ADAPTERREMOVAL" "1" "Jan 22, 2019" "2.2.3" "AdapterRemoval"
.SH NAME
AdapterRemoval \- Fast short-read adapter trimming and processing
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.sp
\fBAdapterRemoval\fP [\fIoptions\fP…] –file1 <\fIfilenames\fP> [–file2 <\fIfilenames\fP>]
.SH DESCRIPTION
.sp
\fBAdapterRemoval\fP removes residual adapter sequences from single\-end (SE) or paired\-end (PE) FASTQ reads, optionally trimming Ns and low qualities bases and/or collapsing overlapping paired\-end mates into one read. Low quality reads are filtered based on the resulting length and the number of ambigious nucleotides (‘N’) present following trimming. These operations may be combined with simultaneous demultiplexing using 5’ barcode sequences. Alternatively, \fBAdapterRemoval\fP may attempt to reconstruct a consensus adapter sequences from paired\-end data, in order to allow the identification of the adapter sequences originally used.
.sp
If you use this program, please cite the paper:
.INDENT 0.0
.INDENT 3.5
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 12;9(1):88
.sp
\fI\%http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104\-016\-1900\-2\fP
.UNINDENT
.UNINDENT
.sp
For detailed documentation, please see
.INDENT 0.0
.INDENT 3.5
\fI\%http://adapterremoval.readthedocs.io/en/v2.2.3/\fP
.UNINDENT
.UNINDENT
.SH OPTIONS
.INDENT 0.0
.TP
.B \-\-help
Display summary of command\-line options.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-version
Print the version string.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-file1 filename [filenames...]
Read FASTQ reads from one or more files, either uncompressed, bzip2 compressed, or gzip compressed. This contains either the single\-end (SE) reads or, if paired\-end, the mate 1 reads. If running in paired\-end mode, both \fB\-\-file1\fP and \fB\-\-file2\fP must be set. See the primary documentation for a list of supported formats.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-file2 filename [filenames...]
Read one or more FASTQ files containing mate 2 reads for a paired\-end run. If specified, \fB\-\-file1\fP must also be set.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-identify\-adapters
Attempt to build a consensus adapter sequence from fully overlapping pairs of paired\-end reads. The minimum overlap is controlled by \fB\-\-minalignmentlength\fP\&. The result will be compared with the values set using \fB\-\-adapter1\fP and \fB\-\-adapter2\fP\&. No trimming is performed in this mode. Default is off.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-threads n
Maximum number of threads. Defaults to 1.
.UNINDENT
.SS FASTQ options
.INDENT 0.0
.TP
.B \-\-qualitybase base
The Phred quality scores encoding used in input reads \- either ‘64’ for Phred+64 (Illumina 1.3+ and 1.5+) or ‘33’ for Phred+33 (Illumina 1.8+). In addition, the value ‘solexa’ may be used to specify reads with Solexa encoded scores. Default is 33.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-qualitybase\-output base
The base of the quality score for reads written by AdapterRemoval \- either ‘64’ for Phred+64 (i.e., Illumina 1.3+ and 1.5+) or ‘33’ for Phred+33 (Illumina 1.8+). In addition, the value ‘solexa’ may be used to specify reads with Solexa encoded scores. However, note that quality scores are represented using Phred scores internally, and conversion to and from Solexa scores therefore result in a loss of information. The default corresponds to the value given for \fB\-\-qualitybase\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-qualitymax base
Specifies the maximum Phred score expected in input files, and used when writing output files. Possible values are 0 to 93 for Phred+33 encoded files, and 0 to 62 for Phred+64 encoded files. Defaults to 41.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-mate\-separator separator
Character separating the mate number (1 or 2) from the read name in FASTQ records. Defaults to ‘/’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-interleaved
Enables \fB\-\-interleaved\-input\fP and \fB\-\-interleaved\-output\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-interleaved\-input
If set, input is expected to be a interleaved FASTQ files specified using \fB\-\-file1\fP, in which pairs of reads are written one after the other (e.g. read1/1, read1/2, read2/1, read2/2, etc.).
.UNINDENT
.INDENT 0.0
.TP
.B \-\-interleaved\-ouput
Write paired\-end reads to a single file, interleaving mate 1 and mate 2 reads. By default, this file is named \fBbasename.paired.truncated\fP, but this may be changed using the \fB\-\-output1\fP option.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-combined\-output
Write all reads into the files specified by \fB\-\-output1\fP and \fB\-\-output2\fP\&. The sequences of reads discarded due to quality filters or read merging are replaced with a single ‘N’ with Phred score 0. This option can be combined with \fB\-\-interleaved\-output\fP to write PE reads to a single output file specified with \fB\-\-output1\fP\&.
.UNINDENT
.SS Output file options
.INDENT 0.0
.TP
.B \-\-basename filename
Prefix used for the naming output files, unless these names have been overridden using the corresponding command\-line option (see below).
.UNINDENT
.INDENT 0.0
.TP
.B \-\-settings file
Output file containing information on the parameters used in the run as well as overall statistics on the reads after trimming. Default filename is ‘basename.settings’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-output1 file
Output file containing trimmed mate1 reads. Default filename is ‘basename.pair1.truncated’ for paired\-end reads, ‘basename.truncated’ for single\-end reads, and ‘basename.paired.truncated’ for interleaved paired\-end reads.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-output2 file
Output file containing trimmed mate 2 reads when \fB\-\-interleaved\-output\fP is not enabled. Default filename is ‘basename.pair2.truncated’ in paired\-end mode.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-singleton file
Output file to which containing paired reads for which the mate has been discarded. Default filename is ‘basename.singleton.truncated’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-outputcollapsed file
If –collapsed is set, contains overlapping mate\-pairs which have been merged into a single read (PE mode) or reads for which the adapter was identified by a minimum overlap, indicating that the entire template molecule is present. This does not include which have subsequently been trimmed due to low\-quality or ambiguous nucleotides. Default filename is ‘basename.collapsed’
.UNINDENT
.INDENT 0.0
.TP
.B \-\-outputcollapsedtruncated file
Collapsed reads (see –outputcollapsed) which were trimmed due the presence of low\-quality or ambiguous nucleotides. Default filename is ‘basename.collapsed.truncated’.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-discarded file
Contains reads discarded due to the –minlength, –maxlength or –maxns options. Default filename is ‘basename.discarded’.
.UNINDENT
.SS Output compression options
.INDENT 0.0
.TP
.B \-\-gzip
If set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression level specified using \fB\-\-gzip\-level\fP\&. The extension “.gz” is added to files for which no filename was given on the command\-line. Defaults to off.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-gzip\-level level
Determines the compression level used when gzip’ing FASTQ files. Must be a value in the range 0 to 9, with 0 disabling compression and 9 being the best compression. Defaults to 6.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-bzip2
If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression level specified using \fB\-\-bzip2\-level\fP\&. The extension “.bz2” is added to files for which no filename was given on the command\-line. Defaults to off.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-bzip2\-level level
Determines the compression level used when bzip2’ing FASTQ files. Must be a value in the range 1 to 9, with 9 being the best compression. Defaults to 9.
.UNINDENT
.SS FASTQ trimming options
.INDENT 0.0
.TP
.B \-\-adapter1 adapter
Adapter sequence expected to be found in mate 1 reads, specified in read direction. For a detailed description of how to provide the appropriate adapter sequences, see the “Adapters” section of the online documentation. Default is AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-adapter2 adapter
Adapter sequence expected to be found in mate 2 reads, specified in read direction. For a detailed description of how to provide the appropriate adapter sequences, see the “Adapters” section of the online documentation. Default is AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-adapter\-list filename
Read one or more adapter sequences from a table. The first two columns (separated by whitespace) of each line in the file are expected to correspond to values passed to –adapter1 and –adapter2. In single\-end mode, only column one is required. Lines starting with ‘#’ are ignored. When multiple rows are found in the table, AdapterRemoval will try each adapter (pair), and select the best aligning adapters for each FASTQ read processed.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minadapteroverlap length
In single\-end mode, reads are only trimmed if the overlap between read and the adapter is at least X bases long, not counting ambiguous nucleotides (N); this is independent of the \fB\-\-minalignmentlength\fP when using \fB\-\-collapse\fP, allowing a conservative selection of putative complete inserts in single\-end mode, while ensuring that all possible adapter contamination is trimmed. The default is 0.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-mm mismatchrate
The allowed fraction of mismatches allowed in the aligned region. If the value is less than 1, then the value is used directly. If \fB\(ga\-\-mismatchrate\fP is greater than 1, the rate is set to 1 / \fB\-\-mismatchrate\fP\&. The default setting is 3 when trimming adapters, corresponding to a maximum mismatch rate of 1/3, and 10 when using \fB\-\-identify\-adapters\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-shift n
To allow for missing bases in the 5’ end of the read, the program can let the alignment slip \fB\-\-shift\fP bases in the 5’ end. This corresponds to starting the alignment maximum \fB\-\-shift\fP nucleotides into read2 (for paired\-end) or the adapter (for single\-end). The default is 2.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trim5p n [n]
Trim the 5’ of reads by a fixed amount after removing adapters, but before carrying out quality based trimming. Specify one value to trim mate 1 and mate 2 reads the same amount, or two values separated by a space to trim each mate different amounts. Off by default.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trim3p n [n]
Trim the 3’ of reads by a fixed amount. See \fB\-\-trim5p\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trimns
Trim consecutive Ns from the 5’ and 3’ termini. If quality trimming is also enabled (\fB\-\-trimqualities\fP), then stretches of mixed low\-quality bases and/or Ns are trimmed.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-maxns n
Discard reads containing more than \fB\-\-max\fP ambiguous bases (‘N’) after trimming. Default is 1000.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trimqualities
Trim consecutive stretches of low quality bases (threshold set by \fB\-\-minquality\fP) from the 5’ and 3’ termini. If trimming of Ns is also enabled (\fB\-\-trimns\fP), then stretches of mixed low\-quality bases and Ns are trimmed.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-trimwindows window_size
Trim low quality bases using a sliding window based approach inspired by \fBsickle\fP with the given window size. See the “Window based quality trimming” section of the manual page for a description of this algorithm.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minquality minimum
Set the threshold for trimming low quality bases using \fB\-\-trimqualities\fP and \fB\-\-trimwindows\fP\&. Default is 2.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minlength length
Reads shorter than this length are discarded following trimming. Defaults to 15.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-maxlength length
Reads longer than this length are discarded following trimming. Defaults to 4294967295.
.UNINDENT
.SS FASTQ merging options
.INDENT 0.0
.TP
.B \-\-collapse
In paired\-end mode, merge overlapping mates into a single and recalculate the quality scores. In single\-end mode, attempt to identify templates for which the entire sequence is available. In both cases, complete “collapsed” reads are written with a ‘M_’ name prefix, and “collapsed” reads which are trimmed due to quality settings are written with a ‘MT_’ name prefix. The overlap needs to be at least \fB\-\-minalignmentlength\fP nucleotides, with a maximum number of mismatches determined by \fB\-\-mm\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-minalignmentlength length
The minimum overlap between mate 1 and mate 2 before the reads are collapsed into one, when collapsing paired\-end reads, or when attempting to identify complete template sequences in single\-end mode. Default is 11.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-seed seed
When collaping reads at positions where the two reads differ, and the quality of the bases are identical, AdapterRemoval will select a random base. This option specifies the seed used for the random number generator used by AdapterRemoval. This value is also written to the settings file. Note that setting the seed is not reliable in multithreaded mode, since the order of operations is non\-deterministic.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-deterministic
Enable deterministic mode; currently only affects –collapse, different overlapping bases with equal quality are set to N quality 0, instead of being randomly sampled.
.UNINDENT
.SS FASTQ demultiplexing options
.INDENT 0.0
.TP
.B \-\-barcode\-list filename
Perform demultiplxing using table of one or two fixed\-length barcodes for SE or PE reads. The table is expected to contain 2 or 3 columns, the first of which represent the name of a given sample, and the second and third of which represent the mate 1 and (optionally) the mate 2 barcode sequence. For a detailed description, see the “Demultiplexing” section of the online documentation.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-barcode\-mm n
.TP
.B Maximum number of mismatches allowed when counting mismatches in both the mate 1 and the mate 2 barcode for paired reads.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-barcode\-mm\-r1 n
Maximum number of mismatches allowed for the mate 1 barcode; if not set, this value is equal to the \fB\-\-barcode\-mm\fP value; cannot be higher than the \fB\-\-barcode\-mm\fP value.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-barcode\-mm\-r2 n
Maximum number of mismatches allowed for the mate 2 barcode; if not set, this value is equal to the \fB\-\-barcode\-mm\fP value; cannot be higher than the \fB\-\-barcode\-mm\fP value.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-demultiplex\-only
Only carry out demultiplexing using the list of barcodes supplied with –barcode\-list. No other processing is done.
.UNINDENT
.SH WINDOW BASED QUALITY TRIMMING
.sp
As of v2.2.2, AdapterRemoval implements sliding window based approach to quality based base\-trimming inspired by \fBsickle\fP\&. If \fBwindow_size\fP is greater than or equal to 1, that number is used as the window size for all reads. If \fBwindow_size\fP is a number greater than or equal to 0 and less than 1, then that number is multiplied by the length of individual reads to determine the window size. If the window length is zero or is greater than the current read length, then the read length is used instead.
.sp
Reads are trimmed as follows for a given window size:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP 1. 3
The new 5’ is determined by locating the first window where both the average quality and the quality of the first base in the window is greater than \fB\-\-minquality\fP\&.
.IP 2. 3
The new 3’ is located by sliding the first window right, until the average quality becomes less than or equal to \fB\-\-minquality\fP\&. The new 3’ is placed at the last base in that window where the quality is greater than or equal to \fB\-\-minquality\fP\&.
.IP 3. 3
If no 5’ position could be determined, the read is discarded.
.UNINDENT
.UNINDENT
.UNINDENT
.SH EXIT STATUS
.sp
AdapterRemoval exists with status 0 if the program ran succesfully, and with a non\-zero exit code if any errors were encountered. Do not use the output from AdapterRemoval if the program returned a non\-zero exit code!
.SH REPORTING BUGS
.sp
Please report any bugs using the AdapterRemoval issue\-tracker:
.sp
\fI\%https://github.com/MikkelSchubert/adapterremoval/issues\fP
.SH LICENSE
.sp
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
at your option any later version.
.sp
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.sp
You should have received a copy of the GNU General Public License
along with this program. If not, see <\fI\%http://www.gnu.org/licenses/\fP>.
.SH AUTHOR
Mikkel Schubert; Stinus Lindgreen
.SH COPYRIGHT
2017, Mikkel Schubert; Stinus Lindgreen
.\" Generated by docutils manpage writer.
.
This diff is collapsed.
### Version 2.2.3 - 2019-01-22
* Added support for trimming reads by a fixed amount: --trim5p N --trim3p N.
Different values may be given for each mate: --trim5p N1 N2. Trimming is
carried out after adapters have been removed and reads have been collapsed,
if enabled, but before quality trimming (Ns and low qualities).
* Added option for determistic read merging (--collapse-deterministic). In
this mode AdapterRemoval will set a merged base to 'N' with quality 0 if
the corresponding bases on the two mates differ, and if both have the same
quality score. The default behavior is to select one of the two bases at
random.
* Fixed reporting of line numbers in error messages.
* Added conda installation instructions, courtesy of Maxime Borry (maxibor).
* Fixed reading mate 2 adapters specified via --adapter-list. Adapters would
be used in the reverse orientation compared to --adapter2. Courtesy of
Karolis (KarolisM).
* Fixed various typos and improved help/error messages.
### Version 2.2.2 - 2017-07-17
......
......@@ -86,21 +86,21 @@ OBJS := ${LIBOBJS} $(BDIR)/main.o
DFILES := $(OBJS:.o=.deps)
.PHONY: all install clean test clean_tests static validate validation
.PHONY: all install clean test clean_tests static regression docs
all: build/$(PROG) build/$(PROG).1
all: build/$(PROG)
everything: all static test validation
everything: all static test regression docs
# Clean
clean: clean_tests
clean: clean_tests clean_docs
@echo $(COLOR_GREEN)"Cleaning ..."$(COLOR_END)
$(QUIET) rm -f build/$(PROG) build/$(PROG).1 build/$(LIBNAME).a
$(QUIET) rm -rvf build/validation
$(QUIET) rm -f build/$(PROG) build/$(LIBNAME).a
$(QUIET) rm -rvf build/regression
$(QUIET) rm -rvf $(BDIR)
# Install
install: build/$(PROG) build/$(PROG).1
install: build/$(PROG)
@echo $(COLOR_GREEN)"Installing AdapterRemoval .."$(COLOR_END)
@echo $(COLOR_GREEN)" .. binary into ${PREFIX}/bin/"$(COLOR_END)
$(QUIET) mkdir -p ${PREFIX}/bin/
......@@ -109,7 +109,7 @@ install: build/$(PROG) build/$(PROG).1
@echo $(COLOR_GREEN)" .. man-page into ${PREFIX}/share/man/man1/"$(COLOR_END)
$(QUIET) mkdir -p ${PREFIX}/share/man/man1/
$(QUIET) mv -f build/$(PROG).1 ${PREFIX}/share/man/man1/
$(QUIET) cp -a $(PROG).1 ${PREFIX}/share/man/man1/
$(QUIET) chmod a+r ${PREFIX}/share/man/man1/$(PROG).1
@echo $(COLOR_GREEN)" .. README into ${PREFIX}/share/adapterremoval/"$(COLOR_END)
......@@ -141,11 +141,6 @@ build/$(LIBNAME).a: $(LIBOBJS)
@echo $(COLOR_GREEN)"Linking static library $@"$(COLOR_END)
$(AR) rcs build/$(LIBNAME).a $(LIBOBJS)
build/%.1: %.pod
@echo $(COLOR_GREEN)"Constructing man-page $@ from $<"$(COLOR_END)
$(QUIET) mkdir -p $(BDIR)
$(QUIET) pod2man $< > $@
# Automatic header depencencies
-include $(DFILES)
......@@ -154,11 +149,12 @@ build/%.1: %.pod
# Unit testing
#
TEST_DIR := build/tests
TEST_OBJS := $(TEST_DIR)/alignment.o \
TEST_OBJS := $(TEST_DIR)/main_test.o \
$(TEST_DIR)/debug.o \
$(TEST_DIR)/alignment.o \
$(TEST_DIR)/alignment_test.o \
$(TEST_DIR)/argparse.o \
$(TEST_DIR)/argparse_test.o \
$(TEST_DIR)/debug.o \
$(TEST_DIR)/fastq.o \
$(TEST_DIR)/fastq_test.o \
$(TEST_DIR)/fastq_enc.o \
......@@ -167,30 +163,21 @@ TEST_OBJS := $(TEST_DIR)/alignment.o \
$(TEST_DIR)/strutils_test.o
TEST_DEPS := $(TEST_OBJS:.o=.deps)
GTEST_DIR := googletest-release-1.8.0/googletest
GTEST_OBJS := $(TEST_DIR)/gtest-all.o $(TEST_DIR)/gtest_main.o
GTEST_LIB := $(TEST_DIR)/libgtest.a
TEST_CXXFLAGS := -isystem $(GTEST_DIR)/include -I$(GTEST_DIR) -Isrc -DAR_TEST_BUILD -g
GTEST_CXXFLAGS := $(TEST_CXXFLAGS)
TEST_CXXFLAGS := -Isrc -DAR_TEST_BUILD -g
test: $(TEST_DIR)/main
@echo $(COLOR_GREEN)"Running tests"$(COLOR_END)
$(QUIET) $< --gtest_print_time=0 --gtest_shuffle
@echo $(COLOR_GREEN)"Running unit tests"$(COLOR_END)
$(QUIET) $(TEST_DIR)/main
clean_tests:
@echo $(COLOR_GREEN)"Cleaning tests ..."$(COLOR_END)
$(QUIET) rm -rvf $(TEST_DIR)
$(TEST_DIR)/main: $(GTEST_LIB) $(TEST_OBJS)
$(TEST_DIR)/main: $(TEST_OBJS)
@echo $(COLOR_GREEN)"Linking executable $@"$(COLOR_END)
$(QUIET) $(CXX) $(CXXFLAGS) ${LIBRARIES} $^ -o $@
$(TEST_DIR)/libgtest.a: $(GTEST_OBJS)
@echo $(COLOR_GREEN)"Linking GTest library $@"$(COLOR_END)
$(QUIET) ar -rv $@ $^
$(TEST_DIR)/%.o: tests/%.cpp
$(TEST_DIR)/%.o: tests/unit/%.cpp
@echo $(COLOR_CYAN)"Building $@ from $<"$(COLOR_END)
$(QUIET) mkdir -p $(TEST_DIR)
$(QUIET) $(CXX) $(CXXFLAGS) $(TEST_CXXFLAGS) -c -o $@ $<
......@@ -202,47 +189,34 @@ $(TEST_DIR)/%.o: src/%.cpp
$(QUIET) $(CXX) $(CXXFLAGS) $(TEST_CXXFLAGS) -c -o $@ $<
$(QUIET) $(CXX) $(CXXFLAGS) $(TEST_CXXFLAGS) -w -MM -MT $@ -MF $(@:.o=.deps) $<
$(TEST_DIR)/gtest%.o: $(GTEST_DIR)/src/gtest%.cc
@echo $(COLOR_CYAN)"Building $@ from $<"$(COLOR_END)
$(QUIET) mkdir -p $(TEST_DIR)
$(QUIET) $(CXX) $(GTEST_CXXFLAGS) -c $< -o $@
.PRECIOUS: $(GTEST_DIR)/src/gtest%.cc
$(GTEST_DIR)/src/gtest%.cc: googletest-release-1.8.0.zip
$(QUIET) if ! test -e "$@"; \
then \
echo $(COLOR_CYAN)"Unpacking Google Test library"$(COLOR_END); \
unzip -qo googletest-release-1.8.0.zip; \
fi
googletest-release-1.8.0.zip:
ifneq ("$(shell which wget)", "")
@echo $(COLOR_CYAN)"Fetching Google Test library using wget"$(COLOR_END)
$(QUIET) wget -q https://github.com/google/googletest/archive/release-1.8.0.zip -O googletest-release-1.8.0.zip
else ifneq ("$(shell which curl)", "")
@echo $(COLOR_CYAN)"Fetching Google Test library using curl"$(COLOR_END)
$(QUIET) curl -L https://github.com/google/googletest/archive/release-1.8.0.zip -o googletest-release-1.8.0.zip
else
@echo $(COLOR_YELLOW)"To run tests, first download and unpack GoogleTest 1.8.0 in this folder:"$(COLOR_END)
@echo $(COLOR_YELLOW)" $$ wget https://github.com/google/googletest/archive/release-1.8.0.zip -O googletest-release-1.8.0.zip"$(COLOR_END)
@echo $(COLOR_YELLOW)" $$ unzip googletest-release-1.8.0.zip"$(COLOR_END)
@exit 1
endif
#
# Validation
#
VALIDATION_BDIR=build/validation
VALIDATION_SDIR=validation
VALIDATION_BDIR=./build/regression
VALIDATION_SDIR=./tests/regression
validate: build/$(PROG)
@echo $(COLOR_CYAN)"Validating AdapterRemoval results"$(COLOR_END)
regression: build/$(PROG)
@echo $(COLOR_GREEN)"Running regression tests"$(COLOR_END)
@mkdir -p $(VALIDATION_BDIR)
@./validation/run $(VALIDATION_BDIR) $(VALIDATION_SDIR)
validation: validate
@$(VALIDATION_SDIR)/run $(VALIDATION_BDIR) $(VALIDATION_SDIR)
# Automatic header dependencies for tests
-include $(TEST_DEPS)
#
# Documentation
#
SPHINXOPTS = -n -q
SPHINXBUILD = sphinx-build
docs:
$(QUIET) @$(SPHINXBUILD) -M html docs build/docs $(SPHINXOPTS)
$(QUIET) @$(SPHINXBUILD) -M man docs build/docs $(SPHINXOPTS)
$(QUIET) cp -v "build/docs/man/AdapterRemoval.1" .
clean_docs:
@echo $(COLOR_GREEN)"Cleaning documentation ..."$(COLOR_END)
$(QUIET) rm -rvf build/docs
......@@ -29,6 +29,16 @@ AdapterRemoval was originally published in Lindgreen 2012:
## Installation
### Installation with [Conda](https://conda.io/docs/)
If you have Conda [installed on your system](https://conda.io/miniconda.html):
```
conda install -c maxibor adapterremoval2
```
### Manual installation
To install, first download and unpack the newest release from GitHub:
$ wget -O adapterremoval-2.1.7.tar.gz https://github.com/MikkelSchubert/adapterremoval/archive/v2.1.7.tar.gz
......@@ -166,7 +176,7 @@ Note that in the case of paired-end adapters, AdapterRemoval considers only the
If we did not know the adapter sequences for the 'reads\_*.fq' files, AdapterRemoval may be used to generate a consensus adapter sequence based on fragments identified as belonging to the adapters through pairwise alignments of the reads, provided that the data set contains only a single adapter sequence (not counting differences in index sequences).
In the following example, the identified adapters corresponds to the default adapter sequences with a poly-A tail resulting from sequencing past the end of the insert + templates. It is not necessary to specify this tail when using the --adapter1 or --adapter2 command-line options. The characters shown under each of the consensus sequences represented the phred-encoded fraction of bases identical to the consensus base, with adapter 1 containing the index CACCTA:
In the following example, the identified adapters corresponds to the default adapter sequences with a poly-A tail resulting from sequencing past the end of the insert + templates. It is not necessary to specify this tail when using the --adapter1 or --adapter2 command-line options. The characters shown under each of the consensus sequences represent the Phred-encoded fraction of bases that differ from the consensus base, such that a high Phred score indicates a strong consensus. In the examples below, adapter 1 is observed to contain the index CACCTA:
$ AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq
......
build/AdapterRemoval.1
AdapterRemoval.1
adapterremoval (2.2.3-1) unstable; urgency=medium
* New upstream version
* debhelper 12
* Standards-Version: 4.3.0
* Respect DEB_BUILD_OPTIONS in override_dh_auto_test target
-- Andreas Tille <tille@debian.org> Sun, 27 Jan 2019 12:29:03 +0100
adapterremoval (2.2.2-2) unstable; urgency=medium
[ Steffen Moeller ]
......
......@@ -4,12 +4,12 @@ Uploaders: Andreas Tille <tille@debian.org>,
Kevin Murray <kdmfoss@gmail.com>
Section: science
Priority: optional
Build-Depends: debhelper (>= 11~),
Build-Depends: debhelper (>= 12~),
zlib1g-dev,
libbz2-dev,
libgtest-dev,
python-markdown
Standards-Version: 4.2.1
Standards-Version: 4.3.0
Vcs-Browser: https://salsa.debian.org/med-team/adapterremoval
Vcs-Git: https://salsa.debian.org/med-team/adapterremoval.git
Homepage: https://github.com/MikkelSchubert/adapterremoval
......
......@@ -14,8 +14,10 @@ override_dh_auto_build:
markdown_py -f README.html README.md
override_dh_auto_test:
ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
# dh_auto_test
echo "*********** Needs adapting to libgtest-dev. ***************"
endif
override_dh_auto_install:
make PREFIX=$(CURDIR)/debian/$(DEB_SOURCE)/usr
......
AdapterRemoval v2
=================
.. image:: https://img.shields.io/travis/MikkelSchubert/adapterremoval/master.svg
:target: https://travis-ci.org/MikkelSchubert/adapterremoval
:alt: Travis-CI
.. image:: https://img.shields.io/coveralls/MikkelSchubert/adapterremoval.svg
:target: https://coveralls.io/github/MikkelSchubert/adapterremoval
:alt: Coveralls
AdapterRemoval searches for and removes adapter sequences from High-Throughput
Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end
of reads following adapter removal. AdapterRemoval can analyze both single end
and paired end data, and can be used to merge overlapping paired-ended reads
into (longer) consensus sequences. Additionally, AdapterRemoval can construct a
consensus adapter sequence for paired-ended reads, if which this information is
not available.
If you use AdapterRemoval v2, then please cite the paper
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter
trimming, identification, and read merging. BMC Research Notes, 12;9(1):88
http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2
AdapterRemoval was originally published in Lindgreen 2012:
Lindgreen (2012): AdapterRemoval: Easy Cleaning of Next Generation
Sequencing Reads, BMC Research Notes, 5:337
http://www.biomedcentral.com/1756-0500/5/337/
Overview of major features
==========================
- Trimming of adapters sequences from single-end and paired-end FASTQ reads.
- Trimming of multiple, different adapters or adapter pairs.
- Demultiplexing of single or double indexed reads, with or without trimming
of adapter sequences.
- Reconstruction of adapter sequences from paired-end reads, by the pairwise
alignment of reads in the absence of a known adapter sequence.
- Merging of overlapping read-pairs into higher-quality consensus sequences.
- Multi-threading of all operations for increased throughput.
- Reading and writing of gzip and bzip2 compressed files.
- Reading and writing of interleaved FASTQ files.
Installation
============
For detailed installation instructions, please see the
`Installation <https://adapterremoval.readthedocs.io/en/latest/installation.html>`_
section of the online documentation.
Installing AdapterRemoval using `Conda <https://conda.io/docs/>`_
-----------------------------------------------------------------
If you have Conda `installed on your system <https://conda.io/miniconda.html>`_,
you may install AdapterRemoval using one of serveral unoffical recipies::
conda install -c bioconda adapterremoval
conda install -c maxibor adapterremoval2
Installing AdapterRemoval on Debian based systems
-------------------------------------------------
Users of Debian Stretch or Ubuntu Artful Aardvark may install AdapterRemoval
using Apt::
sudo apt-get install adapterremoval adapterremoval-examples
Building AdapterRemoval from scratch
------------------------------------
Compiling AdapterRemoval from scratch requires a C++11 compliant compiler, and
that the zlib and bz2lib headers are installed::
make
sudo make install
Getting started
===============
To run AdapterRemoval, specify the location of pair 1 and (optionally) pair 2
FASTQ using the --file1 and --file2 command-line options::
AdapterRemoval --file1 myreads_1.fastq.gz --file2 myreads_2.fastq.gz
By default, AdapterRemoval will save the trimmed reads in the current working
directly, using filenames starting with 'your_output'. See the
`Input and Output
<https://adapterremoval.readthedocs.io/en/latest/input_and_output.html>`_
files section for more information about files generated by AdapterRemoval.
More examples of common usage may be found in the
`Examples <https://adapterremoval.readthedocs.io/en/latest/examples.html>`_
section of the online documentation.
Documentation
-------------
For a detailed description of program usage, please refer to the online
`online documentation <https://adapterremoval.readthedocs.io/>`_.
# -*- coding: utf-8 -*-
#
# AdapterRemoval documentation build configuration file, created by
# sphinx-quickstart on Sun Sep 17 15:00:32 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = []
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'AdapterRemoval'
copyright = u'2017, Mikkel Schubert; Stinus Lindgreen'
author = u'Mikkel Schubert; Stinus Lindgreen'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'2.2.3'
# The full version, including alpha/beta/rc tags.
release = u'2.2.3'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = []
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
'**': [
'about.html',
'navigation.html',
'relations.html', # needs 'show_related': True theme option to display
'searchbox.html',
'donate.html',
]
}
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'AdapterRemovaldoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'AdapterRemoval.tex', u'AdapterRemoval Documentation',
author, 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('manpage', 'AdapterRemoval', u'Fast short-read adapter trimming and processing',
[author], 1)
]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'AdapterRemoval', u'AdapterRemoval Documentation',
author, 'AdapterRemoval', 'One line description of project.',
'Miscellaneous'),
]
.. highlight:: Bash
Example usage
=============
The following examples all make use of the data included in the 'examples' folder.
Trimming single-end reads
-------------------------
The following command removes adapters from the file *reads_1.fq* trims both Ns and low quality bases from the reads, and gzip compresses the resulting files. The ``--basename`` option is used to specify the prefix for output files::
AdapterRemoval --file1 reads_1.fq --basename output_single --trimns --trimqualities --gzip
Since ``--gzip`` and ``--basename`` is specified, the trimmed FASTQ reads are written to *output_single.truncated.gz*, the discarded FASTQ reads are written to *output_single.discarded.gz*, and settings and summary statistics are written to *output_single.settings*.
Note that by default, AdapterRemoval does not require a minimum number of bases overlapping with the adapter sequence, before reads are trimmed. This may result in an excess of very short (1 - 3 bp) 3' fragments being falsely identified as adapter sequences, and trimmed. This behavior may be changed using the ``--minadapteroverlap`` option, which allows the specification of a minimum number of bases (excluding Ns) that must be aligned to carry trimming. For example, use *--minadapteroverlap 3* to require an overlap of at least 3 bp.
Trimming paired-end reads
-------------------------
The following command removes adapters from a paired-end reads, where the mate 1 and mate 2 reads are kept in files *reads_1.fq* and *reads_2.fq*, respectively. The reads are trimmed for both Ns and low quality bases, and overlapping reads (at least 11 nucleotides, per default) are merged (collapsed)::
AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_paired --trimns --trimqualities --collapse
This command generates the files *output_paired.pair1.truncated* and *output_paired.pair2.truncated*, which contain trimmed pairs of reads which were not collapsed, *output_paired.singleton.truncated* containing reads where one mate was discarded, *output_paired.collapsed* containing merged reads, and *output_paired.collapsed.truncated* containing merged reads that have been trimmed due to the ``--trimns`` or ``--trimqualities`` options. Finally, the *output_paired.discarded* and *output_paired.settings* files correspond to those of the single-end run.
Multiple input FASTQ files
--------------------------
More than one input file may be specified for mate 1 and mate 2 reads. This is accomplished simply by listing more than one file after the ``--file1`` and the ``--file2`` options.
For single-end reads::
AdapterRemoval --file1 reads_1a.fq reads_1b.fq reads_1c.fq
And for paired-end reads::
AdapterRemoval --file1 reads_1a.fq reads_1b.fq reads_1c.fq --file2 reads_2a.fq reads_2b.fq reads_2c.fq
AdapterRemoval will process these files as if they had been concatenated into a single file or pair of files prior to invoking AdapterRemoval. For paired reads, the files must be specified in the same order for ``--file1`` and ``--file2``.
Interleaved FASTQ reads
-----------------------
AdapterRemoval is able to read and write paired-end reads stored in a single, so-called interleaved FASTQ file (one pair at a time, first mate 1, then mate 2). This is accomplished by specifying the location of the file using ``--file1`` and *also* setting the ``--interleaved`` command-line option::
AdapterRemoval --interleaved --file1 interleaved.fq --basename output_interleaved
Other than taking just a single input file, this mode operates almost exactly like paired end trimming (as described above); the mode differs only in that paired reads are not written to a 'pair1' and a 'pair2' file, but instead these are instead written to a single, interleaved file, named 'paired'. The location of this file is controlled using the ``--output1`` option. Enabling either reading or writing of interleaved FASTQ files, both not both, can be accomplished by specifying the either of the ``--interleaved-input`` and ``--interleaved-output`` options, both of which are enabled by the ``--interleaved`` option.
Combining FASTQ output
----------------------
By default, AdapterRemoval will create one output file for each mate, one file for discarded reads, and (in PE mode) one file paired reads where one mate has been discarded, and (optionally) two files for collapsed reads. Alternatively, these files may be combined using the ``--combined-output``, in which case all output is directed to the mate 1 and (in PE mode) to the mate 2 file. In cases where reads are discarded due to trimming to due to being collapsed into a single sequence, the sequence and quality scores of the discarded read is replaced with a single 'N' with base-quality 0. This option may be combined with ``--interleaved`` / ``--interleaved-output``, to write a single, interleaved file in paired-end mode.
Different quality score encodings
---------------------------------
By default, AdapterRemoval expects the quality scores in FASTQ reads to be Phred+33 encoded, meaning that the error probabilities are encoded as (char)('!' - 10 * log10(p)). Most data will be encoded using Phred+33, but Phred+64 and 'Solexa' encoded quality scores are also supported. These are selected by specifying the ``--qualitybase`` command-line option (specifying either '33', '64', or 'solexa')::
AdapterRemoval --qualitybase 64 --file1 reads_q64.fq --basename output_phred_64
By default, reads are written using the *same* encoding as the input. If a different encoding is desired, this may be accomplished using the ``--qualitybase-output`` option::
AdapterRemoval --qualitybase 64 --qualitybase-output 33 --file1 reads_q64.fq --basename output_phred_33
Note furthermore that AdapterRemoval by default only expects quality scores in the range 0 - 41 (or -5 to 41 in the case of Solexa encoded scores). If input data using a different maximum quality score is to be processed, or if the desired maximum quality score of collapsed reads is greater than 41, then this limit may be increased using the ``--qualitymax`` option::
AdapterRemoval --qualitymax 50 --file1 reads_1.fq --file2 reads_2.fq --collapse --basename output_collapsed_q50
For a detailed overview of Phred encoding schemes currently and previously in use, see e.g. the Wikipedia article on the subject:
https://en.wikipedia.org/wiki/FASTQ_format#Encoding
Trimming paired-end reads with multiple adapter pairs
-----------------------------------------------------
It is possible to trim data that contains multiple adapter pairs, by providing a one or two-column table containing possible adapter combinations (for single-end and paired-end trimming, respectively; see e.g. examples/adapters.txt)::
cat adapters.txt
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTG AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
AAACTTGCTCTGTGCCCGCTCCGTATGTCACAACAGTGCGTGTATCACCTCAATGCAGGACTCA GATCGGGAGTAATTTGGAGGCAGTAGTTCGTCGAAACTCGGAGCGTCTTTAGCAGGAG
CTAATTTGCCGTAGCGACGTACTTCAGCCTCCAGGAATTGGACCCTTACGCACACGCATTCATG TACCGTGAAAGGTGCGCTTAGTGGCATATGCGTTAAGAGCTAGGTAACGGTCTGGAGG
GTTCATACGACGACGACCAATGGCACACTTATCCGGTACTTGCGTTTCAATGCGCATGCCCCAT TAAGAAACTCGGAGTTTGGCCTGCGAGGTAGCTTGGGTGTTATGAAGAACGGCATGCG
CCATGCCCCGAAGATTCCTATACCCTTAAGGTCGCAATTGTTCGAGTAAGCTGTACGCGCCCAT GTTGCATTGACCCGAAGGGCTCGATGTTTAGGGAGGTCAGAAGTTGAGCGGGTTCAAA
This table is then specified using the ``--adapter-list`` option::
AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_multi --trimns --trimqualities --collapse --adapter-list adapters.txt
The resulting .summary file contains an overview of how frequently each adapter (pair) was used.
Note that in the case of paired-end adapters, AdapterRemoval considers only the combinations of adapters specified in the table, one combination per row. For single-end trimming, only the first column of the table file is required, and the list may therefore take the form of a file containing one sequence per line.
Identifying adapter sequences from paired-ended reads
-----------------------------------------------------
If we did not know the adapter sequences for the *reads_*.fq* files, AdapterRemoval may be used to generate a consensus adapter sequence based on fragments identified as belonging to the adapters through pairwise alignments of the reads, provided that the data set contains only a single adapter sequence (not counting differences in index sequences).
In the following example, the identified adapters corresponds to the default adapter sequences with a poly-A tail resulting from sequencing past the end of the insert + templates. It is not necessary to specify this tail when using the ``--adapter1`` or ``--adapter2`` command-line options. The characters shown under each of the consensus sequences represented the phred-encoded fraction of bases identical to the consensus base, with adapter 1 containing the index CACCTA::
AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq
Attemping to identify adapter sequences ...
Processed a total of 1,000 reads in 0.0s; 129,000 reads per second on average ...
Found 394 overlapping pairs ...
Of which 119 contained adapter sequence(s) ...
Printing adapter sequences, including poly-A tails:
--adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
||||||||||||||||||||||||||||||||||******||||||||||||||||||||||||
Consensus: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAA
Quality: 55200522544444/4411330333330222222/1.1.1.1111100-00000///..+....--*-)),,+++++++**(('%%%$
Top 5 most common 9-bp 5'-kmers:
1: AGATCGGAA = 96.00% (96)
2: AGATGGGAA = 1.00% (1)
3: AGCTCGGAA = 1.00% (1)
4: AGAGCGAAA = 1.00% (1)
5: AGATCGGGA = 1.00% (1)
--adapter2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Consensus: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Quality: 525555555144141441430333303.2/22-2/-1..11111110--00000///..+....--*-),,,+++++++**(%'%%%$
Top 5 most common 9-bp 5'-kmers:
1: AGATCGGAA = 100.00% (100)
No files are generated from running the adapter identification step.
The consensus sequences inferred are compared to those specified using the ``--adapter1`` and ``--adapter2`` command-line options, or with the default values for these if no values have been given (as in this case). Pipes (|) indicate matches between the provided sequences and the consensus sequence, and "*" indicate the presence of unspecified bases (Ns).
Demultiplexing and adapter-trimming
-----------------------------------
As of version 2.1, AdapterRemoval supports simultaneous demultiplexing and adapter trimming; demultiplexing is carried out using a simple comparison between the specified barcode (a sequence of A, C, G, and T) and the first N bases of the mate 1 read, where N is the length of the barcode. Demultiplexing of double-indexed sequences is also supported, in which case two barcodes must be specified for each sample. The first barcode is then compared to first N_1 bases of the mate 1 read, and the second barcode is compared to the first N_2 bases of the mate 2 read. By default, this comparison requires a perfect match. Reads identified as containing a specific barcode(s) are then trimmed using adapter sequences including the barcode(s) as necessary. Reads for which no (pair of) barcodes matched are written to a separate file or pair of files (for paired end reads).
Demultiplexing is enabled by creating a table of barcodes, the first column of which species the sample name (using characters a-z, A-Z, 0-9, or _) and the second and (optional) third columns specifies the barcode sequences expected at the 5' termini of mate 1 and mate 2 reads, respectively.
For example, a table of barcodes from a double-indexed run might be as follows (see examples/barcodes.txt)::
cat barcodes.txt
sample_1 ATGCGGA TGAATCT
sample_2 ATGGATT ATAGTGA
sample_7 CAAAACT TCGCTGC
In the case of single-read reads, only the first two columns are required. AdapterRemoval is invoked with the ``--barcode-list`` option, specifying the path to this table::
AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_demux --barcode-list barcodes.txt
This generates a set of output files for each sample specified in the barcode table, using the basename (``--basename``) as the prefix, followed by a dot and the sample name, followed by a dot and the default name for a given file type. For example, the output files for sample_2 would be
* output_demux.sample_2.discarded
* output_demux.sample_2.pair1.truncated
* output_demux.sample_2.pair2.truncated
* output_demux.sample_2.settings
* output_demux.sample_2.singleton.truncated
The settings files generated for each sample summarizes the reads for that sample only; in addition, a basename.settings file is generated which summarizes the number and proportion of reads identified as belonging to each sample.
The maximum number of mismatches allowed when comparing barocdes is controlled using the options ``--barcode-mm``, ``--barcode-mm-r1``, and ``--barcode-mm-r2``, which specify the maximum number of mismatches total, and the maximum number of mismatches for the mate 1 and mate 2 barcodes respectively. Thus, if mm_1(i) and mm_2(i) represents the number of mismatches observed for barcode-pair i for a given pair of reads, these options require that
1. mm_1(i) <= ``--barcode-mm-r1``
2. mm_2(i) <= ``--barcode-mm-r2``
3. mm_1(i) + mm_2(i) <= ``--barcode-mm``
Demultiplexing mode
-------------------
As of version 2.2, AdapterRemoval can furthermore be used to demultiplex reads, without carrying out other forms of adapter trimming. This is accomplished by specifying the ``--demultiplex-only`` option:
AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_only_demux --barcode-list barcodes.txt --demultiplex-only
Options listed under "TRIMMING SETTINGS" (see *AdapterRemoval --help*) do not apply to this mode, but compression (``--gzip``, ``--bzip2``), multi-threading (``--threads``), interleaving (``--interleaved``, etc.) and other such options may be used in conjunction with ``--demultiplex-only``.
AdapterRemoval will generate a *.settings* file for each sample listed in the ``--barcode-list`` file, along with the adapter-sequences that should be used when trimming reads for a given sample. These adapters correspond to the adapters that were specified when running AdapterRemoval in demultiplexing mode, with the barcode prefixed as appropriate. An underscore is used to demarcate the location at which the barcode ends and the adapter beings.
It is important to use these, updated, adapter sequences when trimming the demultiplexed reads, to avoid the inclusion of barcode sequences in reads extending past the 3' termini of the DNA template sequence.
.. highlight:: Bash
Getting started
===============
To run AdapterRemoval on single-end FASTQ data, simply specify the location of FASTQ file(s) using the ``--file1`` command-line options::
AdapterRemoval --file1 myreads_1.fastq.gz
To run AdapterRemoval on paired-end FASTQ data, specify the location of the mate 1 and mate 2 FASTQ files using the ``--file1`` and ``--file2`` command-line options::
AdapterRemoval --file1 myreads_1.fastq.gz --file2 myreads_2.fastq.gz
The files may be uncompressed, gzip-compressed, or bzip2 compressed. When run in this manner, AdapterRemoval will save the trimmed reads in the current working directly, using filenames starting with 'your_output'. This behavior may be changed using the ``--basename`` option, or using specific options for each output file. See the :doc:`input_and_output` section for more information about files generated by AdapterRemoval.
More examples of common usage may be found in the :doc:`examples` section of the documentation.
A note on specifying adapters
-----------------------------
AdapterRemoval relies on the user specifying the adapter sequences to be trimmed, using the ``--adapter1`` and ``--adapter2`` command-line options. By default, AdapterRemoval is setup to trim Illumina Truseq adapters, corresponding to the following command-line options::
--adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
--adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
It is therefore extremely important to specify the correct adapter sequences when running AdapterRemoval on a dataset that does not make use of these adapters. Failure to do so will result in the wrong sequences being trimmed, and actual adapter sequences being left in the resulting "trimmed" reads.
Adapter sequences are specified in the read orientation when using the ``--adapter1`` and ``--adapter2`` command-line options, directly corresponding to the sequence that is observed in the FASTQ files produced by the base calling software. If we were processing data generated using the above TrueSeq adapters, then we would therefore expect to find those sequences as-is in our FASTQ files (assuming that the read lengths are sufficiently long and that insert sizes are sufficiently short), typically followed by a low-quality A-tail::
$ grep "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC......ATCTCGTATGCCGTCTTCTGCTTG" file1.fq
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAAGAAT
CTGGAGTTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAA
GGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGCAAATTGAAAACAC
$ grep "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" file2.fq
CAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAAAAGAAAAACATCTTG
GAACTCCAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAAAAATAGA
GAACTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAACATAAGACCTA
The ambiguous bases representing the mate 1 barcode (the six Ns) have been replaced by single-character wildcards (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters.
For paired-end data, the ``--identify-adapters`` mode may be used to verify the choice of adapters, by attempting to reconstruct the adapter sequence directly from the FASTQ reads. See the :doc:`examples` section for a demonstration of this functionality.
AdapterRemoval
==============
AdapterRemoval searches for and removes remnant adapter sequences from High-Throughput Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end of reads following adapter removal. AdapterRemoval can analyze both single end and paired end data, and can be used to merge overlapping paired-ended reads into (longer) consensus sequences. Additionally, Additionally, AdapterRemoval can construct a consensus adapter sequence for paired-ended reads, if which this information is not available.
If you use AdapterRemoval v2, then please cite the paper:
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 12;9(1):88
http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2
AdapterRemoval was originally published in Lindgreen 2012:
Lindgreen (2012): AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads, BMC Research Notes, 5:337
http://www.biomedcentral.com/1756-0500/5/337/
.. toctree::
:maxdepth: 2
:caption: Contents:
installation
getting_started
examples
manpage
misc
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Input and output files
======================
Output files
------------
.. highlight:: Bash
Installation
============
Installing on Debian based systems
----------------------------------
Debian users on Stretch, Buster, or Sid, or using Jessie-backports, as well as Ubuntu users on Zesty or Artful, may install AdapterRemoval using apt::
apt-get install adapterremoval
For other distributions, or to get the latest version of AdapteRemoval, please see the `Installing from sources`_ section below.
Installing on OSX
-----------------
MacOSX users may install AdapterRemoval using Homebrew::
brew install homebrew/science/adapterremoval
Please see the Homebrew website for instructions on how to install and use Homebrew:
https://brew.sh/
Installing from sources
-----------------------
Installing AdapterRemoval from sources requires the presence of libz and bz2 headers. On Debian based systems, these may be installed as follows::
sudo apt-get install zlib1g-dev libbz2-dev
In addtion, a C++11 compatible compiler and basic build-tools are required. On Debian based systems, these may be installed as follows::
sudo apt-get install build-essential
To compile AdapterRemoval, first download and unpack the newest release from GitHub, and then run the 'make' command::
wget -O adapterremoval-2.2.2.tar.gz https://github.com/MikkelSchubert/adapterremoval/archive/v2.2.2.tar.gz
tar xvzf adapterremoval-2.2.2.tar.gz
cd adapterremoval-2.2.2
make
The resulting 'AdapterRemoval' executable is located in the 'build' subdirectory, and can be run as-is. It is also possible to perform a system-wide installation of the AdapterRemoval executable, man-page, and examples using the following command::
sudo make install
AdapterRemoval manpage
======================
Synopsis
--------
**AdapterRemoval** [*options*...] --file1 <*filenames*> [--file2 <*filenames*>]
Description
-----------
:program:`AdapterRemoval` removes residual adapter sequences from single-end (SE) or paired-end (PE) FASTQ reads, optionally trimming Ns and low qualities bases and/or collapsing overlapping paired-end mates into one read. Low quality reads are filtered based on the resulting length and the number of ambigious nucleotides ('N') present following trimming. These operations may be combined with simultaneous demultiplexing using 5' barcode sequences. Alternatively, ``AdapterRemoval`` may attempt to reconstruct a consensus adapter sequences from paired-end data, in order to allow the identification of the adapter sequences originally used.
If you use this program, please cite the paper:
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 12;9(1):88
http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2
For detailed documentation, please see
http://adapterremoval.readthedocs.io/en/v2.2.3/
Options
-------
.. program:: AdapterRemoval
.. option:: --help
Display summary of command-line options.
.. option:: --version
Print the version string.
.. option:: --file1 filename [filenames...]
Read FASTQ reads from one or more files, either uncompressed, bzip2 compressed, or gzip compressed. This contains either the single-end (SE) reads or, if paired-end, the mate 1 reads. If running in paired-end mode, both ``--file1`` and ``--file2`` must be set. See the primary documentation for a list of supported formats.
.. option:: --file2 filename [filenames...]
Read one or more FASTQ files containing mate 2 reads for a paired-end run. If specified, ``--file1`` must also be set.
.. option:: --identify-adapters
Attempt to build a consensus adapter sequence from fully overlapping pairs of paired-end reads. The minimum overlap is controlled by ``--minalignmentlength``. The result will be compared with the values set using ``--adapter1`` and ``--adapter2``. No trimming is performed in this mode. Default is off.
.. option:: --threads n
Maximum number of threads. Defaults to 1.
FASTQ options
~~~~~~~~~~~~~
.. option:: --qualitybase base
The Phred quality scores encoding used in input reads - either '64' for Phred+64 (Illumina 1.3+ and 1.5+) or '33' for Phred+33 (Illumina 1.8+). In addition, the value 'solexa' may be used to specify reads with Solexa encoded scores. Default is 33.
.. option:: --qualitybase-output base
The base of the quality score for reads written by AdapterRemoval - either '64' for Phred+64 (i.e., Illumina 1.3+ and 1.5+) or '33' for Phred+33 (Illumina 1.8+). In addition, the value 'solexa' may be used to specify reads with Solexa encoded scores. However, note that quality scores are represented using Phred scores internally, and conversion to and from Solexa scores therefore result in a loss of information. The default corresponds to the value given for ``--qualitybase``.
.. option:: --qualitymax base
Specifies the maximum Phred score expected in input files, and used when writing output files. Possible values are 0 to 93 for Phred+33 encoded files, and 0 to 62 for Phred+64 encoded files. Defaults to 41.
.. option:: --mate-separator separator
Character separating the mate number (1 or 2) from the read name in FASTQ records. Defaults to '/'.
.. option:: --interleaved
Enables ``--interleaved-input`` and ``--interleaved-output``.
.. option:: --interleaved-input
If set, input is expected to be a interleaved FASTQ files specified using ``--file1``, in which pairs of reads are written one after the other (e.g. read1/1, read1/2, read2/1, read2/2, etc.).
.. option:: --interleaved-ouput
Write paired-end reads to a single file, interleaving mate 1 and mate 2 reads. By default, this file is named ``basename.paired.truncated``, but this may be changed using the ``--output1`` option.
.. option:: --combined-output
Write all reads into the files specified by ``--output1`` and ``--output2``. The sequences of reads discarded due to quality filters or read merging are replaced with a single 'N' with Phred score 0. This option can be combined with ``--interleaved-output`` to write PE reads to a single output file specified with ``--output1``.
Output file options
~~~~~~~~~~~~~~~~~~~
.. option:: --basename filename
Prefix used for the naming output files, unless these names have been overridden using the corresponding command-line option (see below).
.. option:: --settings file
Output file containing information on the parameters used in the run as well as overall statistics on the reads after trimming. Default filename is 'basename.settings'.
.. option:: --output1 file
Output file containing trimmed mate1 reads. Default filename is 'basename.pair1.truncated' for paired-end reads, 'basename.truncated' for single-end reads, and 'basename.paired.truncated' for interleaved paired-end reads.
.. option:: --output2 file
Output file containing trimmed mate 2 reads when ``--interleaved-output`` is not enabled. Default filename is 'basename.pair2.truncated' in paired-end mode.
.. option:: --singleton file
Output file to which containing paired reads for which the mate has been discarded. Default filename is 'basename.singleton.truncated'.
.. option:: --outputcollapsed file
If --collapsed is set, contains overlapping mate-pairs which have been merged into a single read (PE mode) or reads for which the adapter was identified by a minimum overlap, indicating that the entire template molecule is present. This does not include which have subsequently been trimmed due to low-quality or ambiguous nucleotides. Default filename is 'basename.collapsed'
.. option:: --outputcollapsedtruncated file
Collapsed reads (see --outputcollapsed) which were trimmed due the presence of low-quality or ambiguous nucleotides. Default filename is 'basename.collapsed.truncated'.
.. option:: --discarded file
Contains reads discarded due to the --minlength, --maxlength or --maxns options. Default filename is 'basename.discarded'.
Output compression options
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. option:: --gzip
If set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression level specified using ``--gzip-level``. The extension ".gz" is added to files for which no filename was given on the command-line. Defaults to off.
.. option:: --gzip-level level
Determines the compression level used when gzip'ing FASTQ files. Must be a value in the range 0 to 9, with 0 disabling compression and 9 being the best compression. Defaults to 6.
.. option:: --bzip2
If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression level specified using ``--bzip2-level``. The extension ".bz2" is added to files for which no filename was given on the command-line. Defaults to off.
.. option:: --bzip2-level level
Determines the compression level used when bzip2'ing FASTQ files. Must be a value in the range 1 to 9, with 9 being the best compression. Defaults to 9.
FASTQ trimming options
~~~~~~~~~~~~~~~~~~~~~~
.. option:: --adapter1 adapter
Adapter sequence expected to be found in mate 1 reads, specified in read direction. For a detailed description of how to provide the appropriate adapter sequences, see the "Adapters" section of the online documentation. Default is AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG.
.. option:: --adapter2 adapter
Adapter sequence expected to be found in mate 2 reads, specified in read direction. For a detailed description of how to provide the appropriate adapter sequences, see the "Adapters" section of the online documentation. Default is AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT.
.. option:: --adapter-list filename
Read one or more adapter sequences from a table. The first two columns (separated by whitespace) of each line in the file are expected to correspond to values passed to --adapter1 and --adapter2. In single-end mode, only column one is required. Lines starting with '#' are ignored. When multiple rows are found in the table, AdapterRemoval will try each adapter (pair), and select the best aligning adapters for each FASTQ read processed.
.. option:: --minadapteroverlap length
In single-end mode, reads are only trimmed if the overlap between read and the adapter is at least X bases long, not counting ambiguous nucleotides (N); this is independent of the ``--minalignmentlength`` when using ``--collapse``, allowing a conservative selection of putative complete inserts in single-end mode, while ensuring that all possible adapter contamination is trimmed. The default is 0.
.. option:: --mm mismatchrate
The allowed fraction of mismatches allowed in the aligned region. If the value is less than 1, then the value is used directly. If ```--mismatchrate`` is greater than 1, the rate is set to 1 / ``--mismatchrate``. The default setting is 3 when trimming adapters, corresponding to a maximum mismatch rate of 1/3, and 10 when using ``--identify-adapters``.
.. option:: --shift n
To allow for missing bases in the 5' end of the read, the program can let the alignment slip ``--shift`` bases in the 5' end. This corresponds to starting the alignment maximum ``--shift`` nucleotides into read2 (for paired-end) or the adapter (for single-end). The default is 2.
.. option:: --trim5p n [n]
Trim the 5' of reads by a fixed amount after removing adapters, but before carrying out quality based trimming. Specify one value to trim mate 1 and mate 2 reads the same amount, or two values separated by a space to trim each mate different amounts. Off by default.
.. option:: --trim3p n [n]
Trim the 3' of reads by a fixed amount. See ``--trim5p``.
.. option:: --trimns
Trim consecutive Ns from the 5' and 3' termini. If quality trimming is also enabled (``--trimqualities``), then stretches of mixed low-quality bases and/or Ns are trimmed.
.. option:: --maxns n
Discard reads containing more than ``--max`` ambiguous bases ('N') after trimming. Default is 1000.
.. option:: --trimqualities
Trim consecutive stretches of low quality bases (threshold set by ``--minquality``) from the 5' and 3' termini. If trimming of Ns is also enabled (``--trimns``), then stretches of mixed low-quality bases and Ns are trimmed.
.. option:: --trimwindows window_size
Trim low quality bases using a sliding window based approach inspired by :program:`sickle` with the given window size. See the "Window based quality trimming" section of the manual page for a description of this algorithm.
.. option:: --minquality minimum
Set the threshold for trimming low quality bases using ``--trimqualities`` and ``--trimwindows``. Default is 2.
.. option:: --minlength length
Reads shorter than this length are discarded following trimming. Defaults to 15.
.. option:: --maxlength length
Reads longer than this length are discarded following trimming. Defaults to 4294967295.
FASTQ merging options
~~~~~~~~~~~~~~~~~~~~~
.. option:: --collapse
In paired-end mode, merge overlapping mates into a single and recalculate the quality scores. In single-end mode, attempt to identify templates for which the entire sequence is available. In both cases, complete "collapsed" reads are written with a 'M\_' name prefix, and "collapsed" reads which are trimmed due to quality settings are written with a 'MT\_' name prefix. The overlap needs to be at least ``--minalignmentlength`` nucleotides, with a maximum number of mismatches determined by ``--mm``.
.. option:: --minalignmentlength length
The minimum overlap between mate 1 and mate 2 before the reads are collapsed into one, when collapsing paired-end reads, or when attempting to identify complete template sequences in single-end mode. Default is 11.
.. option:: --seed seed
When collaping reads at positions where the two reads differ, and the quality of the bases are identical, AdapterRemoval will select a random base. This option specifies the seed used for the random number generator used by AdapterRemoval. This value is also written to the settings file. Note that setting the seed is not reliable in multithreaded mode, since the order of operations is non-deterministic.
.. option:: --deterministic
Enable deterministic mode; currently only affects --collapse, different overlapping bases with equal quality are set to N quality 0, instead of being randomly sampled.
FASTQ demultiplexing options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. option:: --barcode-list filename
Perform demultiplxing using table of one or two fixed-length barcodes for SE or PE reads. The table is expected to contain 2 or 3 columns, the first of which represent the name of a given sample, and the second and third of which represent the mate 1 and (optionally) the mate 2 barcode sequence. For a detailed description, see the "Demultiplexing" section of the online documentation.
.. option:: --barcode-mm n
Maximum number of mismatches allowed when counting mismatches in both the mate 1 and the mate 2 barcode for paired reads.
.. option:: --barcode-mm-r1 n
Maximum number of mismatches allowed for the mate 1 barcode; if not set, this value is equal to the ``--barcode-mm`` value; cannot be higher than the ``--barcode-mm`` value.
.. option:: --barcode-mm-r2 n
Maximum number of mismatches allowed for the mate 2 barcode; if not set, this value is equal to the ``--barcode-mm`` value; cannot be higher than the ``--barcode-mm`` value.
.. option:: --demultiplex-only
Only carry out demultiplexing using the list of barcodes supplied with --barcode-list. No other processing is done.
Window based quality trimming
-----------------------------
As of v2.2.2, AdapterRemoval implements sliding window based approach to quality based base-trimming inspired by ``sickle``. If ``window_size`` is greater than or equal to 1, that number is used as the window size for all reads. If ``window_size`` is a number greater than or equal to 0 and less than 1, then that number is multiplied by the length of individual reads to determine the window size. If the window length is zero or is greater than the current read length, then the read length is used instead.
Reads are trimmed as follows for a given window size:
1. The new 5' is determined by locating the first window where both the average quality and the quality of the first base in the window is greater than ``--minquality``.
2. The new 3' is located by sliding the first window right, until the average quality becomes less than or equal to ``--minquality``. The new 3' is placed at the last base in that window where the quality is greater than or equal to ``--minquality``.
3. If no 5' position could be determined, the read is discarded.
Exit status
-----------
AdapterRemoval exists with status 0 if the program ran succesfully, and with a non-zero exit code if any errors were encountered. Do not use the output from AdapterRemoval if the program returned a non-zero exit code!
Reporting bugs
--------------
Please report any bugs using the AdapterRemoval issue-tracker:
https://github.com/MikkelSchubert/adapterremoval/issues
License
-------
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
at your option any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Miscellaneous
=============
Window-based quality trimming
-----------------------------
As of v2.2.2, AdapterRemoval implements sliding window based approach to quality based base-trimming inspired by `sickle`_. If ``window_size`` is greater than or equal to 1, that number is used as the window size for all reads. If ``window_size`` is a number greater than or equal to 0 and less than 1, then that number is multiplied by the length of individual reads to determine the window size. If the window length is zero or is greater than the current read length, then the read length is used instead.
Reads are trimmed as follows for a given window size:
1. The new 5' is determined by locating the first window where both the average quality and the quality of the first base in the window is greater than ``--minquality``.
2. The new 3' is located by sliding the first window right, until the average quality becomes less than or equal to ``--minquality``. The new 3' is placed at the last base in that window where the quality is greater than or equal to ``--minquality``.
3. If no 5' position could be determined, the read is discarded.
Migrating from AdapterRemoval v1.x
----------------------------------
Command-line options mostly behave the same between AdapterRemoval v1 and AdapterRemoval v2, and scripts written with AdapterRemoval v1.x in mind should work with AdapterRemoval v2.x. A notable exception is the ``--pcr1`` and ``--pcr2`` options, which have been replaced by the ``--adapter1`` and ``--adapter2`` options described above. While the ``--pcr`` options are still supported for backwards compatibility, these should not be used going forward.
The difference between these two options is that ``--adapter2`` expects the mate 2 adapter sequence to be specified in the read orientation as described above, while the ``--pcr2`` expects the sequence to be in the same orientation as the mate 1 sequence, the reverse complement of the sequence observed in the mate 2 reads.
Using the common 13 bp Illumina adapter sequence (AGATCGGAAGAGC) as an example, this is how the options would be used in AdapterRemoval v2.x::
AdapterRemoval --adapter1 AGATCGGAAGAGC --adapter2 AGATCGGAAGAGC ...
And in AdapterRemoval v1.x::
AdapterRemoval --adapter1 AGATCGGAAGAGC --adapter2 GCTCTTCCGATCT ...
.. _sickle: https://github.com/najoshi/sickle
\ No newline at end of file