Skip to content
Commits on Source (24)
......@@ -428,6 +428,8 @@ SPAdes assembly:
--depth_filter DEPTH_FILTER Filter out contigs lower than this fraction of the chromosomal
depth, if doing so does not result in graph dead ends (default:
0.25)
--largest_component Only keep the largest connected component of the assembly graph
(default: keep all connected components)
--spades_tmp_dir SPADES_TMP_DIR
Specify SPAdes temporary directory using the SPAdes --tmp-dir
option (default: make a temporary directory in the output
......
unicycler (0.4.8+dfsg-1~bpo9+1) stretch-backports-sloppy; urgency=medium
* Rebuild for stretch-backports-sloppy.
-- Andreas Tille <tille@debian.org> Wed, 27 Nov 2019 15:44:46 +0100
unicycler (0.4.8+dfsg-1) unstable; urgency=medium
[ Michael R. Crusoe ]
* Inherit and use LDFLAGS and CPPFLAGS
* Mark unicycler-data as Multi-Arch: foreign, as recommended by the
Multiarch hinter.
[ Andreas Tille ]
* New upstream version
* debhelper-compat 12
* Standards-Version: 4.4.1
* Set upstream metadata fields: Repository.
* (Build-)Depends: bcftools, miniasm
* Versioned (Build-)Depends of Python3 enabled spades
-- Andreas Tille <tille@debian.org> Mon, 18 Nov 2019 16:48:24 +0100
unicycler (0.4.7+dfsg-2) unstable; urgency=medium
[ Andreas Tille ]
* Add manpages
[ Liubov Chuprikova ]
* Add autopkgtest
* Split data files and docs in unicycler-data
-- Liubov Chuprikova <chuprikovalv@gmail.com> Wed, 24 Oct 2018 09:04:28 +0200
unicycler (0.4.7+dfsg-1~bpo9+1) stretch-backports; urgency=medium
* Rebuild for stretch-backports.
......
......@@ -4,20 +4,22 @@ Uploaders: Andreas Tille <tille@debian.org>,
Liubov Chuprikova <chuprikovalv@gmail.com>
Section: science
Priority: optional
Build-Depends: debhelper (>= 11~),
Build-Depends: debhelper-compat (= 12),
dh-python,
python3-all,
python3-setuptools,
default-jdk,
bcftools,
bowtie2,
miniasm,
ncbi-blast+,
pilon,
racon,
samtools,
spades,
spades (>= 3.13.1),
libseqan2-dev,
zlib1g-dev
Standards-Version: 4.2.1
Standards-Version: 4.4.1
Vcs-Browser: https://salsa.debian.org/med-team/unicycler
Vcs-Git: https://salsa.debian.org/med-team/unicycler.git
Homepage: https://github.com/rrwick/Unicycler
......@@ -29,12 +31,14 @@ Depends: ${python3:Depends},
${misc:Depends},
python3-setuptools,
default-jre,
bcftools,
bowtie2,
miniasm,
ncbi-blast+,
pilon,
racon,
samtools,
spades
spades (>= 3.13.1)
Recommends: unicycler-data
Description: hybrid assembly pipeline for bacterial genomes
Unicycler is an assembly pipeline for bacterial genomes. It can assemble
......@@ -45,6 +49,7 @@ Description: hybrid assembly pipeline for bacterial genomes
Package: unicycler-data
Architecture: all
Multi-Arch: foreign
Depends: ${misc:Depends}
Description: hybrid assembly pipeline for bacterial genomes (data package)
Unicycler is an assembly pipeline for bacterial genomes. It can assemble
......
#!/bin/sh
MANDIR=debian/mans
mkdir -p $MANDIR
VERSION=`dpkg-parsechangelog | awk '/^Version:/ {print $2}' | sed -e 's/^[0-9]*://' -e 's/-.*//' -e 's/[+~]dfsg$//'`
NAME=`grep "^Description:" debian/control | sed 's/^Description: *//' | head -n1`
PROGNAME=`grep "^Package:" debian/control | sed 's/^Package: *//' | head -n1`
AUTHOR=".SH AUTHOR\nThis manpage was written by $DEBFULLNAME for the Debian distribution and
can be used for any other usage of the program.
"
# If program name is different from package name or title should be
# different from package short description change this here
progname=${PROGNAME}
help2man --no-info --no-discard-stderr \
--name="assembly pipeline for bacterial genomes" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
progname=unicycler_align
help2man --no-info --no-discard-stderr \
--name="sensitive semi-global long read aligner" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
progname=unicycler_check
help2man --no-info --no-discard-stderr \
--name="long read assembly checker" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
progname=unicycler_polish
help2man --no-info --no-discard-stderr \
--name="Unicycler polish - hybrid assembly polishing" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
progname=unicycler_scrub
help2man --no-info --no-discard-stderr \
--name="read trimming, chimera detection and misassembly detection" \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
echo "$MANDIR/*.1" > debian/manpages
cat <<EOT
Please enhance the help2man output.
The following web page might be helpful in doing so:
http://liw.fi/manpages/
EOT
debian/mans/*.1
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH UNICYCLER "1" "October 2018" "unicycler 0.4.7" "User Commands"
.SH NAME
unicycler \- assembly pipeline for bacterial genomes
.SH SYNOPSIS
.B unicycler
[\-h] [\-\-help_all] [\-\-version] [\-1 SHORT1] [\-2 SHORT2]
[\-s UNPAIRED] [\-l LONG] \fB\-o\fR OUT [\-\-verbosity VERBOSITY]
[\-\-min_fasta_length MIN_FASTA_LENGTH] [\-\-keep KEEP]
[\-t THREADS] [\-\-mode {conservative,normal,bold}]
[\-\-linear_seqs LINEAR_SEQS] [\-\-vcf]
.SH DESCRIPTION
Unicycler is an assembly pipeline for bacterial genomes. It can assemble
Illumina-only read sets where it functions as a SPAdes-optimiser. It can
also assembly long-read-only sets (PacBio or Nanopore) where it runs a
miniasm+Racon pipeline. For the best possible assemblies, give it both
Illumina reads and long reads, and it will conduct a hybrid assembly.
.SH OPTIONS
.TP
\fB\-h\fR, \fB\-\-help\fR
Show this help message and exit
.TP
\fB\-\-help_all\fR
Show a help message with all program options
.TP
\fB\-\-version\fR
Show Unicycler's version number
.SS Input
.TP
\fB\-1\fR SHORT1, \fB\-\-short1\fR SHORT1
FASTQ file of first short reads in each pair
(required)
.TP
\fB\-2\fR SHORT2, \fB\-\-short2\fR SHORT2
FASTQ file of second short reads in each pair
(required)
.TP
\fB\-s\fR UNPAIRED, \fB\-\-unpaired\fR UNPAIRED
FASTQ file of unpaired short reads (optional)
.TP
\fB\-l\fR LONG, \fB\-\-long\fR LONG
FASTQ or FASTA file of long reads (optional)
.SS Output
.TP
\fB\-o\fR OUT, \fB\-\-out\fR OUT
Output directory (required)
.TP
\fB\-\-verbosity\fR VERBOSITY
Level of stdout and log file information (default: 1)
.IP
0 = no stdout,
.IP
1 = basic progress indicators,
.IP
2 = extra info,
.IP
3 = debugging info
.TP
\fB\-\-min_fasta_length\fR MIN_FASTA_LENGTH
Exclude contigs from the FASTA file which are
shorter than this length (default: 100)
.TP
\fB\-\-keep\fR KEEP
Level of file retention (default: 1)
.IP
0 = only keep final files: assembly (FASTA,GFA and log),
.IP
1 = also save graphs at main checkpoints,
.IP
2 = also keep SAM (enables fast rerun in different mode),
.IP
3 = keep all temp files and save all graphs (for debugging)
.TP
\fB\-\-vcf\fR
Produce a VCF by mapping the short reads to the
final assembly (experimental, default: do not
produce a vcf file)
.SS Other
.TP
\fB\-t\fR THREADS, \fB\-\-threads\fR THREADS
Number of threads used (default: 4)
.TP
\fB\-\-mode\fR {conservative,normal,bold}
Bridging mode (default: normal)
.IP
conservative = smaller contigs, lowest misassembly rate
.IP
normal = moderate contig size and misassembly rate
.IP
bold = longest contigs, higher misassembly rate
.TP
\fB\-\-linear_seqs\fR LINEAR_SEQS
The expected number of linear (i.e. non\-circular)
sequences in the underlying sequence (default: 0)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH UNICYCLER_ALIGN "1" "October 2018" "unicycler_align 0.4.7" "User Commands"
.SH NAME
unicycler_align \- sensitive semi-global long read aligner
.SH SYNOPSIS
.B unicycler_align
[\-h] \fB\-\-ref\fR REF \fB\-\-reads\fR READS \fB\-\-sam\fR SAM
[\-\-contamination CONTAMINATION] [\-\-scores SCORES]
[\-\-low_score LOW_SCORE] [\-\-keep_bad]
[\-\-sensitivity SENSITIVITY] [\-\-threads THREADS]
[\-\-verbosity VERBOSITY] [\-\-min_len MIN_LEN]
[\-\-allowed_overlap ALLOWED_OVERLAP]
.SH DESCRIPTION
Unicycler align \- a sensitive semi\-global long read aligner
.SH OPTIONS
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-\-ref\fR REF
FASTA file containing one or more reference
sequences
.TP
\fB\-\-reads\fR READS
FASTQ or FASTA file of long reads
.TP
\fB\-\-sam\fR SAM
SAM file of resulting alignments
.TP
\fB\-\-contamination\fR CONTAMINATION
FASTA file of known contamination in long reads
.TP
\fB\-\-scores\fR SCORES
Comma\-delimited string of alignment scores: match,
mismatch, gap open, gap extend (default: 3,\-6,\-5,\-2)
.TP
\fB\-\-low_score\fR LOW_SCORE
Score threshold \- alignments below this are
considered poor (default: set threshold
automatically)
.TP
\fB\-\-keep_bad\fR
Include alignments in the results even if they are
below the low score threshold (default: low\-scoring
alignments are discarded)
.TP
\fB\-\-sensitivity\fR SENSITIVITY
A number from 0 (least sensitive) to 3 (most
sensitive) (default: 0)
.TP
\fB\-\-threads\fR THREADS
Number of threads used (default: number of CPUs, up
to 8)
.TP
\fB\-\-verbosity\fR VERBOSITY
Level of stdout information (0 to 4) (default: 1)
.TP
\fB\-\-min_len\fR MIN_LEN
Minimum alignment length (bp) \- exclude alignments
shorter than this length (default: 100)
.TP
\fB\-\-allowed_overlap\fR ALLOWED_OVERLAP
Allow this much overlap between alignments in a
single read (default: 100)
.SH SEE ALSO
unicycler(1)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH UNICYCLER_CHECK "1" "October 2018" "unicycler_check 0.4.7" "User Commands"
.SH NAME
unicycler_check \- long read assembly checker
.SH SYNOPSIS
.B unicycler_check
[\-h] \fB\-\-sam\fR SAM \fB\-\-ref\fR REF \fB\-\-reads\fR READS
[\-\-min_len MIN_LEN]
[\-\-error_window_size ERROR_WINDOW_SIZE]
[\-\-depth_window_size DEPTH_WINDOW_SIZE]
[\-\-error_rate_threshold ERROR_RATE_THRESHOLD]
[\-\-depth_p_val DEPTH_P_VAL]
[\-\-window_tables WINDOW_TABLES]
[\-\-base_tables BASE_TABLES] [\-\-html HTML]
[\-\-threads THREADS] [\-\-verbosity VERBOSITY]
.SH DESCRIPTION
Long read assembly checker
.SH OPTIONS
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-\-sam\fR SAM
Input SAM file of alignments (if this file doesn't
exist, the alignment will be performed with results
saved to this file \- you can use the aligner
arguments with this script)
.TP
\fB\-\-ref\fR REF
FASTA file containing one or more reference
sequences
.TP
\fB\-\-reads\fR READS
FASTQ file of long reads
.TP
\fB\-\-min_len\fR MIN_LEN
Minimum alignment length (bp) \- exclude alignments
shorter than this length (default: 100)
.TP
\fB\-\-error_window_size\fR ERROR_WINDOW_SIZE
Window size for error summaries (default: 100)
.TP
\fB\-\-depth_window_size\fR DEPTH_WINDOW_SIZE
Window size for depth summaries (default: 100)
.TP
\fB\-\-error_rate_threshold\fR ERROR_RATE_THRESHOLD
Threshold for high error rates, expressed as the
fraction between the mean error rate and the random
alignment error rate (default: 0.3)
.TP
\fB\-\-depth_p_val\fR DEPTH_P_VAL
P\-value for low/high depth thresholds (default:
0.001)
.TP
\fB\-\-window_tables\fR WINDOW_TABLES
Path and/or prefix for table files summarising
reference errors for reference windows (default: do
not save window tables)
.TP
\fB\-\-base_tables\fR BASE_TABLES
Path and/or prefix for table files summarising
reference errors at each base (default: do not save
base tables)
.TP
\fB\-\-html\fR HTML
Path for HTML report (default: do not save HTML
report)
.TP
\fB\-\-threads\fR THREADS
Number of CPU threads used to align (default: the
number of available CPUs)
.TP
\fB\-\-verbosity\fR VERBOSITY
Level of stdout information (0 to 2) (default: 1)
.SH SEE ALSO
unicycler(1)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH UNICYCLER_POLISH "1" "October 2018" "unicycler_polish 0.4.7" "User Commands"
.SH NAME
unicycler_polish \- Unicycler polish - hybrid assembly polishing
.SH SYNOPSIS
.B unicycler_polish
[\-h] \fB\-a\fR ASSEMBLY [\-1 SHORT1] [\-2 SHORT2]
[\-\-pb_bax PB_BAX [PB_BAX ...]] [\-\-pb_bam PB_BAM]
[\-\-pb_fasta PB_FASTA] [\-\-long_reads LONG_READS]
[\-\-no_fix_local] [\-\-min_insert MIN_INSERT]
[\-\-max_insert MAX_INSERT]
[\-\-min_align_length MIN_ALIGN_LENGTH]
[\-\-homopolymer HOMOPOLYMER] [\-\-large LARGE]
[\-\-illumina_alt ILLUMINA_ALT]
[\-\-freebayes_qual_cutoff FREEBAYES_QUAL_CUTOFF]
[\-\-threads THREADS] [\-\-verbosity VERBOSITY]
[\-\-samtools SAMTOOLS] [\-\-bowtie2 BOWTIE2]
[\-\-minimap2 MINIMAP2] [\-\-freebayes FREEBAYES]
[\-\-pitchfork PITCHFORK] [\-\-bax2bam BAX2BAM]
[\-\-pbalign PBALIGN] [\-\-arrow ARROW] [\-\-pilon PILON]
[\-\-java JAVA] [\-\-ale ALE] [\-\-racon RACON]
[\-\-minimap MINIMAP] [\-\-nucmer NUCMER]
[\-\-showsnps SHOWSNPS]
.SH DESCRIPTION
Unicycler polish \- hybrid assembly polishing
.SH OPTIONS
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.SS Assembly
.TP
\fB\-a\fR ASSEMBLY, \fB\-\-assembly\fR ASSEMBLY
Input assembly to be polished
.SS Short reads
.IP
To polish with short reads (using Pilon), provide two FASTQ files of
paired\-end reads
.TP
\fB\-1\fR SHORT1, \fB\-\-short1\fR SHORT1
FASTQ file of short reads (first reads in each pair)
.TP
\fB\-2\fR SHORT2, \fB\-\-short2\fR SHORT2
FASTQ file of short reads (second reads in each
pair)
.SS PacBio reads
.IP
To polish with PacBio reads (using Arrow), provide one of the following
.TP
\fB\-\-pb_bax\fR PB_BAX [PB_BAX ...]
PacBio raw bax.h5 read files
.TP
\fB\-\-pb_bam\fR PB_BAM
PacBio BAM read file
.TP
\fB\-\-pb_fasta\fR PB_FASTA
FASTA file of PacBio reads
.SS Generic long reads
.IP
To polish with generic long reads, provide the following
.TP
\fB\-\-long_reads\fR LONG_READS
FASTQ/FASTA file of long reads
.SS Polishing settings
Various settings for polishing behaviour (defaults should work well in
most cases)
.TP
\fB\-\-no_fix_local\fR
do not fix local misassemblies (default: False)
.TP
\fB\-\-min_insert\fR MIN_INSERT
minimum valid short read insert size (default: auto)
.TP
\fB\-\-max_insert\fR MAX_INSERT
maximum valid short read insert size (default: auto)
.TP
\fB\-\-min_align_length\fR MIN_ALIGN_LENGTH
Minimum long read alignment length (default: 1000)
.TP
\fB\-\-homopolymer\fR HOMOPOLYMER
Long read polish changes to a homopolymer of this
length or greater will be ignored (default: 4)
.TP
\fB\-\-large\fR LARGE
Variants of this size or greater will be assess as
large variants (default: 10)
.TP
\fB\-\-illumina_alt\fR ILLUMINA_ALT
When assessing long read changes with short read
alignments, a variant will only be applied if the
alternative occurrences in the short read alignments
exceed this percentage (default: 5)
.TP
\fB\-\-freebayes_qual_cutoff\fR FREEBAYES_QUAL_CUTOFF
Reject Pilon substitutions from long reads if the
FreeBayes quality is less than this value (default:
10.0)
.SS Other settings
.TP
\fB\-\-threads\fR THREADS
CPU threads to use in alignment and consensus
(default: number of CPUs)
.TP
\fB\-\-verbosity\fR VERBOSITY
Level of stdout information (0 to 3, default: 2)
0 = no stdout, 1 = basic progress indicators,
2 = extra info, 3 = debugging info
.SS Tool locations
If these required tools are not available in your PATH variable, specify
their location here (depending on which input reads are used, some of
these tools may not be required)
.TP
\fB\-\-samtools\fR SAMTOOLS
path to samtools executable (default: samtools)
.TP
\fB\-\-bowtie2\fR BOWTIE2
path to bowtie2 executable (default: bowtie2)
.TP
\fB\-\-minimap2\fR MINIMAP2
path to minimap2 executable (default: minimap2)
.TP
\fB\-\-freebayes\fR FREEBAYES
path to freebayes executable (default: freebayes)
.TP
\fB\-\-pitchfork\fR PITCHFORK
Path to Pitchfork installation of PacBio tools
(should contain bin and lib directories) (default: )
.TP
\fB\-\-bax2bam\fR BAX2BAM
path to bax2bam executable (default: bax2bam)
.TP
\fB\-\-pbalign\fR PBALIGN
path to pbalign executable (default: pbalign)
.TP
\fB\-\-arrow\fR ARROW
path to arrow executable (default: arrow)
.TP
\fB\-\-pilon\fR PILON
path to pilon jar file (default: pilon*.jar)
.TP
\fB\-\-java\fR JAVA
path to java executable (default: java)
.TP
\fB\-\-ale\fR ALE
path to ALE executable (default: ALE)
.TP
\fB\-\-racon\fR RACON
path to racon executable (default: racon)
.TP
\fB\-\-minimap\fR MINIMAP
path to miniasm executable (default: minimap)
.TP
\fB\-\-nucmer\fR NUCMER
path to nucmer executable (default: nucmer)
.TP
\fB\-\-showsnps\fR SHOWSNPS
path to show\-snps executable (default: show\-snps)
.SH SEE ALSO
unicycler(1)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH UNICYCLER_SCRUB "1" "October 2018" "unicycler_scrub 0.4.7" "User Commands"
.SH NAME
unicycler_scrub \- read trimming, chimera detection and misassembly detection
.SH SYNOPSIS
.B unicycler_scrub
[\-h] \fB\-i\fR INPUT \fB\-o\fR OUT [\-r READS] [\-\-trim TRIM]
[\-\-split SPLIT] [\-\-min_split_size MIN_SPLIT_SIZE]
[\-\-discard_chimeras] [\-t THREADS] [\-\-keep_paf]
[\-\-parameters PARAMETERS] [\-\-verbosity VERBOSITY]
.SH DESCRIPTION
Unicycler\-scrub \- read trimming, chimera detection and misassembly detection
.SH OPTIONS
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-i\fR INPUT, \fB\-\-input\fR INPUT
These are the reads or assembly to be scrubbed (can
be FASTA or FASTQ format
.TP
\fB\-o\fR OUT, \fB\-\-out\fR OUT
The scrubbed reads or assembly will be saved to this
file (will have the same format as the \fB\-\-input\fR file
format) or use "none" to not produce an output file
.TP
\fB\-r\fR READS, \fB\-\-reads\fR READS
These are the reads used to scrub \fB\-\-input\fR (can be
FASTA or FASTQ format) (default: same file as
\fB\-\-input\fR)
.TP
\fB\-\-trim\fR TRIM
The aggressiveness with which the input will be
trimmed (0 to 100, where 0 is no trimming and 100 is
very aggressive trimming) (default: 50)
.TP
\fB\-\-split\fR SPLIT
The aggressiveness with which the input will be
split (0 to 100, where 0 is no splitting and 100 is
very aggressive splitting) (default: 50)
.TP
\fB\-\-min_split_size\fR MIN_SPLIT_SIZE
Parts of split sequences will only be outputted if
they are at least this big (default: 1000)
.TP
\fB\-\-discard_chimeras\fR
If used, chimeric sequences will be discarded
instead of split (default: False)
.TP
\fB\-t\fR THREADS, \fB\-\-threads\fR THREADS
Number of threads used (default: 4)
.TP
\fB\-\-keep_paf\fR
Save the alignments to file (makes repeated runs
faster because alignments can be loaded from file)
(default: False)
.TP
\fB\-\-parameters\fR PARAMETERS
Low\-level parameters (for debugging use only)
(default: )
.TP
\fB\-\-verbosity\fR VERBOSITY
Level of stdout information (default: 1)
.IP
0 = no stdout,
.IP
1 = basic progress indicators,
.IP
2 = extra info,
.IP
3 = debugging info
.SH SEE ALSO
unicycler(1)
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
From: Michael R. Crusoe <michael.crusoe@gmail.com>
Subject: Inherit and use LDFLAGS and CPPFLAGS
--- unicycler.orig/Makefile
+++ unicycler/Makefile
@@ -66,7 +66,7 @@
# These flags are required for the build to work.
FLAGS = -std=c++14 -Iunicycler/include -fPIC
-LDFLAGS = -shared -lz
+LDFLAGS += -shared -lz
# Platform-specific stuff (for Seqan)
@@ -115,4 +115,4 @@
$(RM) $(TARGET)
%.o: %.cpp $(HEADERS)
- $(CXX) $(FLAGS) $(CXXFLAGS) -c -o $@ $<
+ $(CXX) $(CPPFLAGS) $(FLAGS) $(CXXFLAGS) -c -o $@ $<
spades.patch
# bowtie.patch
install_wo_extra_steps.patch
append_flags
Tests: run-unit-test
Depends: @
Depends: @, @builddeps@
Restrictions: allow-stderr
#!/bin/bash
set -e
pkg=#PACKAGENAME#
pkg=unicycler
if [ "$AUTOPKGTEST_TMP" = "" ] ; then
AUTOPKGTEST_TMP=`mktemp -d /tmp/${pkg}-test.XXXXXX`
trap "rm -rf $AUTOPKGTEST_TMP" 0 INT QUIT ABRT PIPE TERM
fi
cp -a /usr/share/doc/${pkg}/examples/* $AUTOPKGTEST_TMP
if [ -d /usr/share/${pkg}-data/sample_data ] ; then
cp -a /usr/share/${pkg}-data/sample_data/* $AUTOPKGTEST_TMP
else
echo "Please install package unicycler-data to run this script"
exit 1
fi
cd $AUTOPKGTEST_TMP
#do_stuff_to_test_package#
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -o illumina_assembly
#unicycler -l long_reads_high_depth.fastq.gz -o long_read_assembly
# This command fails with the following error:
# Assembling contigs and long reads with miniasm
# ...
# Assembling reads with miniasm... empty result
# Error: miniasm assembly failed
# It might be that the reads have not enough depth. See issue:
# https://github.com/rrwick/Unicycler/issues/38
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -l long_reads_low_depth.fastq.gz -o hybrid_assembly
sample_data/README.md
sample_data/download_links
sample_data /usr/share/unicycler-data
sample_data/reference.fasta /usr/share/unicycler-data/sample_data
sample_data/*.fastq.gz /usr/share/unicycler-data/sample_data
debian/tests/run-unit-test
debian/README.test
Reference:
- Author: Ryan R. Wick and Louise M. Judd and Claire L. Gorrie and Kathryn E. Holt
Title: "Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads"
- Author: >
Ryan R. Wick and Louise M. Judd and Claire L. Gorrie and Kathryn
E. Holt
Title: >
Unicycler: Resolving bacterial genome assemblies from short and long
sequencing reads
Journal: PLOS Computational Biology
Year: 2017
Volume: 13
......@@ -8,10 +12,12 @@ Reference:
Pages: e1005595
DOI: 10.1371/journal.pcbi.1005595
PMID: 28594827
URL: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005595
eprint: http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005595&type=printable
- Author: Ryan R. Wick and Louise M. Judd and Claire L. Gorrie and Kathryn E. Holt
Title: Completing bacterial genome assemblies with multiplex MinION sequencing
URL: "http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005595"
eprint: "http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005595&type=printable"
- Author: >
Ryan R. Wick and Louise M. Judd and Claire L. Gorrie and Kathryn E. Holt
Title: >
Completing bacterial genome assemblies with multiplex MinION sequencing
Journal: Microbial Genomics
Year: 2017
Volume: 3
......@@ -19,5 +25,15 @@ Reference:
Pages: e000132
DOI: 10.1099/mgen.0.000132
PMID: 29177090
URL: http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000132
eprint: http://mgen.microbiologyresearch.org/deliver/fulltext/mgen/3/10/mgen000132.pdf?itemId=/content/journal/mgen/10.1099/mgen.0.000132&mimeType=pdf&isFastTrackArticle=
URL: "http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000132"
eprint: "http://mgen.microbiologyresearch.org/deliver/fulltext/mgen/3/10/mgen000132.pdf?itemId=/content/journal/mgen/10.1099/mgen.0.000132&mimeType=pdf&isFastTrackArticle="
Registry:
- Name: OMICtools
Entry: OMICS_14591
- Name: bio.tools
Entry: unicycler
- Name: conda:bioconda
Entry: unicycler
- Name: SciCrunch
Entry: NA
Repository: https://github.com/rrwick/Unicycler.git
......@@ -423,6 +423,7 @@ class AssemblyGraph(object):
3) deleting the segment would not create any dead ends
"""
segment_nums_to_remove = []
total_length_removed = 0
ten_longest_contigs = sorted(self.segments.values(), reverse=True,
key=lambda x: x.get_length())[:10]
whole_graph_cutoff = self.get_median_read_depth(ten_longest_contigs) * relative_depth_cutoff
......@@ -437,7 +438,9 @@ class AssemblyGraph(object):
self.all_segments_below_depth(component, whole_graph_cutoff) or \
self.dead_end_change_if_deleted(seg_num) <= 0:
segment_nums_to_remove.append(seg_num)
total_length_removed += segment.get_length()
self.remove_segments(segment_nums_to_remove)
return len(segment_nums_to_remove), total_length_removed
def filter_homopolymer_loops(self):
"""
......@@ -455,6 +458,28 @@ class AssemblyGraph(object):
log.log('Removed homopolymer loops:', 3)
log.log_number_list(segment_nums_to_remove, 3)
def choose_largest_component(self):
"""
Special logic: throw out all of the graph's connected components except for the largest one.
"""
largest_component_length = None
connected_components = self.get_connected_components()
for component_nums in connected_components:
component_segments = [self.segments[x] for x in component_nums]
component_length = sum(x.get_length() for x in component_segments)
if largest_component_length is None or component_length > largest_component_length:
largest_component_length = component_length
segment_nums_to_remove = []
for component_nums in connected_components:
component_segments = [self.segments[x] for x in component_nums]
component_length = sum(x.get_length() for x in component_segments)
if component_length < largest_component_length:
segment_nums_to_remove += component_nums
self.remove_segments(segment_nums_to_remove)
if segment_nums_to_remove:
log.log('\nRemoved not-largest components:', 3)
log.log_number_list(segment_nums_to_remove, 3)
def remove_segments(self, nums_to_remove):
"""
This function deletes all segments in the nums_to_remove list, along with their links. It
......@@ -923,7 +948,7 @@ class AssemblyGraph(object):
dead_ends += 1
return potential_dead_ends - dead_ends
def clean(self, read_depth_filter):
def clean(self, read_depth_filter, largest_component):
"""
This function does various graph repairs, filters and normalisations to make it a bit
nicer.
......@@ -931,9 +956,12 @@ class AssemblyGraph(object):
log.log('Repair multi way junctions ' + get_dim_timestamp(), 3)
self.repair_multi_way_junctions()
log.log('Filter by read depth ' + get_dim_timestamp(), 3)
self.filter_by_read_depth(read_depth_filter)
removed_count, removed_length = self.filter_by_read_depth(read_depth_filter)
log.log('Filter homopolymer loops ' + get_dim_timestamp(), 3)
self.filter_homopolymer_loops()
if largest_component:
log.log('Keep largest component ' + get_dim_timestamp(), 3)
self.choose_largest_component()
log.log('Merge all possible ' + get_dim_timestamp(), 3)
self.merge_all_possible(None, 2)
log.log('Normalise read depths ' + get_dim_timestamp(), 3)
......@@ -943,6 +971,7 @@ class AssemblyGraph(object):
log.log('Sort link order ' + get_dim_timestamp(), 3)
self.sort_link_order()
log.log('Graph cleaning finished ' + get_dim_timestamp(), 3)
return removed_count, removed_length
def final_clean(self):
"""
......