Skip to content
Commits on Source (14)
......@@ -19,6 +19,8 @@ Alternatively, you can also build the latest unreleased from github:
cd canu/src
make -j <number of threads>
The unreleased tip has not undergone the same testing as a release and so may have unknown bugs or issues generating sub-optimal assemblies. We recommend the release version for most users.
## Learn:
The [quick start](http://canu.readthedocs.io/en/latest/quick-start.html) will get you assembling quickly, while the [tutorial](http://canu.readthedocs.io/en/latest/tutorial.html) explains things in more detail.
......
......@@ -35,6 +35,7 @@ $stoppingCommits{"1ef335952342ef06ad1651a888f09c312f54dab8"} = 1; # 18 MAY 20
$stoppingCommits{"bbbdcd063560e5f86006ee6b8b96d2d7b80bb750"} = 1; # 21 NOV 2016
$stoppingCommits{"64459fe33f97f6d23fe036ba1395743d0cdd03e4"} = 1; # 17 APR 2017
$stoppingCommits{"9e9bd674b705f89817b07ff30067210c2d180f42"} = 1; # 14 AUG 2017
$stoppingCommits{"0fff8a511fd7d74081d94ff9e0f6c0351650ae2e"} = 1; # 27 FEB 2018 - v1.7
open(F, "< logs") or die "Failed to open 'logs': $!\n";
......
This diff is collapsed.
......@@ -286,6 +286,7 @@ foreach my $file (@filesToProcess) {
next if ($file =~ m/libfalcon/);
next if ($file =~ m/libNDFalcon/);
next if ($file =~ m/libbacktrace/);
next if ($file =~ m/libsnappy/);
next if ($file =~ m/qsort_mt.c$/);
......
#!/bin/bash
selfdir="$(dirname $(realpath ${BASH_SOURCE[0]}))"
exec "${selfdir}/../lib/canu/bin/canu" "$@"
canu (1.7+dfsg-1) UNRELEASED; urgency=medium
* New upstream version 1.7+dfsg
* Update handling of mhap bundled copy
* Refresh patches
* Use a wrapper script for main program
* move gnuplot to Depends from Recommends
* Bump mhap minimum version
* Bump Standards-Version to 4.1.3
-- Afif Elghraoui <afif@debian.org> Sat, 10 Mar 2018 23:44:12 -0500
canu (1.6+dfsg-2) unstable; urgency=medium
* Team upload
......
......@@ -9,8 +9,8 @@ Build-Depends:
libmeryl-dev,
# For File::Path
libfilesys-df-perl,
mhap (>= 2.1)
Standards-Version: 4.1.1
mhap (>= 2.1.3)
Standards-Version: 4.1.3
Homepage: http://canu.readthedocs.org/en/latest/
Vcs-Git: https://anonscm.debian.org/git/debian-med/canu.git
Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/canu.git
......@@ -22,8 +22,8 @@ Depends:
${misc:Depends},
${perl:Depends},
libfilesys-df-perl,
mhap (>= 2.1),
Recommends: gnuplot
mhap (>= 2.1.3),
gnuplot,
Suggests:
pbgenomicconsensus,
nanopolish,
......
......@@ -3,7 +3,7 @@ Upstream-Name: canu
Source: https://github.com/marbl/canu
Files-Excluded:
kmer
src/mhap/*.tar
src/mhap/mhap-*.jar
src/utgcns/libboost
Files: *
......
*-*/bin usr
*-*/lib usr
*-*/share usr
*-*/bin usr/lib/canu/
*-*/lib usr/lib/canu/
debian/bin/canu /usr/bin/
/usr/lib/canu/canu /usr/bin/canu
Description: Use Debian-packaged MHAP
Author: Afif Elghraoui <afif@debian.org>
Forwarded: not-needed
--- a/src/main.mk
+++ b/src/main.mk
@@ -193,7 +193,6 @@ SUBMAKEFILES := stores/gatekeeperCreate.
\
overlapInCore/liboverlap/prefixEditDistance-matchLimitGenerate.mk \
\
- mhap/mhap.mk \
mhap/mhapConvert.mk \
\
minimap/mmapConvert.mk \
Description: don't expect bundled MHAP
the jar file has been removd from the Debian source.
Author: Afif Elghraoui <afif@debian.org>
Forwarded: not-needed
Last-Update: 2018-03-10
--- canu.orig/src/Makefile
+++ canu/src/Makefile
@@ -615,7 +615,6 @@
${TARGET_DIR}/bin/canu \
${TARGET_DIR}/bin/trioCanu \
${TARGET_DIR}/bin/canu.defaults \
- ${TARGET_DIR}/share/java/classes/mhap-2.1.3.jar \
${TARGET_DIR}/lib/site_perl/canu/Consensus.pm \
${TARGET_DIR}/lib/site_perl/canu/CorrectReads.pm \
${TARGET_DIR}/lib/site_perl/canu/HaplotypeReads.pm \
......@@ -3,9 +3,9 @@ last-Update: Sat, 02 Sep 2017 15:30:21 +0200
Bug-Debian: https://bugs.debian.org/871390
Description: Fix gcc-7 error (violation of format-security)
--- a/src/merTrim/merTrim.C
+++ b/src/merTrim/merTrim.C
@@ -1782,7 +1782,7 @@ mertrimComputation::dump(char *label) {
--- canu.orig/src/merTrim/merTrim.C
+++ canu/src/merTrim/merTrim.C
@@ -1790,7 +1790,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (ORI)\n");
......@@ -14,7 +14,7 @@ Description: Fix gcc-7 error (violation of format-security)
logPos = 0;
for (uint32 i=0; i<seqLen; i++) {
@@ -1792,7 +1792,7 @@ mertrimComputation::dump(char *label) {
@@ -1800,7 +1800,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (SEQ)\n");
......@@ -23,7 +23,7 @@ Description: Fix gcc-7 error (violation of format-security)
if (corrSeq && verifySeq) {
uint32 i=0;
@@ -1813,7 +1813,7 @@ mertrimComputation::dump(char *label) {
@@ -1821,7 +1821,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (VAL)\n");
......@@ -32,7 +32,7 @@ Description: Fix gcc-7 error (violation of format-security)
logPos = 0;
for (uint32 i=0; i<seqLen; i++) {
@@ -1823,7 +1823,7 @@ mertrimComputation::dump(char *label) {
@@ -1831,7 +1831,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (VAL)\n");
......@@ -41,7 +41,7 @@ Description: Fix gcc-7 error (violation of format-security)
}
logPos = 0;
@@ -1834,7 +1834,7 @@ mertrimComputation::dump(char *label) {
@@ -1842,7 +1842,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (QLT)\n");
......@@ -50,7 +50,7 @@ Description: Fix gcc-7 error (violation of format-security)
logPos = 0;
for (uint32 i=0; i<seqLen; i++) {
@@ -1844,7 +1844,7 @@ mertrimComputation::dump(char *label) {
@@ -1852,7 +1852,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (COVERAGE)\n");
......@@ -59,7 +59,7 @@ Description: Fix gcc-7 error (violation of format-security)
logPos = 0;
for (uint32 i=0; i<seqLen; i++) {
@@ -1854,7 +1854,7 @@ mertrimComputation::dump(char *label) {
@@ -1862,7 +1862,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (CORRECTIONS)\n");
......@@ -68,7 +68,7 @@ Description: Fix gcc-7 error (violation of format-security)
logPos = 0;
for (uint32 i=0; i<seqLen; i++) {
@@ -1864,7 +1864,7 @@ mertrimComputation::dump(char *label) {
@@ -1872,7 +1872,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (DISCONNECTION)\n");
......@@ -77,7 +77,7 @@ Description: Fix gcc-7 error (violation of format-security)
logPos = 0;
for (uint32 i=0; i<seqLen; i++) {
@@ -1874,7 +1874,7 @@ mertrimComputation::dump(char *label) {
@@ -1882,7 +1882,7 @@
if (i+1 == clrEnd) { logLine[logPos++] = ']'; logLine[logPos++] = '-'; }
}
strcpy(logLine + logPos, " (ADAPTER)\n");
......
Description: Adjust paths to program executables
Adjust paths so that canu finds its libexec programs in /usr/lib/canu/
Author: Afif Elghraoui <afif@debian.org>
Forwarded: not-needed
--- canu.orig/src/pipelines/canu.pl
+++ canu/src/pipelines/canu.pl
@@ -41,9 +41,9 @@
use FindBin;
use Cwd qw(getcwd abs_path);
-use lib "$FindBin::RealBin/lib";
-use lib "$FindBin::RealBin/lib/canu/lib/perl5";
-use lib "$FindBin::RealBin/lib/canu/lib64/perl5";
+use lib "$FindBin::RealBin/../lib/canu";
+use lib "$FindBin::RealBin/../lib/canu/lib/perl5";
+use lib "$FindBin::RealBin/../lib/canu/lib64/perl5";
use File::Path 2.08 qw(make_path remove_tree);
external-deps.patch
relative-paths.patch
use-debian-mhap-at-runtime.patch
gcc-7_format-security.patch
external-mhap.patch
......@@ -2,23 +2,23 @@ Description: Use mhap jar from /usr/share/java
Author: Afif Elghraoui <afif@debian.org>
Forwarded: not-needed
Last-Update: 2016-03-20
--- a/src/pipelines/canu/OverlapMhap.pm
+++ b/src/pipelines/canu/OverlapMhap.pm
@@ -378,7 +378,7 @@ sub mhapConfigure ($$$) {
--- canu.orig/src/pipelines/canu/OverlapMhap.pm
+++ canu/src/pipelines/canu/OverlapMhap.pm
@@ -364,7 +364,7 @@
print F "cd ./blocks\n";
print F "\n";
print F "$javaPath -d64 -server -Xmx", $javaMemory, "m \\\n";
- print F " -jar $cygA \$bin/mhap-" . getGlobal("${tag}MhapVersion") . ".jar $cygB \\\n";
+ print F " -jar $cygA \$bin/mhap.jar $cygB \\\n";
- print F " -jar $cygA \$bin/../share/java/classes/mhap-" . getGlobal("${tag}MhapVersion") . ".jar $cygB \\\n";
+ print F " -jar $cygA /usr/share/java/mhap.jar $cygB \\\n";
print F " --repeat-weight 0.9 --repeat-idf-scale 10 -k $merSize \\\n";
print F " --supress-noise 2 \\\n" if (defined(getGlobal("${tag}MhapFilterUnique")) && getGlobal("${tag}MhapFilterUnique") == 1);
print F " --no-tf \\\n" if (defined(getGlobal("${tag}MhapNoTf")) && getGlobal("${tag}MhapNoTf") == 1);
@@ -478,7 +478,7 @@ sub mhapConfigure ($$$) {
@@ -464,7 +464,7 @@
print F "\n";
print F "if [ ! -e ./results/\$qry.mhap ] ; then\n";
print F " $javaPath -d64 -server -Xmx", $javaMemory, "m \\\n";
- print F " -jar $cygA \$bin/mhap-" . getGlobal("${tag}MhapVersion") . ".jar $cygB \\\n";
+ print F " -jar $cygA \$bin/mhap.jar $cygB \\\n";
- print F " -jar $cygA \$bin/../share/java/classes/mhap-" . getGlobal("${tag}MhapVersion") . ".jar $cygB \\\n";
+ print F " -jar $cygA /usr/share/java/mhap.jar $cygB \\\n";
print F " --repeat-weight 0.9 --repeat-idf-scale 10 -k $merSize \\\n";
print F " --supress-noise 2 \\\n" if (defined(getGlobal("${tag}MhapFilterUnique")) && getGlobal("${tag}MhapFilterUnique") == 1);
print F " --no-tf \\\n" if (defined(getGlobal("${tag}MhapNoTf")) && getGlobal("${tag}MhapNoTf") == 1);
......@@ -15,9 +15,6 @@ export DEB_BUILD_MAINT_OPTIONS=hardening=+all
override_dh_auto_build:
dh_auto_build
builddir=*-*/; \
mkdir -p lib/canu share/perl5 && mv lib share $$builddir; \
mv $$builddir/bin/lib/canu $$builddir/share/perl5; \
mv $$builddir/bin/* $$builddir/lib/canu; \
find $$builddir \
-name OverlapMhap.pm \
-exec sed -i 's#\(\s*my \$$javaPath = \).*#\1 "/usr/lib/jvm/java-8-openjdk-$(DEB_HOST_ARCH)/bin/java";#' {} +
......@@ -8,3 +8,11 @@ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#definition-li
http://rest-sphinx-memo.readthedocs.io/en/latest/ReST.html
- A very useful page with examples.
`italics`
*italics*
**bold**
``red-code``
......@@ -55,9 +55,9 @@ copyright = u'2015, Adam Phillippy, Sergey Koren, Brian Walenz'
# built documents.
#
# The short X.Y version.
version = '1.6'
version = '1.7'
# The full version, including alpha/beta/rc tags.
release = '1.6'
release = '1.7'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......
......@@ -13,7 +13,7 @@ What resources does Canu require for a bacterial genome assembly? A mammalian as
-------------------------------------
Canu will detect available resources and configure itself to run efficiently using those
resources. It will request resources, for example, the number of compute threads to use, Based
on the ``genomeSize`` being assembled. It will fail to even start if it feels there are
on the genome size being assembled. It will fail to even start if it feels there are
insufficient resources available.
A typical bacterial genome can be assembled with 8GB memory in a few CPU hours - around an hour
......@@ -39,44 +39,71 @@ How do I run Canu on my SLURM / SGE / PBS / LSF / Torque system?
To disable grid support and run only on the local machine, specify ``useGrid=false``
It is possible to limit the number of grid jobs running at the same time, but this isn't
directly supported by Canu. The various :ref:`gridOptions <grid-options>` parameters
can pass grid-specific parameters to the submit commands used; see
`Issue #756 <https://github.com/marbl/canu/issues/756>`_ for Slurm and SGE examples.
My run stopped with the error ``'Failed to submit batch jobs'``
-------------------------------------
The grid you run on must allow compute nodes to submit jobs. This means that if you are on a compute host, ``qsub/bsub/sbatch/etc`` must be available and working. You can test this by starting an interactive compute session and running the submit command manually (e.g. ``qsub`` on SGE, ``bsub`` on LSF, ``sbatch`` on SLURM).
The grid you run on must allow compute nodes to submit jobs. This means that if you are on a
compute host, ``qsub/bsub/sbatch/etc`` must be available and working. You can test this by
starting an interactive compute session and running the submit command manually (e.g. ``qsub``
on SGE, ``bsub`` on LSF, ``sbatch`` on SLURM).
If this is not the case, Canu **WILL NOT** work on your grid. You must then set ``useGrid=false`` and run on a single machine. Alternatively, you can run Canu with ``useGrid=remote`` which will stop at every submit command, list what should be submitted. You then submit these jobs manually, wait for them to complete, and run the Canu command again. This is a manual process but currently the only workaround for grids without submit support on the compute nodes.
If this is not the case, Canu **WILL NOT** work on your grid. You must then set
``useGrid=false`` and run on a single machine. Alternatively, you can run Canu with
``useGrid=remote`` which will stop at every submit command, list what should be submitted. You
then submit these jobs manually, wait for them to complete, and run the Canu command again. This
is a manual process but currently the only workaround for grids without submit support on the
compute nodes.
What parameters should I use for my reads?
-------------------------------------
Canu is designed to be universal on a large range of PacBio (C2, P4-C2, P5-C3, P6-C4) and Oxford Nanopore
(R6 through R9) data. Assembly quality and/or efficiency can be enhanced for specific datatypes:
Canu is designed to be universal on a large range of PacBio (C2, P4-C2, P5-C3, P6-C4) and Oxford
Nanopore (R6 through R9) data. Assembly quality and/or efficiency can be enhanced for specific
datatypes:
**Nanopore R7 1D** and **Low Identity Reads**
With R7 1D sequencing data, and generally for any raw reads lower than 80% identity, five to
ten rounds of error correction are helpful. To run just the correction phase, use options
``-correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high``. Use the output of
the previous run (in ``asm.correctedReads.fasta.gz``) as input to the next round.
ten rounds of error correction are helpful::
canu -p r1 -d r1 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw your_reads.fasta
canu -p r2 -d r2 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r1/r1.correctedReads.fasta.gz
canu -p r3 -d r3 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r2/r2.correctedReads.fasta.gz
canu -p r4 -d r4 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r3/r3.correctedReads.fasta.gz
canu -p r5 -d r5 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r4/r4.correctedReads.fasta.gz
Then assemble the output of the last round, allowing up to 30% difference in overlaps::
Once corrected, assemble with ``-nanopore-corrected <your data> correctedErrorRate=0.3 utgGraphDeviation=50``
canu -p asm -d asm correctedErrorRate=0.3 utgGraphDeviation=50 -nanopore-corrected r5/r5.correctedReads.fasta.gz
**Nanopore R7 2D** and **Nanopore R9 1D**
Increase the maximum allowed difference in overlaps from the default of 4.5% to 7.5% with
``correctedErrorRate=0.075``
The defaults were designed with these datasets in mind so they should work. Having very high
coverage or very long Nanopore reads can slow down the assembly significantly. You can try the
``overlapper=mhap utgReAlign=true`` option which is much faster but may produce less
contiguous assemblies on large genomes.
**Nanopore R9 2D** and **PacBio P6**
Slightly decrease the maximum allowed difference in overlaps from the default of 4.5% to 4.0%
with ``correctedErrorRate=0.040``
Slightly decrease the maximum allowed difference in overlaps from the default of 14.4% to 12.0%
with ``correctedErrorRate=0.120``
**Early PacBio Sequel**
Based on exactly one publically released *A. thaliana* `dataset
<http://www.pacb.com/blog/sequel-system-data-release-arabidopsis-dataset-genome-assembly/>`_,
slightly decrease the maximum allowed difference from the default of 4.5% to 4.0% with
``correctedErrorRate=0.040 corMhapSensitivity=normal``. For recent Sequel data, the defaults
are appropriate.
seem to be appropriate.
**Nanopore R9 large genomes**
Due to some systematic errors, the identity estimate used by Canu for correction can be an over-estimate of true error, inflating runtime. For recent large genomes (>1gbp) we've used ``'corMhapOptions=--threshold 0.8 --num-hashes 512 --ordered-sketch-size 1000 --ordered-kmer-size 14'``. This can be used with 30x or more of coverage, below that the defaults are OK.
Due to some systematic errors, the identity estimate used by Canu for correction can be an
over-estimate of true error, inflating runtime. For recent large genomes (>1gbp) with more
than 30x coverage, we've used ``'corMhapOptions=--threshold 0.8 --num-hashes
512 --ordered-sketch-size 1000 --ordered-kmer-size 14'``. This is not needed for below 30x
coverage.
My assembly continuity is not good, how can I improve it?
......@@ -161,7 +188,7 @@ What parameters can I tweak?
divergence, you'd end up collapsing the variations. We've used the following parameters
for polyploid populations (PacBio data):
``corOutCoverage=200 correctedErrorRate=0.040 "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50"``
``corOutCoverage=200 "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50"``
This will output more corrected reads (than the default 40x). The latter option will be
more conservative at picking the error rate to use for the assembly to try to maintain
......@@ -180,17 +207,28 @@ What parameters can I tweak?
chromosome (and probably some reads from other chromosomes). When assembling, overlaps
well outside the observed error rate distribution are discarded.
For metagenomes:
The basic idea is to use all data for assembly rather than just the longest as default. The
parameters we've used recently are:
``corOutCoverage=10000 corMhapSensitivity=high corMinCoverage=0 redMemory=32 oeaMemory=32 batMemory=200``
For low coverage:
- For less than 30X coverage, increase the alllowed difference in overlaps from 4.5% to 7.5%
(or more) with ``correctedErrorRate=0.075``, to adjust for inferior read correction. Canu
will automatically reduce ``corMinCoverage`` to zero to correct as many reads as possible.
- For less than 30X coverage, increase the alllowed difference in overlaps by a few percent
(from 4.5% to 8.5% (or more) with ``correctedErrorRate=0.105`` for PacBio and from 14.4% to
16% (or more) with ``correctedErrorRate=0.16`` for Nanopore), to adjust for inferior read
correction. Canu will automatically reduce ``corMinCoverage`` to zero to correct as many
reads as possible.
For high coverage:
- For more than 60X coverage, decrease the allowed difference in overlaps from 4.5% to 4.0%
with ``correctedErrorRate=0.040``, so that only the better corrected reads are used. This is
primarily an optimization for speed and generally does not change assembly continuity.
- For more than 60X coverage, decrease the allowed difference in overlaps (from 4.5% to 4.0%
with ``correctedErrorRate=0.040`` for PacBio, from 14.4% to 12% with
``correctedErrorRate=0.12`` for Nanopore), so that only the better corrected reads are used.
This is primarily an optimization for speed and generally does not change assembly
continuity.
My asm.contigs.fasta is empty, why?
......@@ -200,24 +238,26 @@ My asm.contigs.fasta is empty, why?
output, unitigs are the primary output split at alternate paths,
and unassembled are the leftover pieces.
The :ref:`contigFilter` parameter sets several parameters that control how small or low coverage
initial contigs are handled. By default, initial contigs with more than 50% of the length at
less than 5X coverage will be classified as 'unassembled' and removed from the assembly, that
is, ``contigFilter="2 0 1.0 0.5 5"``. The filtering can be disabled by changing the last number
from '5' to '0' (meaning, filter if 50% is less than 0X coverage).
The :ref:`contigFilter <contigFilter>` parameter sets several parameters that control how small
or low coverage initial contigs are handled. By default, initial contigs with more than 50% of
the length at less than 3X coverage will be classified as 'unassembled' and removed from the
assembly, that is, ``contigFilter="2 0 1.0 0.5 3"``. The filtering can be disabled by changing
the last number from '3' to '0' (meaning, filter if 50% of the contig is less than 0X coverage).
Why is my assembly is missing my favorite short plasmid?
-------------------------------------
Only the longest 40X of data (based on the specified genome size) is used for
correction. Datasets with uneven coverage or small plasmids can fail to generate enough
corrected reads to give enough coverage for assembly, resulting in gaps in the genome or even no
reads for small plasmids. Set ``corOutCoverage=1000`` (or any value greater than your total input
coverage) to correct all input data.
In Canu v1.6 and earlier only the longest 40X of data (based on the specified genome size) is
used for correction. Datasets with uneven coverage or small plasmids can fail to generate
enough corrected reads to give enough coverage for assembly, resulting in gaps in the genome or
even no reads for small plasmids. Set ``corOutCoverage=1000`` (or any value greater than your
total input coverage) to correct all input data.
An alternate approach is to correct all reads (``-correct corOutCoverage=1000``) then assemble
40X of reads picked at random from the ``<prefix>.correctedReads.fasta.gz`` output.
More recent Canu versions dynamically select poorly represented sequences to avoid missing short
plasmids so this should no longer happen.
Why do I get less corrected read data than I asked for?
-------------------------------------
......@@ -229,8 +269,29 @@ Why do I get less corrected read data than I asked for?
What is the minimum coverage required to run Canu?
-------------------------------------
For eukaryotic genomes, coverage more than 20X is enough to outperform current hybrid methods.
For eukaryotic genomes, coverage more than 20X is enough to outperform current hybrid
methods. Below that, you will likely not assemble the full genome.
My circular element is duplicated/has overlap?
-------------------------------------
This is expected for any circular elements. They can overlap by up to a read length due to how
Canu constructs contigs. Canu provides an alignment string in the GFA output which can be
converted to an alignment to identify the trimming points.
An alternative is to run MUMmer to get self-alignments on the contig and use those trim
points. For example, assuming the circular element is in ``tig00000099.fa``. Run::
nucmer -maxmatch -nosimplify tig00000099.fa tig00000099.fa
show-coords -lrcTH out.delta
to find the end overlaps in the tig. The output would be something like::
1 1895 48502 50400 1895 1899 99.37 50400 50400 3.76 3.77 tig00000001 tig00000001
48502 50400 1 1895 1899 1895 99.37 50400 50400 3.77 3.76 tig00000001 tig00000001
means trim to 1 to 48502. There is also an alternate `writeup
<https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Circularizing-and-trimming>`_.
My genome is AT (or GC) rich, do I need to adjust parameters? What about highly repetitive genomes?
-------------------------------------
......@@ -250,12 +311,9 @@ How can I send data to you?
FTP to ftp://ftp.cbcb.umd.edu/incoming/sergek. This is a write-only location that only the Canu
developers can see.
Here is a quick walk-through using a command-line ftp client (should be available on most Linux and OSX installations). Say we want to transfer a file named ``reads.fastq``. First, run ``ftp ftp.cbcb.umd.edu``, specify ``anonymous`` as the user name and hit return for password (blank). Then:
.. code-block::
cd incoming/sergek
put reads.fastq
quit
Here is a quick walk-through using a command-line ftp client (should be available on most Linux
and OSX installations). Say we want to transfer a file named ``reads.fastq``. First, run ``ftp
ftp.cbcb.umd.edu``, specify ``anonymous`` as the user name and hit return for password
(blank). Then ``cd incoming/sergek``, ``put reads.fastq``, and ``quit``.
That's it, you won't be able to see the file but we can download it.