Skip to content
Commits on Source (3)
BasedOnStyle: Google
BreakBeforeBraces: Mozilla
AllowShortLoopsOnASingleLine: false
AccessModifierOffset: -4
BreakConstructorInitializersBeforeComma: true
ColumnLimit: 100
IndentWidth: 4
PointerAlignment: Left
TabWidth: 4
ReflowComments: false # protect ASCII art in comments
KeepEmptyLinesAtTheStartOfBlocks: true
# UNANIMITY - CHANGELOG
## [3.1.0]
### Changed
- Per ZMW timings are default on in DIAGNOSTICS mode or available via hidden
option --zmwTimings. Output is BAM tag ms
## [3.0.0]
### Refactored
- MultiMolecularIntegrator renamed to just Integrator
- MonoMolecularIntegrator removed, all integrators now accept multiple molecules
- VirtualTemplate removed, as without MonoMolecular it is no longer needed
- MutatedTemplate added as a View object over some const template
- Template::Mutate() now returns a MutatedTemplate instead of modifying the Template
- Template was promoted from a member of Recursor to a member of EvaluatorImpl
- Recursor refactored to take a template as an argument in most functions
- Existing model files updated to match the new parent Recursor class
- s/PB_CHEMISTRY_BUNDLE_DIR/SMRT_CHEMISTRY_BUNDLE_DIR/g
## [2.1.0]
### Added
- Use pbcopper's q-gram index for sparse alignment
- Replaced seqan MSA in ChimeraLabeler
- support loading bundle models from PB_CHEMISTRY_BUNDLE_DIR
environment variable
## [2.0.4]
### Added
- Add pbcopper's ToolContract, summary is no longer a second output file
- Differentiate between .xml and .bam output type
- Enforce .pbi generation
## [2.0.3]
### Added
- Switch from cpp-optparse to pbcopper, use pbcopper's CLI parsing
## [2.0.2]
### Added
- Fix index errors in the Hirschberg aligner
- Added a cleaner interface for AddRead/GetTemplate
## [2.0.1]
### Added
- Add new ReleaseWithAssert CMAKE_BUILD_TYPE
- Bump version (to cc2 + ccs)
- Unify CCS and CC2 versioning under unanimity
- Cleanup python/swig generation
- Cleanup version handling
## [0.0.1]
### Added
- Unify code base, refactor directory structure
- Add pbccs, ConsensusCore2, pbsparse, and pbchimera
- Code coverage report
- Initial framework including pbbam, htslib, pbcopper
##############################################
# CMake build script for the UNANIMITY library
##############################################
cmake_minimum_required(VERSION 3.2)
cmake_policy(SET CMP0048 NEW)
project(UNANIMITY VERSION 3.0.0 LANGUAGES CXX C)
set(ROOT_PROJECT_NAME ${PROJECT_NAME} CACHE STRING "root project name")
# Build type
IF(NOT CMAKE_BUILD_TYPE)
SET(CMAKE_BUILD_TYPE Release CACHE STRING "Choose the type of build, options are: Debug Release Profile RelWithDebInfo ReleaseWithAssert" FORCE)
ENDIF(NOT CMAKE_BUILD_TYPE)
# Build-time options
option(UNY_build_bin "Build binaries." ON)
option(UNY_build_tests "Build UNANIMITY's unit tests." ON)
option(UNY_build_chimera "Build UNANMITIY's stand-alone chimera labeler." OFF)
option(UNY_build_sim "Build UNANMITIY's (sub)read simulator." OFF)
option(UNY_inc_coverage "Include UNANIMITY's coverage script." OFF)
option(UNY_use_ccache "Build UNANIMITY using ccache, if available." ON)
# Main project paths
set(UNY_RootDir ${UNANIMITY_SOURCE_DIR})
set(UNY_IncludeDir ${UNY_RootDir}/include)
set(UNY_SourceDir ${UNY_RootDir}/src)
set(UNY_SwigDir ${UNY_RootDir}/swig)
set(UNY_TestsDir ${UNY_RootDir}/tests)
set(UNY_ThirdPartyDir ${UNY_RootDir}/third-party)
# Project configuration
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_LIST_DIR}/cmake ${CMAKE_MODULE_PATH})
# Fixed order, do not sort or shuffle
include(uny-ccache)
include(uny-releasewithassert)
include(uny-dependencies)
include(uny-compilerflags)
include(uny-gitsha1)
include(uny-config)
# Build library
add_subdirectory(${UNY_SourceDir})
# Build tests
if(UNY_build_tests)
add_subdirectory(${UNY_TestsDir})
endif()
# Swig
if (PYTHON_SWIG)
add_subdirectory(${UNY_SwigDir})
endif()
Copyright (c) 2011-2019, Pacific Biosciences of California, Inc.
Copyright (c) 2011-2018, Pacific Biosciences of California, Inc.
All rights reserved.
......
<p align="center">
<img src="img/ccs.png" alt="CCS logo" width="250px"/>
<img src="doc/img/unanimity.png" alt="unanimity logo"/>
</p>
<h1 align="center">CCS</h1>
<p align="center">Generate Highly Accurate Single-Molecule Consensus Reads</p>
<h1 align="center">Unanimity</h1>
<p align="center">C++ library and its applications to generate and process accurate consensus sequences</p>
***
_ccs_ takes multiple (sub)reads of the same SMRTbell molecule and combines
them using a statistical model to produce one highly accurate consensus sequence,
also called HiFi or CCS read, with base quality values.
This tool powers the _Circular Consensus Sequencing_ workflow in SMRT Link.
## Availability
Latest `ccs` can be installed via bioconda package `pbccs`.
Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda)
for information on Installation, Support, License, Copyright, and Disclaimer.
## Latest Version
Version **4.0.0**: [Full changelog here](#changelog)
## Schematic Workflow
<p align="center"><img width="600px" src="img/ccs-workflow.png"/></p>
## Execution
**Input**: Subreads from a single movie in PacBio BAM format (`.subreads.bam`).
**Output**: Consensus reads in a format inferred from the file extension:
unaligned BAM (`.bam`); bgzipped FASTQ (`.fastq.gz`);
or SMRT Link XML (`.consensusreadset.xml`) which also generates a corresponding
BAM file.
Run on a full movie:
ccs movie.subreads.bam movie.ccs.bam
## [Circular Consensus Calling](doc/PBCCS.md)
Parallelize by using `--chunk`.
See [how-to chunk](#how-can-I-parallelize-on-multiple-servers).
`ccs` takes multiple reads of the same SMRTbell sequence and combines
them, employing a statistical model, to produce one high quality consensus sequence.
More information available [here](doc/PBCCS.md).
## FAQ
### What impacts the number and quality of CCS reads that are generated?
The longer the polymerase read gets, more readouts (passes) of the SMRTbell
are produced and consequently more evidence is accumulated per molecule.
This increase in evidence translates into higher consensus accuracy, as
depicted in the following plot:
<p align="center"><img width="600px" src="img/ccs-acc.png"/></p>
### How is number of passes computed?
Each CCS read is annotated with a `np` tag that contains the number of
full-length subreads used for polishing.
Since the first version of _ccs_, number of passes has only accounted for
full-length subreads. In version v3.3.0 windowing has been added, which
takes the minimum number of full-length subreads across all windows.
Starting with version v4.0.0, minimum has been replaced with mode to get a
better representation across all windows.
### Which and in what order are filters applied?
_ccs_ exposes the following filters on input subreads, draft consensus,
and final output consensus:
Input Filter Options:
--min-passes INT Minimum number of full-length subreads required to generate CCS for a ZMW. [3]
--min-snr FLOAT Minimum SNR of subreads to use for generating CCS [2.5]
Draft Filter Options:
--min-length INT Minimum draft length before polishing. [10]
--max-length INT Maximum draft length before polishing. [50000]
Output Filter Options:
--min-rq FLOAT Minimum predicted accuracy in [0, 1]. [0.99]
Data flow how each ZMW gets processed and filtered:
1. Remove subreads with lengths <50% or >200% of the median subread length.
2. Remove subreads with SNR below `--min-snr`.
3. Stop if number of full-length subreads is fewer than `--min-passes`.
4. Generate draft sequence and stop if draft length does not pass `--min-length` and `--max-length`.
5. Polish consensus sequence and only emit CCS read if predicted accuracy is at least `--min-rq`.
### How do I read the ccs_report.txt file?
The `ccs_report.txt` file summarizes (B) how many ZMWs generated CCS reads and
(C) how many failed CCS generation because of the listed causes. For (C), each ZMW
contributes exactly to one reason of failure; percentages are with respect to (C).
The following comments refer to the filters that are explained in the FAQ above.
ZMWs input (A) : 4779
ZMWs generating CCS (B) : 1875 (39.23%)
ZMWs filtered (C) : 2904 (60.77%)
Exclusive ZMW counts for (C):
No usable subreads : 66 (2.27%) <- All subreads were filtered in (1)
Below SNR threshold : 54 (1.86%) <- All subreads were filtered in (2)
Lacking full passes : 2779 (95.70%) <- Less than --min-passes full-length reads (3)
Heteroduplexes : 0 (0.00%) <- Single-strand artifacts
Min coverage violation : 0 (0.00%) <- ZMW is damaged on one strand and can't be polished reliably
Draft generation error : 5 (0.17%) <- Subreads don't agree to generate a draft sequence
Draft above --max-length : 0 (0.00%) <- Draft sequence is longer than --min-length (4)
Draft below --min-length : 0 (0.00%) <- Draft sequence is shorter than --min-length (4)
Lacking usable subreads : 0 (0.00%) <- Too many subreads were dropped while polishing
CCS did not converge : 0 (0.00%) <- Draft has too many errors that can't be polished in time
CCS below minimum RQ : 0 (0.00%) <- Predicted accuracy is below --min-rq (5)
Unknown error : 0 (0.00%) <- Rare implementation errors
### What is the definition of a heteroduplex?
In general, whenever bases on one strand of the SMRTbell are not the
reverse complement of the other strand, as small as a single base `A` with a
matching `G`. _ccs_ would polish this to one of the bases and reflect the
ambiguity in the base QV. In our case, when one strand has more than `20`
additional bases that the other strand does not have, _ccs_ won't be able to
converge to a consensus sequence, consequently will remove the ZMW and
increase the counter for heteroduplexes found in the `ccs_report.txt` file.
### How can I parallelize on multiple servers?
Parallelize by chunking. Since _ccs_ v4.0.0, direct chunking via `--chunk`
is possible. For this, the `.subreads.bam` file must accompanied by a
`.pbi` file. To generate the index `subreads.bam.pbi`, use
`pbindex`, which can be installed with `conda install pbbam`.
pbindex movie.subreads.bam
An example workflow, all ccs invocations can run simultaneously:
ccs movie.subreads.bam movie.ccs.1.bam --chunk 1/10 -j <THREADS>
ccs movie.subreads.bam movie.ccs.2.bam --chunk 2/10 -j <THREADS>
...
ccs movie.subreads.bam movie.ccs.10.bam --chunk 10/10 -j <THREADS>
Merge chunks with `pbmerge` and index with `pbindex`
pbmerge -o movie.ccs.bam movie.ccs.*.bam
pbindex movie.ccs.bam
or use `samtools`
samtools merge -@8 movie.ccs.bam movie.ccs.*.bam
### What happened to unanimity?
Unanimity lives on as a PacBio internal library to generate consensus sequences.
Customer-facing documentation will be limited to _ccs_ distributed via bioconda.
### [Help! I am getting "Unsupported chemistries found: (...)"!](#model-data)
### Where is the source code?
We have stopped mirroring code changes to GitHub in March 2018.
Instead, we provide binaries on bioconda to ease end-user experience.
If your project relies on outdated unanimity source code,
please use [this commit](https://github.com/PacificBiosciences/unanimity/tree/6f11a13e1472b8c00337ba8c5e94bf83bdab31d6).
Similar to [pbbam](https://github.com/PacificBiosciences/pbbam), Unanimity's consensus
models require chemistry-dependent parameters. As part of ongoing development efforts,
we might need to introduce new part numbers to identify novel reagents and/or SMRT Cells.
If your version of Unanimity significantly predates the chemistry you have used for
generating collections of data, you will run into issues with Unanimity not being able
to handle your data. In such cases, download the latest version of the model parameters
and place them in a subdirectory of `${SMRT_CHEMISTRY_BUNDLE_DIR}`:
### Help! I am getting "Unsupported ..."!
If you encounter the error `Unsupported chemistries found: (...)` or
`unsupported sequencing chemistry combination`, your _ccs_ binaries does not
support the used sequencing chemistry kit, from here on refered to as "chemistry".
This may be because we removed support of an older or your binary predates
release of the used chemistry.
This is unlikely to happen with _ccs_ from SMRT Link installations, as SMRT Link
is able to automatically update and install new chemistries.
Thus, easiest solution is to always use _ccs_ from the SMRT Link version that
shipped with the release of the sequencing chemistry kit.
```sh
cd <some persistent dir, preferably the same as used for pbbam>
export SMRT_CHEMISTRY_BUNDLE_DIR="${PWD}"
**Old chemistries:**
With _ccs_ 4.0.0, we have removed support for the last RSII chemistry `P6-C4`.
The only option is to downgrade _ccs_ with `conda install pbccs==3.4`.
mkdir -p arrow
cp /some/download/dir/model.json arrow/
```
**New chemistries:**
It might happen that your _ccs_ version predates the sequencing chemistry kit
and there is an easy fix, install the latest version of _ccs_ with `conda update --all`.
If you are an early access user, follow the [monkey patch tutorial](#monkey-patch-ccs-to-support-additional-sequencing-chemistry-kits).
### Monkey patch _ccs_ to support additional sequencing chemistry kits
Please create a directory that is used to inject new chemistry information
into _ccs_:
```sh
mkdir -p /path/to/persistent/dir/
cd /path/to/persistent/dir/
export SMRT_CHEMISTRY_BUNDLE_DIR="${PWD}"
mkdir -p arrow
```
Execute the following step by step instructions to fix the error you are observing
and afterwards proceed using _ccs_ as you would normally do. Additional chemistry
information is automatically loaded from the `${SMRT_CHEMISTRY_BUNDLE_DIR}`
environmental variable.
#### Error: "unsupported sequencing chemistry combination"
Please download the latest out-of-band `chemistry.xml`:
```sh
wget https://raw.githubusercontent.com/PacificBiosciences/pbcore/develop/pbcore/chemistry/resources/mapping.xml -O "${SMRT_CHEMISTRY_BUNDLE_DIR}"/chemistry.xml
```
#### Error: "Unsupported chemistries found: (...)"
Please get the latest consensus model `.json` from PacBio and
copy it to:
```sh
cp /some/download/dir/model.json "${SMRT_CHEMISTRY_BUNDLE_DIR}"/arrow/
```
### How fast is CCS?
We tested CCS runtime using 1,000 ZMWs per length bin with exactly 10 passes.
<img width="600px" src="img/runtime.png"/>
#### How does that translate into time to result per SMRT Cell?
We will measure time to result for Sequel I and II CCS sequencing collections
on a PacBio recommended HPC, according to the
[Sequel II System Compute Requirements](https://www.pacb.com/wp-content/uploads/SMRT_Link_Installation_v701.pdf)
with 192 physical or 384 hyper-threaded cores.
1) Sequel I: 15 kb insert size, 30-hours movie, 37 GB raw yield, 2.3 GB CCS UMY
2) Sequel II: 15 kb insert size, 30-hours movie, 340 GB raw yield, 24 GB CCS UMY
CCS version | Sequel I | Sequel II
:-: | :-: | :-:
≤3.0.0 | 1 day | >1 week
3.4.1 | 3 hours | >1 day
≥4.0.0 | **40 minutes** | **6 hours**
#### How is CCS speed affected by raw base yield?
Raw base yield is the sum of all polymerase read lengths.
A polymerase read consists of all subreads concatenated
with SMRTbell adapters in between.
Raw base yield can be increased with
1) higher percentage of single loaded ZMWs and
2) longer movie times that lead to longer polymerase read lengths.
Since the first version, _ccs_ scaled linear in (1) the number of single loaded
ZMWs per SMRT Cell.
Starting with version 3.3.0 _ccs_ scaled linear in (2) the polymerase read length
and with version 4.0.0 _ccs_ scales sublinear.
#### What did change in each version?
CCS version | O(insert size) | O(#passes)
:-: | :-: | :-:
≤3.0.0 | quadratic | linear
3.4.1 | **linear** | linear
≥4.0.0 | linear | **sublinear**
#### How can version 4.0.0 be sublinear in the number of passes?
With the introduction of new heuristics, individual draft bases can skip
polishing if they are of sufficient quality.
The more passes a ZMW has, the fewer bases need additional polishing.
### What heuristics are used?
Following heuristics are enabled
- determine which bases need polishing,
- remove ZMWs with single-strand artifacts such as heteroduplexes
- remove large insertions that likely are due to sequencing errors,
- on-the-fly model caching with less SNR resolution,
- adaptive windowing strategy with a target window size of 22 bp with ±2 bp overlaps, avoiding breaks in simple repeats (homopolymers to 4mer repeats)
### Does speed impact quality and yield?
Yes it does. With ~35x speed improvements from version 3.1.0 to 4.0.0 and
consequently reducing CPU time from >60,000 to <2,000 core hours,
heuristics and changes in algorithms lead to slightly lower yield and
accuracy if run head to head on the same data set. Internal tests show
that _ccs_ 4.0.0 introduces no regressions in CCS-only Structural Variant
calling and has minimal impact on SNV and indel calling in DeepVariant.
In contrast, lower DNA quality has a bigger impact on quality and yield.
### Can I tune _ccs_ to get improve results?
No, we optimized _ccs_ such that there is a good balance between speed and
output quality.
### Can I produce one consensus sequence for each strand of a molecule?
Yes, please use `--by-strand`. Make sure that you have sufficient coverage,
as `--min-passes` are per strand in this case. For each strand, _ccs_
generates one consensus read that has to pass all filters.
Read name suffix indicates strand. Example:
m64011_190714_120746/14/ccs/rev
m64011_190714_120746/35/ccs/fwd
### Is there a progress report?
Yes. With `--log-level INFO`, _ccs_ provides status to `stderr` every
`--refresh-rate seconds` (default 30):
#ZMWs, #CCS, #CPM, #CMT, ETA: 2689522, 1056330, 2806, 29.2, 4h 52m
In detail:
* `#ZMWs`, number of ZMWs processed
* `#CCS`, number of CCS reads generated
* `#CPM`, number of CCS reads generated per minute
* `#CMT`, number of CCS reads generated per minute per thread
* `ETA`, estimated processing time left
If there is no `.pbi` file present, ETA will be omitted.
## Licenses
PacBio® tool _ccs_, distributed via Bioconda, is licensed under
[BSD-3-Clause-Clear](https://spdx.org/licenses/BSD-3-Clause-Clear.html)
and statically links GNU C Library v2.29 licensed under [LGPL](https://spdx.org/licenses/LGPL-2.1-only.html).
Per LPGL 2.1 subsection 6c, you are entitled to request the complete
machine-readable work that uses glibc in object code.
This will cause Unanimity to try to load models from all files in `${SMRT_CHEMISTRY_BUNDLE_DIR}/arrow`
with a `.json` suffix.
## Changelog
* **4.0.0**:
* SMRT Link v8.0 release candidate
* Speed improvements
* Removed support for legacy python Genomic Consensus, please use `conda install pbgcpp`
* New command-line interface
* New report file
* 3.4.1
* Released with SMRT Link v7.0
* **3.4.1**:
* Log used chemistry model to INFO level
* 3.4.0
* 3.4.0:
* Fixes to unpolished mode for IsoSeq
* Improve runtime when `--minPredictedAccuracy` has been increased
* 3.3.0
* 3.3.0:
* Add a windowing approach to reduce computational complexity from quadratic to linear
* Improve multi-threading framework to increase throughput
* Enhance XML output, propagate `CollectionMetadata`
* Includes latest chemistry parameters
* 3.1.0
* 3.1.0:
* Add `--maxPoaCoverage` to decrease runtime for unpolished output, special parameter for IsoSeq workflow
* Chemistry parameters for SMRT Link v6.0
## DISCLAIMER
## License
[PacBio open source license](LICENSE)
DISCLAIMER
----------
THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.
#!/bin/bash
grep -v submodule scripts/ci/build.sh|bash -vex
machine:
python:
version: 2.7.9
dependencies:
cache_directories:
- "_deps/cmake-3.3.0-Linux-x86_64"
- "_deps/boost_1_60_0"
- "_deps/swig-3.0.8"
pre:
- curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
- sudo apt-get install git-lfs=1.1.0
- if [ ! -d _deps ] ; then mkdir _deps ; fi # Create a directory for dependencies, These are static, cache them.
- pushd _deps ; if [ ! -d cmake-3.3.0-Linux-x86_64 ] ; then wget --no-check-certificate https://www.cmake.org/files/v3.3/cmake-3.3.0-Linux-x86_64.tar.gz ; tar xzf cmake-3.3.0-Linux-x86_64.tar.gz ; fi
- pushd _deps ; if [ ! -d boost_1_60_0 ] ; then wget https://downloads.sourceforge.net/project/boost/boost/1.60.0/boost_1_60_0.tar.bz2 ; tar xjf boost_1_60_0.tar.bz2 ; fi
- pushd _deps ; if [ ! -f swig-3.0.8/bin/swig ] ; then rm -fr swig-3.0.8* ; mkdir dl ; pushd dl ; wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.8/swig-3.0.8.tar.gz ; tar xzf swig-3.0.8.tar.gz ; pushd swig-3.0.8 ; ./configure --prefix $(readlink -f ../../swig-3.0.8) ; make ; make install ; fi
- pushd _deps ; git clone https://github.com/PacificBiosciences/PacBioTestData.git
- pip install --upgrade pip
- pip install numpy cython h5py pysam cram nose jsonschema avro
- pip install --upgrade --no-deps git+https://github.com/PacificBiosciences/pbcommand.git
- pip install --upgrade --no-deps git+https://github.com/PacificBiosciences/pbcore.git
- pushd _deps/PacBioTestData ; git lfs pull && make python
- mkdir _rev_deps # Create a directory for reverse-dependencies, ie things that depend on us. These are not static, do not cache them.
# Build ConsensusCore
- pushd _deps ; git clone https://github.com/PacificBiosciences/ConsensusCore.git
- pushd _deps/ConsensusCore ; python setup.py install --boost=$(readlink -f ../../_deps/boost_1_60_0) --swig=$(readlink -f ../../_deps/swig-3.0.8/bin/swig)
- pushd _rev_deps ; git clone https://github.com/PacificBiosciences/GenomicConsensus.git
- git submodule update --init --remote
override:
- CMAKE_BUILD_TYPE=ReleaseWithAssert CMAKE_COMMAND=$(readlink -f _deps/cmake-3.3.0-Linux-x86_64/bin/cmake) Boost_INCLUDE_DIRS=$(readlink -f _deps/boost_1_60_0) SWIG_COMMAND=$(readlink -f _deps/swig-3.0.8/bin/swig) VERBOSE=1 pip install --verbose --upgrade --no-deps .
- python -c "import ConsensusCore2 ; print ConsensusCore2.__version__"
- pushd _rev_deps/GenomicConsensus ; pip install --upgrade --no-deps --verbose .
- pushd _rev_deps/GenomicConsensus ; make check # Test GC
test:
pre:
- mkdir _build
- pushd _build ; $(readlink -f ../_deps/cmake-3.3.0-Linux-x86_64/bin/cmake) -DBoost_INCLUDE_DIRS=$(readlink -f ../_deps/boost_1_60_0) -DCMAKE_BUILD_TYPE=ReleaseWithAssert ..
override:
- pushd _build ; make
- pushd _build ; make check
if(UNY_use_ccache)
find_program(CCACHE_FOUND ccache)
if(CCACHE_FOUND)
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE ccache)
set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK ccache)
endif()
endif()
# Copyright (c) 2012 - 2015, Lars Bilke
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
#
#
# 2012-01-31, Lars Bilke
# - Enable Code Coverage
#
# 2013-09-17, Joakim Söderberg
# - Added support for Clang.
# - Some additional usage instructions.
#
# USAGE:
# 0. (Mac only) If you use Xcode 5.1 make sure to patch geninfo as described here:
# http://stackoverflow.com/a/22404544/80480
#
# 1. Copy this file into your cmake modules path.
#
# 2. Add the following line to your CMakeLists.txt:
# INCLUDE(CodeCoverage)
#
# 3. Set compiler flags to turn off optimization and enable coverage:
# SET(CMAKE_CXX_FLAGS "-g -O0 -fprofile-arcs -ftest-coverage")
# SET(CMAKE_C_FLAGS "-g -O0 -fprofile-arcs -ftest-coverage")
#
# 3. Use the function SETUP_TARGET_FOR_COVERAGE to create a custom make target
# which runs your test executable and produces a lcov code coverage report:
# Example:
# SETUP_TARGET_FOR_COVERAGE(
# my_coverage_target # Name for custom target.
# test_driver # Name of the test driver executable that runs the tests.
# # NOTE! This should always have a ZERO as exit code
# # otherwise the coverage generation will not complete.
# coverage # Name of output directory.
# )
#
# 4. Build a Debug build:
# cmake -DCMAKE_BUILD_TYPE=Debug ..
# make
# make my_coverage_target
#
#
# Check prereqs
FIND_PROGRAM( GCOV_PATH gcov )
FIND_PROGRAM( LCOV_PATH lcov )
FIND_PROGRAM( GENHTML_PATH genhtml )
FIND_PROGRAM( GCOVR_PATH gcovr PATHS ${CMAKE_SOURCE_DIR}/tests)
IF(NOT GCOV_PATH)
MESSAGE(FATAL_ERROR "gcov not found! Aborting...")
ENDIF() # NOT GCOV_PATH
IF(NOT CMAKE_COMPILER_IS_GNUCXX)
# Clang version 3.0.0 and greater now supports gcov as well.
IF(NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
MESSAGE(FATAL_ERROR "Compiler is not GNU gcc! Aborting...")
ENDIF()
ENDIF() # NOT CMAKE_COMPILER_IS_GNUCXX
SET(CMAKE_CXX_FLAGS_COVERAGE
"-g -O0 --coverage -fprofile-arcs -ftest-coverage"
CACHE STRING "Flags used by the C++ compiler during coverage builds."
FORCE )
SET(CMAKE_C_FLAGS_COVERAGE
"-g -O0 --coverage -fprofile-arcs -ftest-coverage"
CACHE STRING "Flags used by the C compiler during coverage builds."
FORCE )
SET(CMAKE_EXE_LINKER_FLAGS_COVERAGE
""
CACHE STRING "Flags used for linking binaries during coverage builds."
FORCE )
SET(CMAKE_SHARED_LINKER_FLAGS_COVERAGE
""
CACHE STRING "Flags used by the shared libraries linker during coverage builds."
FORCE )
MARK_AS_ADVANCED(
CMAKE_CXX_FLAGS_COVERAGE
CMAKE_C_FLAGS_COVERAGE
CMAKE_EXE_LINKER_FLAGS_COVERAGE
CMAKE_SHARED_LINKER_FLAGS_COVERAGE )
IF ( NOT (CMAKE_BUILD_TYPE STREQUAL "Debug" OR CMAKE_BUILD_TYPE STREQUAL "Coverage"))
MESSAGE( WARNING "Code coverage results with an optimized (non-Debug) build may be misleading" )
ENDIF() # NOT CMAKE_BUILD_TYPE STREQUAL "Debug"
# Param _targetname The name of new the custom make target
# Param _testrunner The name of the target which runs the tests.
# MUST return ZERO always, even on errors.
# If not, no coverage report will be created!
# Param _outputname lcov output is generated as _outputname.info
# HTML report is generated in _outputname/index.html
# Optional fourth parameter is passed as arguments to _testrunner
# Pass them in list form, e.g.: "-j;2" for -j 2
FUNCTION(SETUP_TARGET_FOR_COVERAGE _targetname _testrunner _outputname)
IF(NOT LCOV_PATH)
MESSAGE(FATAL_ERROR "lcov not found! Aborting...")
ENDIF() # NOT LCOV_PATH
IF(NOT GENHTML_PATH)
MESSAGE(FATAL_ERROR "genhtml not found! Aborting...")
ENDIF() # NOT GENHTML_PATH
SET(coverage_info "${CMAKE_BINARY_DIR}/${_outputname}.info")
SET(coverage_cleaned "${coverage_info}.cleaned")
SEPARATE_ARGUMENTS(test_command UNIX_COMMAND "${_testrunner}")
# Setup target
ADD_CUSTOM_TARGET(${_targetname}
# Cleanup lcov
${LCOV_PATH} --directory . --zerocounters
# Run tests
COMMAND ${test_command} ${ARGV3}
# Capturing lcov counters and generating report
COMMAND ${LCOV_PATH} --directory . --capture --output-file ${coverage_info}
COMMAND ${LCOV_PATH} --remove ${coverage_info} 'tests/*' '/usr/*' 'third-party/*' --output-file ${coverage_cleaned}
COMMAND ${GENHTML_PATH} -o ${_outputname} ${coverage_cleaned}
COMMAND ${CMAKE_COMMAND} -E remove ${coverage_info} ${coverage_cleaned}
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Resetting code coverage counters to zero.\nProcessing code coverage counters and generating report."
)
# Show info where to find the report
ADD_CUSTOM_COMMAND(TARGET ${_targetname} POST_BUILD
COMMAND ;
COMMENT "Open ./${_outputname}/index.html in your browser to view the coverage report."
)
ENDFUNCTION() # SETUP_TARGET_FOR_COVERAGE
# Param _targetname The name of new the custom make target
# Param _testrunner The name of the target which runs the tests
# Param _outputname cobertura output is generated as _outputname.xml
# Optional fourth parameter is passed as arguments to _testrunner
# Pass them in list form, e.g.: "-j;2" for -j 2
FUNCTION(SETUP_TARGET_FOR_COVERAGE_COBERTURA _targetname _testrunner _outputname)
IF(NOT GCOVR_PATH)
MESSAGE(FATAL_ERROR "gcovr not found! Aborting...")
ENDIF() # NOT GCOVR_PATH
ADD_CUSTOM_TARGET(${_targetname}
# Run tests
${_testrunner} ${ARGV3}
# Running gcovr
COMMAND ${GCOVR_PATH} -x -r ${CMAKE_SOURCE_DIR} -e '${CMAKE_SOURCE_DIR}/tests/' -o ${_outputname}.xml
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Running gcovr to produce Cobertura code coverage report."
)
# Show info where to find the report
ADD_CUSTOM_COMMAND(TARGET ${_targetname} POST_BUILD
COMMAND ;
COMMENT "Cobertura code coverage report saved in ${_outputname}.xml."
)
ENDFUNCTION() # SETUP_TARGET_FOR_COVERAGE_COBERTURA
\ No newline at end of file
include(CheckCXXCompilerFlag)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
# shared CXX flags for all source code & tests
set(UNY_FLAGS "-Wall -Wextra -Wno-unused-parameter -Wno-unused-variable")
# gperftools support
if(CMAKE_BUILD_TYPE STREQUAL "Debug" AND APPLE)
set(UNY_LINKER_FLAGS "${UNY_LINKER_FLAGS} -Wl,-no_pie")
endif(CMAKE_BUILD_TYPE STREQUAL "Debug" AND APPLE)
# static linking
IF(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
set(UNY_LINKER_FLAGS "${UNY_LINKER_FLAGS} -static-libgcc -static-libstdc++")
ENDIF()
# NOTE: quash clang warnings w/ Boost
check_cxx_compiler_flag("-Wno-unused-local-typedefs" HAS_NO_UNUSED_LOCAL_TYPEDEFS)
if(HAS_NO_UNUSED_LOCAL_TYPEDEFS)
set(UNY_FLAGS "${UNY_FLAGS} -Wno-unused-local-typedefs")
endif()
# Cannot use this until pbbam complies
# if (CMAKE_COMPILER_IS_GNUCXX)
# set(UNY_FLAGS "${UNY_FLAGS} -Werror=suggest-override")
# endif()
# Coverage settings
if (UNY_inc_coverage)
set(UNY_COV_FLAGS "${UNY_FLAGS} -fprofile-arcs -ftest-coverage")
endif()
# Extra testing that will lead to longer compilation times!
if (SANITIZE)
# AddressSanitizer is a fast memory error detector
set(UNY_SANITY_FLAGS "${UNY_SANITY_FLAGS} -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls")
# Clang Thread Safety Analysis is a C++ language extension which warns about
# potential race conditions in code.
set(UNY_SANITY_FLAGS "${UNY_SANITY_FLAGS} -Wthread-safety")
# ThreadSanitizer is a tool that detects data races
set(UNY_SANITY_FLAGS "${UNY_SANITY_FLAGS} -fsanitize=thread")
# MemorySanitizer is a detector of uninitialized reads.
set(UNY_SANITY_FLAGS "${UNY_SANITY_FLAGS} -fsanitize=memory")
# UndefinedBehaviorSanitizer is a fast undefined behavior detector.
set(UNY_SANITY_FLAGS "${UNY_SANITY_FLAGS} -fsanitize=undefined")
endif()
# shared CXX flags for src & tests
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${UNY_FLAGS}")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} ${UNY_COV_FLAGS} ${UNY_SANITY_FLAGS}")
SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${UNY_LINKER_FLAGS}")
# Config generation
find_git_sha1(UNANIMITY_GIT_SHA1)
file (STRINGS "${UNY_RootDir}/CHANGELOG.md" UNANIMITY_CHANGELOG)
configure_file(
${UNY_SourceDir}/UnanimityGitHash.cpp.in
${CMAKE_BINARY_DIR}/generated/UnanimityGitHash.cpp
)
configure_file(
${UNY_SourceDir}/UnanimityVersion.cpp.in
${CMAKE_BINARY_DIR}/generated/UnanimityVersion.cpp
)
# External libraries
# Get static libraries
SET(CMAKE_FIND_LIBRARY_SUFFIXES .a ${CMAKE_FIND_LIBRARY_SUFFIXES})
# Boost
if(NOT Boost_INCLUDE_DIRS)
find_package(Boost REQUIRED)
endif()
# pbcopper
if (NOT pbcopper_INCLUDE_DIRS OR
NOT pbcopper_LIBRARIES)
if (PYTHON_SWIG)
set(pbcopper_build_shared OFF CACHE INTERNAL "" FORCE)
endif()
set(pbcopper_build_tests OFF CACHE INTERNAL "" FORCE)
set(pbcopper_build_docs OFF CACHE INTERNAL "" FORCE)
set(pbcopper_build_examples OFF CACHE INTERNAL "" FORCE)
add_subdirectory(${UNY_ThirdPartyDir}/pbcopper external/pbcopper/build)
endif()
# only require if NOT called from pip install
if (NOT PYTHON_SWIG)
# Threads
if (NOT Threads)
find_package(Threads REQUIRED)
endif()
# ZLIB
if (NOT ZLIB_INCLUDE_DIRS OR NOT ZLIB_LIBRARIES)
find_package(PkgConfig REQUIRED)
pkg_check_modules(ZLIB zlib)
else()
set(ZLIB_LDFLAGS ${ZLIB_LIBRARIES})
endif()
# pbbam
if (NOT PacBioBAM_INCLUDE_DIRS OR
NOT PacBioBAM_LIBRARIES)
set(PacBioBAM_build_docs OFF CACHE INTERNAL "" FORCE)
set(PacBioBAM_build_tests OFF CACHE INTERNAL "" FORCE)
set(PacBioBAM_build_tools OFF CACHE INTERNAL "" FORCE)
add_subdirectory(${UNY_ThirdPartyDir}/pbbam external/pbbam/build)
endif()
# cpp-optparse sources
if (NOT CPPOPTPARSE_CPP)
set(CPPOPTPARSE_CPP ${UNY_ThirdPartyDir}/cpp-optparse/OptionParser.cpp CACHE INTERNAL "" FORCE)
endif()
if (NOT CPPOPTPARSE_IncludeDir)
set(CPPOPTPARSE_IncludeDir ${UNY_ThirdPartyDir}/cpp-optparse CACHE INTERNAL "" FORCE)
endif()
# seqan headers
if (NOT SEQAN_INCLUDE_DIRS)
set(SEQAN_INCLUDE_DIRS ${UNY_ThirdPartyDir}/seqan/include CACHE INTERNAL "" FORCE)
endif()
# Complete-Striped-Smith-Waterman-Library
set(ssw_INCLUDE_DIRS ${UNY_ThirdPartyDir}/cssw)
endif()
if (__find_python_inc)
return()
endif()
set(__find_python_inc YES)
function(find_python_inc _PYTHON_INC)
# find the executable
if (NOT PYTHON_EXECUTABLE)
find_package(PythonInterp REQUIRED)
endif()
# find the include directory
execute_process(COMMAND "${PYTHON_EXECUTABLE}" "-c"
"from __future__ import print_function; import distutils.sysconfig; print(distutils.sysconfig.get_python_inc(), end='')"
RESULT_VARIABLE PYTHON_INCLUDE_SUCCESS
OUTPUT_VARIABLE PYTHON_INCLUDE_DIRS)
# check for success
if (NOT PYTHON_INCLUDE_SUCCESS EQUAL 0 OR
NOT PYTHON_INCLUDE_DIRS)
message(FATAL_ERROR "${_PYTHON_INC} needs to be set manually")
endif()
# set the output variables
set(${_PYTHON_INC} "${PYTHON_INCLUDE_DIRS}" PARENT_SCOPE)
endfunction()
function(find_numpy_inc _NUMPY_INC)
if (NOT PYTHON_EXECUTABLE)
find_package(PythonInterp REQUIRED)
endif()
execute_process(COMMAND "${PYTHON_EXECUTABLE}" "-c"
"from __future__ import print_function; import numpy; print(numpy.get_include(), end='')"
RESULT_VARIABLE NUMPY_INCLUDE_SUCCESS
OUTPUT_VARIABLE NUMPY_INCLUDE_DIRS)
if (NOT NUMPY_INCLUDE_SUCCESS EQUAL 0 OR
NOT NUMPY_INCLUDE_DIRS)
message(FATAL_ERROR "NUMPY_INCLUDE_DIRS needs to be set manually")
endif()
set(${_NUMPY_INC} "${NUMPY_INCLUDE_DIRS}" PARENT_SCOPE)
endfunction()
if(__find_git_sha1)
return()
endif()
set(__find_git_sha1 YES)
function(find_git_sha1 _GIT_SHA1)
find_package(Git QUIET REQUIRED)
execute_process(COMMAND
"${GIT_EXECUTABLE}" "describe" "--always" "--dirty=-dirty"
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}"
RESULT_VARIABLE res
OUTPUT_VARIABLE out
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE)
if (NOT res EQUAL 0)
message(FATAL_ERROR "Could not determine git sha1 via `git describe --always --dirty=*`")
endif()
set(${_GIT_SHA1} "${out}" PARENT_SCOPE)
endfunction()
string(REGEX REPLACE "[/-][dD][^/-]*NDEBUG" "" CMAKE_C_FLAGS_RELEASEWITHASSERT_INIT "${CMAKE_C_FLAGS_RELEASE_INIT}")
string(REGEX REPLACE "[/-][dD][^/-]*NDEBUG" "" CMAKE_CXX_FLAGS_RELEASEWITHASSERT_INIT "${CMAKE_CXX_FLAGS_RELEASE_INIT}")
set(CMAKE_C_FLAGS_RELEASEWITHASSERT "${CMAKE_C_FLAGS_RELEASEWITHASSERT_INIT}" CACHE STRING "C flags for release with assert builds.")
set(CMAKE_CXX_FLAGS_RELEASEWITHASSERT "${CMAKE_CXX_FLAGS_RELEASEWITHASSERT_INIT}" CACHE STRING "C++ flags for release with assert builds.")
set(CMAKE_EXE_LINKER_FLAGS_RELEASEWITHASSERT "" CACHE STRING "Linker flags for release with assert builds.")
mark_as_advanced(CMAKE_CXX_FLAGS_RELEASEWITHASSERT CMAKE_C_FLAGS_RELEASEWITHASSERT)
\ No newline at end of file
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " applehelp to make an Apple Help Book"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " epub3 to make an epub3"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
@echo " coverage to run coverage check of the documentation (if enabled)"
@echo " dummy to check syntax errors of document sources"
.PHONY: clean
clean:
rm -rf $(BUILDDIR)/*
.PHONY: html
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
.PHONY: dirhtml
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
.PHONY: singlehtml
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
.PHONY: pickle
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
.PHONY: json
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
.PHONY: htmlhelp
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
.PHONY: qthelp
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/ConsensusCore2DesignandImplementation.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/ConsensusCore2DesignandImplementation.qhc"
.PHONY: applehelp
applehelp:
$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
@echo
@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
@echo "N.B. You won't be able to view it unless you put it in" \
"~/Library/Documentation/Help or install it in your application" \
"bundle."
.PHONY: devhelp
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/ConsensusCore2DesignandImplementation"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/ConsensusCore2DesignandImplementation"
@echo "# devhelp"
.PHONY: epub
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
.PHONY: epub3
epub3:
$(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3
@echo
@echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3."
.PHONY: latex
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
.PHONY: latexpdf
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
.PHONY: latexpdfja
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
.PHONY: text
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
.PHONY: man
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
.PHONY: texinfo
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
.PHONY: info
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
.PHONY: gettext
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
.PHONY: changes
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
.PHONY: linkcheck
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
.PHONY: doctest
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
.PHONY: coverage
coverage:
$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
@echo "Testing of coverage in the sources finished, look at the " \
"results in $(BUILDDIR)/coverage/python.txt."
.PHONY: xml
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
.PHONY: pseudoxml
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
.PHONY: dummy
dummy:
$(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy
@echo
@echo "Build finished. Dummy builder generates no files."
.. _zscore-math:
"Z-Score": when is a read completely garbage?
---------------------------------------------
Garbage-In-Garbage-Out: It is not always useful to estimate fine-scale
distinctions when looking at examples that have excessively high
noise. Estimation is likely to be improved by simplying filtering away
all examples that are irrecoverably "broken" as long as you still have
sufficient sample size on which to make estimates.
With "single-molecule-weirdness" such as long bursty inserts,
individual reads (or subsections of them) might contain very high
levels of noise. The current record for bursty insert is about 5.7kb
of inserted sequence that has nothing to do with the reference
template being sequenced. This can throw off alignments and consensus
models (CCS2) that do not explicitly model these bursty insert
behaviours.
While it would be best to solve these behaviors upstream at the
chemical/polymerase level, we must have defenses in place to at least
identify when these undesirable behaviors present themselves so we
might at least try to filter them away before they wreak havoc on
estimates.
Z-Score Motivation
------------------
The idea of the Z-Score is to compute the expected mean and variance
of the log likelihood (LL) of sequences output by a given HMM. Thus
when presented with a sequence that has a certain log probability, we
can reason how far removed it is to "normal" such that outliers can be
filtered away. This Z-Score metric has shown some power at enabling
us to filter out truly aberrant reads from consensus calling.
The Arrow HMM model is a left-right "profile" HMM model; for each
template position, the model will arrive at either the match or delete
state corresponding to the template position, after sojourning through
the preceding branch/stick states some number of times. A useful
reduction of the model collapses the Branch and Stick states into an
"E" ("Extra") state, and the Match/Delete states into an "N" ("Next")
state . The following figure shows what the states look like
corresponding to each template position.
.. figure:: img/ZScore-HMM-figures.jpeg
(a) shows the states and edges corresponding to a single template
position in the Arrow model; (b) shows a simplification where
where we have collapsed the states that advance in the template
(Match and Delete become "N" ("Next") and those that remain in
place (Branch and Stick become "E" ("Extra"))
For each template position, then, we can calculate the expectation and
variance of the loglikelihood of sojourning through the "Extra" states
and finally arriving at a "Next" state (and performing all the
associated emissions). The expectation of LL over all states is then
the sum of the expectation over each state. We make independence
assumptions about the loglikelihood contributions from each state,
thus enabling the variance to be decomposed as a simple sum as well.
Z-Score parameters computation
------------------------------
*(Note that we use the nonstandard abbreviation* :math:`\log^k x = (\log x)^k` *in the following.)*
.. math::
\newcommand{\E}{\textrm{E}}
\newcommand{\Var}{\textrm{Var}}
\newcommand{\Cov}{\textrm{Cov}}
\newcommand{\ahrule}{\shortintertext{\rule{\textwidth}{0.5pt}}}
\newcommand{\phantomeq}{\phantom{{}={}}}
*Definitions:*
.. math::
\begin{aligned}
L &= \textrm{likelihood} \\
E\textrm{(xtra)} &= B\textrm{(ranch or cognate extra)} + S\textrm{(tick or non-cognate extra)} \\
N\textrm{(ext)} &= M\textrm{(atch or incorporation)} + D\textrm{(eletion)} \\
\mathbb{O} &= \textrm{set of emission outcomes} \\
p_B &= \textrm{probability of a cognate extra} \\
p_S &= \textrm{probability of a non-cognate extra} \\
p_M &= \textrm{probability of an incorporation} \\
p_D &= \textrm{probability of a deletion} \\
p_E &= p_B + p_S \\
p_N &= p_M + p_D \\
p_N + p_E &= p_B + p_S + p_M + p_D = 1
\end{aligned}
-------
*Identities:*
.. math::
\begin{aligned}
\sum_{k=0}^{\infty} p^k &= \frac{1}{1-p}, \textrm{ given } |p| < 1 \\
\sum_{k=0}^{\infty} k p^k &= \frac{p}{(1-p)^2}, \textrm{ given } |p| < 1 \\
\sum_{k=0}^{\infty} k^2 p^k &= \frac{p(1+p)}{(1-p)^3}, \textrm{ given } |p| < 1 \\
\end{aligned}
-------
*Expected loglikelihood contribution from each state type:*
.. math::
\E[\log L] &= \textrm{expected log-likelihood of the model} \\
\E[\log B] &= \textstyle{\sum_{o \in \mathbb{O}}} \Pr(o|B) \log \Pr(o|B) \\ % \textrm{expected log-likelihood of cognate extra emission} \\
\E[\log S] &= \textstyle{\sum_{o \in \mathbb{O}}} \Pr(o|S) \log \Pr(o|S) \\ % \textrm{expected log-likelihood of non-cognate extra emission} \\
\E[\log M] &= \textstyle{\sum_{o \in \mathbb{O}}} \Pr(o|M) \log \Pr(o|M) \\ % \textrm{expected log-likelihood of incorporation emission} \\
\E[\log E] &= \frac{p_B}{p_B + p_S} (\log p_B + \E[\log B]) + \frac{p_S}{p_B + p_S} (\log p_S + \E[\log S]) \\
\E[\log N] &= \frac{p_M}{p_M + p_D} (\log p_M + \E[\log M]) + \frac{p_D}{p_M + p_D} \log p_D \\
-------
*Second moments*
.. math::
\E[\log^2 L] &= \textrm{expected squared log-likelihood of the model} \\
\E[\log^2 B] &= \textstyle{\sum_{o \in \mathbb{O}}} \Pr(o|B) \log^2 \Pr(o|B) \\ % \textrm{expected squared log-likelihood of cognate extra emission} \\
\E[\log^2 S] &= \textstyle{\sum_{o \in \mathbb{O}}} \Pr(o|S) \log^2 \Pr(o|S) \\ % \textrm{expected squared log-likelihood of non-cognate extra emission} \\
\E[\log^2 M] &= \textstyle{\sum_{o \in \mathbb{O}}} \Pr(o|M) \log^2 \Pr(o|M) \\ % \textrm{expected squared log-likelihood of incorporation emission} \\
\E[\log^2 E] &= \frac{p_B}{p_B + p_S} (\log^2 p_B + 2 \cdot \E[\log B] \log p_B + \E[\log^2 B]) \mathrel{+} \\
&\phantomeq \frac{p_S}{p_B + p_S} (\log^2 p_S + 2 \cdot \E[\log S] \log p_S + \E[\log^2 S]) \\
\E[\log^2 N] &= \frac{p_M}{p_M + p_D} (\log^2 p_M + 2 \cdot \E[\log M] \log p_M + \E[\log^2 M]) + \frac{p_D}{p_M + p_D} \log^2 p_D \\
-------
*Expected loglikelihood:*
.. math::
\begin{align}
\E[\log L] &= p_N \sum_{k=0}^{\infty} p_E^k \E[\log E^k N] \\
&= p_N \sum_{k=0}^{\infty} p_E^k (k \E[\log E] + \E[\log N]) \\
&= p_N \E[\log E] \sum_{k=0}^{\infty} k p_E^k + p_N \E[\log N] \sum_{k=0}^{\infty} p_E^k \\
&= p_N \E[\log E] \frac{p_E}{p_N^2} + p_N \E[\log N] \frac{1}{p_N} \\
&= \frac{p_E}{p_N} \E[\log E] + \E[\log N]
\end{align}
---------
*Variance of loglikelihood*
.. math::
\begin{align}
\Var(\log L) &= \E[\log^2 L] - (\E[\log L])^2 \\
&= p_N \sum_{k=0}^{\infty} p_E^k \E[(\log E^k N)^2] - (\E[\log L])^2 \\
&= p_N \sum_{k=0}^{\infty} p_E^k (k \E[\log E] + \E[\log N])^2 - (\E[\log L])^2 \\
&= p_N \sum_{k=0}^{\infty} p_E^k k^2 \E[\log^2 E] \mathrel{+} \\
&\phantomeq 2 p_N \sum_{k=0}^{\infty} p_E^k k \E[\log E] \E[\log N] \mathrel{+} \\
&\phantomeq p_N \sum_{k=0}^{\infty} p_E^k \E[\log^2 N] \mathrel{-} \\
&\phantomeq (\E[\log L])^2 \\
&= p_N \E[\log^2 E] \frac{p_E (1+p_E)}{p_N^3} \mathrel{+} \\
&\phantomeq 2 p_N \E[\log E] \E[\log N] \frac{p_E}{p_N^2} \mathrel{+} \\
&\phantomeq p_N \E[\log^2 N] \frac{1}{p_N} \mathrel{-} \\
&\phantomeq (\E[\log L])^2 \\
&= \frac{p_E (1 + p_E)}{p_N^2} \E[\log^2 E] + 2 \frac{p_E}{p_N} \E[\log E] \E[\log N] + \E[\log^2 N] - (\E[\log L])^2
\end{align}
Z-Score Shortcomings
--------------------
Unfortunately, a small region of deviance may not be noticed in a long
read---bursty errors occur in localized regions. Overall the number
of errors, if they were randomly distributed across the read, might be
within what might be expected normally. The fact that they are all
localized is what makes it abnormal.
An HMM can identify these localized bursts. The Viterbi path assigns
each match/delete state to a position in the read
:math:`(ref_i->read_j \mbox{ with } prob_i)`. Because the HMM is a
regular language, we know if :math:`ref_i` derives the string starting
at :math:`read_j` with :math:`prob_i` and :math:`ref_{i+1}` derives
with :math:`prob_{i+1}` then :math:`ref_i` derives it's portion with
probability (:math:`prob_i / prob_{i+1}`) (or differences in log
probability if using log probability). This is the part of the HMM
that accounts for a single reference base. We can use the same Z-Score
ideas to determine outliers. If the substate HMM derives 4 or less
bases 99.999% of the time, then if in the Viterbi path a derivation of
200 bases is observed, then we can conclude this is an outlier bursty
insert between this and then next reference base. (Similar ideas exist
for forward / backward / posterior.)
# -*- coding: utf-8 -*-
#
# ConsensusCore2: Design and Implementation documentation build configuration file, created by
# sphinx-quickstart on Wed Jul 27 09:09:44 2016.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.todo',
'sphinx.ext.mathjax',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The encoding of source files.
#
# source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'ConsensusCore2: Design and Implementation'
copyright = u'2016, David Alexander, Nigel Delaney, Lance Hepler, Armin Töpfer'
author = u'David Alexander, Nigel Delaney, Lance Hepler, Armin Töpfer'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'0.1'
# The full version, including alpha/beta/rc tags.
release = u'0.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#
# today = ''
#
# Else, today_fmt is used as the format for a strftime call.
#
# today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# The reST default role (used for this markup: `text`) to use for all
# documents.
#
# default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#
# add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#
# add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#
# show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
# keep_warnings = False
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []
# The name for this set of Sphinx documents.
# "<project> v<release> documentation" by default.
#
# html_title = u'ConsensusCore2: Design and Implementation v0.1'
# A shorter title for the navigation bar. Default is the same as html_title.
#
# html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#
# html_logo = None
# The name of an image file (relative to this directory) to use as a favicon of
# the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#
# html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
#
# html_extra_path = []
# If not None, a 'Last updated on:' timestamp is inserted at every page
# bottom, using the given strftime format.
# The empty string is equivalent to '%b %d, %Y'.
#
# html_last_updated_fmt = None
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#
# html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#
# html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#
# html_additional_pages = {}
# If false, no module index is generated.
#
# html_domain_indices = True
# If false, no index is generated.
#
# html_use_index = True
# If true, the index is split into individual pages for each letter.
#
# html_split_index = False
# If true, links to the reST sources are added to the pages.
#
# html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#
# html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#
# html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#
# html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
# Language to be used for generating the HTML full-text search index.
# Sphinx supports the following languages:
# 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
# 'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr', 'zh'
#
# html_search_language = 'en'
# A dictionary with options for the search language support, empty by default.
# 'ja' uses this config value.
# 'zh' user can custom change `jieba` dictionary path.
#
# html_search_options = {'type': 'default'}
# The name of a javascript file (relative to the configuration directory) that
# implements a search results scorer. If empty, the default will be used.
#
# html_search_scorer = 'scorer.js'
# Output file base name for HTML help builder.
htmlhelp_basename = 'ConsensusCore2DesignandImplementationdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'ConsensusCore2DesignandImplementation.tex', u'ConsensusCore2: Design and Implementation Documentation',
u'David Alexander, Nigel Delaney, Lance Hepler, Armin Töpfer', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#
# latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#
# latex_use_parts = False
# If true, show page references after internal links.
#
# latex_show_pagerefs = False
# If true, show URL addresses after external links.
#
# latex_show_urls = False
# Documents to append as an appendix to all manuals.
#
# latex_appendices = []
# If false, no module index is generated.
#
# latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'consensuscore2designandimplementation', u'ConsensusCore2: Design and Implementation Documentation',
[author], 1)
]
# If true, show URL addresses after external links.
#
# man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'ConsensusCore2DesignandImplementation', u'ConsensusCore2: Design and Implementation Documentation',
author, 'ConsensusCore2DesignandImplementation', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#
# texinfo_appendices = []
# If false, no module index is generated.
#
# texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#
# texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
#
# texinfo_no_detailmenu = False