Skip to content
Commits on Source (7)
## v1.5.0 - 2018-11-12
### Added
- Add a configurable option to allow overlapping pairs to be used as evidence (MANTA-1398)
- The option is available in the configure file configureManta.py.ini
### Changed
- Change SV candidate contig aligners to improve precision (MANTA-1396)
- Change contig aligners such that variant occurrences are more heavily penalized.
- Fix multi-junction nomination (MANTA-1430)
- Complex events with more than two junctions are no longer nominated as a group
- Fix the problem of duplicate detection of the same SV candidate
- Add index to ensure uniqueness of evidence bam filenames (MANTA-1431)
- It solves the potential problem of name conflicts for evidence bams if the input bam files have the same name while located in different directories.
- Change filters for easy interpretation of multi-sample germline variant vcf (MANTA-1343)
- Add record-level filter 'SampleFT' when no sample passes all sample level filters
- Add sample-level filter 'HomRef' for homogyzous reference calls
- No more sample-level filter will be applied at the record level even if it applies to all samples
- Change representation of inversions in the VCF output (MANTA-1385)
- Intrachromosomal translocations with inverted breakpoints are now reported as two breakend (BND) records.
- Previously they were reported in the VCF using the inversion (INV) allele type.
### Fixed
- Fix the bug of stats generation with short reference sequences (MANTA-1459/[#143])
- Fix the evidence significance test in the multi-sample calling mode (MANTA-1294)
- This issue previously caused spurious false negatives during the multi-sample calling mode. The incidence rate of the problem tended to increase with sample count.
## v1.4.0 - 2018-04-25
This is a major bugfix update from v1.3.2, featuring improved precision and vcf representation, in addition to minor user friendly improvements.
......
......@@ -31,8 +31,8 @@ indels for germline and cancer sequencing applications. *Bioinformatics*,
...and the corresponding [open-access pre-print][preprint].
[bpaper]:https://dx.doi.org/10.1093/bioinformatics/btv710
[preprint]:http://dx.doi.org/10.1101/024232
[bpaper]:https://doi.org/10.1093/bioinformatics/btv710
[preprint]:https://doi.org/10.1101/024232
License
......
manta (1.4.0+dfsg-1) UNRELEASED; urgency=medium
manta (1.5.0+dfsg-1) UNRELEASED; urgency=medium
* Initial release (Closes: #861664)
TODO: Check why runMantaWorkflowDemo is failing
--> https://github.com/Illumina/manta/issues/77
* debian/upstream/metadata: added ref to OMICtools
* New upstream version
* debhelper 11
* Point Vcs fields to salsa.debian.org
* Standards-Version: 4.1.4
-- Andreas Tille <tille@debian.org> Thu, 03 May 2018 15:27:38 +0200
-- Andreas Tille <tille@debian.org> Wed, 24 Apr 2019 16:59:01 +0200
......@@ -3,7 +3,7 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.
Uploaders: Andreas Tille <tille@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 11~),
Build-Depends: debhelper (>= 12~),
cmake,
dh-python,
libboost-date-time-dev,
......@@ -17,10 +17,10 @@ Build-Depends: debhelper (>= 11~),
libboost-test-dev,
zlib1g-dev,
python-all-dev,
python-pyflow (>= 1.1.20),
python-pyflow,
libhts-dev (>= 1.7),
samtools
Standards-Version: 4.1.4
Standards-Version: 4.3.0
Vcs-Browser: https://salsa.debian.org/med-team/manta
Vcs-Git: https://salsa.debian.org/med-team/manta.git
Homepage: https://github.com/Illumina/manta
......
......@@ -6,13 +6,13 @@ Files-Excluded: redist/*.bz2
Files: *
Copyright: 2013-2016 Illumina, Inc.
License: GPL-v3+
License: GPL-3+
Files: debian/*
Copyright: 2016 Andreas Tille <tille@debian.org>
License: GPL-v3+
Copyright: 2016-2019 Andreas Tille <tille@debian.org>
License: GPL-3+
License: GPL-v3+
License: GPL-3+
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
......
......@@ -3,8 +3,8 @@
# DH_VERBOSE := 1
export LC_ALL=C.UTF-8
DEBPKGNAME := $(shell dpkg-parsechangelog | awk '/^Source:/ {print $$2}')
demopkg := $(DEBPKGNAME)-demo
include /usr/share/dpkg/default.mk
demopkg := $(DEB_SOURCE)-demo
export DEB_BUILD_MAINT_OPTIONS=hardening=+all
......@@ -14,7 +14,7 @@ CMAKE_EXTRA_FLAGS += -DCMAKE_BUILD_TYPE=Release
dh $@ --buildsystem=cmake --with python2
override_dh_auto_configure:
DESTDIR=$(CURDIR)/debian/$(DEBPKGNAME) dh_auto_configure -- $(CMAKE_EXTRA_FLAGS)
DESTDIR=$(CURDIR)/debian/$(DEB_SOURCE) dh_auto_configure -- $(CMAKE_EXTRA_FLAGS)
override_dh_auto_install:
dh_auto_install
......@@ -24,12 +24,12 @@ override_dh_auto_install:
override_dh_install:
dh_install
for py in $(CURDIR)/debian/$(DEBPKGNAME)/usr/lib/$(DEBPKGNAME)/*.py ; do \
mv $$py $(CURDIR)/debian/$(DEBPKGNAME)/usr/share/$(DEBPKGNAME)/ ; \
ln -s ../../share/$(DEBPKGNAME)/`basename $$py` $(CURDIR)/debian/$(DEBPKGNAME)/usr/lib/$(DEBPKGNAME) ; \
for py in $(CURDIR)/debian/$(DEB_SOURCE)/usr/lib/$(DEB_SOURCE)/*.py ; do \
mv $$py $(CURDIR)/debian/$(DEB_SOURCE)/usr/share/$(DEB_SOURCE)/ ; \
ln -s ../../share/$(DEB_SOURCE)/`basename $$py` $(CURDIR)/debian/$(DEB_SOURCE)/usr/lib/$(DEB_SOURCE) ; \
done
mkdir -p $(CURDIR)/debian/$(DEBPKGNAME)/etc/$(DEBPKGNAME)
mv $(CURDIR)/debian/$(DEBPKGNAME)/usr/bin/configManta.py.ini $(CURDIR)/debian/$(DEBPKGNAME)/etc/$(DEBPKGNAME)/configManta.ini
for py in $(CURDIR)/debian/$(DEBPKGNAME)/usr/bin/*.py ; do \
mv $$py $(CURDIR)/debian/$(DEBPKGNAME)/usr/share/manta/ ; \
mkdir -p $(CURDIR)/debian/$(DEB_SOURCE)/etc/$(DEB_SOURCE)
mv $(CURDIR)/debian/$(DEB_SOURCE)/usr/bin/configManta.py.ini $(CURDIR)/debian/$(DEB_SOURCE)/etc/$(DEB_SOURCE)/configManta.ini
for py in $(CURDIR)/debian/$(DEB_SOURCE)/usr/bin/*.py ; do \
mv $$py $(CURDIR)/debian/$(DEB_SOURCE)/usr/share/manta/ ; \
done
......@@ -240,7 +240,7 @@ prior to merging the branch.
longer, for instance by starting all major bullet points with an imperitive verb.
## Branching and release tagging guidelines
### Branching and release tagging guidelines
All features and bugfixes are developed on separate branches. Branch names should contain the corresponding JIRA ticket
id or contain the key "github${issueNumber}' to refer to the corresponding issue on github.com. After code
......
......@@ -350,7 +350,7 @@ For a given word size $k$, a word list is made of all $k$-mers present in the in
Finally, a greedy procedure is applied to select the constructed contigs in the order of the number of effective supporting reads and contig length. An effective supporting read cannot be a psuedo read, nor support any contigs that have been selected previously. The selection process is repeated until there is no more contig available with the minimum number of effective supporting reads (defaults to 2), or the maximum number of assembled contigs (defaults to 10) is met.
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}.
\begin{figure}[!tpb]
\centerline{
......@@ -362,15 +362,15 @@ Finally, a greedy procedure is applied to select the constructed contigs in the
\label{fig:jumpstate}
\end{figure}
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -24 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -100 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria, the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
\subsubsection{Contig alignment for complex region candidates}
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the alignment procedure for complex region contigs, which are checked against two aligners optimized for large and small indels respectively.
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the indel aligner that handles both small and large indels.
A contig is first aligned with the large indel aligner and only checked for small indels if no large indels are found. The structure of the large indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -18, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -24 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states. Variants are only reported from the large indel aligner if an insertion of at least 80 bases or a deletion of at least 200 bases is found. The flanking alignment quality criteria described above for large SVs is also applied to filter out noisy alignments. To reduce false positive calls in repetitive regions an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region.
The indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -24, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -100 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states.
If the large indel aligner fails to identify a candidate meeting the size and quality criteria above, the contig is used to search for smaller indels, this time using a conventional affine gap aligner with parameters: (2,-8,-12,0) for match, mismatch, gap open, gap extend. All indels larger than the minimum indel size are identified. For each indel, the flanking contig alignment quality and uniqueness checks described above are applied to filter likely false positives, and any remaining cases become small indel candidates.
All indels larger than the minimum indel size are identified by the indel aligner. For each indel, the flanking alignment quality criteria described above for large SVs is also applied to filter out noise alignments. To further reduce false positive calls in repetitive regions, an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region. Among contigs meeting all quality criteria, the ones with 'large' gaps are prioritized during contig selection. If there are more than one contig with 'large' gaps, or if all contigs have no 'large' gap, the contig with the highest alignment score is selected.
\subsubsection{Large Insertions}
......
......@@ -14,7 +14,7 @@ Manta User Guide
* [Input requirements](#input-requirements)
* [Outputs](#outputs)
* [Structural Variant predictions](#structural-variant-predictions)
* [Manta VCF reporting format](#manta-vcf-reporting-format)
* [Manta VCF interpretation](#manta-vcf-interpretation)
* [VCF Sample Names](#vcf-sample-names)
* [Small indels](#small-indels)
* [Insertions with incomplete insert sequence assembly](#insertions-with-incomplete-insert-sequence-assembly)
......@@ -22,8 +22,9 @@ Manta User Guide
* [VCF INFO Fields](#vcf-info-fields)
* [VCF FORMAT Fields](#vcf-format-fields)
* [VCF FILTER Fields](#vcf-filter-fields)
* [How to interpret VCF filters?](#how-to-interpret-vcf-filters)
* [What do the values in Manta's VCF ID field mean?](#what-do-the-values-in-mantas-vcf-id-field-mean)
* [Interpretation of VCF filters](#interpretation-of-vcf-filters)
* [Interpretation of Manta's INFO/EVENT field](#interpretation-of-mantas-infoevent-field)
* [Details of Manta's VCF ID field](#details-of-mantas-vcf-id-field)
* [Converting Manta VCF to BEDPE format](#converting-manta-vcf-to-bedpe-format)
* [Statistics](#statistics)
* [Runtime hardware requirements](#runtime-hardware-requirements)
......@@ -289,7 +290,7 @@ For tumor-only analysis, Manta will produce an additional VCF:
counts for each allele (2) a subset of the filters from the scored tumor-normal model
are applied to the single tumor case to improve precision.
### Manta VCF reporting format
### Manta VCF interpretation
Manta VCF output follows the VCF 4.1 spec for describing structural
variants. It uses standard field names wherever possible. All custom
......@@ -336,13 +337,17 @@ chr1 11830208 MantaINS:1577:0:0:0:3:0 T <INS> 999 PASS
#### Inversions
Inversions are reported as a single inverted sequence junction. As described in the [VCF INFO Fields](#vcf-info-fields) below, the INV3 tag indicates inversion breakends open at the 3' of reported location, whereas the INV5 tag indicates inversion breakends open at the 5' of reported location. More specifically, in the inversion exmaples illustrated at https://software.broadinstitute.org/software/igv/interpreting_pair_orientations, the INV5 tag corresponds to the IGV "RR"/dark blue reads, and the INV3 tag corresponds to the IGV "LL"/ light blue reads.
This format is used because single inverted junctions are often identified as part of a complex SV in real data, whereas simple reciprocal inversions are uncommon outside of simulated data. For a simple reciprocal inversion, both INV3 and INV5 junctions are expected to be reported, and they shall share the same `EVENT` INFO tag. The following is an example of a simple reciptocal inversion:
Inversions are reported as breakends by default. For a simple reciprocal inversion, four breakends will be reported, and they shall share the same `EVENT` INFO tag. The following is an example of a simple reciptocal inversion:
```
chr1 17124941 MantaBND:1445:0:1:1:3:0:0 T [chr1:234919886[T 999 PASS SVTYPE=BND;MATEID=MantaBND:1445:0:1:1:3:0:1;CIPOS=0,1;HOMLEN=1;HOMSEQ=T;INV5;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=254;BND_DEPTH=107;MATE_BND_DEPTH=100 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:65,8:15,51
chr1 17124948 MantaBND:1445:0:1:0:0:0:0 T T]chr1:234919824] 999 PASS SVTYPE=BND;MATEID=MantaBND:1445:0:1:0:0:0:1;INV3;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=109;MATE_BND_DEPTH=83 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:60,2:0,46
chr1 234919824 MantaBND:1445:0:1:0:0:0:1 G G]chr1:17124948] 999 PASS SVTYPE=BND;MATEID=MantaBND:1445:0:1:0:0:0:0;INV3;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=83;MATE_BND_DEPTH=109 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:60,2:0,46
chr1 234919885 MantaBND:1445:0:1:1:3:0:1 A [chr1:17124942[A 999 PASS SVTYPE=BND;MATEID=MantaBND:1445:0:1:1:3:0:0;CIPOS=0,1;HOMLEN=1;HOMSEQ=A;INV5;EVENT=MantaBND:1445:0:1:0:0:0:0;JUNCTION_QUAL=254;BND_DEPTH=100;MATE_BND_DEPTH=107 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:65,8:15,51
```
A supplementary script, provided as `$MANTA_INSTALL_FOLDER/libexec/convertInversion.py`, can be applied to Manta's output vcf files to reformat inversions into single inverted sequence junctions, which was the format used in Manta versions <= 1.4.0. Two INFO tags are introduced for such format: the INV3 tag indicates inversion breakends open at the 3' of reported location, whereas the INV5 tag indicates inversion breakends open at the 5' of reported location. More specifically, in the inversion exmaples illustrated at https://software.broadinstitute.org/software/igv/interpreting_pair_orientations, the INV5 tag corresponds to the IGV "RR"/dark blue reads, and the INV3 tag corresponds to the IGV "LL"/ light blue reads. This format was informative because single inverted junctions are often identified as part of a complex SV in real data, whereas simple reciprocal inversions are uncommon outside of simulated data. For a simple reciprocal inversion, both INV3 and INV5 junctions are expected to be reported, and they shall share the same `EVENT` INFO tag. The following is the converted formant of the above example of a simple reciptocal inversion:
```
chr1 17124940 MantaINV:3630:0:1:1:4:0 C <INV> 999 PASS END=234919885;SVTYPE=INV;SVLEN=217794945;INV5;EVENT=MantaINV:3630:0:1:0:0:0;JUNCTION_QUAL=999; GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:61,4:24,43
chr1 17124943 MantaINV:3630:0:1:0:0:0 T <INV> 999 PASS END=234919824;SVTYPE=INV;SVLEN=217794881;INV3;EVENT=MantaINV:3630:0:1:0:0:0;JUNCTION_QUAL=999; GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:52,3:8,29
chr1 17124940 MantaINV:1445:0:1:1:3:0 C <INV> 999 PASS END=234919885;SVTYPE=INV;SVLEN=217794945;CIPOS=0,1;CIEND=-1,0;HOMLEN=1;HOMSEQ=T;EVENT=MantaINV:1445:0:1:0:0:0;JUNCTION_QUAL=254;INV5 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:65,8:15,51
chr1 17124948 MantaINV:1445:0:1:0:0:0 T <INV> 999 PASS END=234919824;SVTYPE=INV;SVLEN=217794876;EVENT=MantaINV:1445:0:1:0:0:0;JUNCTION_QUAL=999;INV3 GT:FT:GQ:PL:PR:SR 0/1:PASS:999:999,0,999:60,2:0,46
```
#### VCF INFO Fields
......@@ -364,8 +369,6 @@ SVINSLEN | Length of insertion
SVINSSEQ | Sequence of insertion
LEFT_SVINSSEQ | Known left side of insertion for an insertion of unknown length
RIGHT_SVINSSEQ | Known right side of insertion for an insertion of unknown length
INV3 | Flag indicating that inversion breakends open 3' of reported location
INV5 | Flag indicating that inversion breakends open 5' of reported location
BND_DEPTH | Read depth at local translocation breakend
MATE_BND_DEPTH | Read depth at remote translocation mate breakend
JUNCTION_QUAL | If the SV junction is part of an EVENT (ie. a multi-adjacency variant), this field provides the QUAL value for the adjacency in question only
......@@ -387,25 +390,41 @@ SR | Number of split-reads which strongly (Q30) support the REF or ALT alleles
#### VCF FILTER Fields
ID | Description
--- | ---
MinQUAL | QUAL score is less than 20
MinGQ | GQ score is less than 15 (filter applied at sample level and record level if all samples are filtered)
MinSomaticScore | SOMATICSCORE is less than 30
Ploidy | For DEL & DUP variants, the genotypes of overlapping variants (with similar size) are inconsistent with diploid expectation
MaxDepth | Depth is greater than 3x the median chromosome depth near one or both variant breakends
MaxMQ0Frac | For a small variant (<1000 bases), the fraction of reads in all samples with MAPQ0 around either breakend exceeds 0.4
NoPairSupport | For variants significantly larger than the paired read fragment size, no paired reads support the alternate allele in any sample
ID | Level | Description
--- | --- | ---
MinQUAL | Record | QUAL score is less than 20
MinGQ | Sample | GQ score is less than 15
MinSomaticScore | Record | SOMATICSCORE is less than 30
Ploidy | Record | For DEL & DUP variants, the genotypes of overlapping variants (with similar size) are inconsistent with diploid expectation
MaxDepth | Record | Depth is greater than 3x the median chromosome depth near one or both variant breakends
MaxMQ0Frac | Record | For a small variant (<1000 bases), the fraction of reads in all samples with MAPQ0 around either breakend exceeds 0.4
NoPairSupport | Record | For variants significantly larger than the paired read fragment size, no paired reads support the alternate allele in any sample
SampleFT | Record | No sample passes all the sample-level filters
HomRef | Sample | Homozygous reference call
#### Interpretation of VCF filters
As described above, there are two levels of filters: record level (FILTER) and sample level (FORMAT/FT). Record-level filters are generally independant to sample-level filters. However, if none of the samples passes all sample-level filters, the 'SampleFT' filter will be applied at the record level.
#### How to interpret VCF filters?
#### Interpretation of Manta's INFO/EVENT field
As described above, there are two levels of filters: record level (FILTER) and sample level (FORMAT/FT). Record-level filters are generally independant to sample-level filters. However, if none of the samples passes one record-level filter, that filter will be copied to the record level (e.g. MinGQ).
Some structural variants reported in the VCF, such as translocations, represent a single novel sequence junction in the
sample. Manta uses the `INFO/EVENT` field to indicate that two or more such junctions are hypothesized to occur
together as part of a single variant event. All individual variant records belonging to the same event will share
the same `INFO/EVENT` string. Note that although such an inference could be applied after SV calling by analyzing
the relative distance and orientation of the called variant breakpoints,
Manta incorporates this event mechanism into the calling process to increase sensitivity towards such larger-scale
events. Given that at least one junction in the event has already passed standard variant candidacy thresholds,
sensitivity is improved by lowering the evidence thresholds for additional junctions which occur in a pattern
consistent with a multi-junction event (such as a reciprocal translocation pair).
A sample-specific passing variant needs to have the record level FILTER passed, the sample level FORMAT/FT passed, and the sample level FORMAT/GT is not "0/0"(hom-reference).
Note that although this mechanism could generalize to events including an arbitrary number of junctions,
it is currently limited to 2. Thus, at present it is most useful for identifying and improving sensitivity
towards reciprocal translocation pairs.
#### What do the values in Manta's VCF ID field mean?
#### Details of Manta's VCF ID field
The VCF ID or 'identifer' field can be used for annotation, or in the case of BND ('breakend') records for translocations, the ID value is used to link breakend mates or partners.
The VCF ID or 'identifier' field can be used for annotation, or in the case of BND ('breakend') records for translocations, the ID value is used to link breakend mates or partners.
An example Manta VCF ID is "MantaINS:1577:0:0:0:3:0". The value provided in this field reflects the SV association graph edge(s) from which the SV or indel was discovered. The ID value provided by Manta is primarily intended for internal use by manta developers. The value is guaranteed to be unique within any VCF file produced by Manta, and these ID values are used to link associated breakend records using the standard VCF `MATEID` key. The structure of this ID may change in the future, it is safe to use the entire value as a unique key, but parsing this value may lead to incompatibilities with future updates.
......@@ -588,7 +607,7 @@ Using the `--generateEvidenceBam` option, Manta can be configured to generate ba
It is recommended to use this option together with the `--region` option, so that the analysis is limited to relatively small genomic regions for debugging purposes.
The evidence bam files are provided in `${MANTA_ANALYSIS_PATH}/results/evidence`, with a naming format `evidence.*.bam`.
The evidence bam files are provided in `${MANTA_ANALYSIS_PATH}/results/evidence`, with a naming format `evidence_*.*.bam`.
There is one such file for each input bam of the analysis, containing evidence reads of the candidate SVs identified from that input bam.
Each read in an evidence bam keeps all information from the original bam, and it contains also a customized tag in the format: `ZM:Z:${MANTA_SV_ID_1}|${EVIDENCE_TYPE},${MANTA_SV_ID_2}|${EVIDENCE_TYPE}`. For example, ZM:Z:MantaINV:5:0:1:0:0:0|PR|SRM,MantaDEL:5:1:2:0:0:0|SR
* One read can have more than one of the three evidence types: PR for paired reads, SR for split reads, and SRM for split read mates.
......@@ -719,13 +738,13 @@ together if a more accurate filter is required. The status of a call's `IMPRECIS
of its reliability.
For example, in the unpaired tumor analysis output below, the records could be filtered to only include those with
`SAMPLE/PR[1] >= 15 || SAMPLE/SR[1] >= 15`. This would remove the inversion record, because the paired-read count
for the inversion allele is 13 and the split-read count is not known. The two translocation breakends would not be
`SAMPLE/PR[1] >= 15 || SAMPLE/SR[1] >= 15`. This would remove the deletion record, because the paired-read count
for the deletion allele is 13 and the split-read count is not known. The two translocation breakends would not be
filtered because they have 15 and 19 split-read counts, respectively, supporting the breakend allele:
```
11 94975747 MantaBND:0:2:3:0:0:0:1 G G]8:107653520] . PASS SVTYPE=BND;MATEID=MantaBND:0:2:3:0:0:0:0;CIPOS=0,2;HOMLEN=2;HOMSEQ=TT;BND_DEPTH=216;MATE_BND_DEPTH=735 PR:SR 722,9:463,15
11 94975753 MantaINV:0:1:2:0:0:0 T <INV> . PASS END=94987865;SVTYPE=INV;SVLEN=12112;IMPRECISE;CIPOS=-156,156;CIEND=-150,150;INV3 PR 161,13
11 94975753 MantaDEL:0:1:2:0:0:0 T <DEL> . PASS END=94987865;SVTYPE=DEL;SVLEN=12112;IMPRECISE;CIPOS=-156,156;CIEND=-150,150 PR 161,13
11 94987872 MantaBND:0:0:1:0:0:0:0 T T[8:107653411[ . PASS SVTYPE=BND;MATEID=MantaBND:0:0:1:0:0:0:1;BND_DEPTH=171;MATE_BND_DEPTH=830 PR:SR 489,4:520,19
```
......
......@@ -23,6 +23,8 @@
#include <cassert>
//#define DEBUG_ALN
#ifdef DEBUG_ALN
#include "blt_util/log.hh"
#include <iostream>
......
......@@ -43,10 +43,12 @@ struct AlignmentResult
clear()
{
score = 0;
isJumped = false;
align.clear();
}
ScoreType score;
bool isJumped; ///< whether alignment path includes jump state(s) while backtracking
Alignment align;
};
......
......@@ -21,6 +21,8 @@
#include <cassert>
//#define DEBUG_ALN
#if defined(DEBUG_ALN) || defined(DEBUG_ALN_MATRIX)
#include "blt_util/log.hh"
#endif
......@@ -145,6 +147,16 @@ backTraceAlignment(
{
assert(false && "Unknown align state");
}
// check if the alignment path includes JUMP or JUMPINS states
if ((btrace.state==AlignState::JUMP) || (btrace.state==AlignState::JUMPINS))
{
result.isJumped = true;
#ifdef DEBUG_ALN
log_os << "isJumped is set true" << "\n";
#endif
}
btrace.state=nextState;
ps.length++;
}
......
......@@ -61,7 +61,7 @@ checkStandardizeUsageFile(
const char* fileLabel)
{
std::string errorMsg;
if ( checkStandardizeInputFile(filename, fileLabel, errorMsg))
if (checkAndStandardizeRequiredInputFilePath(filename, fileLabel, errorMsg))
{
usage(os,prog,visible,errorMsg.c_str());
}
......
......@@ -50,7 +50,7 @@ struct EdgeRetrieverJumpBin : public EdgeRetriever
const unsigned binIndex);
bool
next();
next() override;
private:
void
......
......@@ -19,6 +19,7 @@
/// \file
/// \author Chris Saunders
/// \author Naoki Nariai
///
#include "FatSVCandidate.hh"
......@@ -35,10 +36,14 @@ operator<<(
os << static_cast<SVCandidate>(svc);
for (unsigned eIndex(0); eIndex<SVEvidenceType::SIZE; ++eIndex)
{
os << "Index count for Etype: " << SVEvidenceType::label(eIndex)
<< " bp1: " << svc.bp1EvidenceIndex[eIndex].size()
<< " bp2: " << svc.bp2EvidenceIndex[eIndex].size()
os << "Index count for Etype: " << SVEvidenceType::label(eIndex);
for (unsigned bamIndex(0); bamIndex<svc.bp1EvidenceIndex[eIndex].size(); ++bamIndex)
{
os << "Bam index: " << bamIndex
<< " bp1: " << svc.bp1EvidenceIndex[eIndex][bamIndex].size()
<< " bp2: " << svc.bp2EvidenceIndex[eIndex][bamIndex].size()
<< "\n";
}
}
return os;
}
......@@ -19,6 +19,7 @@
/// \file
/// \author Chris Saunders
/// \author Naoki Nariai
///
#pragma once
......@@ -47,7 +48,9 @@ appendVec(
/// an SV candidate with additional details pertaining to input read evidence which is useful for filtration
/// \brief An SV candidate with additional details pertaining to input read evidence
///
/// The extra read evidence provided in this version of SV candidate is useful for filtration
///
struct FatSVCandidate : public SVCandidate
{
......@@ -58,9 +61,15 @@ struct FatSVCandidate : public SVCandidate
{}
explicit
FatSVCandidate(const SVCandidate& copy)
FatSVCandidate(const SVCandidate& copy, const unsigned bamCount)
: base_t(copy)
{}
{
for (unsigned evidenceTypeIndex(0); evidenceTypeIndex<SVEvidenceType::SIZE; ++evidenceTypeIndex)
{
bp1EvidenceIndex[evidenceTypeIndex].resize(bamCount);
bp2EvidenceIndex[evidenceTypeIndex].resize(bamCount);
}
}
FatSVCandidate(const FatSVCandidate&) = default;
FatSVCandidate& operator=(const FatSVCandidate&) = default;
......@@ -74,37 +83,24 @@ struct FatSVCandidate : public SVCandidate
if (! base_t::merge(rhs, isExpandRegion)) return false;
for (unsigned evidenceTypeIndex(0); evidenceTypeIndex<SVEvidenceType::SIZE; ++evidenceTypeIndex)
{
appendVec(bp1EvidenceIndex[evidenceTypeIndex],rhs.bp1EvidenceIndex[evidenceTypeIndex]);
appendVec(bp2EvidenceIndex[evidenceTypeIndex],rhs.bp2EvidenceIndex[evidenceTypeIndex]);
for (unsigned bamIndex(0); bamIndex<bp1EvidenceIndex[evidenceTypeIndex].size(); ++bamIndex)
{
appendVec(bp1EvidenceIndex[evidenceTypeIndex][bamIndex],
rhs.bp1EvidenceIndex[evidenceTypeIndex][bamIndex]);
appendVec(bp2EvidenceIndex[evidenceTypeIndex][bamIndex],
rhs.bp2EvidenceIndex[evidenceTypeIndex][bamIndex]);
}
return true;
}
#if 0
bool
merge(const SVCandidate& rhs)
{
if (! base_t::merge(rhs)) return false;
return true;
}
#endif
#if 0
void
clear()
{
base_t::clear();
for (auto& evi : bp1EvidenceIndex) evi.clear();
for (auto& evi : bp2EvidenceIndex) evi.clear();
}
#endif
/// a 2d array type to track breakpoint evidence, the first dimension is evidence type
/// and the inner dimension is a vector with size equal to the number of (confident-mapping) observations.
/// For each observation the inner-diminsion value provides the index of the read used as an observation, which
/// can be used to estimate signal density vs. all reads.
typedef std::array<std::vector<double>,SVEvidenceType::SIZE> evidenceIndex_t;
/// a 3d array type to track breakpoint evidence.
/// The first dimension is evidence type,
/// the second dimension is bam index with size equal to the number of input bams, and
/// the third dimension is evidence read with size equal to the number of confident-mapping observations.
/// For each observation the value provides the index of the read used as an observation,
/// which can be used to estimate signal density vs. all reads.
typedef std::array<std::vector< std::vector<double> >,SVEvidenceType::SIZE> evidenceIndex_t;
evidenceIndex_t bp1EvidenceIndex;
evidenceIndex_t bp2EvidenceIndex;
......
......@@ -56,7 +56,7 @@ checkStandardizeUsageFile(
const char* fileLabel)
{
std::string errorMsg;
if ( checkStandardizeInputFile(filename, fileLabel, errorMsg))
if (checkAndStandardizeRequiredInputFilePath(filename, fileLabel, errorMsg))
{
usage(os,prog,visible,errorMsg.c_str());
}
......
......@@ -76,7 +76,7 @@ struct GSCOptions
unsigned minCandidateSpanningCount = 3; ///< how many spanning evidence observations are required to become a candidate?
unsigned minScoredVariantSize = 51; ///< min size for scoring and scored output following candidate generation
unsigned minScoredVariantSize = 50; ///< min size for scoring and scored output following candidate generation
bool isOutputContig = false; ///< if true, an assembled contig is written in VCF
};
......