Skip to content
Commits on Source (9)
STAR 2.6.0a 2018/04/23
======================
Major new features:
-------------------
* Merging and mapping of overlapping paired-end reads with new options --peOverlapNbasesMin and --peOverlapMMp. The developmment of this algorithm was supported by Illumina, Inc. Many thanks to June Snedecor, Xiao Chen, and Felix Schlesinger for their extensive help in developing this feature.
* --varVCFfile option to input variant VCF file.
* New SAM attributes in the --outSAMattributes, vG, vA, and vW to report variants overlapping alignments.
* --waspOutputMode option for filtering allele specific alignments. This is re-implementation of the original WASP algorithm by Bryce van de Geijn, Graham McVicker, Yoav Gilad & Jonathan K Pritchard. Please cite the original WASP paper: Nature Methods 12, 1061–1063 (2015), https://www.nature.com/articles/nmeth.3582 . Many thanks to Bryce van de Geijn for fruitful discussions.
* Detection of multimapping chimeras, with new options --chimMultimapNmax, --chimMultimapScoreRange and --chimNonchimScoreDropMin . Many thanks to Brian Haas for testing and feedback.
Minor new features:
-------------------
* --alignInsertionFlush option which defines how to flush ambiguous insertion positions: None: old method, insertions are not flushed; Right: insertions are flushed to the right.
* --outSAMtlen option to select the calculation method for the TLEN field in the SAM/BAM files.
* --outBAMsortingBinsN option to control the number of sorting bins. Increasing this number reduces the amount of RAM required for sorting.
STAR 2.5.4b 2018/02/09
======================
......
STAR 2.5
STAR 2.6
========
Spliced Transcripts Alignment to a Reference
© Alexander Dobin, 2009-2016
© Alexander Dobin, 2009-2018
https://www.ncbi.nlm.nih.gov/pubmed/23104886
AUTHOR/SUPPORT
......@@ -9,11 +9,16 @@ AUTHOR/SUPPORT
Alex Dobin, dobin@cshl.edu
https://groups.google.com/d/forum/rna-star
HARDWARE/SOFTWARE REQUIREMENTS
==============================
* x86-64 compatible processors
* 64 bit Linux or Mac OS X
MANUAL
======
https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
[RELEASEnotes](RELEASEnotes.md) contains detailed information about the latest major release
[RELEASEnotes](https://github.com/alexdobin/STAR/blob/master/RELEASEnotes.md) contains detailed information about the latest major release
DIRECTORY CONTENTS
==================
......@@ -21,44 +26,36 @@ DIRECTORY CONTENTS
* bin: pre-compiled executables for Linux and Mac OS X
* doc: documentation
* extras: miscellaneous files and scripts
* STAR-Fusion: fusion detection developed by Brian Haas, see https://github.com/STAR-Fusion/STAR-Fusion for details.
To populate this submodule, clone STAR with `git clone --recursive https://github.com/alexdobin/STAR`
* STAR-Fusion-x.x.x: latest release of the STAR-Fusion
COMPILING FROM SOURCE
=====================
To compile STAR from source, you must first download the latest [release](release) and uncompress it and then build it.
Linux
-----
Download the latest [release from](https://github.com/alexdobin/STAR/releases) and uncompress it
--------------------------------------------------------
```bash
# Get latest STAR source from releases
wget https://github.com/alexdobin/STAR/archive/2.5.3a.tar.gz
tar -xzf 2.5.3a.tar.gz
cd STAR-2.5.3a
wget https://github.com/alexdobin/STAR/archive/2.6.0a.tar.gz
tar -xzf 2.6.0a.tar.gz
cd STAR-2.6.0a
# Alternatively, get STAR source using git
git clone https://github.com/alexdobin/STAR.git
cd STAR/source
# Build STAR
make STAR
# To include STAR-Fusion
git submodule update --init --recursive
Compile under Linux
-------------------
# If you have a TeX environment, you may like to build the documentation
make manual
```bash
# Compile
cd STAR/source
make STAR
```
Mac OS X
--------
Compile under Mac OS X
----------------------
```bash
# Build STAR
# Compile
cd source
make STARforMacStatic
```
......@@ -84,44 +81,15 @@ make LDFLAGSextra=-flto CXXFLAGSextra="-flto -march=native"
```
Developers
==========
STAR developers with write access to https://github.com/alexdobin/STAR can update the `STAR-Fusion`
submodule to a specific tag by following these steps:
```bash
git clone --recursive https://github.com/alexdobin/STAR.git
cd STAR
# or:
#
# git clone //github.com/alexdobin/STAR.git
# cd STAR
# git git submodule update --init --recursive
# checkout a specific tag for the submodule
cd STAR-Fusion
git checkout v0.3.1
# Commit the change
cd ../
git add STAR-Fusion
git commit -m "Updated STAR-Fusion to v0.3.1"
# Push the change to GitHub
git push
```
HARDWARE/SOFTWARE REQUIREMENTS
==============================
* x86-64 compatible processors
* 64 bit Linux or Mac OS X
* 30GB of RAM for human genome
LIMITATIONS
===========
This release was tested with the default parameters for human and mouse genomes.
Mammal genomes require at least 16GB of RAM, ideally 32GB.
Please contact the author for a list of recommended parameters for much larger or much smaller genomes.
FUNDING
=======
The developmenr of STAR is supported by the National Human Genome Research Institute of
the National Institutes of Health under Award Number R01HG009318.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
STAR 2.6.0a 2018/04/23
======================
Major new features:
-------------------
**1. Merging and mapping of overlapping paired-end reads.**
This feature improves mapping accuracy for paired-end libraries with short insert sizes, where many reads have overlapping mates. Importantly, it allows detection of chimeric junction in the overlap region.
STAR will search for an overlap between mates larger or equal to --peOverlapNbasesMin bases with proportion of mismatches in the overlap area not exceeding --peOverlapMMp .
If the overlap is found, STAR will map merge the mates and attempt to map the resulting (single-end) sequence.
If requested, the chimeric detection will be performed on the merged-mate sequence, thus allowing chimeric detection in the overlap region.
If the score of this alignment higher than the original one, or if a chimeric alignment is found, STAR will report the merged-mate aligment instead of the original one.
In the output, the merged-mate aligment will be converted back to paired-end format.
The developmment of this algorithm was supported by Illumina, Inc.
Many thanks to June Snedecor, Xiao Chen, and Felix Schlesinger for their extensive help in developing this feature.
**2. Detection of personal variants overlapping alignments.**
Option --varVCFfile /path/to/vcf/file is used to input VCF file with personal variants. Only single nucleotide variants (SNVs) are supported at the moment.
Each variant is expected to have a genotype with two alleles.
To output variants that overlap alignments, vG and vA have to be added to --outSAMattributes list.
SAM attribute vG outputs the genomic coordinate of the variant, allowing for identification of the variant.
SAM attribute vA outputs which allele is detected in the read: 1 or 2 match one of the genotype alleles, 3 - no match to genotype.
**3. WASP filtering of allele specific alignments.**
This is re-implementation of the original WASP algorithm by Bryce van de Geijn, Graham McVicker, Yoav Gilad & Jonathan K Pritchard. Please cite the original [WASP paper: Nature Methods 12, 1061–1063 (2015) ](https://www.nature.com/articles/nmeth.3582).
WASP filtering is activated with --waspOutputMode SAMtag, which will add vW tag to the SAM output:
vW:i:1 means alignment passed WASP filtering, while all other values mean it did not pass.
Many thanks to Bryce van de Geijn for fruitful discussions.
**4. Detection of multimapping chimeras.**
Previous STAR chimeric detection algorithm only detected uniquely mapping chimeras, which reduced its sensitivity in some cases.
The new algorithm can detect and output multimapping chimeras. Presently, the only output into Chimeric.out.junction is supported.
This algorithm is activated with >0 value in --chimMultimapNmax, which defines the maximum number of chimeric multi-alignments.
The --chimMultimapScoreRange (=1 by default) parameter defines the score range for multi-mapping chimeras below the best chimeric score, similar to the --outFilterMultimapScoreRange parameter for normal alignments.
The --chimNonchimScoreDropMin (=20 by default) defines the threshold triggering chimeric detection: the drop in the best non-chimeric alignment score with respect to the read length has to be smaller than this value.
Many thanks to Brian Haas for testing and feedback.
Minor new features:
-------------------
* --outSAMtlen 1/2 option to select the calculation method for the TLEN field in the SAM/BAM files:
1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate
2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends
* --alignInsertionFlush option which defines how to flush ambiguous insertion positions: None: old method, insertions are not flushed; Right: insertions are flushed to the right.
* --outBAMsortingBinsN option to control the number of sorting bins. Increasing this number reduces the amount of RAM required for sorting.
STAR 2.5.0a 2015/11/06
======================
......
.TH STAR "1" "January 2018" "STAR 2.5.4a" "User Commands"
.TH STAR "1" "April 2018" "STAR 2.604a" "User Commands"
.SH NAME
STAR \- ultrafast universal RNA-seq aligner
.SH DESCRIPTION
......@@ -22,5 +22,4 @@ of the STAR mapping strategy.
\fB[options]... --genomeDir REFERENCE --readFilesIn R1.fq R2.fq\fR
.SH SEE ALSO
This manpage is only a placeholder. You can get extensive information
by using
see /usr/share/doc/rna-star/STARmanual.pdf
by using /usr/share/doc/rna-star/STARmanual.pdf.
rna-star (2.6.0a+dfsg-1) unstable; urgency=medium
[ Sascha Steinbiss ]
* New upstream release.
* Remove absent files from Files-Excluded.
* Update copyright date.
* Drop patch applied by upstream.
* Adjust VCS entries.
* Use debhelper 11.
* Bump Standards-Version.
* Update version number in man page.
[ Steffen Moeller ]
* Add SciCrunch IDs.
-- Sascha Steinbiss <satta@debian.org> Thu, 26 Apr 2018 10:14:15 +0200
rna-star (2.5.4b+dfsg-1) unstable; urgency=medium
* New upstream release.
......
......@@ -5,13 +5,13 @@ Uploaders: Steffen Moeller <moeller@debian.org>,
Sascha Steinbiss <satta@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 9),
Build-Depends: debhelper (>= 11),
libhts-dev,
vim-common,
zlib1g-dev
Standards-Version: 3.9.8
Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/rna-star.git
Vcs-Git: https://anonscm.debian.org/git/debian-med/rna-star.git
Standards-Version: 4.1.4
Vcs-Browser: https://salsa.debian.org/med-team/rna-star
Vcs-Git: https://salsa.debian.org/med-team/rna-star.git
Homepage: https://github.com/alexdobin/STAR/
Package: rna-star
......
......@@ -4,12 +4,10 @@ Upstream-Contact: Alexander Dobin <dobin@cshl.edu>
Source: https://github.com/alexdobin/STAR/releases
Files-Excluded: bin/*
source/htslib
source/Mac_Include
.gitignore
*/gencode.v19.annotation.gtf.exons.gz
Files: *
Copyright: 2009-2015 Alexander Dobin <dobin@cshl.edu>
Copyright: 2009-2018 Alexander Dobin <dobin@cshl.edu>
License: GPL-3+
Files: source/bam_cat.c
......
......@@ -8,7 +8,7 @@ Description: Use Debian packaged htslib
CXX ?= g++
# pre-defined flags
-LDFLAGS_shared := -pthread -Lhtslib -Bstatic -lhts -Bdynamic -lz -lrt
-LDFLAGS_shared := -pthread -Lhtslib -Bstatic -lhts -Bdynamic -lz
-LDFLAGS_static := -static -static-libgcc -pthread -Lhtslib -lhts -lz
+LDFLAGS_shared := -pthread -Bstatic -lhts -Bdynamic $(LDFLAGS_add)
+LDFLAGS_static := -static -static-libgcc -pthread -lhts -lz
......@@ -28,7 +28,7 @@ Description: Use Debian packaged htslib
CFLAGS := -O3 -pipe -Wall -Wextra $(CFLAGS)
@@ -54,10 +54,10 @@
@@ -60,10 +60,10 @@
%.o : %.cpp
......@@ -41,7 +41,7 @@ Description: Use Debian packaged htslib
all: STAR
@@ -68,12 +68,10 @@
@@ -74,12 +74,10 @@
.PHONY: CLEAN
CLEAN:
rm -f *.o STAR Depend.list
......@@ -54,7 +54,7 @@ Description: Use Debian packaged htslib
.PHONY: install
install:
@@ -84,7 +82,7 @@
@@ -90,7 +88,7 @@
ifneq ($(MAKECMDGOALS),CLEAN)
ifneq ($(MAKECMDGOALS),STARforMac)
ifneq ($(MAKECMDGOALS),STARforMacGDB)
......@@ -63,7 +63,7 @@ Description: Use Debian packaged htslib
echo $(SOURCES)
'rm' -f ./Depend.list
$(CXX) $(CXXFLAGS_common) -MM $^ >> Depend.list
@@ -95,11 +93,6 @@
@@ -101,11 +99,6 @@
endif
endif
......@@ -133,8 +133,8 @@ Description: Use Debian packaged htslib
// kstring_t strK;
--- a/source/STAR.cpp
+++ b/source/STAR.cpp
@@ -27,7 +27,7 @@
#include "sjdbInsertJunctions.h"
@@ -28,7 +28,7 @@
#include "Variation.h"
#include "bam_cat.h"
-#include "htslib/htslib/sam.h"
......
donotuse_own_htslib.patch
mips_shm_noreserve.patch
reproducible.patch
spelling.patch
Description: spelling
Author: Sascha Steinbiss <satta@debian.org>
Forwarded: https://github.com/alexdobin/STAR/pull/369
Last-Update: 2018-01-26
--- a/extras/doc-latex/STARmanual.tex
+++ b/extras/doc-latex/STARmanual.tex
@@ -72,7 +72,7 @@
}
\paragraph{Mac OS X.\newline}
-Current versions of Mac OS X Xcode are shipped with Clang replacing the standard gcc compiler. Presently, standard Clang does not support OpenMP which creates problems for STAR compilation. One option to avoid this problem is to install gcc (preferrably using \code{homebrew} package manager). Another option is to add OpenMP functionality to Clang.
+Current versions of Mac OS X Xcode are shipped with Clang replacing the standard gcc compiler. Presently, standard Clang does not support OpenMP which creates problems for STAR compilation. One option to avoid this problem is to install gcc (preferably using \code{homebrew} package manager). Another option is to add OpenMP functionality to Clang.
\subsection{Basic workflow.}
Basic STAR workflow consists of 2 steps:
--- a/extras/doc-latex/parametersDefault.tex
+++ b/extras/doc-latex/parametersDefault.tex
@@ -8,7 +8,7 @@
\begin{optTable}
\optName{sysShell}
\optValue{-}
- \optLine{string: path to the shell binary, preferrably bash, e.g. /bin/bash.}
+ \optLine{string: path to the shell binary, preferably bash, e.g. /bin/bash.}
\begin{optOptTable}
\optOpt{-} \optOptLine{the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.}
\end{optOptTable}
@@ -344,7 +344,7 @@
\begin{optTable}
\optName{bamRemoveDuplicatesType}
\optValue{-}
- \optLine{string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only}
+ \optLine{string: mark duplicates in the BAM file, for now only works with (i) sorted BAM fed with inputBAMfile, and (ii) for paired-end alignments only}
\begin{optOptTable}
\optOpt{-} \optOptLine{no duplicate removal/marking}
\optOpt{UniqueIdentical} \optOptLine{mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical}
@@ -635,7 +635,7 @@
\optLine{int{\textgreater}=0: minimum total (summed) score of the chimeric segments}
\optName{chimScoreDropMax}
\optValue{20}
- \optLine{int{\textgreater}=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length}
+ \optLine{int{\textgreater}=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length}
\optName{chimScoreSeparation}
\optValue{10}
\optLine{int{\textgreater}=0: minimum difference (separation) between the best chimeric score and the next one}
--- a/source/parametersDefault
+++ b/source/parametersDefault
@@ -10,7 +10,7 @@
### System
sysShell -
- string: path to the shell binary, preferrably bash, e.g. /bin/bash.
+ string: path to the shell binary, preferably bash, e.g. /bin/bash.
- ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.
### Run Parameters
@@ -309,7 +309,7 @@
### BAM processing
bamRemoveDuplicatesType -
- string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only
+ string: mark duplicates in the BAM file, for now only works with (i) sorted BAM fed with inputBAMfile, and (ii) for paired-end alignments only
- ... no duplicate removal/marking
UniqueIdentical ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical
UniqueIdenticalNotMulti ... mark duplicate unique mappers but not multimappers.
@@ -565,7 +565,7 @@
int>=0: minimum total (summed) score of the chimeric segments
chimScoreDropMax 20
- int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length
+ int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length
chimScoreSeparation 10
int>=0: minimum difference (separation) between the best chimeric score and the next one
--- a/source/Genome.cpp
+++ b/source/Genome.cpp
@@ -516,7 +516,7 @@
P->winBinNbits = (uint) floor( log2( max( max(4LLU,P->alignIntronMax), (P->alignMatesGapMax==0 ? 1000LLU : P->alignMatesGapMax) ) /4 ) + 0.5);
P->winBinNbits = max( P->winBinNbits, (uint) floor(log2(P->nGenome/40000+1)+0.5) );
//ISSUE - to be fixed in STAR3: if alignIntronMax>0 but alignMatesGapMax==0, winBinNbits will be defined by alignIntronMax
- P->inOut->logMain << "To accomodate alignIntronMax="<<P->alignIntronMax<<" redefined winBinNbits="<< P->winBinNbits <<endl;
+ P->inOut->logMain << "To accommodate alignIntronMax="<<P->alignIntronMax<<" redefined winBinNbits="<< P->winBinNbits <<endl;
};
if (P->winBinNbits > P->genomeChrBinNbits) {
@@ -531,7 +531,7 @@
//redefine winFlankNbins,winAnchorDistNbins
P->winFlankNbins=max(P->alignIntronMax,P->alignMatesGapMax)/(1LLU<<P->winBinNbits)+1;
P->winAnchorDistNbins=2*P->winFlankNbins;
- P->inOut->logMain << "To accomodate alignIntronMax="<<P->alignIntronMax<<" and alignMatesGapMax="<<P->alignMatesGapMax<<\
+ P->inOut->logMain << "To accommodate alignIntronMax="<<P->alignIntronMax<<" and alignMatesGapMax="<<P->alignMatesGapMax<<\
", redefined winFlankNbins="<<P->winFlankNbins<<" and winAnchorDistNbins="<<P->winAnchorDistNbins<<endl;
};
No preview for this file type
......@@ -34,7 +34,7 @@
\newcommand{\sechyperref}[1]{\hyperref[#1]{Section \ref{#1}. \nameref{#1}}}
\title{STAR manual 2.5.4b}
\title{STAR manual 2.6.0a}
\author{Alexander Dobin\\
dobin@cshl.edu}
\maketitle
......@@ -72,7 +72,7 @@ STAR is compiled with gcc c++ compiler and depends only on standard gcc librarie
}
\paragraph{Mac OS X.\newline}
Current versions of Mac OS X Xcode are shipped with Clang replacing the standard gcc compiler. Presently, standard Clang does not support OpenMP which creates problems for STAR compilation. One option to avoid this problem is to install gcc (preferrably using \code{homebrew} package manager). Another option is to add OpenMP functionality to Clang.
Current versions of Mac OS X Xcode are shipped with Clang replacing the standard gcc compiler. Presently, standard Clang does not support OpenMP which creates problems for STAR compilation. One option to avoid this problem is to install gcc (preferably using \code{homebrew} package manager). Another option is to add OpenMP functionality to Clang.
\subsection{Basic workflow.}
Basic STAR workflow consists of 2 steps:
......@@ -153,7 +153,7 @@ Note, that the \opt{sjdbFileChrStartEnd} file can contain duplicate (identical)
For small genomes, the parameter \opt{genomeSAindexNbases} \textbf{must} to be scaled down, with a typical value of \code{min(14, log2(GenomeLength)/2 - 1)}. For example, for 1~megaBase genome, this is equal to 9, for 100~kiloBase genome, this is equal to 7.
\subsubsection{Genome with a large number of references.}
If you are using a genome with a large (\textgreater 5,000) number of references (chrosomes/scaffolds), you may need to reduce the \opt{genomeChrBinNbits} to reduce RAM consumption. The following scaling is recommended: \opt{genomeChrBinNbits} = \code{min(18, log2(GenomeLength/NumberOfReferences))}. For example, for 3~gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.
If you are using a genome with a large (\textgreater 5,000) number of references (chrosomes/scaffolds), you may need to reduce the \opt{genomeChrBinNbits} to reduce RAM consumption. The following scaling is recommended: \opt{genomeChrBinNbits} = \code{min(18,log2[max(GenomeLength/NumberOfReferences,ReadLength)])}. For example, for 3~gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.
\section{Running mapping jobs.}\label{Running_mapping_jobs}
\subsection{Basic options.}
......@@ -449,6 +449,48 @@ This is the original 2-pass method which involves genome re-generation step in-b
\item Run the 2nd pass mapping for all samples with the new genome index.
\end{enumerate}
\section{Merging and mapping of overlapping paired-end reads.}
This feature improves mapping accuracy for paired-end libraries with short insert sizes, where many reads have overlapping mates. Importantly, it allows detection of chimeric junction in the overlap region.
STAR will search for an overlap between mates larger or equal to \opt{peOverlapNbasesMin} bases with proportion of mismatches in the overlap area not exceeding \opt{peOverlapMMp}.
If the overlap is found, STAR will map merge the mates and attempt to map the resulting (single-end) sequence.
If requested, the chimeric detection will be performed on the merged-mate sequence, thus allowing chimeric detection in the overlap region.
If the score of this alignment higher than the original one, or if a chimeric alignment is found, STAR will report the merged-mate aligment instead of the original one.
In the output, the merged-mate aligment will be converted back to paired-end format.
The developmment of this algorithm was supported by Illumina, Inc.
Many thanks to June Snedecor, Xiao Chen, and Felix Schlesinger for their extensive help in developing this feature.
\section{Detection of personal variants overlapping alignments.}
Option \opt{varVCFfile} \optvr{/path/to/vcf/file} is used to input VCF file with personal variants. Only single nucleotide variants (SNVs) are supported at the moment.
Each variant is expected to have a genotype with two alleles.
To output variants that overlap alignments, vG and vA have to be added to \opt{outSAMattributes} list.
SAM attribute vG outputs the genomic coordinate of the variant, allowing for identification of the variant.
SAM attribute vA outputs which allele is detected in the read: $1$ or $2$ match one of the genotype alleles, $3$ - no match to genotype.
\section{WASP filtering of allele specific alignments.}
This is re-implementation of the original WASP algorithm by Bryce van de Geijn, Graham McVicker, Yoav Gilad and Jonathan K Pritchard. Please cite the original WASP paper: Nature Methods 12, 1061–1063 (2015) \url{https://www.nature.com/articles/nmeth.3582}.
WASP filtering is activated with \opt{waspOutputMode} \optv{SAMtag}, which will add \optv{vW} tag to the SAM output:
\optv{vW:i:1} means alignment passed WASP filtering, and all other values mean it did not pass:
\optv{vW:i:2} - multi-mapping read
\optv{vW:i:3} - variant base in the read is N (non-ACGT)
\optv{vW:i:4} - remapped read did not map
\optv{vW:i:5} - remapped read multi-maps
\optv{vW:i:6} - remapped read maps to a different locus
\optv{vW:i:7} - read overlaps too many variants
\section{Detection of multimapping chimeras.}
Previous STAR chimeric detection algorithm only detected uniquely mapping chimeras, which reduced its sensitivity in some cases.
The new algorithm can detect and output multimapping chimeras. Presently, the only output into Chimeric.out.junction is supported.
This algorithm is activated with $>0$ value in \optv{chimMultimapNmax}, which defines the maximum number of chimeric multi-alignments.
The \optv{chimMultimapScoreRange} ($=1$ by default) parameter defines the score range for multi-mapping chimeras below the best chimeric score, similar to the \optv{outFilterMultimapScoreRange} parameter for normal alignments.
The \optv{chimNonchimScoreDropMin} ($=20$ by default) defines the threshold triggering chimeric detection: the drop in the best non-chimeric alignment score with respect to the read length has to be smaller than this value.
\section{Description of all options.}\label{Description_of_all_options}
For each STAR version, the most up-to-date information about all STAR parameters can be found in the \code{parametersDefault} file in the STAR source directory. The parameters in the \code{parametersDefault}, as well as in the descriptions below, are grouped by function:
......
......@@ -8,7 +8,7 @@
\begin{optTable}
\optName{sysShell}
\optValue{-}
\optLine{string: path to the shell binary, preferrably bash, e.g. /bin/bash.}
\optLine{string: path to the shell binary, preferably bash, e.g. /bin/bash.}
\begin{optOptTable}
\optOpt{-} \optOptLine{the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.}
\end{optOptTable}
......@@ -17,13 +17,7 @@
\begin{optTable}
\optName{runMode}
\optValue{alignReads}
\optLine{string: type of the run:}
\begin{optOptTable}
\optOpt{alignReads} \optOptLine{map reads}
\optOpt{genomeGenerate} \optOptLine{generate genome files}
\optOpt{inputAlignmentsFromBAM} \optOptLine{input alignments from BAM. Presently only works with --outWigType and --bamRemoveDuplicates.}
\optOpt{liftOver} \optOptLine{lift-over of GTF files (--sjdbGTFfile) between genome assemblies using chain file(s) from --genomeChainFiles.}
\end{optOptTable}
\optLine{string: type of the run.}
\optName{runThreadN}
\optValue{1}
\optLine{int: number of threads to run STAR}
......@@ -116,6 +110,12 @@
\optOpt{All} \optOptLine{all files including big Genome, SA and SAindex - this will create a complete genome directory}
\end{optOptTable}
\end{optTable}
\optSection{Variation parameters}\label{Variation_parameters}
\begin{optTable}
\optName{varVCFfile}
\optValue{-}
\optLine{string: path to the VCF file that contains variation data.}
\end{optTable}
\optSection{Input Files}\label{Input_Files}
\begin{optTable}
\optName{inputBAMfile}
......@@ -272,10 +272,18 @@
\optValue{Standard}
\optLine{string: a string of desired SAM attributes, in the order desired for the output SAM}
\begin{optOptTable}
\optOpt{NH HI AS nM NM MD jM jI XS ch} \optOptLine{any combination in any order}
\optOpt{Standard} \optOptLine{NH HI AS nM}
\optOpt{All} \optOptLine{NH HI AS nM NM MD jM jI ch}
\optOpt{NH HI AS nM NM MD jM jI XS MC ch} \optOptLine{any combination in any order}
\optOpt{None} \optOptLine{no attributes}
\optOpt{Standard} \optOptLine{NH HI AS nM}
\optOpt{All} \optOptLine{NH HI AS nM NM MD jM jI MC ch}
\optOpt{vA} \optOptLine{variant allele}
\optOpt{vG} \optOptLine{genomic coordiante of the variant overlapped by the read}
\optOpt{vW} \optOptLine{0/1 - alignment does not pass / passes WASP filtering. Requires --waspOutputMode SAMtag .}
\end{optOptTable}
\optLine{Unsupported/undocumented:}
\begin{optOptTable}
\optOpt{rB} \optOptLine{alignment block read/genomic coordinates}
\optOpt{vR} \optOptLine{read coordinate of the variant}
\end{optOptTable}
\optName{outSAMattrIHstart}
\optValue{1}
......@@ -348,18 +356,28 @@
\begin{optOptTable}
\optOpt{-1} \optOptLine{all alignments (up to --outFilterMultimapNmax) will be output}
\end{optOptTable}
\optName{outSAMtlen}
\optValue{1}
\optLine{int: calculation method for the TLEN field in the SAM/BAM files}
\begin{optOptTable}
\optOpt{1} \optOptLine{leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate}
\optOpt{2} \optOptLine{leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends}
\end{optOptTable}
\optName{outBAMcompression}
\optValue{1}
\optLine{int: -1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression}
\optName{outBAMsortingThreadN}
\optValue{0}
\optLine{int: {\textgreater}=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN).}
\optName{outBAMsortingBinsN}
\optValue{50}
\optLine{int: {\textgreater}0: number of genome bins fo coordinate-sorting}
\end{optTable}
\optSection{BAM processing}\label{BAM_processing}
\begin{optTable}
\optName{bamRemoveDuplicatesType}
\optValue{-}
\optLine{string: mark duplicates in the BAM file, for now only works with (i) sorted BAM feeded with inputBAMfile, and (ii) for paired-end alignments only}
\optLine{string: mark duplicates in the BAM file, for now only works with (i) sorted BAM fed with inputBAMfile, and (ii) for paired-end alignments only}
\begin{optOptTable}
\optOpt{-} \optOptLine{no duplicate removal/marking}
\optOpt{UniqueIdentical} \optOptLine{mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical}
......@@ -424,22 +442,22 @@
\optLine{int: alignment will be output only if it has no more mismatches than this value.}
\optName{outFilterMismatchNoverLmax}
\optValue{0.3}
\optLine{float: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value.}
\optLine{real: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value.}
\optName{outFilterMismatchNoverReadLmax}
\optValue{1.0}
\optLine{float: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value.}
\optLine{real: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value.}
\optName{outFilterScoreMin}
\optValue{0}
\optLine{int: alignment will be output only if its score is higher than or equal to this value.}
\optName{outFilterScoreMinOverLread}
\optValue{0.66}
\optLine{float: same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for paired-end reads)}
\optLine{real: same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for paired-end reads)}
\optName{outFilterMatchNmin}
\optValue{0}
\optLine{int: alignment will be output only if the number of matched bases is higher than or equal to this value.}
\optName{outFilterMatchNminOverLread}
\optValue{0.66}
\optLine{float: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads).}
\optLine{real: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads).}
\optName{outFilterIntronMotifs}
\optValue{None}
\optLine{string: filter alignment using their motifs}
......@@ -527,7 +545,7 @@
\optLine{int{\textgreater}0: defines the search start point through the read - the read is split into pieces no longer than this value}
\optName{seedSearchStartLmaxOverLread}
\optValue{1.0}
\optLine{float: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads)}
\optLine{real: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads)}
\optName{seedSearchLmax}
\optValue{0}
\optLine{int{\textgreater}=0: defines the maximum length of the seeds, if =0 max seed lengthis infinite}
......@@ -570,7 +588,7 @@
\optLine{int{\textgreater}0: minimum mapped length for a read mate that is spliced}
\optName{alignSplicedMateMapLminOverLmate}
\optValue{0.66}
\optLine{float{\textgreater}0: alignSplicedMateMapLmin normalized to mate length}
\optLine{real{\textgreater}0: alignSplicedMateMapLmin normalized to mate length}
\optName{alignWindowsPerReadNmax}
\optValue{10000}
\optLine{int{\textgreater}0: max number of windows per read}
......@@ -605,6 +623,22 @@
\optOpt{Yes} \optOptLine{allow}
\optOpt{No} \optOptLine{prohibit, useful for compatibility with Cufflinks}
\end{optOptTable}
\optName{alignInsertionFlush}
\optValue{None}
\optLine{string: how to flush ambiguous insertion positions}
\begin{optOptTable}
\optOpt{None} \optOptLine{insertions are not flushed}
\optOpt{Right} \optOptLine{insertions are flushed to the right}
\end{optOptTable}
\end{optTable}
\optSection{Paired-End reads: presently unsupported/undocumented}\label{Paired-End_reads:_presently_unsupported/undocumented}
\begin{optTable}
\optName{peOverlapNbasesMin}
\optValue{0}
\optLine{int{\textgreater}=0: minimum number of overlap bases to trigger mates merging and realignment}
\optName{peOverlapMMp}
\optValue{0.1}
\optLine{real, {\textgreater}=0 {\&} {\textless}1: maximum proportion of mismatched bases in the overlap area}
\end{optTable}
\optSection{Windows, Anchors, Binning}\label{Windows,_Anchors,_Binning}
\begin{optTable}
......@@ -622,7 +656,7 @@
\optLine{int{\textgreater}0: log2(winFlank), where win Flank is the size of the left and right flanking regions for each window}
\optName{winReadCoverageRelativeMin}
\optValue{0.5}
\optLine{float{\textgreater}=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only.}
\optLine{real{\textgreater}=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only.}
\optName{winReadCoverageBasesMin}
\optValue{0}
\optLine{int{\textgreater}0: minimum number of bases covered by the seeds in a window , for STARlong algorithm only.}
......@@ -630,16 +664,13 @@
\optSection{Chimeric Alignments}\label{Chimeric_Alignments}
\begin{optTable}
\optName{chimOutType}
\optValue{SeparateSAMold}
\optValue{Junctions}
\optLine{string(s): type of chimeric output}
\optLine{1st word:}
\begin{optOptTable}
\optOpt{Junctions} \optOptLine{Chimeric.out.junction}
\optOpt{SeparateSAMold} \optOptLine{output old SAM into separate Chimeric.out.sam file}
\optOpt{WithinBAM} \optOptLine{output into main aligned BAM files (Aligned.*.bam)}
\end{optOptTable}
\optLine{2nd word:}
\begin{optOptTable}
\optOpt{WithinBAM HardClip} \optOptLine{hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present)}
\optOpt{WithinBAM HardClip} \optOptLine{(default) hard-clipping in the CIGAR for supplemental chimeric alignments (defaultif no 2nd word is present)}
\optOpt{WithinBAM SoftClip} \optOptLine{soft-clipping in the CIGAR for supplemental chimeric alignments}
\end{optOptTable}
\optName{chimSegmentMin}
......@@ -650,7 +681,7 @@
\optLine{int{\textgreater}=0: minimum total (summed) score of the chimeric segments}
\optName{chimScoreDropMax}
\optValue{20}
\optLine{int{\textgreater}=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segements) from the read length}
\optLine{int{\textgreater}=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length}
\optName{chimScoreSeparation}
\optValue{10}
\optLine{int{\textgreater}=0: minimum difference (separation) between the best chimeric score and the next one}
......@@ -673,6 +704,18 @@
\optName{chimMainSegmentMultNmax}
\optValue{10}
\optLine{int{\textgreater}=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.}
\optName{chimMultimapNmax}
\optValue{0}
\optLine{int{\textgreater}=0: maximum number of chimeric multi-alignments}
\begin{optOptTable}
\optOpt{0} \optOptLine{use the old scheme for chimeric detection which only considered unique alignments}
\end{optOptTable}
\optName{chimMultimapScoreRange}
\optValue{1}
\optLine{int{\textgreater}=0: the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax {\textgreater} 1}
\optName{chimNonchimScoreDropMin}
\optValue{20}
\optLine{int{\textgreater}=0: to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read lenght has to be smaller than this value}
\end{optTable}
\optSection{Quantification of Annotations}\label{Quantification_of_Annotations}
\begin{optTable}
......@@ -708,3 +751,12 @@
\optValue{-1}
\optLine{int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step.}
\end{optTable}
\optSection{WASP parameters}\label{WASP_parameters}
\begin{optTable}
\optName{waspOutputMode}
\optValue{None}
\optLine{string: WASP allele-specific output type. This is re-implemenation of the original WASP mappability filtering by Bryce van de Geijn, Graham McVicker, Yoav Gilad {\&} Jonathan K Pritchard. Please cite the original WASP paper: Nature Methods 12, 1061–1063 (2015), https://www.nature.com/articles/nmeth.3582 .}
\begin{optOptTable}
\optOpt{SAMtag} \optOptLine{add WASP tags to the alignments that pass WASP filtering}
\end{optOptTable}
\end{optTable}
......@@ -3,7 +3,7 @@
#include "serviceFuns.cpp"
#include "BAMfunctions.h"
void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, string dirBAMsort, Parameters *P) {
void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen) {
if (binS==0) return; //nothing to do for empty bins
//allocate arrays
......@@ -30,7 +30,7 @@ void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, st
ostringstream errOut;
errOut << "EXITING because of FATAL ERROR: number of bytes expected from the BAM bin does not agree with the actual size on disk: ";
errOut << binS <<" "<< bamInBytes <<" "<< iBin <<"\n";
exitWithError(errOut.str(),std::cerr, P->inOut->logMain, 1, *P);
exitWithError(errOut.str(),std::cerr, P.inOut->logMain, 1, P);
};
//extract coordinates
......@@ -48,8 +48,8 @@ void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, st
qsort((void*) startPos, binN, sizeof(uint)*3, funCompareArrays<uint,3>);
BGZF *bgzfBin;
bgzfBin=bgzf_open((dirBAMsort+"/b"+to_string((uint) iBin)).c_str(),("w"+to_string((long long) P->outBAMcompression)).c_str());
outBAMwriteHeader(bgzfBin,P->samHeaderSortedCoord,P->chrNameAll,P->chrLengthAll);
bgzfBin=bgzf_open((dirBAMsort+"/b"+to_string((uint) iBin)).c_str(),("w"+to_string((long long) P.outBAMcompression)).c_str());
outBAMwriteHeader(bgzfBin,P.samHeaderSortedCoord,mapGen.chrNameAll,mapGen.chrLengthAll);
//send ordered aligns to bgzf one-by-one
for (uint ia=0;ia<binN;ia++) {
char* ib=bamIn+startPos[ia*3+2];
......
......@@ -2,8 +2,10 @@
#define CODE_BAMbinSortByCoordinate
#include "IncludeDefine.h"
#include "Parameters.h"
#include "Genome.h"
#include SAMTOOLS_BGZF_H
void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, string dirBAMsort, Parameters *P);
void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen);
#endif
\ No newline at end of file
......@@ -2,21 +2,21 @@
#include "ErrorWarning.h"
#include "BAMfunctions.h"
void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, BGZF *bgzfBAM, Parameters *P) {
void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen) {
BGZF *bgzfBin;
bgzfBin=bgzf_open((dirBAMsort+"/b"+to_string((uint) iBin)).c_str(),("w"+to_string((long long) P->outBAMcompression)).c_str());
outBAMwriteHeader(bgzfBin,P->samHeaderSortedCoord,P->chrNameAll,P->chrLengthAll);
bgzfBin=bgzf_open((dirBAMsort+"/b"+to_string((uint) iBin)).c_str(),("w"+to_string((long long) P.outBAMcompression)).c_str());
outBAMwriteHeader(bgzfBin,P.samHeaderSortedCoord,mapGen.chrNameAll,mapGen.chrLengthAll);
vector<string> bamInFile;
std::map <uint,uint> startPos;
for (uint it=0; it<nThreads; it++) {//initialize
for (uint it=0; it<nThreads; it++) {//files from all threads, and BySJout
bamInFile.push_back(dirBAMsort+to_string(it)+"/"+to_string((uint) iBin));
bamInFile.push_back(dirBAMsort+to_string(it)+"/"+to_string((uint) iBin)+".BySJout");
};
vector<uint32> bamSize(bamInFile.size(),0);
vector<uint32> bamSize(bamInFile.size(),0);//record sizes
//allocate arrays
char **bamIn=new char* [bamInFile.size()];
......@@ -25,13 +25,13 @@ void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, BGZF *bgz
for (uint it=0; it<bamInFile.size(); it++) {//initialize
bamIn[it] = new char [BAMoutput_oneAlignMaxBytes];
bamInStream[it].open(bamInFile.at(it).c_str());
bamInStream[it].open(bamInFile.at(it).c_str());//opean all files
bamInStream[it].read(bamIn[it],sizeof(int32));//read record size
bamInStream[it].read(bamIn[it],sizeof(int32));//read BAM record size
if (bamInStream[it].good()) {
bamSize[it]=((*(uint32*)bamIn[it])+sizeof(int32));
bamSize[it]=((*(uint32*)bamIn[it])+sizeof(int32));//true record size +=4 (4 bytes for uint-iRead)
bamInStream[it].read(bamIn[it]+sizeof(int32),bamSize.at(it)-sizeof(int32)+sizeof(uint));//read the rest of the record, including last uint = iRead
startPos[*(uint*)(bamIn[it]+bamSize.at(it))]=it;
startPos[*(uint*)(bamIn[it]+bamSize.at(it))]=it;//startPos[iRead]=it : record the order of the files to output
} else {//nothing to do here, file is empty, do not record it
};
};
......
......@@ -2,8 +2,10 @@
#define CODE_BAMbinSortUnmapped
#include "IncludeDefine.h"
#include "Parameters.h"
#include "Genome.h"
#include SAMTOOLS_BGZF_H
void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, BGZF *bgzfBAM, Parameters *P);
void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen);
#endif
......@@ -91,7 +91,7 @@ void outBAMwriteHeader (BGZF* fp, const string &samh, const vector <string> &chr
template <class TintType>
TintType bamAttributeInt(const char *bamAux, const char *attrName) {//not tested!!!
char *attrStart=strstr(bamAux,attrName);
const char *attrStart=strstr(bamAux,attrName);
if (attrStart==NULL) return (TintType) -1;
switch (attrStart[2]) {
case ('c'):
......
......@@ -5,18 +5,16 @@
#include "serviceFuns.cpp"
#include "ThreadControl.h"
BAMoutput::BAMoutput (int iChunk, string tmpDir, Parameters *Pin) {//allocate bam array
BAMoutput::BAMoutput (int iChunk, string tmpDir, Parameters &Pin) : P(Pin){//allocate bam array
P=Pin;
nBins=P->outBAMcoordNbins;
binSize=P->chunkOutBAMsizeBytes/nBins;
nBins=P.outBAMcoordNbins;
binSize=P.chunkOutBAMsizeBytes/nBins;
bamArraySize=binSize*nBins;
bamArray = new char [bamArraySize];
bamDir=tmpDir+to_string((uint) iChunk);//local directory for this thread (iChunk)
mkdir(bamDir.c_str(),P->runDirPerm);
mkdir(bamDir.c_str(),P.runDirPerm);
binStart=new char* [nBins];
binBytes=new uint64 [nBins];
binStream=new ofstream* [nBins];
......@@ -34,11 +32,9 @@ BAMoutput::BAMoutput (int iChunk, string tmpDir, Parameters *Pin) {//allocate ba
nBins=1;//start with one bin to estimate genomic bin sizes
};
BAMoutput::BAMoutput (BGZF *bgzfBAMin, Parameters *Pin) {//allocate BAM array with one bin, streamed directly into bgzf file
P=Pin;
BAMoutput::BAMoutput (BGZF *bgzfBAMin, Parameters &Pin) : P(Pin){//allocate BAM array with one bin, streamed directly into bgzf file
bamArraySize=P->chunkOutBAMsizeBytes;
bamArraySize=P.chunkOutBAMsizeBytes;
bamArray = new char [bamArraySize];
binBytes1=0;
bgzfBAM=bgzfBAMin;
......@@ -90,9 +86,9 @@ void BAMoutput::coordOneAlign (char *bamIn, uint bamSize, uint iRead) {
bamIn32=(uint32*) bamIn;
alignG=( ((uint) bamIn32[1]) << 32 ) | ( (uint)bamIn32[2] );
if (bamIn32[1] == ((uint32) -1) ) {//unmapped
iBin=P->outBAMcoordNbins-1;
iBin=P.outBAMcoordNbins-1;
} else if (nBins>1) {//bin starts have already been determined
iBin=binarySearch1a <uint64> (alignG, P->outBAMsortingBinStart, (int32) (nBins-1));
iBin=binarySearch1a <uint64> (alignG, P.outBAMsortingBinStart, (int32) (nBins-1));
};
};
......@@ -104,7 +100,7 @@ void BAMoutput::coordOneAlign (char *bamIn, uint bamSize, uint iRead) {
//write buffer is filled
if (binBytes[iBin]+bamSize+sizeof(uint) > ( (iBin>0 || nBins>1) ? binSize : binSize1) ) {//write out this buffer
if ( nBins>1 || iBin==(P->outBAMcoordNbins-1) ) {//normal writing, bins have already been determined
if ( nBins>1 || iBin==(P.outBAMcoordNbins-1) ) {//normal writing, bins have already been determined
binStream[iBin]->write(binStart[iBin],binBytes[iBin]);
binBytes[iBin]=0;//rewind the buffer
} else {//the first chunk of reads was written in one bin, need to determine bin sizes, and re-distribute reads into bins
......@@ -125,11 +121,11 @@ void BAMoutput::coordOneAlign (char *bamIn, uint bamSize, uint iRead) {
};
void BAMoutput::coordBins() {//define genomic starts for bins
nBins=P->outBAMcoordNbins;//this is the true number of bins
nBins=P.outBAMcoordNbins;//this is the true number of bins
//mutex here
if (P->runThreadN>1) pthread_mutex_lock(&g_threadChunks.mutexBAMsortBins);
if (P->outBAMsortingBinStart[0]!=0) {//it's set to 0 only after the bin sizes are determined
if (P.runThreadN>1) pthread_mutex_lock(&g_threadChunks.mutexBAMsortBins);
if (P.outBAMsortingBinStart[0]!=0) {//it's set to 0 only after the bin sizes are determined
//extract coordinates and sort
uint *startPos = new uint [binTotalN[0]];//array of aligns start positions
for (uint ib=0,ia=0;ia<binTotalN[0];ia++) {
......@@ -140,19 +136,19 @@ void BAMoutput::coordBins() {//define genomic starts for bins
qsort((void*) startPos, binTotalN[0], sizeof(uint), funCompareUint1);
//determine genomic starts of the bins
P->inOut->logMain << "BAM sorting: "<<binTotalN[0]<< " mapped reads\n";
P->inOut->logMain << "BAM sorting bins genomic start loci:\n";
P.inOut->logMain << "BAM sorting: "<<binTotalN[0]<< " mapped reads\n";
P.inOut->logMain << "BAM sorting bins genomic start loci:\n";
P->outBAMsortingBinStart[0]=0;
P.outBAMsortingBinStart[0]=0;
for (uint32 ib=1; ib<(nBins-1); ib++) {
P->outBAMsortingBinStart[ib]=startPos[binTotalN[0]/(nBins-1)*ib];
P->inOut->logMain << ib <<"\t"<< (P->outBAMsortingBinStart[ib]>>32) << "\t" << ((P->outBAMsortingBinStart[ib]<<32)>>32) <<endl;
P.outBAMsortingBinStart[ib]=startPos[binTotalN[0]/(nBins-1)*ib];
P.inOut->logMain << ib <<"\t"<< (P.outBAMsortingBinStart[ib]>>32) << "\t" << ((P.outBAMsortingBinStart[ib]<<32)>>32) <<endl;
//how to deal with equal boundaries???
};
delete [] startPos;
};
//mutex here
if (P->runThreadN>1) pthread_mutex_unlock(&g_threadChunks.mutexBAMsortBins);
if (P.runThreadN>1) pthread_mutex_unlock(&g_threadChunks.mutexBAMsortBins);
//re-allocate binStart
uint binTotalNold=binTotalN[0];
......@@ -186,7 +182,7 @@ void BAMoutput::coordFlush () {//flush all alignments
};
void BAMoutput::coordUnmappedPrepareBySJout () {//flush all alignments
uint iBin=P->outBAMcoordNbins-1;
uint iBin=P.outBAMcoordNbins-1;
binStream[iBin]->write(binStart[iBin],binBytes[iBin]);
binStream[iBin]->flush();
binBytes[iBin]=0;//rewind the buffer
......