Skip to content
Commits on Source (3)
STAR 2.7.2b 2019/08/29
======================
Bug fixes in chimeric detection, contributed by Meng Xiao He (@mengxiao)
* Fix memory leak in handling chimeric multimappers: #721
* Ensure chimeric alignment score requirements are consistently checked: #722,#723.
STAR 2.7.2a 2019/08/13
======================
* Chimeric read reporting now requires that the chimeric read alignment score higher than the alternative non-chimeric alignment to the reference genome. The Chimeric.out.junction file now includes the scores of the chimeric alignments and non-chimeric alternative alignments, in addition to the PEmerged bool attribute. (bhaas, Aug 2019)
* Fixed the problem with ALT=* in STAR-WASP.
* Implemented extras/scripts/soloBasicCellFilter.awk script to perform basic filtering of the STARsolo count matrices.
STAR 2.7.1a 2019/05/15
======================
**This version requires re-generation of the genome indexes**
......
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at dobin@cshl.edu. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
# Contributing to STAR
The following is a set of guidelines for contributing to STAR hosted in the GitHub: https://github.com/alexdobin/STAR/.
These are mostly guidelines, not rules.
Use your best judgment, and feel free to propose changes to this document in a pull request.
#### Table Of Contents
[Code of Conduct](#code-of-conduct)
[How Can I Contribute?](#how-can-i-contribute)
* [Ask a question](#ask-a-question)
* [Reporting Bugs](#reporting-bugs)
* [Suggesting Enhancements](#suggesting-enhancements)
* [Pull Requests](#pull-requests)
## Code of Conduct
This project and everyone participating in it is governed by the [Code of Conduct](CODE_OF_CONDUCT.md).
By participating, you are expected to uphold this code.
## How Can I Contribute?
### Ask a question
Please do not file an issue to ask a question.
The GitHub issue tracker is intended for bug reports and feature requests.
We have an official discussion forum where the community chimes in with helpful advice if you have questions.
* [STAR discussion forum](https://groups.google.com/forum/#!forum/rna-star)
### Reporting Bugs
This section guides you through submitting a bug report for STAR.
Following these guidelines helps maintainers and the community understand your report,
reproduce the behavior, and find related reports.
If you find a **Closed** issue that seems like it is the same thing that you're experiencing,
open a new issue and include a link to the original issue in the body of your new one.
#### Before Submitting A Bug Report
* You might be able to find the cause of the problem and fix things yourself.
Most importantly, check if you can reproduce the problem in the latest version of STAR
and if the problem happens when you run with mostly default parameters.
* Check the Log.out file for ERROR/WARNING/SOLUTION messages.
* Perform a through STAR GitHub issues to see if the problem has already been reported.
If it has **and the issue is still open**, add a comment to the existing issue instead of opening a new one.
#### How Do I Submit A (Good) Bug Report?
Bugs are tracked as [GitHub issues](https://guides.github.com/features/issues/) on https://github.com/alexdobin/STAR/issues.
Explain the problem and include additional details to help maintainers reproduce the problem:
* **Use a clear and descriptive title** for the issue to identify the problem.
* **Describe the exact steps which reproduce the problem** in as many details as possible. For example, start by explaining how you run STAR, e.g. which command exactly you used in the terminal. When listing steps, **don't just say what you did, but explain how you did it.
* **Log.out**. Attach the Log.out file generated in the failed run. This file contains a lot of useful debugging information and is a starting point
* **Provide specific examples to demonstrate the steps**. Include links to files or GitHub projects, or copy/pasteable snippets, which you use in those examples. If you're providing snippets in the issue, use [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
* **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior.
* **Explain which behavior you expected to see instead and why.**
* **System information**. In many cases, the problems are associated with the hardware configurations. Provide a brief description of the CPU(s), RAM, storage.
* **Can you reliably reproduce the issue?** If not, provide details about how often the problem happens and under which conditions it normally happens.
Include details about your configuration and environment:
### Suggesting Enhancements
This section guides you through submitting an enhancement suggestion for Atom, including completely new features and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion and find related suggestions.
#### Before Submitting An Enhancement Suggestion
* **Check the STAR manual and Release Notes** — you might discover that the enhancement is already available. Most importantly, check if you're using the latest version of STAR.
* **Perform a cursory search** to see if the enhancement has already been suggested. If it has, add a comment to the existing issue instead of opening a new one.
#### How Do I Submit A (Good) Enhancement Suggestion?
Feature requests and enhancement suggestions are tracked as [GitHub issues](https://guides.github.com/features/issues/) on https://github.com/alexdobin/STAR/issues.
* **Use a clear and descriptive title** for the issue to identify the suggestion.
* **Provide a step-by-step description of the suggested enhancement** in as many details as possible.
* **Provide specific examples to demonstrate the steps**. Include copy/pasteable snippets which you use in those examples, as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
* **Describe the current behavior** and **explain which behavior you expected to see instead** and why.
* **Explain why this enhancement would be useful** to many STAR users.
* **List some other tools where this enhancement exists.**
### Pull Requests
Please read the guides on creating good pull requests (PR) and follow these guidelines:
* **Use a clear and descriptive title** for the pull request.
* **State the purpose of the PR**: is it a bug-fix, new feature implementation, documentation improvement, or cosmetic change? Why is it important?
* **Explain the expected changes in STAR behavior**. Make sure that the default STAR behavior does not change.
* **Provide detailed code documentation and commit messages**.
Adopted from https://github.com/atom/atom/blob/master/CONTRIBUTING.md
STAR
Spliced Transcripts Alignment to a Reference
© Alexander Dobin, 2009-2015
MIT License
STAR is a free open source software distributed under GPLv3
Copyright (c) 2019 Alexander Dobin
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
For details, see <http://www.gnu.org/licenses/>.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
......@@ -35,9 +35,9 @@ Download the latest [release from](https://github.com/alexdobin/STAR/releases) a
```bash
# Get latest STAR source from releases
wget https://github.com/alexdobin/STAR/archive/2.7.1a.tar.gz
tar -xzf 2.7.1a.tar.gz
cd STAR-2.7.1a
wget https://github.com/alexdobin/STAR/archive/2.7.2b.tar.gz
tar -xzf 2.7.2b.tar.gz
cd STAR-2.7.2b
# Alternatively, get STAR source using git
git clone https://github.com/alexdobin/STAR.git
......
STAR 2.7.2a 2019/08/13
======================
Chimeric read alignment reporting updates
-----------------------------------------
The chimeric read alignment reporting has been changed to improve on the specificity of chimeric read detection.
Only those chimeric read aligments with alignment scores that exceed the corresponding score of the non-chimeric
alignment to the reference genome are now reported.
The Chimeric.junction.out file formatting has been updated to include column headers and the alignment scores for
both chimeric alignments and the non-chimeric alternative. See the latest STAR manual for full details.
STAR 2.7.0c 2019/02/05
======================
......
theme: jekyll-theme-slate
\ No newline at end of file
version=3
# Versions ending with a letter are patches, not early releases
opts="repacksuffix=+dfsg,dversionmangle=s/\+dfsg//g,repack,compression=xz" \
https://github.com/alexdobin/STAR/releases .*/archive/(?:STAR_?)?(\d[\d.a-z]+)\.(?:tar(?:\.gz|\.bz2)?|tgz)
No preview for this file type
STARsolo: mapping, demultiplexing and gene quantification for single cell RNA-seq
---------------------------------------------------------------------------------
First released in STAR 2.7.0a (Jan 23 2019)
STARsolo is a turnkey solution for analyzing droplet single cell RNA sequencing data (e.g. 10X Genomics Chromium System) built directly into STAR code.
STARsolo inputs the raw FASTQ reads files, and performs the following operations
* error correction and demultiplexing of cell barcodes using user-input whitelist
* mapping the reads to the reference genome using the standard STAR spliced read alignment algorithm
* error correction and collapsing (deduplication) of Unique Molecular Identifiers (UMIa)
* quantification of per-cell gene expression by counting the number of reads per gene
STARsolo output is designed to be a drop-in replacement for 10X CellRanger gene quantification output.
It follows CellRanger logic for cell barcode whitelisting and UMI deduplication, and produces nearly identical gene counts in the same format.
At the same time STARsolo is ~10 times faster than the CellRanger.
The STAR solo algorithm is turned on with:
```
--soloType Droplet
```
Presently, the cell barcode whitelist has to be provided with:
```
--soloCBwhitelist /path/to/cell/barcode/whitelist
```
The 10X Chromium whitelist file can be found inside the CellRanger distribution,
e.g. [10X-whitelist](https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist-).
Please make sure that the whitelist is compatible with the specific version of the 10X chemistry (V1,V2,V3 etc).
Importantly, in the --readFilesIn option, the 1st file has to be cDNA read, and the 2nd file has to be the barcode (cell+UMI) read, i.e.
```
--readFilesIn cDNAfragmentSequence.fastq.gz CellBarcodeUMIsequence.fastq.gz
```
Important: the genome index has to be re-generated with the latest 2.7.0x release.
Other parameters that control STARsolo output are listed below. Note that default parameters are compatible with 10X Chromium V2 protocol.
```
soloCBstart 1
int>0: cell barcode start base
soloCBlen 16
int>0: cell barcode length
soloUMIstart 17
int>0: UMI start base
soloUMIlen 10
int>0: UMI length
soloStrand Forward
string: strandedness of the solo libraries:
Unstranded ... no strand information
Forward ... read strand same as the original RNA molecule
Reverse ... read strand opposite to the original RNA molecule
soloFeatures Gene
string(s) genomic features for which the UMI counts per Cell Barcode are collected
Gene ... genes: reads match the gene transcript
SJ ... splice junctions: reported in SJ.out.tab
soloUMIdedup 1MM_All
string(s) type of UMI deduplication (collapsing) algorithm
1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once)
1MM_Directional ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017).
1MM_NotCollapsed ... UMIs with 1 mismatch distance to others are not collapsed (i.e. all counted)
soloOutFileNames Solo.out/ genes.tsv barcodes.tsv matrix.mtx matrixSJ.mtx
string(s) file names for STARsolo output
1st word ... file name prefix
2nd word ... barcode sequences
3rd word ... gene IDs and names
4th word ... cell/gene counts matrix
5th word ... cell/splice junction counts matrix
```
......@@ -34,7 +34,7 @@
\newcommand{\sechyperref}[1]{\hyperref[#1]{Section \ref{#1}. \nameref{#1}}}
\title{STAR manual 2.7.1a}
\title{STAR manual 2.7.2b}
\author{Alexander Dobin\\
dobin@cshl.edu}
\maketitle
......@@ -364,24 +364,35 @@ SINATRA-0006:3:3:6387:5665#0 23632554 47M29S 133729451 47S29M40p76M
The first 9 columns give information about the chimeric junction:
\begin{itemize}[leftmargin=1in]
\item[column 1:] chromosome of the donor
\item[column 2:] first base of the intron of the donor (1-based)
\item[column 3:] strand of the donor
\item[column 4:] chromosome of the acceptor
\item[column 5:] first base of the intron of the acceptor (1-based)
\item[column 6:] strand of the acceptor
\item[column 7:] junction type: -1=encompassing junction (between the mates), 1=GT/AG, 2=CT/AC
\item[column 8:] repeat length to the left of the junction
\item[column 9:] repeat length to the right of the junction
\item[column 1:] \textbf{chr\_donorA} : chromosome of the donor
\item[column 2:] \textbf{brkpt\_donorA} : first base of the intron of the donor (1-based)
\item[column 3:] \textbf{strand\_donorA} : strand of the donor
\item[column 4:] \textbf{chr\_acceptorB} : chromosome of the acceptor
\item[column 5:] \textbf{brkpt\_acceptorB} : first base of the intron of the acceptor (1-based)
\item[column 6:] \textbf{strand\_acceptorB} : strand of the acceptor
\item[column 7:] \textbf{junction\_type} : -1=encompassing junction (between the mates), 1=GT/AG, 2=CT/AC
\item[column 8:] \textbf{repeat\_left\_lenA} : repeat length to the left of the junction
\item[column 9:] \textbf{repeat\_right\_lenB} : repeat length to the right of the junction
\end{itemize}
Columns 10-14 describe the alignments of the two chimeric segments, it is SAM like. Alignments are given with respect to the (+) strand
\begin{itemize}[leftmargin=1in]
\item[column 10:] read name
\item[column 11:] first base of the first segment (on the + strand)
\item[column 12:] CIGAR of the first segment
\item[column 13:] first base of the second segment
\item[column 14:] CIGAR of the second segment
\item[column 10:] \textbf{read\_name} : name of the RNA-seq fragment
\item[column 11:] \textbf{start\_alnA} : first base of the first segment (on the + strand)
\item[column 12:] \textbf{cigar\_alnA} : CIGAR of the first segment
\item[column 13:] \textbf{start\_alnB} : first base of the second segment
\item[column 14:] \textbf{cigar\_alnB} : CIGAR of the second segment
\end{itemize}
Columns 15-20 provide alignment score information and relevant metadata. These columns are only output for multimapping chimeriuc algorithm \opt{chimMultimapNmax} \optv{>0}.
\begin{itemize}[leftmargin=1in]
\item[column 15:] \textbf{num\_chim\_aln} : number of sufficiently scoring chimeric alignments reported for this RNA-seq fragment.
\item[column 16:] \textbf{max\_poss\_aln\_score} : maximum possible alignment score for this fragment's read(s).
\item[column 17:] \textbf{non\_chim\_aln\_score} : best non-chimeric alignment score
\item[column 18:] \textbf{this\_chim\_aln\_score} : score for this individual chimeric alignment
\item[column 19:] \textbf{bestall\_chim\_aln\_score} : the highest chimeric alignment score encountered for this RNA-seq fragment among the \textbf{num\_chim\_aln} reported chimeric alignments.
\item[column 20:] \textbf{PEmerged\_bool} : boolean indicating that overlapping PE reads were first merged into a single contiguous sequence before alignment.
\item[column 21:] \textbf{readgrp} : read group assignment for the read as indicated in the BAM file
\end{itemize}
Unlike standard SAM, both mates are recorded in one line here. The gap of length \code{L} between the mates is marked by the \code{p} in the CIGAR string.
......@@ -416,8 +427,12 @@ Note that line 4 and 6 here are BCR/ABL fusions. You would need to filter these
\section{Output in transcript coordinates.}
With \opt{quantMode} \optv{TranscriptomeSAM} option STAR will output alignments translated into transcript coordinates in the \ofilen{Aligned.toTranscriptome.out.bam} file (in addition to alignments in genomic coordinates in \ofilen{Aligned.*.sam/bam} files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress. For example, RSEM command line would look as follows: \codelines{rsem-calculate-expression ... --bam Aligned.toTranscriptome.out.bam /path/to/RSEM/reference RSEM}.
Note, that STAR first aligns reads to entire genome, and only then searches for concordance between alignments and transcripts. I believe this approach might offer certain advantages compared to the alignment to transcriptome only, by not forcing the alignments to annotated transcripts.
With \opt{quantMode} \optv{TranscriptomeSAM} option STAR will output alignments translated into transcript coordinates in the \ofilen{Aligned.toTranscriptome.out.bam} file (in addition to alignments in genomic coordinates in \ofilen{Aligned.*.sam/bam} files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress. For example, RSEM command line would look as follows: \codelines{rsem-calculate-expression ... --bam Aligned.toTranscriptome.out.bam /path/to/RSEM/reference RSEM}
Note, that STAR first aligns reads to entire genome, and only then searches for concordance between alignments and transcripts.This approach offers certain advantages compared to the alignment to transcriptome only, by not forcing the alignments to annotated transcripts. Note that \opt{outFilterMultimapNmax} filter only applies to genomic alignments. If an alignment passes this filter, it is converted to all possible transcriptomic alignments and all of them are output.
By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use \opt{quantTranscriptomeBan} \optv{Singleend} to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
......@@ -506,7 +521,7 @@ Previous STAR chimeric detection algorithm only detected uniquely mapping chimer
The new algorithm can detect and output multimapping chimeras. Presently, the only output into Chimeric.out.junction is supported.
This algorithm is activated with $>0$ value in \optv{chimMultimapNmax}, which defines the maximum number of chimeric multi-alignments.
The \optv{chimMultimapScoreRange} ($=1$ by default) parameter defines the score range for multi-mapping chimeras below the best chimeric score, similar to the \optv{outFilterMultimapScoreRange} parameter for normal alignments.
The \optv{chimNonchimScoreDropMin} ($=20$ by default) defines the threshold triggering chimeric detection: the drop in the best non-chimeric alignment score with respect to the read length has to be smaller than this value.
The \optv{chimNonchimScoreDropMin} ($=20$ by default) defines the threshold triggering chimeric detection: the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value.
\section{STARsolo: mapping, demultiplexing and gene quantification for single cell RNA-seq}
......
......@@ -725,7 +725,7 @@
\optLine{int{\textgreater}=0: the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax {\textgreater} 1}
\optName{chimNonchimScoreDropMin}
\optValue{20}
\optLine{int{\textgreater}=0: to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be smaller than this value}
\optLine{int{\textgreater}=0: to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value}
\optName{chimOutJunctionFormat}
\optValue{0}
\optLine{int: formatting type for the Chimeric.out.junction file}
......
......@@ -2,7 +2,7 @@ FROM debian:stretch-slim
MAINTAINER dobin@cshl.edu
ARG STAR_VERSION=2.7.1a
ARG STAR_VERSION=2.7.2b
ENV PACKAGES gcc g++ make wget zlib1g-dev unzip
......
......@@ -12,6 +12,7 @@ function cigarGenomicDist(cig)
};
BEGIN {
endTol=5;
OFS="\t";
};
{
if ( $7>=0 && $1==$4 && $3==$6 && (($3=="-" && $5>$2 && $5-$2<1000000) || ($3=="+" && $2>$5 && $2-$5<1000000)) )
......@@ -20,7 +21,7 @@ if ( $7>=0 && $1==$4 && $3==$6 && (($3=="-" && $5>$2 && $5-$2<1000000) || ($3=="
#print $11,$11+cigarGenomicDist($12),$13,$13+cigarGenomicDist($14);
if ( ($3=="+" && $11+endTol>$5 && $13+cigarGenomicDist($14)-endTol<=$2) \
|| ($3=="-" && $13+endTol>$2 && $11+cigarGenomicDist($12)-endTol<=$5) ) {
print $1,($3=="+"?$5:$2),($3=="+"?$2:$5),($3=="+"?"-":"+"),($7==0?0:3-$7),$8,$9,(NF>=15 ? $NF:1);
print $1,($3=="+"?$5:$2),($3=="+"?$2:$5),($3=="+"?"-":"+"),($7==0?0:3-$7),$8,$9,(NF>=15 ? $15:1);
};
};
};
# usage awk -v exactCells=... -v maxCells=... -v maxPercentile=... -v maxMinRatio=... -f soloBasicCellFilter.awk
BEGIN {
# default values - if variables were not defined by the user
if (exactCells==0)
exactCells=0;
if (maxCells==0)
maxCells=3000;
if (maxPercentile==0)
maxPercentile=0.99;
if (maxMinRatio==0)
maxMinRatio=10;
print "Parameters: " "exactCells=" exactCells, "maxCells=" maxCells, "maxPercentile=", maxPercentile, "maxMinRatio=", maxMinRatio;
nHeaderLines=3;
fOutCB=ARGV[1] ".filtered";
fOutMat=ARGV[2] ".filtered";
}
{
if (ARGIND==1) {# read barcodes
CB[NR]=$1;
} else if (FNR<=nHeaderLines) {
A[FNR]=$0;
if (FNR==nHeaderLines) {
nGenes=$1;
};
} else {
cellG[FNR]=$1;
cellI[FNR]=$2;
cellN[FNR]=$3;
cellTot[$2]+=$3;
nLines=FNR;
};
}
END {
asort(cellTot,cellTotSort);
nMax=cellTotSort[-int((1-maxPercentile)*maxCells)+length(cellTot)];
nMin=nMax/maxMinRatio;
if (exactCells>0)
nMin=length(cellTot)<exactCells ? cellTotSort[1] : cellTotSort[length(cellTot)-exactCells];
nCell=0;
for (ii=1; ii<=length(CB); ii++) {
if (cellTot[ii]>=nMin) {
print CB[ii] > fOutCB;
nCell++;
cellInew[ii]=nCell;
print nCell,cellTot[ii] > fOutCB ".counts";
};
};
if (exactCells==0) {
print "maxUMIperCell=" cellTotSort[length(cellTotSort)]+0,"Robust maxUMIperCel=" nMax,"minUMIperCell=" nMin, "Filtered N cells=", nCell;
} else {
print "total N cells=" length(cellTot), "exactCells=" exactCells, "minUMIperCell=" nMin;
};
nMat=0;
for (ii=nHeaderLines+1; ii<=nLines; ii++) {
if (cellTot[cellI[ii]]>=nMin) {
nMat++;
};
};
for (ii=1;ii<nHeaderLines;ii++) {
print A[ii] > fOutMat;
};
print nGenes,nCell+0,nMat+0 > fOutMat;
for (ii=nHeaderLines+1; ii<=nLines; ii++) {
if (cellTot[cellI[ii]]>=nMin) {
print cellG[ii],cellInew[cellI[ii]],cellN[ii] > fOutMat;
};
};
};
......@@ -23,8 +23,8 @@ class ChimericAlign
int chimMotif, chimStr, chimScore;
ChimericAlign(ChimericSegment &seg1in, ChimericSegment &seg2in, int chimScoreIn, Genome &genomeIn, ReadAlign *RAin); //allocate
void chimericJunctionOutput(fstream &outStream, uint chimN);
void chimericStitching(char *genSeq, char *readSeq);
void chimericJunctionOutput(fstream &outStream, uint chimN, int maxNonChimAlignScore, bool PEmerged_flag, int chimScoreBest, int maxPossibleAlignScore);
void chimericStitching(char *genSeq, char **Read1);
bool chimericCheck();
bool stitchingDone;
......
#include "ChimericAlign.h"
#include "ReadAlign.h"
void ChimericAlign::chimericJunctionOutput(fstream &outStream, uint chimN)
void ChimericAlign::chimericJunctionOutput(fstream &outStream, uint chimN, int maxNonChimAlignScore, bool PEmerged_flag, int chimScoreBest, int maxPossibleAlignScore)
{
outStream << mapGen.chrName[al1->Chr] <<"\t"<< chimJ1 - mapGen.chrStart[al1->Chr]+1 <<"\t"<< (al1->Str==0 ? "+":"-") \
<<"\t"<< mapGen.chrName[al2->Chr] <<"\t"<< chimJ2 - mapGen.chrStart[al2->Chr]+1 <<"\t"<< (al2->Str==0 ? "+":"-") \
<<"\t"<< chimMotif <<"\t"<< chimRepeat1 <<"\t"<< chimRepeat2 <<"\t"<< al1->readName+1 \
<<"\t"<< al1->exons[0][EX_G] - mapGen.chrStart[al1->Chr]+1 <<"\t"<< al1->generateCigarP() \
<<"\t"<< al2->exons[0][EX_G] - mapGen.chrStart[al2->Chr]+1 <<"\t"<< al2->generateCigarP() <<"\t"<< chimN;
<<"\t"<< al2->exons[0][EX_G] - mapGen.chrStart[al2->Chr]+1 <<"\t"<< al2->generateCigarP()
<<"\t"<< chimN // number of multimapping chimeric alignments for this read.
<< "\t" << maxPossibleAlignScore // the maximum possible alignment score (currently the sum of the (paired) read lengths)
<< "\t" << maxNonChimAlignScore // trBest - the best alignment score from a non-chimeric alignment of this read to the ref genome.
<< "\t" << chimScore // current chimeric alignment score
<< "\t" << chimScoreBest // best chimeric score among multimapping chimeric alignments.
<< "\t" << PEmerged_flag; // boolean indicating paired reads were merged into a single read before alignment & chimer detection.
if (P.outSAMattrPresent.RG)
outStream <<"\t"<< P.outSAMattrRG.at(RA->readFilesIndex);
if (P.pSolo.type>0)
......
#include "ChimericAlign.h"
void ChimericAlign::chimericStitching(char *genSeq, char *readSeq) {
void ChimericAlign::chimericStitching(char *genSeq, char **Read1) {
if (stitchingDone)
return;
stitchingDone=true;
char *readSeq=Read1[0]; //only direct read sequence is used - reverse complemented if necessary in the algorithm
al1=new Transcript(*al1);
al2=new Transcript(*al2);
......@@ -168,9 +170,12 @@ void ChimericAlign::chimericStitching(char *genSeq, char *readSeq) {
chimRepeat1=jR;
};//chimeric junction is within a mate
if (chimMotif>=0 && (a1.exons[ex1][EX_L]<P.pCh.junctionOverhangMin+chimRepeat1 || a2.exons[ex2][EX_L]<P.pCh.junctionOverhangMin+chimRepeat2) ) {
if (chimMotif>=0 && (a1.exons[ex1][EX_L]<P.pCh.junctionOverhangMin || a2.exons[ex2][EX_L]<P.pCh.junctionOverhangMin) ) {
//filter out cases where linear junctions that are very close to chimeric junction
chimScore=0;
return;
};
//re-calculate chimScore for adjusted transcripts
chimScore=a1.alignScore(Read1,genSeq,P) + a2.alignScore(Read1,genSeq,P) + (chimMotif==0 ? P.pCh.scoreJunctionNonGTAG : 0);
};
......@@ -26,7 +26,7 @@ class ChimericDetection {
int chimScoreBest;
ChimericDetection(Parameters &Pin, Transcript ***trAll, uint *nWinTr, char** Read1in, Genome &genomeIn, fstream *ostreamChimJunctionIn, ReadAlign *RA);
bool chimericDetectionMult(uint nWin, uint *readLengthIn);
bool chimericDetectionMult(uint nWin, uint *readLengthIn, int maxNonChimAlignScore, bool PEmerged_flag);
fstream *ostreamChimJunction;
};
......
......@@ -19,9 +19,7 @@ int chimericAlignScore (ChimericSegment & seg1, ChimericSegment & seg2)
};
/////////////////////////////////////////////////////////////
bool ChimericDetection::chimericDetectionMult(uint nW, uint *readLength) {
chimRecord=false;
bool ChimericDetection::chimericDetectionMult(uint nW, uint *readLength, int maxNonChimAlignScore, bool PEmerged_flag) {
// for (uint ii=0;ii<chimAligns.size();ii++) {//deallocate aligns
// if (chimAligns.at(ii).stitchingDone) {//al1,al2 were allocated
......@@ -40,6 +38,13 @@ bool ChimericDetection::chimericDetectionMult(uint nW, uint *readLength) {
chimAligns.clear();
chimScoreBest=0;
int maxPossibleAlignScore = (int)(readLength[0]+readLength[1]);
int minScoreToConsider = P.pCh.scoreMin;
if (maxNonChimAlignScore >= minScoreToConsider)
minScoreToConsider = maxNonChimAlignScore + 1;
if ((maxPossibleAlignScore - P.pCh.scoreDropMax) > minScoreToConsider)
minScoreToConsider = maxPossibleAlignScore - P.pCh.scoreDropMax;
for (uint iW1=0; iW1<nW; iW1++) {//cycle windows
for (uint iA1=0; iA1<nWinTr[iW1]; iA1++) {//cycle aligns in the window
......@@ -62,57 +67,54 @@ bool ChimericDetection::chimericDetectionMult(uint nW, uint *readLength) {
int chimScore=chimericAlignScore(seg1,seg2);
if (chimScore>0)
{//candidate chimera
if (chimScore >= minScoreToConsider) {//candidate chimera
ChimericAlign chAl(seg1, seg2, chimScore, outGen, RA);
if (!chAl.chimericCheck())
continue; //check chimeric alignment
if (chimScore>=chimScoreBest-(int)P.pCh.multimapScoreRange)
//re-calculated chimScoreBest includes non-canonical penalty, so the re-calculated score is lower, in some cases it goes to 0 if some checks are not passed
chAl.chimericStitching(outGen.G, Read1);
// rescore after stitching.
if (chAl.chimScore >= minScoreToConsider) { // survived stitching.
chimAligns.push_back(chAl);//add this chimeric alignment
if ( chimScore > chimScoreBest && chimScore >= P.pCh.scoreMin && chimScore >= (int)(readLength[0]+readLength[1]) - P.pCh.scoreDropMax ) {
chimAligns.back().chimericStitching(outGen.G, Read1[0]);
if (chimAligns.back().chimScore > chimScoreBest)
if (chimAligns.back().chimScore > chimScoreBest) {
chimScoreBest=chimAligns.back().chimScore;
if ((chimScoreBest - (int)P.pCh.multimapScoreRange) > minScoreToConsider)
// best score increased, so subsequent alignment candidates must score higher
minScoreToConsider = chimScoreBest - (int)P.pCh.multimapScoreRange;
}
} // endif stitched chimera survived.
else {
// al1, al2 allocated during stitching
delete chAl.al1;
delete chAl.al2;
};
};
}; // endif meets chim score criteria
};//cycle over window2 aligns
};//cycle over window2
};//cycle over window1 aligns
};//cycle over window1
if (chimScoreBest==0)
return chimRecord;
return false;
chimN=0;
for (auto cAit=chimAligns.begin(); cAit<chimAligns.end(); cAit++) {//scan all chimeras, find the number within score range
if (cAit->chimScore >= chimScoreBest - (int)P.pCh.multimapScoreRange)
for (auto cAit=chimAligns.begin(); cAit<chimAligns.end(); cAit++) {
//scan all chimeras, find the number within score range
if (cAit->chimScore >= minScoreToConsider)
++chimN;
};
if (chimN > 2*P.pCh.multimapNmax) //too many loci (considering 2* more candidates for stitching below)
return chimRecord;
chimN=0;
for (auto cAit=chimAligns.begin(); cAit<chimAligns.end(); cAit++) {//re-scan all chimeras: stitch and re-check the score
if (cAit->chimScore >= chimScoreBest-(int)P.pCh.multimapScoreRange) {
cAit->chimericStitching(outGen.G, Read1[0]);
if (cAit->chimScore >= chimScoreBest - (int)P.pCh.multimapScoreRange)
++chimN;
};
};
if (chimN > P.pCh.multimapNmax) //too many loci
return chimRecord;
return false;
for (auto cAit=chimAligns.begin(); cAit<chimAligns.end(); cAit++) {//output chimeras within score range
if (cAit->chimScore >= chimScoreBest-(int)P.pCh.multimapScoreRange)
cAit->chimericJunctionOutput(*ostreamChimJunction, chimN);
if (cAit->chimScore >= minScoreToConsider)
cAit->chimericJunctionOutput(*ostreamChimJunction, chimN, maxNonChimAlignScore, PEmerged_flag, chimScoreBest, maxPossibleAlignScore);
};
if (chimN>0)
chimRecord=true;
return chimRecord;
return chimN > 0;
};//END