Skip to content
Commits on Source (10)
......@@ -39,3 +39,4 @@ m4/*m4
*.plist
tests/*log
snakemake/.snakemake/
2017-06-21 Changes in 2.1.3.1 Andreas Wilm <wilma@gis.a-star.edu.sg>
* Fixed bug introduced last minute in 2.1.3 that creates segfault if call is used without -o
2017-06-21 Changes in 2.1.3 Andreas Wilm <wilma@gis.a-star.edu.sg>
* Maintenance release before major rewrite
* Added Python3 support
* Added best practices snakemake workflow
* Little easier on memory in high coverage situations
* Added --force-overwrite option to 'call'
2015-05-19 Changes in 2.1.2 Andreas Wilm <wilma@gis.a-star.edu.sg>
* 'indelqual' now allows to read bam from stdin
* Fixed bug in 'call' which resulted in negative phred quality
filter, when pvalue alpha was above 1 and number of tests was low
* 'indelqual' dindel now deletes BI/BD before inserting
* remove unnecessary dependency on kaln.h (not present in samtools
1.2)
* 'uniq' now closing output vcf filehandle on error, thus always
writing at least a header (reported by DNANexus)
* Added HRUN info field to output vcf
* Fixed calling of indel consvars
* Removed options (and use of) cons-as-ref and skip-n. now reference
is always used by default to call against and n's are always
skipped. also means the consensus variants (CONSVAR) concept
disappeared
* Set DEFAULT_MIN_PLP_IDQ to zero
* Caught yet another variant of the reference sequence name
mismatch problem
* 'viterbi': memory allocation now mainly dynamic.
fixes observed segfault on pacbio reads (unclear why though)
* Low AF false positive multi-allelic 1bp indel adjacent to
poly-AT now filtered by default.
* 'indelqual': added support for adding uniform insertion and
deletion qualities (instead of just indel qualities)
* indel calling: fixed index violation while accessing pdi[u] in
idaq happening while processing pacbio reads. added bound check as
hack (idaq() mostly illumina specific anyway)
* Removed MAX_READ_LEN globally
* 'call': added special case for SB test: if ref is entirely missing
and we have alts on only one strand fisher's exact test will
return 0, which is most certainly not what we want. setting to
INT_MAX instead
* vcfset: only-[type] now correctly dealt with in vcf2 on top of vcf1.
* vcfset: fixed bug which match vars even if they only overlapped
partially (now also checking position instead of relying on
tabix iterator)
* Reference sequences now converted to uppercase after fetching to be safe.
This also addresses the "AQ-bug" where low AQ values were reported
because of a lower-case reference
* 'pparallel': made bed reading function standard conform and more
fault tolerant. e.g. now allowing browser and track lines
* 'somatic': now also producing germline indels
* 'somatic': min cov lowered to 7
* 'somatic': normal stringent now has separate parameters, i.e. independent of
tumor stringent (set to fdr 1%)
* 'somatic': sq ignore normal now ignoring indels and snvs
* 'somatic': Added support for multiple ignore vcf files
* 'filter': corrected wrong info about default sb mtc method
* Changed MQ0 prob to a generic 0.5
* Fixed bug in fdr application in filter: previous versions
called more tests significant than actually true. Fixed by setting
all values to not significant right after calling fdr()
* 'somatic': added --min-cov option
Changes in versions before 2.1.2: See http://csb5.github.io/lofreq/blog/
......@@ -12,7 +12,7 @@ Licenses external libraries (part of the statically compiled binary):
The MIT License (MIT)
Copyright (c) 2013,2014 Genome Institute of Singapore
Copyright (c) 2013-2017 Genome Institute of Singapore
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
......
......@@ -22,10 +22,10 @@ The source hosted here on github is mainly for developers!
You will need:
- a C compiler (e.g. gcc or clang)
- a Python 2.7 interpreter
- a Python 2.7 or Python 3 interpreter
- zlib developer files
- a compiled version of [samtools (>=1.1)]((http://sourceforge.net/projects/samtools/files/samtools/1.1/samtools-1.1.tar.bz2/download))
- a compiled version of htslib (>= 1.1; use the one that comes bundled with samtools!)
- a compiled version of [samtools 1.1]((http://sourceforge.net/projects/samtools/files/samtools/1.1/samtools-1.1.tar.bz2/download))
- a compiled version of htslib 1.1; use the one that comes bundled with samtools!)
### Compilation
......
......@@ -5,7 +5,7 @@ AC_PREREQ(2.63)
# 2.64 which allows to define a URL as well
# 2.68 seems to have updated ax_pthread
AC_INIT([LoFreq_Star], [2.1.2], [wilma@gis.a-star.edu.sg])
AC_INIT([LoFreq_Star], [2.1.3.1], [wilma@gis.a-star.edu.sg])
# The AC_INIT macro can take any source file as an argument. It just
# checks that the file is there, which should, in turn, mean that the
......
lofreq (2.1.2+ds-1) UNRELEASED; urgency=low
lofreq (2.1.3.1+dfsg-1) UNRELEASED; urgency=low
* Initial release (Closes: #808895)
* debian/upstream/metadata: Added registry references
(Steffen Moeller)
-- Afif Elghraoui <afif@ghraoui.name> Wed, 23 Dec 2015 11:35:17 -0800
-- Afif Elghraoui <afif@ghraoui.name> Mon, 29 Oct 2018 09:08:23 +0100
Source: lofreq
Maintainer: Debian-Med-Packaging Team <debian-med-packaging@lists.alioth.debian.org>
Uploaders: Afif Elghraoui <afif@ghraoui.name>
Section: science
Priority: optional
Maintainer: Debian-Med-Packaging Team <debian-med-packaging@lists.alioth.debian.org>
Uploaders:
Afif Elghraoui <afif@ghraoui.name>,
Build-Depends:
debhelper (>= 9),
dh-autoreconf,
Build-Depends: debhelper (>= 11~),
python-all,
zlib1g-dev,
uthash-dev,
libhts-dev,
Standards-Version: 3.9.6
libhts-dev
Standards-Version: 4.2.1
Vcs-Browser: https://salsa.debian.org/med-team/lofreq
Vcs-Git: https://salsa.debian.org/med-team/lofreq.git
Homepage: http://csb5.github.io/lofreq/
Vcs-Git: git://anonscm.debian.org/debian-med/lofreq.git
Vcs-Browser: http://anonscm.debian.org/cgit/debian-med/lofreq.git
Package: lofreq
Architecture: any
Depends:
${shlibs:Depends},
Depends: ${shlibs:Depends},
${misc:Depends}
Description: sensitive variant calling from sequencing data
LoFreq* (i.e. LoFreq version 2) is a fast and sensitive
......
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: lofreq
Upstream-Contact: <wilma@gis.a-star.edu.sg>
Source: https://github.com/CSB5/lofreq
......
......@@ -11,4 +11,4 @@ export HTSLIB=/usr/include
#export SAMTOOLS=/usr/lib/python2.7/dist-packages/pysam/include/samtools
%:
dh $@ --parallel --with autoreconf
dh $@
version=4
opts="filenamemangle=s%(?:.*?)?v?(\d[\d.]*)\.tar\.gz%lofreq-$1.tar.gz%" \
https://github.com/CSB5/lofreq/tags \
(?:.*?/)?v?(\d[\d.]*)\.tar\.gz debian uupdate
opts="repacksuffix=+dfsg,dversionmangle=auto,repack,compression=xz" \
https://github.com/CSB5/lofreq/releases .*/archive/v?@ANY_VERSION@@ARCHIVE_EXT@
......@@ -2,7 +2,8 @@ To create a new distribution:
- Make sure tests work test/run_all.sh
- Update version in configure.ac
- Update top-level README
- Update top-level README and changelog
- autoreconf
- run 'make dist' to compile a tarball
- Either
- Upload source and update the websites with info on new usage/bug-fixes/new function
......@@ -11,13 +12,10 @@ To create a new distribution:
- ./configure --enable-static
- make
- compile against static libz if necessary, check with ldd ./src/lofreq/lofreq (or otool -L)
- find src -name \*.[choa] -or -name Makefile\* | xargs rm;
- rm -rf src//tools/build/
- find src -name .deps | xargs rm -rf;
- find src -name .libs | xargs rm -rf;
- cd .. and pack
- bash binary_installer -p somewhere and pack
- Commit your changes
- Tag this version
e.g git tag -a v0.3.1 -m 'my version 0.3.1')
push and push origin --tags
- use binary_installer for binary distributions
"""A best-practices variant calling implementation LoFreq ( # losely
based on https://github.com/gis-rpd/pipelines. Starts with short
reads and finishes with a bgzipped vcf file. The workflow is kept
simple, i.e. no tricks are applied to speed the analysis up
(splitting fastq, running viterbi by chrom etc.).
# Input: config-file with following fields:
- bool mark_short_splits: for bwa mem -M
- string bed: for bed-file limiting analysis to certain regions
- int optional 'maxdepth': for limit per-site coverage in analysis
- dict 'samples': sample names as keys and one fastq-pair each as value
- string reference: reference fasta file
- string outdir: where to save output
# Pre-installed programs:
- lofreq 2.1.2
- bwa (with mem support e.g. 0.7.12)
- samtools >= 1.3
Notes:
- If missing, the workflow will try to index your reference with
samtools and bwa. This can lead to race conditions so is best
done in advance.
"""
import os
shell.executable("/bin/bash")
shell.prefix("set -euo pipefail;")
rule all:
input:
expand(os.path.join(config['outdir'], "{sample}/{sample}.bwamem.lofreq.vcf.gz"),
sample=config['samples'])
rule bwa_index:
input:
"{prefix}.{suffix}"
output:
"{prefix}.{suffix,(fasta|fa)}.pac",
"{prefix}.{suffix,(fasta|fa)}.bwt",
"{prefix}.{suffix,(fasta|fa)}.sa"
log:
"{prefix}.{suffix,(fasta|fa)}.index.log"
shell:
"bwa index {input} >& {log};"
rule samtools_faidx:
input:
"{prefix}.{suffix}"
output:
"{prefix}.{suffix,(fasta|fa)}.fai",
log:
"{prefix}.{suffix,(fasta|fa)}.index.log"
shell:
"samtools faidx {input} >& {log};"
rule samtools_index:
input:
"{prefix}.bam"
output:
"{prefix}.bam.bai",
log:
"{prefix}.bam.bai.log"
shell:
"samtools index {input} >& {log};"
rule bwamem_align:
input:
reffa = config['reference'],
bwaindex = config['reference'] + ".bwt",
fastqs = lambda wc: config['samples'][wc.sample]
output:
bam = '{prefix}/{sample}.bwamem.bam'
log:
'{prefix}/{sample}.bwamem.bam.log'
params:
mark_short_splits = "-M" if config['mark_short_splits'] else "",
message:
'Aligning PE reads, fixing mate information and converting to sorted BAM'
threads:
8
shell:
"{{ bwa mem {params.mark_short_splits} -t {threads}"
" {input.reffa} {input.fastqs} |"
" samtools fixmate - - |"
" samtools sort -o {output.bam} -T {output.bam}.tmp -; }} >& {log}"
rule lofreq_bam_processing:
"""Runs BAM through full LoFreq preprocessing pipeline,
i.e. viterbi, alnqual, indelqual, followed by sort (required by
viterbi).
WARNING: running this on unsorted input files will be inefficient
because of constant reloading of the reference
"""
input:
bam = '{prefix}.bam',
reffa = config['reference'],
reffai = config['reference'] + ".fai"
output:
bam = '{prefix}.lofreq.bam'
log:
'{prefix}.lofreq.log'
message:
"Preprocessing BAMs with LoFreq"
threads:
1
shell:
"{{ lofreq viterbi -f {input.reffa} {input.bam} | "
" lofreq alnqual -u - {input.reffa} | "
" lofreq indelqual --dindel -f {input.reffa} - | "
" samtools sort -o {output.bam} -T {output.bam}.tmp -; }} >& {log}"
rule lofreq_call:
input:
bam = '{prefix}.bam',
bai = '{prefix}.bam.bai',
reffa = config['reference'],
refidx = config['reference'] + ".fai",
output:
vcf = '{prefix}.vcf.gz'
log:
'{prefix}.vcf.log'
message:
"Calling variants with LoFreq"
threads:
8
params:
maxdepth = config.get('maxdepth', 10000),
bed_arg = "-l {}".format(config['bed']) if config['bed'] else ""
shell:
"lofreq call-parallel --pp-threads {threads} --call-indels"
" {params.bed_arg} -f {input.reffa} -o {output.vcf}"
" -d {params.maxdepth} {input.bam} >& {log}"
# sample definition with one fastq pair per sample
samples:
isolate1:
- data/sample1_R1.fastq.gz
- data/sample1_R2.fastq.gz
isolate2:
- data/sample2.fastq.gz
outdir: out/
reference:
data/ref.fa
# optional: regions. leave blank if none
bed: data/regions.bed
# optional: max coverage (see snakefile for default)
maxdepth: 10000
# mark short split hits as secondary in BWA MEM
mark_short_splits: true
snakemake -T -p --dryrun --configfile cfg.yaml -s Snakefile