Skip to content
Commits on Source (4)
2019-09-30 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.4
* Features added
Github Actions is used together with Travis CI for testing and
deployment.
* Bugs fixed
PR #322:
1) #318 Random score in bdgdiff output. It turns out the sum_v is
not initialized as 0 before adding. Potential bugs are fixed in
other functions in ScoreTrack and CallPeakUnit codes.
2) #321 Cython dependency in setup.py script is removed. And place
'cythonzie' call to the correct position.
3) A typo is fixed in Github Actions script.
2019-09-19 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.3.3
......
# INSTALL Guide For MACS
Time-stamp: <2019-09-20 15:08:35 taoliu>
Time-stamp: <2019-09-30 14:47:45 taoliu>
Please check the following instructions to complete your installation.
## Prerequisites
Python version must be equal to *2.7* to run MACS. I recommend
using the version *2.7.15*.
Python version must be equal to *2.7* to run MACS.
[Numpy](http://www.scipy.org/Download) (>=1.15) are required to run MACS v2.
[Numpy](http://www.scipy.org/Download) (>=1.16) are required to run MACS v2.
GCC is required to compile `.c` codes in MACS v2 package, and python
header files are needed. If you are using Mac OSX, I recommend you
install Xcode; if you are using Linux, you need to make sure
`python-dev` is installed.
[Cython](http://cython.org/) (>=0.25) is required to generate `.c`
[Cython](http://cython.org/) (>=0.29) is required to generate `.c`
files from `.pyx` files using `setup.py` script.
## Easy installation through PyPI
......@@ -30,7 +29,7 @@ to install them again.
Then under command line, type `pip install MACS2`. PyPI will
install Numpy and Cython automatically if they are absent.
To upgrade MACS2, type `pip install -U MACS2`. It will check
To upgrade MACS2, type `pip install --upgrade MACS2`. It will check
currently installed MACS2, compare the version with the one on PyPI
repository, download and install newer version while necessary.
......
Metadata-Version: 2.1
Name: MACS2
Version: 2.1.3.3
Version: 2.1.4
Summary: Model Based Analysis for ChIP-Seq data
Home-page: http://github.com/taoliu/MACS/
Author: Tao Liu
Author-email: vladimir.liu@gmail.com
License: UNKNOWN
Description: # Recent Changes for MACS (2.1.3.3)
Description: # Recent Changes for MACS (2.1.4)
### 2.1.3.3
### 2.1.4
* Features added
Github Actions is used together with Travis CI for testing and
deployment.
* Bugs fixed (PR #322)
1) #318 Random score in `bdgdiff` output. It turns out the sum_v
is not initialized as 0 before adding. Potential bugs are fixed in
other functions in ScoreTrack and CallPeakUnit codes.
2) #321 Cython dependency in `setup.py` script is removed. And
place 'cythonzie' call to the correct position.
3) A typo is fixed in Github Actions script.
### 2.1.3
* Features added
1) Support Docker auto-deploy. PR #309
......@@ -43,44 +61,7 @@ Description: # Recent Changes for MACS (2.1.3.3)
3) Rename COPYING to LICENSE.
### 2.1.2
* New features
1) Added missing BEDPE support. And enable the support for BAMPE
and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
subcommands. When format is BAMPE or BEDPE, The 'pileup' command
will pile up the whole fragment defined by mapping locations of
the left end and right end of each read pair. Thank @purcaro
2) Added options to callpeak command for tweaking max-gap and
min-len during peak calling. Thank @jsh58!
3) The callpeak option "--to-large" option is replaced with
"--scale-to large".
4) The randsample option "-t" has been replaced with "-i".
* Bug fixes
1) Fixed memory issue related to #122 and #146
2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
3) Fixed a bug while setting commandline qvalue cutoff.
4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
5) Fixed the calculation of average fragment length for paired-end
data. Thank @jsh58
6) Fixed bugs caused by khash while computing p/q-value and log
likelihood ratios. Thank @jsh58
7) More spelling tweaks in source code. Thank @mr-c
# README for MACS2 (2.1.3.3)
# README for MACS2 (2.1.4)
## Introduction
......@@ -268,8 +249,19 @@ Description: # Recent Changes for MACS (2.1.3.3)
##### `--min-length`, `--max-gap`
These two options can be used to fine-tune the peak calling behavior by specifying the minimum length of a called peak and the maximum allowed gap between two nearby regions to be merged. In another word, a called peak has to be longer than *min-length*, and if the distance between two nearby peaks is smaller than *max-gap* then they will be merged as one. If they are not set, MACS2 will set the DEFAULT value for *min-length* as the predicted fragment size d, and the DEFAULT value for *max-gap* as the detected read length. Note, if you set a *min-length* value smaller than the fragment size, it may have NO effect on the result. For BROAD peak calling, try to set a
large value such as 500bps. You can also use '--cutoff-analysis' option with default setting, and check the column 'avelpeak' under different cutoff values to decide a reasonable *min-length* value.
These two options can be used to fine-tune the peak calling behavior
by specifying the minimum length of a called peak and the maximum
allowed gap between two nearby regions to be merged. In another word,
a called peak has to be longer than *min-length*, and if the distance
between two nearby peaks is smaller than *max-gap* then they will be
merged as one. If they are not set, MACS2 will set the DEFAULT value
for *min-length* as the predicted fragment size d, and the DEFAULT
value for *max-gap* as the detected read length. Note, if you set a
*min-length* value smaller than the fragment size, it may have NO
effect on the result. For BROAD peak calling, try to set a large value
such as 500bps. You can also use '--cutoff-analysis' option with
default setting, and check the column 'avelpeak' under different
cutoff values to decide a reasonable *min-length* value.
##### `--nolambda`
......@@ -405,21 +397,33 @@ Description: # Recent Changes for MACS (2.1.3.3)
- fold enrichment for this peak summit against random Poisson distribution with local lambda,
- -log10(qvalue) at peak summit
Coordinates in XLS is 1-based which is different with BED format. When `--broad` is enabled for broad peak calling, the pileup, pvalue, qvalue, and fold change in the XLS file will be the mean value across the entire peak region, since peak summit won't be called in broad peak calling mode.
Coordinates in XLS is 1-based which is different with BED
format. When `--broad` is enabled for broad peak calling, the
pileup, pvalue, qvalue, and fold change in the XLS file will be the
mean value across the entire peak region, since peak summit won't
be called in broad peak calling mode.
2. `NAME_peaks.narrowPeak` is BED6+4 format file which contains the
peak locations together with peak summit, pvalue and qvalue. You
can load it to UCSC genome browser. Definition of some specific
columns are:
- 5th: integer score for display. It's calculated as `int(-10*log10pvalue)` or `int(-10*log10qvalue)` depending on whether `-p` (pvalue) or `-q` (qvalue) is used as score cutoff. Please note that currently this value might be out of the [0-1000] range defined in [UCSC Encode narrowPeak format](https://genome.ucsc.edu/FAQ/FAQformat.html#format12). You can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by using the following 1-liner awk: `awk -v OFS="\t" '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak`
- 5th: integer score for display. It's calculated as
`int(-10*log10pvalue)` or `int(-10*log10qvalue)` depending on
whether `-p` (pvalue) or `-q` (qvalue) is used as score
cutoff. Please note that currently this value might be out of the
[0-1000] range defined in [UCSC Encode narrowPeak
format](https://genome.ucsc.edu/FAQ/FAQformat.html#format12). You
can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by
using the following 1-liner awk: `awk -v OFS="\t"
'{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak`
- 7th: fold-change at peak summit
- 8th: -log10pvalue at peak summit
- 9th: -log10qvalue at peak summit
- 10th: relative summit position to peak start
The file can be loaded directly to UCSC genome browser. Remove the beginning track line if you want to
analyze it by other tools.
The file can be loaded directly to UCSC genome browser. Remove the
beginning track line if you want to analyze it by other tools.
3. `NAME_summits.bed` is in BED format, which contains the peak
summits locations for every peaks. The 5th column in this file is
......@@ -474,7 +478,8 @@ Description: # Recent Changes for MACS (2.1.3.3)
## Tips of fine-tuning peak calling
There are several subcommands within MACSv2 package to fine-tune or customize your analysis:
There are several subcommands within MACSv2 package to fine-tune or
customize your analysis:
1. `bdgcmp` can be used on `*_treat_pileup.bdg` and
`*_control_lambda.bdg` or bedGraph files from other resources
......@@ -494,8 +499,9 @@ Description: # Recent Changes for MACS (2.1.3.3)
according to parameter settings for minimum length, maximum gap
and cutoff.
4. You can combine subcommands to do a step-by-step peak
calling. Read detail at [MACS2 wikipage](https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands)
4. You can combine subcommands to do a step-by-step peak calling. Read
detail at [MACS2
wikipage](https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands)
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
......
......@@ -2,7 +2,9 @@ ChangeLog
INSTALL.md
LICENSE
MANIFEST.in
PKG-INFO
README.md
setup.cfg
setup.py
MACS2/Constants.py
MACS2/OptValidator.py
......@@ -51,7 +53,6 @@ MACS2/IO/BedGraph.c
MACS2/IO/BedGraph.pyx
MACS2/IO/BedGraphIO.c
MACS2/IO/BedGraphIO.pyx
MACS2/IO/BinKeeper.py
MACS2/IO/CallPeakUnit.c
MACS2/IO/CallPeakUnit.pyx
MACS2/IO/FixWidthTrack.c
......@@ -67,3 +68,23 @@ MACS2/IO/ScoreTrack.pyx
MACS2/IO/__init__.py
MACS2/data/__init__.py
bin/macs2
debian/README.test
debian/changelog
debian/control
debian/copyright
debian/docs
debian/lintian-overrides
debian/macs.examples
debian/manpages
debian/rules
debian/watch
debian/examples/aluY.chr1.bed.gz
debian/examples/gerp.chr1.bed.gz
debian/patches/series
debian/patches/spelling
debian/source/format
debian/source/include-binaries
debian/source/options
debian/tests/control
debian/tests/run-unit-test
debian/upstream/metadata
\ No newline at end of file
numpy>=1.15
cython>=0.25
numpy>=1.16
MACS_VERSION = "2.1.3.3"
MACS_VERSION = "2.1.4"
#MACSDIFF_VERSION = "1.0.4 20110212 (tag:alpha)"
FILTERDUP_VERSION = "1.0.0 20140616"
RANDSAMPLE_VERSION = "1.0.0 20120703"
......
# Time-stamp: <2019-09-20 11:38:14 taoliu>
"""Module Description: BinKeeper for Wiggle-like tracks. (obsolete)
This code is free software; you can redistribute it and/or modify it
under the terms of the BSD License (see the file LICENSE included with
the distribution).
"""
# ------------------------------------
# python modules
# ------------------------------------
import sys
import re
from bisect import insort,bisect_left,bisect_right,insort_right
from array import array
# ------------------------------------
# constants
# ------------------------------------
# to determine the byte size
if array('H',[1]).itemsize == 2:
BYTE2 = 'H'
else:
raise Exception("BYTE2 type cannot be determined!")
if array('I',[1]).itemsize == 4:
BYTE4 = 'I'
elif array('L',[1]).itemsize == 4:
BYTE4 = 'L'
else:
raise Exception("BYTE4 type cannot be determined!")
if array('f',[1]).itemsize == 4:
FBYTE4 = 'f'
elif array('d',[1]).itemsize == 4:
FBYTE4 = 'd'
else:
raise Exception("BYTE4 type cannot be determined!")
# ------------------------------------
# Misc functions
# ------------------------------------
# ------------------------------------
# Classes
# ------------------------------------
class BinKeeperI:
"""BinKeeper keeps point data from a chromosome in a bin list.
Example:
>>> from taolib.CoreLib.Parser import WiggleIO
>>> w = WiggleIO('sample.wig')
>>> bk = w.build_binKeeper()
>>> bk['chrI'].pp2v(1000,2000) # to extract values in chrI:1000..2000
"""
def __init__ (self,binsize=8000,chromosomesize=1e9):
"""Initializer.
Parameters:
binsize : size of bin in Basepair
chromosomesize : size of chromosome, default is 1G
"""
self.binsize = binsize
self.binnumber = int(chromosomesize/self.binsize)+1
self.cage = []
a = self.cage.append
for i in xrange(self.binnumber):
a([array(BYTE4,[]),array(FBYTE4,[])])
def add ( self, p, value ):
"""Add a position into BinKeeper.
Note: position must be sorted before adding. Otherwise, pp2v
and pp2p will not work.
"""
bin = p/self.binsize
self.cage[bin][0].append(p)
self.cage[bin][1].append(value)
def p2bin (self, p ):
"""Return the bin index for a position.
"""
return p/self.binsize
def p2cage (self, p):
"""Return the bin containing the position.
"""
return self.cage[p/self.binsize]
def __pp2cages (self, p1, p2):
assert p1<=p2
bin1 = self.p2bin(p1)
bin2 = self.p2bin(p2)+1
t = [array(BYTE4,[]),array(FBYTE4,[])]
for i in xrange(bin1,bin2):
t[0].extend(self.cage[i][0])
t[1].extend(self.cage[i][1])
return t
def pp2p (self, p1, p2):
"""Give the position list between two given positions.
Parameters:
p1 : start position
p2 : end position
Return Value:
list of positions between p1 and p2.
"""
(ps,vs) = self.__pp2cages(p1,p2)
p1_in_cages = bisect_left(ps,p1)
p2_in_cages = bisect_right(ps,p2)
return ps[p1_in_cages:p2_in_cages]
def pp2v (self, p1, p2):
"""Give the value list between two given positions.
Parameters:
p1 : start position
p2 : end position
Return Value:
list of values whose positions are between p1 and p2.
"""
(ps,vs) = self.__pp2cages(p1,p2)
p1_in_cages = bisect_left(ps,p1)
p2_in_cages = bisect_right(ps,p2)
return vs[p1_in_cages:p2_in_cages]
def pp2pv (self, p1, p2):
"""Give the (position,value) list between two given positions.
Parameters:
p1 : start position
p2 : end position
Return Value:
list of (position,value) between p1 and p2.
"""
(ps,vs) = self.__pp2cages(p1,p2)
p1_in_cages = bisect_left(ps,p1)
p2_in_cages = bisect_right(ps,p2)
return zip(ps[p1_in_cages:p2_in_cages],vs[p1_in_cages:p2_in_cages])
class BinKeeperII:
"""BinKeeperII keeps non-overlapping interval data from a chromosome in a bin list.
This is especially designed for bedGraph type data.
"""
def __init__ (self,binsize=8000,chromosomesize=1e9):
"""Initializer.
Parameters:
binsize : size of bin in Basepair
chromosomesize : size of chromosome, default is 1G
"""
self.binsize = binsize
self.binnumber = int(chromosomesize/self.binsize)+1
self.cage = []
a = self.cage.append
for i in xrange(self.binnumber):
a([array(BYTE4,[]),array(BYTE4,[]),array(FBYTE4,[])])
def add ( self, startp, endp, value ):
"""Add an interval data into BinKeeper.
Note: position must be sorted before adding. Otherwise, pp2v
and pp2p will not work.
"""
startbin = startp/self.binsize
endbin = endp/self.binsize
if startbin == endbin:
# some intervals may only be within a bin
j = bisect.bisect_left(self.cage[startbin][0],startp)
self.cage[startbin][0].insert(j,startp)
self.cage[startbin][1].insert(j,endp)
self.cage[startbin][2].insert(j,value)
else:
# some intervals may cover the end of bins
# first bin
j = bisect.bisect_left(self.cage[startbin][0],startp)
self.cage[startbin][0].insert(j,startp)
self.cage[startbin][1].insert(j,(startbin+1)*self.binsize)
self.cage[startbin][2].insert(j,value)
# other bins fully covered
for i in xrange(startbin+1,endbin):
p = i*self.binsize
j = bisect.bisect_left(self.cage[startbin][0],p)
self.cage[startbin][0].insert(j,p)
self.cage[startbin][1].insert(j,(i+1)*self.binsize)
self.cage[startbin][2].insert(j,value)
insort_right(self.cage[i][0],i*self.binsize)
insort_right(self.cage[i][1],(i+1)*self.binsize)
insort_right(self.cage[i][2],value)
# last bin -- the start of this bin should be covered
insort_right(self.cage[endbin][0],endbin*self.binsize)
insort_right(self.cage[endbin][1],endp)
insort_right(self.cage[endbin][2],value)
def p2bin (self, p ):
"""Given a position, return the bin index for a position.
"""
return p/self.binsize
def p2cage (self, p):
"""Given a position, return the bin containing the position.
"""
return self.cage[p/self.binsize]
def pp2cages (self, p1, p2):
"""Given an interval, return the bins containing this interval.
"""
assert p1<=p2
bin1 = self.p2bin(p1)
bin2 = self.p2bin(p2)
t = [array(BYTE4,[]),array(BYTE4,[]),array(FBYTE4,[])]
for i in xrange(bin1,bin2+1):
t[0].extend(self.cage[i][0])
t[1].extend(self.cage[i][1])
t[2].extend(self.cage[i][2])
return t
def pp2intervals (self, p1, p2):
"""Given an interval, return the intervals list between two given positions.
Parameters:
p1 : start position
p2 : end position
Return Value:
A list of intervals start and end positions (tuple) between p1 and p2.
* Remember, I assume all intervals saved in this BinKeeperII
are not overlapping, so if there is some overlap, this
function will not work as expected.
"""
(startposs,endposs,vs) = self.pp2cages(p1,p2)
p1_in_cages = bisect_left(startposs,p1)
p2_in_cages = bisect_right(endposs,p2)
output_startpos_list = startposs[p1_in_cages:p2_in_cages]
output_endpos_list = endposs[p1_in_cages:p2_in_cages]
# check if the bin (p1_in_cages-1) covers p1
if p1 < endposs[p1_in_cages-1]:
# add this interval
output_startpos_list = array(BYTE4,[p1,])+output_startpos_list
output_endpos_list = array(BYTE4,[endposs[p1_in_cages-1],])+output_endpos_list
# check if the bin (p2_in_cages+1) covers p2
if p2 > startposs[p2_in_cages+1]:
# add this interval
output_startpos_list = array(BYTE4,[startposs[p2_in_cages+1],])+output_startpos_list
output_endpos_list = array(BYTE4,[p2,])+output_endpos_list
return zip(output_startpos_list,output_endpos_list)
def pp2pvs (self, p1, p2):
"""Given an interval, return the values list between two given positions.
Parameters:
p1 : start position
p2 : end position
Return Value:
A list of start, end positions, values (tuple) between p1 and
p2. Each value represents the value in an interval. Remember
the interval length and positions are lost in the output.
* Remember, I assume all intervals saved in this BinKeeperII
are not overlapping, so if there is some overlap, this
function will not work as expected.
"""
(startposs,endposs,vs) = self.pp2cages(p1,p2)
p1_in_cages = bisect_left(startposs,p1)
p2_in_cages = bisect_right(endposs,p2)
output_startpos_list = startposs[p1_in_cages:p2_in_cages]
output_endpos_list = endposs[p1_in_cages:p2_in_cages]
output_value_list = vs[p1_in_cages:p2_in_cages]
# print p1_in_cages,p2_in_cages
# print vs
print output_startpos_list
print output_endpos_list
print output_value_list
# check if the bin (p1_in_cages-1) covers p1
if p1_in_cages-1 >= 0 and p1 < self.cage[p1_in_cages-1][1]:
# add this interval
output_startpos_list = array(BYTE4,[p1,])+output_startpos_list
output_endpos_list = array(BYTE4,[self.cage[p1_in_cages-1][1],])+output_endpos_list
output_value_list = array(BYTE4,[self.cage[p1_in_cages-1][2],])+output_value_list
# check if the bin (p2_in_cages+1) covers p2
#print p2_in_cages+1,len(self.cage)
#print p2, self.cage[p2_in_cages+1][0]
if p2_in_cages+1 < len(self.cage) and p2 > self.cage[p2_in_cages+1][0]:
# add this interval
output_startpos_list = output_startpos_list+array(BYTE4,[self.cage[p2_in_cages+1][0],])
output_endpos_list = output_endpos_list+array(BYTE4,[p2,])
output_value_list = output_value_list+array(BYTE4,[self.cage[p2_in_cages+1][2],])
print output_startpos_list
print output_endpos_list
print output_value_list
return zip(output_startpos_list,output_endpos_list,output_value_list)
This diff is collapsed.
# Time-stamp: <2019-09-20 11:38:30 taoliu>
# Time-stamp: <2019-09-30 13:05:38 taoliu>
"""Module for Calculate Scores.
......@@ -202,11 +202,12 @@ cdef float median_from_value_length ( np.ndarray value, list length ):
int32_t l_half, c, tmp_l
float tmp_v
c = 0
tmp = sorted(zip( value, length ))
l = sum( length )/2
l_half = sum( length )/2
for (tmp_v, tmp_l) in tmp:
c += tmp_l
if c > l:
if c > l_half:
return tmp_v
cdef float mean_from_value_length ( np.ndarray value, list length ):
......@@ -215,7 +216,7 @@ cdef float mean_from_value_length ( np.ndarray value, list length ):
"""
cdef:
list tmp
int32_t tmp_l
int32_t tmp_l, l
float tmp_v, sum_v
sum_v = 0
......
This diff is collapsed.
# Time-stamp: <2019-09-20 11:40:07 taoliu>
# Time-stamp: <2019-09-30 13:02:11 taoliu>
"""Module for Feature IO classes.
......@@ -17,8 +17,6 @@ from copy import copy
from cpython cimport bool
#from scipy.stats import chi2
from MACS2.Signal import maxima, enforce_valleys, enforce_peakyness
cimport cython
......@@ -49,7 +47,7 @@ LOG10_E = 0.43429448190325176
pscore_dict = dict()
cdef inline double get_pscore ( int observed, double expectation ):
cdef double get_pscore ( int observed, double expectation ):
"""Get p-value score from Poisson test. First check existing
table, if failed, call poisson_cdf function, then store the result
in table.
......@@ -67,7 +65,7 @@ cdef inline double get_pscore ( int observed, double expectation ):
asym_logLR_dict = dict()
cdef inline double logLR_asym ( double x, double y ):
cdef double logLR_asym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
chromatin bias ). Set minus sign for depletion.
......@@ -91,7 +89,7 @@ cdef inline double logLR_asym ( double x, double y ):
sym_logLR_dict = dict()
cdef inline double logLR_sym ( double x, double y ):
cdef double logLR_sym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
another enriched ). Set minus sign for H0>H1.
......@@ -113,12 +111,12 @@ cdef inline double logLR_sym ( double x, double y ):
sym_logLR_dict[ ( x, y ) ] = s
return s
cdef inline double get_logFE ( float x, float y ):
cdef double get_logFE ( float x, float y ):
""" return 100* log10 fold enrichment with +1 pseudocount.
"""
return log10( x/y )
cdef inline float get_subtraction ( float x, float y):
cdef float get_subtraction ( float x, float y):
""" return subtraction.
"""
return x - y
......@@ -131,11 +129,12 @@ cdef float median_from_value_length ( np.ndarray value, list length ):
int32_t l_half, c, tmp_l
float tmp_v
c = 0
tmp = sorted(zip( value, length ))
l = sum( length )/2
l_half = sum( length )/2
for (tmp_v, tmp_l) in tmp:
c += tmp_l
if c > l:
if c > l_half:
return tmp_v
cdef float mean_from_value_length ( np.ndarray value, list length ):
......@@ -143,9 +142,10 @@ cdef float mean_from_value_length ( np.ndarray value, list length ):
"""
cdef:
list tmp
int32_t tmp_l
int32_t tmp_l, l
float tmp_v, sum_v
sum_v = 0
tmp = zip( value, length )
l = sum( length )
......@@ -154,7 +154,6 @@ cdef float mean_from_value_length ( np.ndarray value, list length ):
return sum_v / l
# ------------------------------------
# Classes
# ------------------------------------
......@@ -1706,6 +1705,8 @@ cdef class TwoConditionScores:
"""This function try to find regions within which, scores
are continuously higher than a given cutoff.
For bdgdiff.
This function is NOT using sliding-windows. Instead, any
regions in bedGraph above certain cutoff will be detected,
then merged if the gap between nearby two regions are below
......@@ -1815,9 +1816,11 @@ cdef class TwoConditionScores:
"""
cdef:
int32_t tmp_s, tmp_e
int32_t l = 0
int32_t l
float tmp_v, sum_v
sum_v = 0
l = 0
for (tmp_s, tmp_e, tmp_v) in peakcontent:
sum_v += tmp_v * ( tmp_e - tmp_s )
l += tmp_e - tmp_s
......
# Time-stamp: <2019-09-20 11:30:12 taoliu>
# Time-stamp: <2019-09-25 14:40:58 taoliu>
"""Description: Naive call differential peaks from 4 bedGraph tracks for scores.
......@@ -89,8 +89,6 @@ def run( options ):
info("Write peaks...")
ofiles = []
name_prefix = []
if options.ofile:
ofiles = map( lambda x: os.path.join( options.outdir, x ), options.ofile )
name_prefix = options.ofile
......
Metadata-Version: 2.1
Name: MACS2
Version: 2.1.3.3
Version: 2.1.4
Summary: Model Based Analysis for ChIP-Seq data
Home-page: http://github.com/taoliu/MACS/
Author: Tao Liu
Author-email: vladimir.liu@gmail.com
License: UNKNOWN
Description: # Recent Changes for MACS (2.1.3.3)
Description: # Recent Changes for MACS (2.1.4)
### 2.1.3.3
### 2.1.4
* Features added
Github Actions is used together with Travis CI for testing and
deployment.
* Bugs fixed (PR #322)
1) #318 Random score in `bdgdiff` output. It turns out the sum_v
is not initialized as 0 before adding. Potential bugs are fixed in
other functions in ScoreTrack and CallPeakUnit codes.
2) #321 Cython dependency in `setup.py` script is removed. And
place 'cythonzie' call to the correct position.
3) A typo is fixed in Github Actions script.
### 2.1.3
* Features added
1) Support Docker auto-deploy. PR #309
......@@ -43,44 +61,7 @@ Description: # Recent Changes for MACS (2.1.3.3)
3) Rename COPYING to LICENSE.
### 2.1.2
* New features
1) Added missing BEDPE support. And enable the support for BAMPE
and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
subcommands. When format is BAMPE or BEDPE, The 'pileup' command
will pile up the whole fragment defined by mapping locations of
the left end and right end of each read pair. Thank @purcaro
2) Added options to callpeak command for tweaking max-gap and
min-len during peak calling. Thank @jsh58!
3) The callpeak option "--to-large" option is replaced with
"--scale-to large".
4) The randsample option "-t" has been replaced with "-i".
* Bug fixes
1) Fixed memory issue related to #122 and #146
2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
3) Fixed a bug while setting commandline qvalue cutoff.
4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
5) Fixed the calculation of average fragment length for paired-end
data. Thank @jsh58
6) Fixed bugs caused by khash while computing p/q-value and log
likelihood ratios. Thank @jsh58
7) More spelling tweaks in source code. Thank @mr-c
# README for MACS2 (2.1.3.3)
# README for MACS2 (2.1.4)
## Introduction
......@@ -268,8 +249,19 @@ Description: # Recent Changes for MACS (2.1.3.3)
##### `--min-length`, `--max-gap`
These two options can be used to fine-tune the peak calling behavior by specifying the minimum length of a called peak and the maximum allowed gap between two nearby regions to be merged. In another word, a called peak has to be longer than *min-length*, and if the distance between two nearby peaks is smaller than *max-gap* then they will be merged as one. If they are not set, MACS2 will set the DEFAULT value for *min-length* as the predicted fragment size d, and the DEFAULT value for *max-gap* as the detected read length. Note, if you set a *min-length* value smaller than the fragment size, it may have NO effect on the result. For BROAD peak calling, try to set a
large value such as 500bps. You can also use '--cutoff-analysis' option with default setting, and check the column 'avelpeak' under different cutoff values to decide a reasonable *min-length* value.
These two options can be used to fine-tune the peak calling behavior
by specifying the minimum length of a called peak and the maximum
allowed gap between two nearby regions to be merged. In another word,
a called peak has to be longer than *min-length*, and if the distance
between two nearby peaks is smaller than *max-gap* then they will be
merged as one. If they are not set, MACS2 will set the DEFAULT value
for *min-length* as the predicted fragment size d, and the DEFAULT
value for *max-gap* as the detected read length. Note, if you set a
*min-length* value smaller than the fragment size, it may have NO
effect on the result. For BROAD peak calling, try to set a large value
such as 500bps. You can also use '--cutoff-analysis' option with
default setting, and check the column 'avelpeak' under different
cutoff values to decide a reasonable *min-length* value.
##### `--nolambda`
......@@ -405,21 +397,33 @@ Description: # Recent Changes for MACS (2.1.3.3)
- fold enrichment for this peak summit against random Poisson distribution with local lambda,
- -log10(qvalue) at peak summit
Coordinates in XLS is 1-based which is different with BED format. When `--broad` is enabled for broad peak calling, the pileup, pvalue, qvalue, and fold change in the XLS file will be the mean value across the entire peak region, since peak summit won't be called in broad peak calling mode.
Coordinates in XLS is 1-based which is different with BED
format. When `--broad` is enabled for broad peak calling, the
pileup, pvalue, qvalue, and fold change in the XLS file will be the
mean value across the entire peak region, since peak summit won't
be called in broad peak calling mode.
2. `NAME_peaks.narrowPeak` is BED6+4 format file which contains the
peak locations together with peak summit, pvalue and qvalue. You
can load it to UCSC genome browser. Definition of some specific
columns are:
- 5th: integer score for display. It's calculated as `int(-10*log10pvalue)` or `int(-10*log10qvalue)` depending on whether `-p` (pvalue) or `-q` (qvalue) is used as score cutoff. Please note that currently this value might be out of the [0-1000] range defined in [UCSC Encode narrowPeak format](https://genome.ucsc.edu/FAQ/FAQformat.html#format12). You can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by using the following 1-liner awk: `awk -v OFS="\t" '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak`
- 5th: integer score for display. It's calculated as
`int(-10*log10pvalue)` or `int(-10*log10qvalue)` depending on
whether `-p` (pvalue) or `-q` (qvalue) is used as score
cutoff. Please note that currently this value might be out of the
[0-1000] range defined in [UCSC Encode narrowPeak
format](https://genome.ucsc.edu/FAQ/FAQformat.html#format12). You
can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by
using the following 1-liner awk: `awk -v OFS="\t"
'{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak`
- 7th: fold-change at peak summit
- 8th: -log10pvalue at peak summit
- 9th: -log10qvalue at peak summit
- 10th: relative summit position to peak start
The file can be loaded directly to UCSC genome browser. Remove the beginning track line if you want to
analyze it by other tools.
The file can be loaded directly to UCSC genome browser. Remove the
beginning track line if you want to analyze it by other tools.
3. `NAME_summits.bed` is in BED format, which contains the peak
summits locations for every peaks. The 5th column in this file is
......@@ -474,7 +478,8 @@ Description: # Recent Changes for MACS (2.1.3.3)
## Tips of fine-tuning peak calling
There are several subcommands within MACSv2 package to fine-tune or customize your analysis:
There are several subcommands within MACSv2 package to fine-tune or
customize your analysis:
1. `bdgcmp` can be used on `*_treat_pileup.bdg` and
`*_control_lambda.bdg` or bedGraph files from other resources
......@@ -494,8 +499,9 @@ Description: # Recent Changes for MACS (2.1.3.3)
according to parameter settings for minimum length, maximum gap
and cutoff.
4. You can combine subcommands to do a step-by-step peak
calling. Read detail at [MACS2 wikipage](https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands)
4. You can combine subcommands to do a step-by-step peak calling. Read
detail at [MACS2
wikipage](https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands)
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
......
# Recent Changes for MACS (2.1.3.3)
# Recent Changes for MACS (2.1.4)
### 2.1.3.3
### 2.1.4
* Features added
Github Actions is used together with Travis CI for testing and
deployment.
* Bugs fixed (PR #322)
1) #318 Random score in `bdgdiff` output. It turns out the sum_v
is not initialized as 0 before adding. Potential bugs are fixed in
other functions in ScoreTrack and CallPeakUnit codes.
2) #321 Cython dependency in `setup.py` script is removed. And
place 'cythonzie' call to the correct position.
3) A typo is fixed in Github Actions script.
### 2.1.3
* Features added
1) Support Docker auto-deploy. PR #309
......@@ -35,44 +53,7 @@
3) Rename COPYING to LICENSE.
### 2.1.2
* New features
1) Added missing BEDPE support. And enable the support for BAMPE
and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
subcommands. When format is BAMPE or BEDPE, The 'pileup' command
will pile up the whole fragment defined by mapping locations of
the left end and right end of each read pair. Thank @purcaro
2) Added options to callpeak command for tweaking max-gap and
min-len during peak calling. Thank @jsh58!
3) The callpeak option "--to-large" option is replaced with
"--scale-to large".
4) The randsample option "-t" has been replaced with "-i".
* Bug fixes
1) Fixed memory issue related to #122 and #146
2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
3) Fixed a bug while setting commandline qvalue cutoff.
4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
5) Fixed the calculation of average fragment length for paired-end
data. Thank @jsh58
6) Fixed bugs caused by khash while computing p/q-value and log
likelihood ratios. Thank @jsh58
7) More spelling tweaks in source code. Thank @mr-c
# README for MACS2 (2.1.3.3)
# README for MACS2 (2.1.4)
## Introduction
......@@ -260,8 +241,19 @@ of qvalue.
##### `--min-length`, `--max-gap`
These two options can be used to fine-tune the peak calling behavior by specifying the minimum length of a called peak and the maximum allowed gap between two nearby regions to be merged. In another word, a called peak has to be longer than *min-length*, and if the distance between two nearby peaks is smaller than *max-gap* then they will be merged as one. If they are not set, MACS2 will set the DEFAULT value for *min-length* as the predicted fragment size d, and the DEFAULT value for *max-gap* as the detected read length. Note, if you set a *min-length* value smaller than the fragment size, it may have NO effect on the result. For BROAD peak calling, try to set a
large value such as 500bps. You can also use '--cutoff-analysis' option with default setting, and check the column 'avelpeak' under different cutoff values to decide a reasonable *min-length* value.
These two options can be used to fine-tune the peak calling behavior
by specifying the minimum length of a called peak and the maximum
allowed gap between two nearby regions to be merged. In another word,
a called peak has to be longer than *min-length*, and if the distance
between two nearby peaks is smaller than *max-gap* then they will be
merged as one. If they are not set, MACS2 will set the DEFAULT value
for *min-length* as the predicted fragment size d, and the DEFAULT
value for *max-gap* as the detected read length. Note, if you set a
*min-length* value smaller than the fragment size, it may have NO
effect on the result. For BROAD peak calling, try to set a large value
such as 500bps. You can also use '--cutoff-analysis' option with
default setting, and check the column 'avelpeak' under different
cutoff values to decide a reasonable *min-length* value.
##### `--nolambda`
......@@ -397,21 +389,33 @@ for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE *
- fold enrichment for this peak summit against random Poisson distribution with local lambda,
- -log10(qvalue) at peak summit
Coordinates in XLS is 1-based which is different with BED format. When `--broad` is enabled for broad peak calling, the pileup, pvalue, qvalue, and fold change in the XLS file will be the mean value across the entire peak region, since peak summit won't be called in broad peak calling mode.
Coordinates in XLS is 1-based which is different with BED
format. When `--broad` is enabled for broad peak calling, the
pileup, pvalue, qvalue, and fold change in the XLS file will be the
mean value across the entire peak region, since peak summit won't
be called in broad peak calling mode.
2. `NAME_peaks.narrowPeak` is BED6+4 format file which contains the
peak locations together with peak summit, pvalue and qvalue. You
can load it to UCSC genome browser. Definition of some specific
columns are:
- 5th: integer score for display. It's calculated as `int(-10*log10pvalue)` or `int(-10*log10qvalue)` depending on whether `-p` (pvalue) or `-q` (qvalue) is used as score cutoff. Please note that currently this value might be out of the [0-1000] range defined in [UCSC Encode narrowPeak format](https://genome.ucsc.edu/FAQ/FAQformat.html#format12). You can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by using the following 1-liner awk: `awk -v OFS="\t" '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak`
- 5th: integer score for display. It's calculated as
`int(-10*log10pvalue)` or `int(-10*log10qvalue)` depending on
whether `-p` (pvalue) or `-q` (qvalue) is used as score
cutoff. Please note that currently this value might be out of the
[0-1000] range defined in [UCSC Encode narrowPeak
format](https://genome.ucsc.edu/FAQ/FAQformat.html#format12). You
can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by
using the following 1-liner awk: `awk -v OFS="\t"
'{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak`
- 7th: fold-change at peak summit
- 8th: -log10pvalue at peak summit
- 9th: -log10qvalue at peak summit
- 10th: relative summit position to peak start
The file can be loaded directly to UCSC genome browser. Remove the beginning track line if you want to
analyze it by other tools.
The file can be loaded directly to UCSC genome browser. Remove the
beginning track line if you want to analyze it by other tools.
3. `NAME_summits.bed` is in BED format, which contains the peak
summits locations for every peaks. The 5th column in this file is
......@@ -466,7 +470,8 @@ for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE *
## Tips of fine-tuning peak calling
There are several subcommands within MACSv2 package to fine-tune or customize your analysis:
There are several subcommands within MACSv2 package to fine-tune or
customize your analysis:
1. `bdgcmp` can be used on `*_treat_pileup.bdg` and
`*_control_lambda.bdg` or bedGraph files from other resources
......@@ -486,5 +491,6 @@ There are several subcommands within MACSv2 package to fine-tune or customize yo
according to parameter settings for minimum length, maximum gap
and cutoff.
4. You can combine subcommands to do a step-by-step peak
calling. Read detail at [MACS2 wikipage](https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands)
4. You can combine subcommands to do a step-by-step peak calling. Read
detail at [MACS2
wikipage](https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands)
macs (2.1.4-1) unstable; urgency=medium
* New upstream version
* Add patch to fix spelling typo
-- Michael R. Crusoe <michael.crusoe@gmail.com> Tue, 01 Oct 2019 12:35:55 +0200
macs (2.1.3.3-1) unstable; urgency=medium
[ Jelmer Vernooij ]
......
From: Michael R. Crusoe <michael.crusoe@gmail.com>
Subject: typo fix
--- macs.orig/MACS2/IO/CallPeakUnit.pyx
+++ macs/MACS2/IO/CallPeakUnit.pyx
@@ -419,7 +419,7 @@
self.cutoff_analysis_filename = cutoff_analysis_filename
cpdef destroy ( self ):
- """Remove temparary files for pileup values of each chromosome.
+ """Remove temporary files for pileup values of each chromosome.
Note: This function MUST be called if the class object won't
be used anymore.
#!/usr/bin/env python
# Time-stamp: <2019-09-20 15:08:11 taoliu>
# Time-stamp: <2019-09-30 17:27:35 taoliu>
"""Description:
......@@ -10,7 +10,6 @@ under the terms of the BSD License (see the file LICENSE included with
the distribution).
"""
import os
import sys
from setuptools import setup, Extension
......@@ -22,13 +21,13 @@ try:
from Cython.Build import cythonize
has_cython = True
# Note: if this script is run under pip, cython should be installed already due to requirements.txt setting.
except:
except ImportError:
has_cython = False
try:
from numpy import get_include as numpy_get_include
numpy_include_dir = [numpy_get_include()]
except:
except ImportError:
numpy_include_dir = []
......@@ -57,6 +56,7 @@ def main():
Extension("MACS2.hashtable", ["MACS2/hashtable.pyx"], include_dirs=["MACS2/",numpy_get_include()], extra_compile_args=extra_c_args),
Extension("MACS2.Statistics", ["MACS2/Statistics.pyx"], libraries=["m"], include_dirs=["MACS2/",numpy_get_include()], extra_compile_args=extra_c_args),
]
ext_modules = cythonize(ext_modules, language_level=2)
else:
ext_modules = [Extension("MACS2.Prob", ["MACS2/Prob.c"], libraries=["m"], include_dirs=numpy_include_dir, extra_compile_args=extra_c_args ),
Extension("MACS2.IO.Parser",["MACS2/IO/Parser.c"], include_dirs=numpy_include_dir, extra_compile_args=extra_c_args),
......@@ -79,7 +79,7 @@ def main():
long_description = fh.read()
setup(name="MACS2",
version="2.1.3.3",
version="2.1.4",
description="Model Based Analysis for ChIP-Seq data",
long_description = long_description,
long_description_content_type="text/markdown",
......@@ -88,9 +88,7 @@ def main():
url='http://github.com/taoliu/MACS/',
package_dir={'MACS2' : 'MACS2'},
packages=['MACS2', 'MACS2.IO'],
#package_data={'MACS2': ['data/*.dat']},
scripts=['bin/macs2',
],
scripts=['bin/macs2', ],
classifiers=[
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
......@@ -104,11 +102,10 @@ def main():
'Programming Language :: Cython',
],
install_requires=[
'numpy>=1.15',
'cython>=0.25',
'numpy>=1.16',
],
cmdclass = command_classes,
ext_modules = cythonize(ext_modules, language_level="2")
ext_modules = ext_modules
)
if __name__ == '__main__':
......