Skip to content
Commits on Source (4)
2018-10-17 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.2
* New features
1) Added missing BEDPE support. And enable the support for BAMPE
and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
subcommands. When format is BAMPE or BEDPE, The 'pileup' command
will pile up the whole fragment defined by mapping locations of
the left end and right end of each read pair. Thank @purcaro
2) Added options to callpeak command for tweaking max-gap and
min-len during peak calling. Thank @jsh58!
3) The callpeak option "--to-large" option is replaced with
"--scale-to large".
4) The randsample option "-t" has been replaced with "-i".
* Bug fixes
1) Fixed memory issue related to #122 and #146
2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
3) Fixed a bug while setting commandline qvalue cutoff.
4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
5) Fixed the calculation of average fragment length for paired-end
data. Thank @jsh58
6) Fixed bugs caused by khash while computing p/q-value and log
likelihood ratios. Thank @jsh58
7) More spelling tweaks in source code. Thank @mr-c
2016-03-09 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.1 20160309
......
======================
INSTALL Guide For MACS
======================
Time-stamp: <2014-06-17 15:27:24 Tao Liu>
# INSTALL Guide For MACS
Time-stamp: <2018-10-17 16:18:48 Tao Liu>
Please check the following instructions to complete your installation.
Prerequisites
=============
## Prerequisites
Python version must be equal to *2.7* to run MACS. I recommend
using the version *2.7.2*.
using the version *2.7.9*.
Numpy_ (>=1.6) are required to run MACS v2.
[Numpy](http://www.scipy.org/Download) (>=1.6) are required to run MACS v2.
GCC is required to compile ``.c`` codes in MACS v2 package, and python
GCC is required to compile `.c` codes in MACS v2 package, and python
header files are needed. If you are using Mac OSX, I recommend you
install Xcode; if you are using Linux, you need to make sure
``python-dev`` is installed.
`python-dev` is installed.
Cython_ (>=0.18) is required *only if* you want to regenerate ``.c``
files from ``.pyx`` files using ``setup_w_cython.py`` script.
[Cython](http://cython.org/) (>=0.18) is required *only if* you want to regenerate `.c`
files from `.pyx` files using `setup_w_cython.py` script.
.. _Numpy: http://www.scipy.org/Download
.. _Cython: http://cython.org/
Easy installation through PyPI
==============================
## Easy installation through PyPI
The easiest way to install MACS2 is through PyPI system. Get pip_ if
it's not available in your system. *Note* if you have already
installed numpy and scipy system-wide, you can use ```virtualenv
--system-site-packages``` to let your virtual Python environment have
installed numpy and scipy system-wide, you can use `virtualenv
--system-site-packages` to let your virtual Python environment have
access to system-wide numpy and scipy libraries so that you don't need
to install them again.
Then under command line, type ```pip install MACS2```. PyPI will
Then under command line, type `pip install MACS2`. PyPI will
install Numpy and Scipy automatically if they are absent.
To upgrade MACS2, type ```pip install -U MACS2```. It will check
To upgrade MACS2, type `pip install -U MACS2`. It will check
currently installed MACS2, compare the version with the one on PyPI
repository, download and install newer version while necessary.
......@@ -46,19 +39,16 @@ already have a workable Scipy and Numpy, and when 'pip install -U
MACS2', pip downloads newest Scipy and Numpy but unable to compile and
install them. This will fail the whole installation. You can pass
'--no-deps' option to pip and let it skip all dependencies. Type
```pip install -U --no-deps MACS2```.
.. _pip: http://www.pip-installer.org/en/latest/installing.html
`pip install -U --no-deps MACS2`.
Install from source
===================
## Install from source
MACS uses Python's distutils tools for source installations. To
install a source distribution of MACS, unpack the distribution tarball
and open up a command terminal. Go to the directory where you unpacked
MACS, and simply run the install script::
MACS, and simply run the install script:
$ python setup.py install
`$ python setup.py install`
By default, the script will install python library and executable
codes globally, which means you need to be root or administrator of
......@@ -66,66 +56,63 @@ the machine so as to complete the installation. Please contact the
administrator of that machine if you want their help. If you need to
provide a nonstandard install prefix, or any other nonstandard
options, you can provide many command line options to the install
script. Use the –help option to see a brief list of available options::
script. Use the –help option to see a brief list of available options:
$ python setup.py --help
`$ python setup.py --help`
For example, if I want to install everything under my own HOME
directory, use this command::
directory, use this command:
$ python setup.py install --prefix /home/taoliu/
`$ python setup.py install --prefix /home/taoliu/`
If you want to re-generate ``.c`` files from ``.pyx`` files, you need
to install Cython first, then use ``setup_w_cython.py`` script to
replace ``setup.py`` script in the previous commands, such as::
If you want to re-generate `.c` files from `.pyx` files, you need
to install Cython first, then use `setup_w_cython.py` script to
replace `setup.py` script in the previous commands, such as::
$ python setup_w_cython.py install
`$ python setup_w_cython.py install`
or::
or:
$ python setup_w_cython.py install --prefix /home/taoliu/
`$ python setup_w_cython.py install --prefix /home/taoliu/`
Configure enviroment variables
==============================
## Configure enviroment variables
After running the setup script, you might need to add the install
location to your ``PYTHONPATH`` and ``PATH`` environment variables. The
location to your `PYTHONPATH` and `PATH` environment variables. The
process for doing this varies on each platform, but the general
concept is the same across platforms.
PYTHONPATH
~~~~~~~~~~
### PYTHONPATH
To set up your ``PYTHONPATH`` environment variable, you'll need to add the
value ``PREFIX/lib/pythonX.Y/site-packages`` to your existing
``PYTHONPATH``. In this value, X.Y stands for the major–minor version of
To set up your `PYTHONPATH` environment variable, you'll need to add the
value `PREFIX/lib/pythonX.Y/site-packages` to your existing
`PYTHONPATH`. In this value, X.Y stands for the major–minor version of
Python you are using (such as 2.7 ; you can find this with
``sys.version[:3]`` from a Python command line). ``PREFIX`` is the install
`sys.version[:3]` from a Python command line). `PREFIX` is the install
prefix where you installed MACS. If you did not specify a prefix on
the command line, MACS will be installed using Python's sys.prefix
value.
On Linux, using bash, I include the new value in my ``PYTHONPATH`` by
adding this line to my ``~/.bashrc``::
On Linux, using bash, I include the new value in my `PYTHONPATH` by
adding this line to my `~/.bashrc`::
$ export PYTHONPATH=/home/taoliu/lib/python2.7/site-packages:$PYTHONPATH
`$ export PYTHONPATH=/home/taoliu/lib/python2.7/site-packages:$PYTHONPATH`
Using Windows, you need to open up the system properties dialog, and
locate the tab labeled Environment. Add your value to the ``PYTHONPATH``
variable, or create a new ``PYTHONPATH`` variable if there isn't one
locate the tab labeled Environment. Add your value to the `PYTHONPATH`
variable, or create a new `PYTHONPATH` variable if there isn't one
already.
PATH
~~~~
### PATH
Just like your ``PYTHONPATH``, you'll also need to add a new value to your
Just like your `PYTHONPATH`, you'll also need to add a new value to your
PATH environment variable so that you can use the MACS command line
directly. Unlike the ``PYTHONPATH`` value, however, this time you'll need
to add ``PREFIX/bin`` to your PATH environment variable. The process for
updating this is the same as described above for the ``PYTHONPATH``
directly. Unlike the `PYTHONPATH` value, however, this time you'll need
to add `PREFIX/bin` to your PATH environment variable. The process for
updating this is the same as described above for the `PYTHONPATH`
variable::
$ export PATH=/home/taoliu/bin:$PATH
`$ export PATH=/home/taoliu/bin:$PATH`
--
Tao Liu <vladimir.liu@gmail.com>
......
This diff is collapsed.
COPYING
ChangeLog
INSTALL.rst
INSTALL.md
MANIFEST.in
README.rst
README.md
setup.py
setup_w_cython.py
MACS2/Constants.py
......@@ -14,8 +14,6 @@ MACS2/PeakModel.c
MACS2/PeakModel.pyx
MACS2/Pileup.c
MACS2/Pileup.pyx
MACS2/Poisson.c
MACS2/PosVal.pyx
MACS2/Prob.c
MACS2/Prob.pyx
MACS2/Signal.c
......@@ -62,16 +60,11 @@ MACS2/IO/FixWidthTrack.pyx
MACS2/IO/PairedEndTrack.c
MACS2/IO/PairedEndTrack.pyx
MACS2/IO/Parser.c
MACS2/IO/Parser.h
MACS2/IO/Parser.pyx
MACS2/IO/PeakIO.c
MACS2/IO/PeakIO.pyx
MACS2/IO/ScoreTrack.c
MACS2/IO/ScoreTrack.pyx
MACS2/IO/__init__.py
MACS2/IO/test_processing.py
MACS2/IO/test_threading.py
MACS2/data/__init__.py
MACS2/data/g0.01.dat
MACS2/data/g0.05.dat
bin/macs2
\ No newline at end of file
MACS_VERSION = "2.1.1.20160309"
MACS_VERSION = "2.1.2"
#MACSDIFF_VERSION = "1.0.4 20110212 (tag:alpha)"
FILTERDUP_VERSION = "1.0.0 20140616"
RANDSAMPLE_VERSION = "1.0.0 20120703"
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# Time-stamp: <2016-02-15 15:23:38 Tao Liu>
# Time-stamp: <2016-05-19 10:22:22 Tao Liu>
"""Module for Calculate Scores.
......@@ -85,6 +85,15 @@ def do_nothing(*args, **kwargs):
LOG10_E = 0.43429448190325176
cdef void clean_up_ndarray ( np.ndarray x ):
# clean numpy ndarray in two steps
cdef:
long i
i = x.shape[0] / 2
x.resize( 100000 if i > 100000 else i, refcheck=False)
x.resize( 0, refcheck=False)
return
cdef inline float chi2_k1_cdf ( float x ):
return erf( sqrt(x/2) )
......@@ -472,12 +481,15 @@ cdef class CallerFromAlignments:
# reset or clean existing self.chr_pos_treat_ctrl
if self.chr_pos_treat_ctrl: # not a beautiful way to clean
self.chr_pos_treat_ctrl[0].resize(10000,refcheck=False)
self.chr_pos_treat_ctrl[1].resize(10000,refcheck=False)
self.chr_pos_treat_ctrl[2].resize(10000,refcheck=False)
self.chr_pos_treat_ctrl[0].resize(0,refcheck=False)
self.chr_pos_treat_ctrl[1].resize(0,refcheck=False)
self.chr_pos_treat_ctrl[2].resize(0,refcheck=False)
clean_up_ndarray( self.chr_pos_treat_ctrl[0] )
clean_up_ndarray( self.chr_pos_treat_ctrl[1] )
clean_up_ndarray( self.chr_pos_treat_ctrl[2] )
#self.chr_pos_treat_ctrl[0].resize(10000,refcheck=False)
#self.chr_pos_treat_ctrl[1].resize(10000,refcheck=False)
#self.chr_pos_treat_ctrl[2].resize(10000,refcheck=False)
#self.chr_pos_treat_ctrl[0].resize(0,refcheck=False)
#self.chr_pos_treat_ctrl[1].resize(0,refcheck=False)
#self.chr_pos_treat_ctrl[2].resize(0,refcheck=False)
if self.PE_mode:
treat_pv = self.treat.pileup_a_chromosome ( chrom, [self.treat_scaling_factor,], baseline_value = 0.0 )
......
This diff is collapsed.
This diff is collapsed.
# Time-stamp: <2016-02-15 16:12:19 Tao Liu>
# Time-stamp: <2018-10-16 12:15:12 Tao Liu>
"""Module for filter duplicate tags from paired-end data
......@@ -351,6 +351,7 @@ cdef class PETrackI:
# hope there would be no mem leak...
self.__locations[k] = new_locs
if size > 1:
self.__dup_locations[k] = dup_locs
self.average_template_length = float( self.length ) / self.total
return
......@@ -479,7 +480,7 @@ cdef class PETrackI:
return
def print_to_bed (self, fhd=None):
"""Output to BED format files. If fhd is given, write to a
"""Output to BEDPE format files. If fhd is given, write to a
file, otherwise, output to standard output.
"""
......@@ -489,7 +490,6 @@ cdef class PETrackI:
if not fhd:
fhd = sys.stdout
assert isinstance(fhd, file)
assert self.fw > 0, "width should be set larger than 0!"
chrnames = self.get_chr_names()
......@@ -503,7 +503,7 @@ cdef class PETrackI:
for i in range(locs.shape[0]):
s, e = locs[ i ]
fhd.write("%s\t%d\t%d\t.\t.\t.\n" % (k, s, e))
fhd.write("%s\t%d\t%d\n" % (k, s, e))
return
......
This diff is collapsed.
#ifndef __PYX_HAVE__MACS2__IO__Parser
#define __PYX_HAVE__MACS2__IO__Parser
#ifndef __PYX_HAVE_API__MACS2__IO__Parser
#ifndef __PYX_EXTERN_C
#ifdef __cplusplus
#define __PYX_EXTERN_C extern "C"
#else
#define __PYX_EXTERN_C extern
#endif
#endif
#ifndef DL_IMPORT
#define DL_IMPORT(_T) _T
#endif
__PYX_EXTERN_C DL_IMPORT(int) HAS_PYSAM;
#endif /* !__PYX_HAVE_API__MACS2__IO__Parser */
#if PY_MAJOR_VERSION < 3
PyMODINIT_FUNC initParser(void);
#else
PyMODINIT_FUNC PyInit_Parser(void);
#endif
#endif /* !__PYX_HAVE__MACS2__IO__Parser */
......@@ -356,7 +356,7 @@ cdef class BEDPEParser(GenericParser):
"""
cdef public int n
cdef public int d
cdef public float d
cdef __pe_parse_line ( self, str thisline ):
""" Parse each line, and return chromosome, left and right positions
......@@ -387,9 +387,8 @@ cdef class BEDPEParser(GenericParser):
str chromname
int left_pos
int right_pos
long i = 0
long m = 0
float d = 0 # the average fragment size
long i = 0 # number of fragments
long m = 0 # sum of fragment lengths
petrack = PETrackI( buffer_size = self.buffer_size )
add_loc = petrack.add_loc
......@@ -400,19 +399,18 @@ cdef class BEDPEParser(GenericParser):
continue
assert right_pos > left_pos, "Right position must be larger than left position, check your BED file at line: %s" % thisline
d = ( d * i + right_pos-left_pos ) / ( i + 1 ) # keep track of avg fragment size
m += right_pos - left_pos
i += 1
if i % 1000000 == 0:
m += 1
logging.info( " %d" % ( m*1000000 ) )
logging.info( " %d" % i )
add_loc( chromosome, left_pos, right_pos )
self.d = float( m ) / i
self.n = i
self.d = int( d )
assert d >= 0, "Something went wrong (mean fragment size was negative)"
assert self.d >= 0, "Something went wrong (mean fragment size was negative)"
self.close()
petrack.set_rlengths( {"DUMMYCHROM":0} )
......@@ -425,9 +423,8 @@ cdef class BEDPEParser(GenericParser):
str chromname
int left_pos
int right_pos
long i = 0
long m = 0
float d = 0 # the average fragment size
long i = 0 # number of fragments
long m = 0 # sum of fragment lengths
add_loc = petrack.add_loc
......@@ -438,19 +435,18 @@ cdef class BEDPEParser(GenericParser):
continue
assert right_pos > left_pos, "Right position must be larger than left position, check your BED file at line: %s" % thisline
d = ( d * i + right_pos-left_pos ) / ( i + 1 ) # keep track of avg fragment size
m += right_pos - left_pos
i += 1
if i % 1000000 == 0:
m += 1
logging.info( " %d" % ( m*1000000 ) )
logging.info( " %d" % i )
add_loc( chromosome, left_pos, righ_pos )
add_loc( chromosome, left_pos, right_pos )
self.d = int( self.d * self.n + d * i )/( self.n + i )
self.d = ( self.d * self.n + m ) / ( self.n + i )
self.n += i
assert d >= 0, "Something went wrong (mean fragment size was negative)"
assert self.d >= 0, "Something went wrong (mean fragment size was negative)"
self.close()
petrack.set_rlengths( {"DUMMYCHROM":0} )
......@@ -1019,19 +1015,18 @@ cdef class BAMPEParser(BAMParser):
2048 supplementary alignment
"""
cdef public int n # total number of fragments
cdef public int d # the average length of fragments in integar
cdef public float d # the average length of fragments
cpdef build_petrack ( self ):
"""Build PETrackI from all lines, return a FWTrack object.
"""
cdef:
long i = 0
int m = 0
long i = 0 # number of fragments
long m = 0 # sum of fragment lengths
int entrylength, fpos, chrid, tlen
int *asint
list references
dict rlengths
float d = 0.0
str rawread
str rawentrylength
_BAMPEParsed read
......@@ -1055,16 +1050,16 @@ cdef class BAMPEParser(BAMParser):
read = self.__pe_binary_parse(rawread)
fseek(entrylength - 32, 1)
if read.ref == -1: continue
d = (d * i + abs(read.tlen)) / (i + 1) # keep track of avg fragment size
tlen = abs(read.tlen)
m += tlen
i += 1
if i % 1000000 == 0:
m += 1
info(" %d" % (m*1000000))
add_loc(references[read.ref], read.start, read.start + read.tlen)
info(" %d" % i)
add_loc(references[read.ref], read.start, read.start + tlen)
self.d = float( m ) / i
self.n = i
self.d = int(d)
assert d >= 0, "Something went wrong (mean fragment size was negative)"
assert self.d >= 0, "Something went wrong (mean fragment size was negative)"
self.fhd.close()
petrack.set_rlengths( rlengths )
return petrack
......@@ -1073,13 +1068,12 @@ cdef class BAMPEParser(BAMParser):
"""Build PETrackI from all lines, return a PETrackI object.
"""
cdef:
long i = 0
int m = 0
long i = 0 # number of fragments
long m = 0 # sum of fragment lengths
int entrylength, fpos, chrid, tlen
int *asint
list references
dict rlengths
float d = 0.0
str rawread
str rawentrylength
_BAMPEParsed read
......@@ -1101,16 +1095,16 @@ cdef class BAMPEParser(BAMParser):
read = self.__pe_binary_parse(rawread)
fseek(entrylength - 32, 1)
if read.ref == -1: continue
d = (d * i + abs(read.tlen)) / (i + 1) # keep track of avg fragment size
tlen = abs(read.tlen)
m += tlen
i += 1
if i == 1000000:
m += 1
info(" %d" % (m*1000000))
i=0
add_loc(references[read.ref], read.start, read.start + read.tlen)
self.d = int( self.d * self.n + d * i )/( self.n + i )
info(" %d" % i)
add_loc(references[read.ref], read.start, read.start + tlen)
self.d = ( self.d * self.n + m ) / ( self.n + i )
self.n += i
assert d >= 0, "Something went wrong (mean fragment size was negative)"
assert self.d >= 0, "Something went wrong (mean fragment size was negative)"
self.fhd.close()
# this is the problematic part. If fwtrack is finalized, then it's impossible to increase the length of it in a step of buffer_size for multiple input files.
# petrack.finalize()
......
This diff is collapsed.
......@@ -548,7 +548,7 @@ cdef class PeakIO:
def overlap_with_other_peaks (self, peaks2, double cover=0):
"""Peaks2 is a PeakIO object or dictionary with can be
initialzed as a PeakIO. check __init__ for PeakIO for detail.
initialized as a PeakIO. check __init__ for PeakIO for detail.
return how many peaks are intersected by peaks2 by percentage
coverage on peaks2(if 50%, cover = 0.5).
......
This diff is collapsed.
# Time-stamp: <2016-02-12 00:12:45 Tao Liu>
# Time-stamp: <2018-10-02 15:12:17 Tao Liu>
"""Module for Feature IO classes.
......@@ -24,7 +24,7 @@ from copy import copy
from cpython cimport bool
#from scipy.stats import chi2 # for
#from scipy.stats import chi2
from MACS2.Signal import maxima, enforce_valleys, enforce_peakyness
......@@ -37,8 +37,6 @@ from MACS2.Constants import BYTE4, FBYTE4, array
from MACS2.Prob import poisson_cdf
from MACS2.IO.PeakIO import PeakIO, BroadPeakIO, parse_peakname
from MACS2.hashtable import Int64HashTable, Float64HashTable
import logging
# ------------------------------------
......@@ -56,7 +54,7 @@ cdef inline int int_min(int a, int b): return a if a <= b else b
LOG10_E = 0.43429448190325176
pscore_khashtable = Int64HashTable()
pscore_dict = dict()
cdef inline double get_pscore ( int observed, double expectation ):
"""Get p-value score from Poisson test. First check existing
......@@ -66,18 +64,15 @@ cdef inline double get_pscore ( int observed, double expectation ):
"""
cdef:
double score
long key_value
#key_value = ( observed, expectation )
key_value = hash( (observed, expectation ) )
try:
return pscore_khashtable.get_item(key_value)
return pscore_dict[(observed, expectation)]
except KeyError:
score = -1*poisson_cdf(observed,expectation,False,True)
pscore_khashtable.set_item(key_value, score)
pscore_dict[(observed, expectation)] = score
return score
asym_logLR_khashtable = Int64HashTable()
asym_logLR_dict = dict()
cdef inline double logLR_asym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
......@@ -88,22 +83,20 @@ cdef inline double logLR_asym ( double x, double y ):
"""
cdef:
double s
long key_value
key_value = hash( (x, y ) )
try:
return asym_logLR_khashtable.get_item( key_value )
except KeyError:
if asym_logLR_dict.has_key( ( x, y ) ):
return asym_logLR_dict[ ( x, y ) ]
else:
if x > y:
s = (x*(log(x)-log(y))+y-x)*LOG10_E
elif x < y:
s = (x*(-log(x)+log(y))-y+x)*LOG10_E
else:
s = 0
asym_logLR_khashtable.set_item(key_value, s)
asym_logLR_dict[ ( x, y ) ] = s
return s
sym_logLR_khashtable = Int64HashTable()
sym_logLR_dict = dict()
cdef inline double logLR_sym ( double x, double y ):
"""Calculate log10 Likelihood between H1 ( enriched ) and H0 (
......@@ -114,22 +107,19 @@ cdef inline double logLR_sym ( double x, double y ):
"""
cdef:
double s
long key_value
key_value = hash( (x, y ) )
try:
return sym_logLR_khashtable.get_item( key_value )
except KeyError:
if sym_logLR_dict.has_key( ( x, y ) ):
return sym_logLR_dict[ ( x, y ) ]
else:
if x > y:
s = (x*(log(x)-log(y))+y-x)*LOG10_E
elif y > x:
s = (y*(log(x)-log(y))+y-x)*LOG10_E
else:
s = 0
sym_logLR_khashtable.set_item(key_value, s)
sym_logLR_dict[ ( x, y ) ] = s
return s
cdef inline double get_logFE ( float x, float y ):
""" return 100* log10 fold enrichment with +1 pseudocount.
"""
......@@ -761,17 +751,18 @@ cdef class scoreTrackII:
# convert pvalue2qvalue to a simple dict based on khash
# khash has big advantage while checking keys for millions of times.
s_p2q = Float64HashTable()
for k in pqtable.keys():
s_p2q.set_item(k,pqtable[k])
#s_p2q = Float64HashTable()
#for k in pqtable.keys():
# s_p2q.set_item(k,pqtable[k])
g = s_p2q.get_item
#g = s_p2q.get_item
for chrom in self.data.keys():
v = self.data[chrom][3]
l = self.datalength[chrom]
for i in range(l):
v[ i ] = g( v[ i ])
v[ i ] = pqtable[ v[ i ] ]
#v [ i ] = g( v[ i ])
self.scoring_method = 'q'
return
......