Skip to content
Commits on Source (7)
2019-10-01 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.2.4 (Python3)
* Features added
1) First Python3 version MACS2 released.
2) Version number 2.2.X will be used for MACS2 in Python3, in
parallel to 2.1.X.
3) More comprehensive test.sh script to check the consistency of
results from Python2 version and Python3 version.
4) Simplify setup.py script since the newest version transparently
supports cython. And when cython is not installed by the user,
setup.py can still compile using only C codes.
5) Fix Signal.pyx to use np.array instead of np.mat.
2019-09-30 Tao Liu <vladimir.liu@gmail.com>
MACS version 2.1.4
......
# INSTALL Guide For MACS
Time-stamp: <2019-09-30 14:47:45 taoliu>
Time-stamp: <2019-10-03 11:56:03 taoliu>
Please check the following instructions to complete your installation.
## Prerequisites
Python version must be equal to *2.7* to run MACS.
MACS v2.2.x requires Python3. We have tested MACS in Python3.5, 3.6
and 3.7. MACS runs slower under Python3.5, so Python3.6 or Python3.7
is recommended.
[Numpy](http://www.scipy.org/Download) (>=1.16) are required to run MACS v2.
MACS also requires [Numpy](http://www.scipy.org/Download) (>=1.17).
GCC is required to compile `.c` codes in MACS v2 package, and python
header files are needed. If you are using Mac OSX, I recommend you
install Xcode; if you are using Linux, you need to make sure
`python-dev` is installed.
`python-dev` package is installed -- the actual package name depends
on the Linux OS distribution, you are using.
[Cython](http://cython.org/) (>=0.29) is required to generate `.c`
files from `.pyx` files using `setup.py` script.
[Cython](http://cython.org/) is **NOT** required although most of MACS
codes are written in Cython. But if you plan to generate `.c` files
from `.pyx` by yourself, you can install Cython (>=0.29), then use
`setup.py` script.
## Easy installation through PyPI
## Prepare a virtual Python environment
The easiest way to install MACS2 is through PyPI system. Get pip_ if
it's not available in your system. *Note* if you have already
installed numpy and scipy system-wide, you can use `virtualenv
--system-site-packages` to let your virtual Python environment have
access to system-wide numpy and scipy libraries so that you don't need
to install them again.
I strongly recommend installing your MACS program in a virtual
environment, so that you have full control of your installation and
won't mess up with your system libraries. To learn about virtual
environment, read [this
article](https://docs.python.org/3/library/venv.html). A simple way to
create a virtual environment of Python3 is
Then under command line, type `pip install MACS2`. PyPI will
install Numpy and Cython automatically if they are absent.
`$ python3 -m venv MyPythonEnv/`
To upgrade MACS2, type `pip install --upgrade MACS2`. It will check
Then active it by
`$ source MyPythonEnv/bin/active`
## Install through PyPI
The easiest way to install MACS is through PyPI system. Get `pip` if
it's not available in your system. If you create a virtual environment
as described before, your `pip` command will install everything under
the folder you specified previously through `python3 -m env` command.
Then under the command line, type `pip install macs2`. PyPI will
install Numpy automatically if it is absent.
To upgrade MACS2, type `pip install --upgrade macs2`. It will check
currently installed MACS2, compare the version with the one on PyPI
repository, download and install newer version while necessary.
repository, download and install a newer version while necessary.
## Install through Conda
Please check [MACS2 page on
bioconda](https://anaconda.org/bioconda/macs2) for instructions.
## Install through Debian APT
MACS is included in Debian Stretch/Buster/Unstable blessed by some
kind of Supper Cow Powers.
## Install from source
MACS uses Python's distutils tools for source installations. To
install a source distribution of MACS, unpack the distribution tarball
and open up a command terminal. Go to the directory where you unpacked
MACS, and simply run the install script:
MACS uses Python's [setuptools](https://setuptools.readthedocs.io) for
source code installations. To install a source distribution of MACS,
unpack the distribution tarball, or clone Git repository with `git
clone git@github.com:taoliu/MACS.git`. Go to the directory where you
unpacked MACS, and simply run the install script:
`$ python setup.py install`
By default, the script will install python library and executable
codes globally, which means you need to be root or administrator of
the machine so as to complete the installation. Please contact the
administrator of that machine if you want their help. If you need to
provide a nonstandard install prefix, or any other nonstandard
options, you can provide many command line options to the install
script. Use the –help option to see a brief list of available options:
codes according to the environment. When you run the command under
virtualenv, the script will install to the virtual environment
instead. When you run it without virtual environment, you may need to
be root or administrator of the machine so as to complete the
installation. Please contact the system administrator if you want
their help. If you need to provide a nonstandard install prefix, or
any other nonstandard options, you can provide many command line
options to the install script. Use the `--help` option to see a brief
list of available options:
`$ python setup.py --help`
......@@ -57,8 +89,17 @@ directory, use this command:
`$ python setup.py install --prefix /home/taoliu/`
As mentioned in *Prerequisites*, you don't need to install Cython in
order to install MACS. When Cython is available, this setup script
will regenerate C codes from Pyx codes when necessary. When Cython is
not available, this setup script will just use the C codes included in
the release package (or your Github clone) for installation.
## Configure enviroment variables
## Configure environment variables
*Note*, if you are using a virtual environment, you should skip this
section since all the corresponding environment variables have been
correctly set while you `activate` the environment.
After running the setup script, you might need to add the install
location to your `PYTHONPATH` and `PATH` environment variables. The
......@@ -67,10 +108,10 @@ concept is the same across platforms.
### PYTHONPATH
To set up your `PYTHONPATH` environment variable, you'll need to add the
value `PREFIX/lib/pythonX.Y/site-packages` to your existing
To set up your `PYTHONPATH` environment variable, you'll need to add
the value `PREFIX/lib/pythonX.Y/site-packages` to your existing
`PYTHONPATH`. In this value, X.Y stands for the major–minor version of
Python you are using (such as 2.7 ; you can find this with
Python you are using (such as 3.7; you can find this with
`sys.version[:3]` from a Python command line). `PREFIX` is the install
prefix where you installed MACS. If you did not specify a prefix on
the command line, MACS will be installed using Python's sys.prefix
......@@ -79,24 +120,23 @@ value.
On Linux, using bash, I include the new value in my `PYTHONPATH` by
adding this line to my `~/.bashrc`::
`$ export PYTHONPATH=/home/taoliu/lib/python2.7/site-packages:$PYTHONPATH`
`$ export
PYTHONPATH=/home/taoliu/lib/python3.7/site-packages:$PYTHONPATH`
Using Windows, you need to open up the system properties dialog, and
Using Windows, you need to open up the system properties dialog and
locate the tab labeled Environment. Add your value to the `PYTHONPATH`
variable, or create a new `PYTHONPATH` variable if there isn't one
already.
### PATH
Just like your `PYTHONPATH`, you'll also need to add a new value to your
PATH environment variable so that you can use the MACS command line
directly. Unlike the `PYTHONPATH` value, however, this time you'll need
to add `PREFIX/bin` to your PATH environment variable. The process for
updating this is the same as described above for the `PYTHONPATH`
variable::
Just like your `PYTHONPATH`, you'll also need to add a new value to
your PATH environment variable so that you can use the MACS command
line directly. Unlike the `PYTHONPATH` value, however, this time
you'll need to add `PREFIX/bin` to your PATH environment variable. The
process for updating this is the same as described above for the
`PYTHONPATH` variable::
`$ export PATH=/home/taoliu/bin:$PATH`
--
Tao Liu <vladimir.liu@gmail.com>
-- Tao Liu <vladimir.liu@gmail.com>
This diff is collapsed.
......@@ -53,6 +53,7 @@ MACS2/IO/BedGraphIO.c
MACS2/IO/BedGraphIO.pyx
MACS2/IO/CallPeakUnit.c
MACS2/IO/CallPeakUnit.pyx
MACS2/IO/CallPeakUnitPrecompiled.c
MACS2/IO/FixWidthTrack.c
MACS2/IO/FixWidthTrack.pyx
MACS2/IO/PairedEndTrack.c
......@@ -64,5 +65,4 @@ MACS2/IO/PeakIO.pyx
MACS2/IO/ScoreTrack.c
MACS2/IO/ScoreTrack.pyx
MACS2/IO/__init__.py
MACS2/data/__init__.py
bin/macs2
\ No newline at end of file
MACS_VERSION = "2.1.4"
#MACSDIFF_VERSION = "1.0.4 20110212 (tag:alpha)"
MACS_VERSION = "2.2.4"
FILTERDUP_VERSION = "1.0.0 20140616"
RANDSAMPLE_VERSION = "1.0.0 20120703"
MAX_PAIRNUM = 1000
......
This diff is collapsed.
# Time-stamp: <2019-09-20 11:37:21 taoliu>
# cython: language_level=3
# Time-stamp: <2019-10-02 11:05:35 taoliu>
"""Module for BedGraph data class.
......@@ -91,12 +92,12 @@ cdef class bedGraphTrackI:
self.minvalue = 10000000 # initial minimum value is large since I want safe_add_loc to update it
self.baseline_value = baseline_value
def add_a_chromosome ( self, chrom, d ):
def add_a_chromosome ( self, bytes chrom, d ):
"""Unsafe method. Only to be used by cPileup.pyx.
"""
self.__data[chrom] = d
cpdef add_loc ( self, str chromosome, int startpos, int endpos, float value ):
cpdef add_loc ( self, bytes chromosome, int startpos, int endpos, float value ):
"""Add a chr-start-end-value block into __data dictionary.
Difference between safe_add_loc: no check, but faster. Save
......@@ -112,7 +113,7 @@ cdef class bedGraphTrackI:
if startpos < 0:
startpos = 0
if not self.__data.has_key(chromosome):
if chromosome not in self.__data:
self.__data[chromosome] = [array(BYTE4,[]),array(FBYTE4,[])] # for (endpos,value)
c = self.__data[chromosome]
if startpos:
......@@ -147,16 +148,16 @@ cdef class bedGraphTrackI:
"""
cdef:
set chrs
str chromosome
bytes chromosome
chrs = self.get_chr_names()
for chromosome in chrs:
if self.__data.has_key(chromosome):
if chromosome in self.__data:
self.__data[chromosome] = [None, None]
self.__data.pop(chromosome)
return True
def safe_add_loc ( self, str chromosome, int startpos, int endpos, double value):
def safe_add_loc ( self, bytes chromosome, int startpos, int endpos, double value):
"""Add a chr-start-end-value block into __data dictionary.
"""
......@@ -168,7 +169,7 @@ cdef class bedGraphTrackI:
if startpos < 0:
startpos = 0
if not self.__data.has_key(chromosome):
if chromosome not in self.__data:
self.__data[chromosome] = [array(BYTE4,[]),array(FBYTE4,[])] # for (endpos,value)
c = self.__data[chromosome]
if startpos:
......@@ -209,13 +210,13 @@ cdef class bedGraphTrackI:
if value < self.minvalue:
self.minvalue = value
def get_data_by_chr (self, str chromosome):
def get_data_by_chr (self, bytes chromosome):
"""Return array of counts by chromosome.
The return value is a tuple:
([end pos],[value])
"""
if self.__data.has_key(chromosome):
if chromosome in self.__data:
return self.__data[chromosome]
else:
return None
......@@ -238,7 +239,7 @@ cdef class bedGraphTrackI:
cdef:
int pre, pos, i
double value
str chrom
bytes chrom
if trackline:
trackcontents = (name.replace("\"", "\\\""), description.replace("\"", "\\\""))
......@@ -246,8 +247,8 @@ cdef class bedGraphTrackI:
chrs = self.get_chr_names()
for chrom in chrs:
(p,v) = self.__data[chrom]
pnext = iter(p).next
vnext = iter(v).next
pnext = iter(p).__next__
vnext = iter(v).__next__
pre = 0
for i in range(len(p)):
......@@ -255,7 +256,7 @@ cdef class bedGraphTrackI:
value = vnext()
#if value != self.baseline_value:
# never write baseline_value
fhd.write("%s\t%d\t%d\t%.5f\n" % (chrom,pre,pos,value))
fhd.write("%s\t%d\t%d\t%.5f\n" % (chrom.decode(),pre,pos,value))
pre = pos
def reset_baseline (self, double baseline_value):
......@@ -276,13 +277,13 @@ cdef class bedGraphTrackI:
cdef:
int new_pre_pos, pos, i
double new_pre_value, value
str chrom
bytes chrom
chrs = set(self.__data.keys())
for chrom in chrs:
(p,v) = self.__data[chrom]
pnext = iter(p).next
vnext = iter(v).next
pnext = iter(p).__next__
vnext = iter(v).__next__
# new arrays
new_pos = array(BYTE4,[pnext(),])
......@@ -317,13 +318,13 @@ cdef class bedGraphTrackI:
cdef:
int new_pre_pos, pos, i
double new_pre_value, value
str chrom
bytes chrom
chrs = set(self.__data.keys())
for chrom in chrs:
(p,v) = self.__data[chrom]
pnext = iter(p).next
vnext = iter(v).next
pnext = iter(p).__next__
vnext = iter(v).__next__
# new arrays
new_pos = array(BYTE4,[])
......@@ -412,7 +413,7 @@ cdef class bedGraphTrackI:
cdef:
int peak_length, x, pre_p, p, i, summit, tstart, tend
double v, summit_value, tvalue
str chrom
bytes chrom
#if call_summits: close_peak = self.__close_peak2
#else: close_peak = self.__close_peak
......@@ -423,8 +424,8 @@ cdef class bedGraphTrackI:
peak_content = None
peak_length = 0
(ps,vs) = self.get_data_by_chr(chrom) # arrays for position and values
psn = iter(ps).next # assign the next function to a viable to speed up
vsn = iter(vs).next
psn = iter(ps).__next__ # assign the next function to a viable to speed up
vsn = iter(vs).__next__
x = 0
pre_p = 0 # remember previous position
while True:
......@@ -466,7 +467,7 @@ cdef class bedGraphTrackI:
close_peak(peak_content, peaks, min_length, chrom) #, smoothlen=max_gap / 2 )
return peaks
def __close_peak( self, peak_content, peaks, int min_length, str chrom ):
def __close_peak( self, peak_content, peaks, int min_length, bytes chrom ):
peak_length = peak_content[-1][1]-peak_content[0][0]
if peak_length >= min_length: # if the peak is too small, reject it
......@@ -509,7 +510,7 @@ cdef class bedGraphTrackI:
Return both general PeakIO object for highly enriched regions
and gapped broad regions in BroadPeakIO.
"""
cdef str chrom
cdef bytes chrom
cdef int i, j
#cdef int tmp_n
......@@ -524,7 +525,7 @@ cdef class bedGraphTrackI:
#tmp_n = 0
lvl1peakschrom = lvl1_peaks.get_data_from_chrom(chrom)
lvl2peakschrom = lvl2_peaks.get_data_from_chrom(chrom)
lvl1peakschrom_next = iter(lvl1peakschrom).next
lvl1peakschrom_next = iter(lvl1peakschrom).__next__
tmppeakset = [] # to temporarily store lvl1 region inside a lvl2 region
# our assumption is lvl1 regions should be included in lvl2 regions
try:
......@@ -551,7 +552,7 @@ cdef class bedGraphTrackI:
#print len(lvl1peakschrom), len(lvl2peakschrom), tmp_n
return lvl1_peaks, broadpeaks
return broadpeaks
def __add_broadpeak (self, bpeaks, chrom, lvl2peak, lvl1peakset):
"""Internal function to create broad peak.
......@@ -559,40 +560,40 @@ cdef class bedGraphTrackI:
"""
cdef:
long start, end, blockNum
str blockSizes, blockStarts, thickStart, thickEnd
bytes blockSizes, blockStarts, thickStart, thickEnd
start = lvl2peak["start"]
end = lvl2peak["end"]
# the following code will add those broad/lvl2 peaks with no strong/lvl1 peaks inside
if not lvl1peakset:
# try:
# will complement by adding 1bps start and end to this region
# may change in the future if gappedPeak format was improved.
bpeaks.add(chrom, start, end, score=lvl2peak["score"], thickStart=str(start), thickEnd=str(end),
blockNum = 2, blockSizes = "1,1", blockStarts = "0,"+str(end-start-1), pileup = lvl2peak["pileup"],
bpeaks.add(chrom, start, end, score=lvl2peak["score"], thickStart=(b"%d" % start), thickEnd=(b"%d" % end),
blockNum = 2, blockSizes = b"1,1", blockStarts = (b"0,%d" % (end-start-1)), pileup = lvl2peak["pileup"],
pscore = lvl2peak["pscore"], fold_change = lvl2peak["fc"],
qscore = lvl2peak["qscore"] )
return bpeaks
thickStart = str(lvl1peakset[0]["start"])
thickEnd = str(lvl1peakset[-1]["end"])
thickStart = b"%d" % lvl1peakset[0]["start"]
thickEnd = b"%d" % lvl1peakset[-1]["end"]
blockNum = len(lvl1peakset)
blockSizes = ",".join( map(lambda x:str(x["length"]),lvl1peakset) )
blockStarts = ",".join( map(lambda x:str(x["start"]-start),lvl1peakset) )
blockSizes = b",".join( [b"%d" % x["length"] for x in lvl1peakset] )
blockStarts = b",".join( [b"%d" % (x["start"]-start) for x in lvl1peakset] )
# add 1bp left and/or right block if necessary
if int(thickStart) != start:
# add 1bp left block
thickStart = str(start)
thickStart = b"%d" % start
blockNum += 1
blockSizes = "1,"+blockSizes
blockStarts = "0,"+blockStarts
blockSizes = b"1,"+blockSizes
blockStarts = b"0,"+blockStarts
if int(thickEnd) != end:
# add 1bp right block
thickEnd = str(end)
thickEnd = b"%d" % end
blockNum += 1
blockSizes = blockSizes+",1"
blockStarts = blockStarts+","+str(end-start-1)
blockSizes = blockSizes+b",1"
blockStarts = blockStarts + b"," + (b"%d" % (end-start-1))
bpeaks.add(chrom, start, end, score=lvl2peak["score"], thickStart=thickStart, thickEnd=thickEnd,
blockNum = blockNum, blockSizes = blockSizes, blockStarts = blockStarts)
......@@ -615,7 +616,7 @@ cdef class bedGraphTrackI:
"""
cdef:
str chrom
bytes chrom
int max_p
ret = bedGraphTrackI()
......@@ -663,7 +664,7 @@ cdef class bedGraphTrackI:
cdef:
int pre_p, p1, p2
double v1, v2
str chrom
bytes chrom
nr_tracks = len(bdgTracks) + 1 # +1 for self
assert nr_tracks >= 2, "Specify at least two replicates"
......@@ -698,9 +699,9 @@ cdef class bedGraphTrackI:
ps, vs, pn, vn = [], [], [], []
for data in datas:
ps.append(data[0])
pn.append(iter(ps[-1]).next)
pn.append(iter(ps[-1]).__next__)
vs.append(data[1])
vn.append(iter(vs[-1]).next)
vn.append(iter(vs[-1]).__next__)
pre_p = 0 # remember the previous position in the new bedGraphTrackI object ret
try:
......@@ -748,7 +749,7 @@ cdef class bedGraphTrackI:
for other type of scores.
"""
cdef:
str chrom
bytes chrom
object pos_array, pscore_array
dict pvalue_stat = {}
dict pqtable = {}
......@@ -767,14 +768,14 @@ cdef class bedGraphTrackI:
[pos_array, pscore_array] = self.__data[ chrom ]
pn = iter(pos_array).next
vn = iter(pscore_array).next
pn = iter(pos_array).__next__
vn = iter(pscore_array).__next__
for i in range( len( pos_array ) ):
this_p = pn()
this_v = vn()
this_l = this_p - pre_p
if pvalue_stat.has_key( this_v ):
if this_v in pvalue_stat:
pvalue_stat[ this_v ] += this_l
else:
pvalue_stat[ this_v ] = this_l
......@@ -826,7 +827,7 @@ cdef class bedGraphTrackI:
cdef:
int pre_p, p1, p2, i
double v1, v2
str chrom
bytes chrom
assert isinstance(bdgTrack2,bedGraphTrackI), "bdgTrack2 is not a bedGraphTrackI object"
......@@ -844,12 +845,12 @@ cdef class bedGraphTrackI:
for i in range( len( common_chr ) ):
chrom = common_chr.pop()
(p1s,v1s) = self.get_data_by_chr(chrom) # arrays for position and values
p1n = iter(p1s).next # assign the next function to a viable to speed up
v1n = iter(v1s).next
p1n = iter(p1s).__next__ # assign the next function to a viable to speed up
v1n = iter(v1s).__next__
(p2s,v2s) = bdgTrack2.get_data_by_chr(chrom) # arrays for position and values
p2n = iter(p2s).next # assign the next function to a viable to speed up
v2n = iter(v2s).next
p2n = iter(p2s).__next__ # assign the next function to a viable to speed up
v2n = iter(v2s).__next__
pre_p = 0 # remember the previous position in the new bedGraphTrackI object ret
......@@ -914,7 +915,7 @@ cdef class bedGraphTrackI:
cdef:
int pre_p, p1, p2
double v1, v2
str chrom
bytes chrom
assert isinstance(bdgTrack2,bedGraphTrackI), "bdgTrack2 is not a bedGraphTrackI object"
......@@ -927,12 +928,12 @@ cdef class bedGraphTrackI:
for chrom in common_chr:
(p1s,v1s) = self.get_data_by_chr(chrom) # arrays for position and values
p1n = iter(p1s).next # assign the next function to a viable to speed up
v1n = iter(v1s).next
p1n = iter(p1s).__next__ # assign the next function to a viable to speed up
v1n = iter(v1s).__next__
(p2s,v2s) = bdgTrack2.get_data_by_chr(chrom) # arrays for position and values
p2n = iter(p2s).next # assign the next function to a viable to speed up
v2n = iter(v2s).next
p2n = iter(p2s).__next__ # assign the next function to a viable to speed up
v2n = iter(v2s).__next__
chrom_max_len = len(p1s)+len(p2s) # this is the maximum number of locations needed to be recorded in scoreTrackI for this chromosome.
......@@ -982,7 +983,8 @@ cdef class bedGraphTrackI:
cpdef str cutoff_analysis ( self, int max_gap, int min_length, int steps = 100 ):
cdef:
list chrs, tmplist, peak_content
str chrom, ret
bytes chrom
str ret
float cutoff
long total_l, total_p, i, n, ts, te, lastp, tl, peak_length
dict cutoff_npeaks, cutoff_lpeaks
......@@ -1017,8 +1019,8 @@ cdef class bedGraphTrackI:
continue
# first bit of region above cutoff
acs_next = iter(above_cutoff_startpos).next
ace_next = iter(above_cutoff_endpos).next
acs_next = iter(above_cutoff_startpos).__next__
ace_next = iter(above_cutoff_endpos).__next__
ts = acs_next()
te = ace_next()
......@@ -1062,7 +1064,7 @@ def scoreTracktoBedGraph (scoretrack, str colname):
"""
cdef:
int pre, i
str chrom
bytes chrom
bdgtrack = bedGraphTrackI( baseline_value = 0 )
if colname not in ['sample','control','-100logp','-100logq']:
......@@ -1098,7 +1100,7 @@ class bedRegionTrackI (bedGraphTrackI):
self.minvalue = 0
self.baseline_value = 0
def safe_add_loc (self, str chromosome, int startpos, int endpos):
def safe_add_loc (self, bytes chromosome, int startpos, int endpos):
"""Add a chr-start-end-value block into __data dictionary.
"""
......@@ -1114,7 +1116,7 @@ class bedRegionTrackI (bedGraphTrackI):
if startpos < 0:
startpos = 0
if not self.__data.has_key(chromosome):
if chromosome not in self.__data:
self.__data[chromosome] = [array(BYTE4,[]),array(FBYTE4,[])] # for (endpos,value)
c = self.__data[chromosome]
if startpos:
......
This diff is collapsed.
# Time-stamp: <2019-09-20 11:37:42 taoliu>
# cython: language_level=3
# Time-stamp: <2019-10-02 11:05:42 taoliu>
"""Module Description: IO Module for bedGraph file
......@@ -62,13 +63,13 @@ cdef class bedGraphIO:
If any of the above two criteria is violated, parsering will fail.
"""
cdef:
char * bedGraph_filename
str bedGraph_filename
def __init__ ( self, bedGraph_filename ):
def __init__ ( self, str bedGraph_filename ):
"""f must be a filename or a file handler.
"""
self.bedGraph_filename = <bytes> bedGraph_filename
self.bedGraph_filename = bedGraph_filename
def build_bdgtrack (self, double baseline_value=0):
"""Use this function to return a bedGraphTrackI object.
......@@ -82,19 +83,19 @@ cdef class bedGraphIO:
Then the region chr1:200..250 should be filled with
baseline_value. Default of baseline_value is 0.
"""
cdef str i
cdef bytes i
data = bedGraphTrackI(baseline_value=baseline_value)
add_func = data.add_loc
# python open file
bedGraph_file = open( self.bedGraph_filename, "r" )
bedGraph_file = open( self.bedGraph_filename, "rb" )
for i in bedGraph_file:
if i.startswith("track"):
if i.startswith(b"track"):
continue
elif i.startswith("#"):
elif i.startswith(b"#"):
continue
elif i.startswith("browse"):
elif i.startswith(b"browse"):
continue
else:
fs = i.split()
......@@ -103,39 +104,6 @@ cdef class bedGraphIO:
bedGraph_file.close()
return data
def build_bdgtrack_c (self, double baseline_value=0):
"""Use this function to return a bedGraphTrackI object. This is a C version.
baseline_value is the value to fill in the regions not defined
in bedGraph. For example, if the bedGraph is like:
chr1 100 200 1
chr1 250 350 2
Then the region chr1:200..250 should be filled with
baseline_value. Default of baseline_value is 0.
"""
cdef:
char[80] chrom # will there be chromosome name longer than 80 characters?
int start, end
float value
FILE * bedGraph_file
data = bedGraphTrackI(baseline_value=baseline_value)
add_func = data.add_loc
# C open file
bedGraph_file = fopen( self.bedGraph_filename, "r" )
if bedGraph_file == NULL:
raise Exception("File '%s' can not be opened!" % self.bedGraph_filename )
while ( fscanf( bedGraph_file, "%s\t%d\t%d\t%f\n", chrom, &start, &end, &value ) != EOF ):
add_func( chrom, start, end, value )
fclose( bedGraph_file )
#data.finalize()
return data
cdef class genericBedIO:
"""File Parser Class for generic bed File with at least column #1,#2,#3,and #5.
......@@ -156,8 +124,8 @@ cdef class genericBedIO:
"""
if type(f) == str:
self.fhd = open(f,"r")
elif type(f) == file:
self.fhd = open(f,"rb")
elif type(f) == io.IOBase:
self.fhd = f
else:
raise Exception("f must be a filename or a file handler.")
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# Time-stamp: <2019-09-20 11:38:45 taoliu>
# cython: language_level=3
# Time-stamp: <2019-10-02 11:06:00 taoliu>
"""Module for FWTrack classes.
......@@ -15,6 +16,7 @@ import logging
#from array import array
#from random import sample as random_sample
import sys
import io
from copy import copy
from collections import Counter
......@@ -98,18 +100,18 @@ cdef class FWTrack:
"""
cdef:
set chrs
str chromosome
bytes chromosome
chrs = set(self.get_chr_names())
for chromosome in chrs:
if self.__locations.has_key(chromosome):
if chromosome in self.__locations:
self.__locations[chromosome][0].resize( self.buffer_size, refcheck=False )
self.__locations[chromosome][0].resize( 0, refcheck=False )
self.__locations[chromosome][1].resize( self.buffer_size, refcheck=False )
self.__locations[chromosome][1].resize( 0, refcheck=False )
self.__locations[chromosome] = [None, None]
self.__locations.pop(chromosome)
if self.__dup_locations.has_key(chromosome):
if chromosome in self.__dup_locations:
self.__dup_locations[chromosome][0].resize( self.buffer_size, refcheck=False )
self.__dup_locations[chromosome][0].resize( 0, refcheck=False )
self.__dup_locations[chromosome][1].resize( self.buffer_size, refcheck=False )
......@@ -120,7 +122,7 @@ cdef class FWTrack:
return True
cpdef add_loc ( self, str chromosome, int32_t fiveendpos, int strand ):
cpdef add_loc ( self, bytes chromosome, int32_t fiveendpos, int strand ):
"""Add a location to the list according to the sequence name.
chromosome -- mostly the chromosome name
......@@ -130,7 +132,7 @@ cdef class FWTrack:
cdef:
long i
if not self.__locations.has_key(chromosome):
if chromosome not in self.__locations:
self.__locations[chromosome] = [ np.zeros(self.buffer_size, dtype='int32'), np.zeros(self.buffer_size, dtype='int32') ] # [plus,minus strand]
self.__pointer[chromosome] = [ 0, 0 ]
self.__locations[chromosome][strand][0] = fiveendpos
......@@ -143,10 +145,10 @@ cdef class FWTrack:
self.__locations[chromosome][strand][i]= fiveendpos
self.__pointer[chromosome][strand] += 1
except:
print "i:",i
print "self.buffer_size:", self.buffer_size
print "strand:", strand
print "loc len:", len( self.__locations[chromosome][strand] )
print ("i:",i)
print ("self.buffer_size:", self.buffer_size)
print ("strand:", strand)
print ("loc len:", len( self.__locations[chromosome][strand] ))
raise Exception("!!")
cdef np.ndarray[np.int32_t, ndim=1] __expand__ ( self, np.ndarray[np.int32_t, ndim=1] arr ):
......@@ -161,7 +163,7 @@ cdef class FWTrack:
cdef:
int32_t i
str c
bytes c
self.total = 0
......@@ -191,7 +193,7 @@ cdef class FWTrack:
"""
cdef:
set valid_chroms, missed_chroms, extra_chroms
str chrom
bytes chrom
valid_chroms = set(self.__locations.keys()).intersection(rlengths.keys())
for chrom in valid_chroms:
......@@ -211,11 +213,11 @@ cdef class FWTrack:
self.rlengths = dict([(k, INT_MAX) for k in self.__locations.keys()])
return self.rlengths
cpdef get_locations_by_chr ( self, str chromosome ):
cpdef get_locations_by_chr ( self, bytes chromosome ):
"""Return a tuple of two lists of locations for certain chromosome.
"""
if self.__locations.has_key(chromosome):
if chromosome in self.__locations:
return self.__locations[chromosome]
else:
raise Exception("No such chromosome name (%s) in TrackI object!\n" % (chromosome))
......@@ -223,7 +225,7 @@ cdef class FWTrack:
cpdef list get_chr_names ( self ):
"""Return all the chromosome names stored in this track object.
"""
l = self.__locations.keys()
l = list(self.__locations.keys())
l.sort()
return l
......@@ -238,7 +240,7 @@ cdef class FWTrack:
"""
cdef:
int32_t i
str c
bytes c
chrnames = self.get_chr_names()
......@@ -258,7 +260,7 @@ cdef class FWTrack:
int p, m, n, current_loc, i_chrom
unsigned long i_old, i_new # index for old array, and index for new one
unsigned long i_dup, size, new_size, dup_size
str k
bytes k
np.ndarray[np.int32_t, ndim=1] plus, new_plus, dup_plus, minus, new_minus, dup_minus
if not self.__sorted:
......@@ -384,7 +386,7 @@ cdef class FWTrack:
int p, m, n, current_loc, i_chrom
unsigned long i_old, i_new # index for old array, and index for new one
unsigned long i_dup, size, new_size, dup_size
str k
bytes k
np.ndarray[np.int32_t, ndim=1] plus, new_plus, dup_plus, minus, new_minus, dup_minus
if not self.__sorted:
......@@ -452,7 +454,7 @@ cdef class FWTrack:
int p, m, n, current_loc, i_chrom
# index for old array, and index for new one
unsigned long i_old, i_new, size, new_size
str k
bytes k
np.ndarray[np.int32_t, ndim=1] plus, new_plus, minus, new_minus
if maxnum < 0: return self.total # do nothing
......@@ -560,7 +562,7 @@ cdef class FWTrack:
"""
cdef:
int32_t num, i_chrom # num: number of reads allowed on a certain chromosome
str key
bytes key
self.total = 0
self.length = 0
......@@ -612,11 +614,11 @@ cdef class FWTrack:
"""
cdef:
int32_t i, i_chrom, p
str k
bytes k
if not fhd:
fhd = sys.stdout
assert isinstance(fhd, file)
assert isinstance(fhd,io.IOBase)
assert self.fw > 0, "FWTrack object .fw should be set larger than 0!"
chrnames = self.get_chr_names()
......@@ -631,16 +633,16 @@ cdef class FWTrack:
for i in range(plus.shape[0]):
p = plus[i]
fhd.write("%s\t%d\t%d\t.\t.\t%s\n" % (k,p,p+self.fw,"+") )
fhd.write("%s\t%d\t%d\t.\t.\t%s\n" % (k.decode(),p,p+self.fw,"+") )
minus = self.__locations[k][1]
for i in range(minus.shape[0]):
p = minus[i]
fhd.write("%s\t%d\t%d\t.\t.\t%s\n" % (k,p-self.fw,p,"-") )
fhd.write("%s\t%d\t%d\t.\t.\t%s\n" % (k.decode(),p-self.fw,p,"-") )
return
cpdef tuple extract_region_tags ( self, str chromosome, int32_t startpos, int32_t endpos ):
cpdef tuple extract_region_tags ( self, bytes chromosome, int32_t startpos, int32_t endpos ):
cdef:
int32_t i, pos
np.ndarray[np.int32_t, ndim=1] rt_plus, rt_minus
......@@ -696,7 +698,7 @@ cdef class FWTrack:
cdef:
int32_t m, i, j, pre_i, pre_j, pos, startpos, endpos
np.ndarray[np.int32_t, ndim=1] plus, minus, rt_plus, rt_minus
str chrom, name
bytes chrom, name
list temp, retval, pchrnames
pchrnames = peaks.get_chr_names()
......@@ -777,7 +779,7 @@ cdef class FWTrack:
cdef:
int32_t m, i, j, pre_i, pre_j, pos, startpos, endpos #, n_peaks
np.ndarray[np.int32_t, ndim=1] plus, minus, rt_plus, rt_minus
str chrom #, peak_name
bytes chrom #, peak_name
list temp, retval, pchrnames, cpeaks
np.ndarray[np.int32_t, ndim=1] adjusted_summits, passflags
......@@ -863,7 +865,7 @@ cdef class FWTrack:
# end of a loop
return ret_peaks
cpdef pileup_a_chromosome ( self, str chrom, list ds, list scale_factor_s, float baseline_value = 0.0, bint directional = True, int end_shift = 0 ):
cpdef pileup_a_chromosome ( self, bytes chrom, list ds, list scale_factor_s, float baseline_value = 0.0, bint directional = True, int end_shift = 0 ):
"""pileup a certain chromosome, return [p,v] (end position and value) list.
ds : tag will be extended to this value to 3' direction,
......@@ -897,8 +899,8 @@ cdef class FWTrack:
three_shift_s.append( end_shift + d)
else:
# both sides
five_shift_s.append( d/2 - end_shift )
three_shift_s.append( end_shift + d - d/2)
five_shift_s.append( d//2 - end_shift )
three_shift_s.append( end_shift + d - d//2)
prev_pileup = None
......@@ -906,13 +908,13 @@ cdef class FWTrack:
five_shift = five_shift_s[i]
three_shift = three_shift_s[i]
scale_factor = scale_factor_s[i]
tmp_pileup = se_all_in_one_pileup ( self.__locations[chrom][0], self.__locations[chrom][1], five_shift, three_shift, rlength, scale_factor, baseline_value )
if prev_pileup:
prev_pileup = max_over_two_pv_array ( prev_pileup, tmp_pileup )
else:
prev_pileup = tmp_pileup
return prev_pileup
cdef inline int32_t left_sum ( data, int pos, int width ):
......
This diff is collapsed.
# Time-stamp: <2019-09-20 11:39:05 taoliu>
# cython: language_level=3
# Time-stamp: <2019-10-02 11:06:06 taoliu>
"""Module for filter duplicate tags from paired-end data
......@@ -8,16 +9,19 @@ the distribution).
"""
from MACS2.Constants import *
import io
import sys
from logging import debug, info
import numpy as np
cimport numpy as np
from libc.stdint cimport uint32_t, uint64_t, int32_t, int64_t
from copy import copy
from cpython cimport bool
cimport cython
from libc.stdint cimport uint32_t, uint64_t, int32_t, int64_t
from MACS2.Constants import *
from MACS2.Pileup import quick_pileup, max_over_two_pv_array, se_all_in_one_pileup
import sys
cdef INT_MAX = <int>((<unsigned int>-1)>>1)
......@@ -67,7 +71,7 @@ cdef class PETrackI:
self.length = 0
self.average_template_length = 0.0
cpdef add_loc ( self, str chromosome, int start, int end):
cpdef add_loc ( self, bytes chromosome, int start, int end):
"""Add a location to the list according to the sequence name.
chromosome -- mostly the chromosome name
......@@ -76,7 +80,7 @@ cdef class PETrackI:
cdef:
long i
if not self.__locations.has_key(chromosome):
if chromosome not in self.__locations:
self.__locations[chromosome] = np.zeros(shape=self.buffer_size, dtype=[('l','int32'),('r','int32')]) # note: ['l'] is the leftmost end, ['r'] is the rightmost end of fragment.
self.__locations[chromosome][ 0 ] = ( start, end )
self.__pointer[chromosome] = 1
......@@ -93,16 +97,16 @@ cdef class PETrackI:
"""
cdef:
set chrs
str chromosome
bytes chromosome
chrs = set(self.get_chr_names())
for chromosome in chrs:
if self.__locations.has_key(chromosome):
if chromosome in self.__locations:
self.__locations[chromosome].resize( self.buffer_size, refcheck=False )
self.__locations[chromosome].resize( 0, refcheck=False )
self.__locations[chromosome] = None
self.__locations.pop(chromosome)
if self.__dup_locations.has_key(chromosome):
if chromosome in self.__dup_locations:
self.__dup_locations[chromosome].resize( self.buffer_size, refcheck=False )
self.__dup_locations[chromosome].resize( 0, refcheck=False )
self.__dup_locations[chromosome] = None
......@@ -126,7 +130,7 @@ cdef class PETrackI:
"""
cdef:
set valid_chroms, missed_chroms
str chrom
bytes chrom
valid_chroms = set(self.__locations.keys()).intersection(rlengths.keys())
for chrom in valid_chroms:
......@@ -154,7 +158,7 @@ cdef class PETrackI:
"""
cdef int32_t i
cdef str c
cdef bytes c
self.total = 0
......@@ -170,11 +174,11 @@ cdef class PETrackI:
self.average_template_length = float( self.length ) / self.total
return
def get_locations_by_chr ( self, str chromosome ):
def get_locations_by_chr ( self, bytes chromosome ):
"""Return a tuple of two lists of locations for certain chromosome.
"""
if self.__locations.has_key(chromosome):
if chromosome in self.__locations:
return self.__locations[chromosome]
else:
raise Exception("No such chromosome name (%s) in TrackI object!\n" % (chromosome))
......@@ -182,7 +186,9 @@ cdef class PETrackI:
cpdef list get_chr_names ( self ):
"""Return all the chromosome names stored in this track object.
"""
l = self.__locations.keys()
cdef list l
l = list(self.__locations.keys())
l.sort()
return l
......@@ -203,7 +209,7 @@ cdef class PETrackI:
"""
cdef uint32_t i
cdef str c
cdef bytes c
cdef list chrnames = self.get_chr_names()
for i in range(len(chrnames)):
......@@ -241,7 +247,7 @@ cdef class PETrackI:
np.ndarray[np.float64_t, ndim=1] pmf
np.ndarray[np.int64_t, ndim=1] bins
np.ndarray[np.int32_t, ndim=1] sizes, locs
str c
bytes c
int i, bins_len
int max_bins = 0
list chrnames = self.get_chr_names()
......@@ -275,7 +281,7 @@ cdef class PETrackI:
np.ndarray locs, new_locs, dup_locs
unsigned long i_old, i_new, i_dup, new_size, dup_size
list chrnames = self.get_chr_names()
str k
bytes k
if not self.__sorted: self.sort()
......@@ -362,7 +368,7 @@ cdef class PETrackI:
# np.ndarray[np.int32_t, ndim=1] current_loc #= np.zeros([1,2], np.int32)
int loc_start, loc_end, current_loc_start, current_loc_end
unsigned long i_old, i_new, size, new_size
str k
bytes k
np.ndarray locs, new_locs
if maxnum < 0: return self.total # condition to return if not filtering
......@@ -435,7 +441,7 @@ cdef class PETrackI:
"""
cdef:
uint32_t num, i_chrom # num: number of reads allowed on a certain chromosome
str key
bytes key
self.total = 0
self.length = 0
......@@ -479,11 +485,11 @@ cdef class PETrackI:
"""
cdef int32_t i, i_chrom, s, e
cdef str k
cdef bytes k
if not fhd:
fhd = sys.stdout
assert isinstance(fhd, file)
assert isinstance(fhd, io.IOBase)
chrnames = self.get_chr_names()
......@@ -497,11 +503,11 @@ cdef class PETrackI:
for i in range(locs.shape[0]):
s, e = locs[ i ]
fhd.write("%s\t%d\t%d\n" % (k, s, e))
fhd.write("%s\t%d\t%d\n" % (k.decode(), s, e))
return
cpdef pileup_a_chromosome ( self, str chrom, list scale_factor_s, float baseline_value = 0.0 ):
cpdef pileup_a_chromosome ( self, bytes chrom, list scale_factor_s, float baseline_value = 0.0 ):
"""pileup a certain chromosome, return [p,v] (end position and value) list.
scale_factor_s : linearly scale the pileup value applied to each d in ds. The list should have the same length as ds.
......@@ -528,7 +534,7 @@ cdef class PETrackI:
return prev_pileup
cpdef pileup_a_chromosome_c ( self, str chrom, list ds, list scale_factor_s, float baseline_value = 0.0 ):
cpdef pileup_a_chromosome_c ( self, bytes chrom, list ds, list scale_factor_s, float baseline_value = 0.0 ):
"""pileup a certain chromosome, return [p,v] (end position and value) list.
This function is for control track. Basically, here is a
......@@ -557,8 +563,8 @@ cdef class PETrackI:
for i in range(len(scale_factor_s)):
d = ds[i]
scale_factor = scale_factor_s[i]
five_shift = d/2
three_shift= d/2
five_shift = d//2
three_shift= d//2
tmp_pileup = se_all_in_one_pileup ( self.__locations[chrom]['l'], self.__locations[chrom]['r'], five_shift, three_shift, rlength, scale_factor, baseline_value )
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.