Michael R. Crusoe · Michael R. Crusoe · Michael R. Crusoe · Michael R. Crusoe · Michael R. Crusoe · Michael R. Crusoe
--- a/COPYING
+++ b/COPYING
-Copyright (c) 2013, Tao Liu lab at UB and Xiaole Shirley Liu lab at DFCI
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-    * Redistributions of source code must retain the above copyright
-      notice, this list of conditions and the following disclaimer.
-    * Redistributions in binary form must reproduce the above copyright
-      notice, this list of conditions and the following disclaimer in the
-      documentation and/or other materials provided with the distribution.
-    * Neither the name of Tao Liu lab or Xiaole Shirley Liu lab nor the
-      names of its contributors may be used to endorse or promote products
-      derived from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT
-HOLDERs AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
-INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
-OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
-ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
-TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
-USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
-DAMAGE.
\ No newline at end of file
--- a/ChangeLog
+++ b/ChangeLog
+2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
+	MACS version 2.1.3.3
+
+	* Features added
+
+	1) Support Docker auto-deploy. PR #309
+
+	2) Support Travis CI auto-testing, update unit-testing
+	scripts, and enable subcommand testing on small datasets.
+
+	3) Update README documents. #297 PR #306
+
+	4) `cmbreps` supports more than 2 replicates. Merged from PR #304
+	@Maarten-vd-Sande and PR #307 (our own chi-sq test code)
+
+	5) `--d-min` option is added in `callpeak` and `predictd`, to
+	exclude predictions of fragment size smaller than the given
+	value. Merged from PR #267 @shouldsee.
+
+	6) `--buffer-size` option is added in `predictd`, `filterdup`,
+	`pileup` and `refinepeak` subcommands. Users can use this option
+	to decrease memory usage while there are a large number of contigs
+	in the data. Also, now `callpeak`, `predictd`, `filterdup`,
+	`pileup` and `refinepeak` will suggest users to tweak
+	`--buffer-size` while catching a MemoryError. #313 PR #314
+
+	* Bugs fixed
+
+	1) #265 Fixed a bug where the pseudocount hasn't been applied
+	while calculating p-value score in ScoreTrack object.
+
+	2) Fixed bdgbroadcall so that it will report those broad peaks
+	without strong peak inside, a consistent behavior as `callpeak
+	--broad`.
+
+	3) Rename COPYING to LICENSE.
+
 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 	MACS version 2.1.2


--- a/INSTALL.md
+++ b/INSTALL.md
 # INSTALL Guide For MACS
-Time-stamp: <2018-10-17 16:18:48 Tao Liu>
+Time-stamp: <2019-09-20 15:08:35 taoliu>

 Please check the following instructions to complete your installation.

 ## Prerequisites

 Python version must be equal to *2.7* to run MACS. I recommend
-using the version *2.7.9*.
+using the version *2.7.15*.

-[Numpy](http://www.scipy.org/Download) (>=1.6) are required to run MACS v2. 
+[Numpy](http://www.scipy.org/Download) (>=1.15) are required to run MACS v2. 

 GCC is required to compile `.c` codes in MACS v2 package, and python
 header files are needed. If you are using Mac OSX, I recommend you
 install Xcode; if you are using Linux, you need to make sure
 `python-dev` is installed.

-[Cython](http://cython.org/) (>=0.18) is required *only if* you want to regenerate `.c`
-files from `.pyx` files using `setup_w_cython.py` script.
+[Cython](http://cython.org/) (>=0.25) is required to generate `.c`
+files from `.pyx` files using `setup.py` script.

 ## Easy installation through PyPI

@@ -28,19 +28,12 @@ access to system-wide numpy and scipy libraries so that you don't need
 to install them again.  

 Then under command line, type `pip install MACS2`. PyPI will
-install Numpy and Scipy automatically if they are absent. 
+install Numpy and Cython automatically if they are absent. 

 To upgrade MACS2, type `pip install -U MACS2`. It will check
 currently installed MACS2, compare the version with the one on PyPI
 repository, download and install newer version while necessary.

-Note, if you do not want pip to fix dependencies. For example, you
-already have a workable Scipy and Numpy, and when 'pip install -U
-MACS2', pip downloads newest Scipy and Numpy but unable to compile and
-install them. This will fail the whole installation. You can pass
-'--no-deps' option to pip and let it skip all dependencies. Type
-`pip install -U --no-deps MACS2`.
-
 ## Install from source

 MACS uses Python's distutils tools for source installations. To
@@ -65,15 +58,6 @@ directory, use this command:

 `$ python setup.py install --prefix /home/taoliu/`

-If you want to re-generate `.c` files from `.pyx` files, you need
-to install Cython first, then use `setup_w_cython.py` script to
-replace `setup.py` script in the previous commands, such as::
-
- `$ python setup_w_cython.py install`
-
-or:
-
-  `$ python setup_w_cython.py install --prefix /home/taoliu/`

 ## Configure enviroment variables


--- a/LICENSE
+++ b/LICENSE
+BSD 3-Clause License
+
+Copyright (c) 2019, Tao Liu lab at RPCCC and Xiaole Shirley Liu lab at DFCI
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/MACS2.egg-info/PKG-INFO
+++ b/MACS2.egg-info/PKG-INFO
--- a/MACS2.egg-info/SOURCES.txt
+++ b/MACS2.egg-info/SOURCES.txt
-COPYING
 ChangeLog
 INSTALL.md
+LICENSE
 MANIFEST.in
 README.md
 setup.py
-setup_w_cython.py
 MACS2/Constants.py
 MACS2/OptValidator.py
 MACS2/OutputWriter.py

--- a/MACS2.egg-info/requires.txt
+++ b/MACS2.egg-info/requires.txt
-numpy>=1.6
+numpy>=1.15
+cython>=0.25
--- a/MACS2/Constants.py
+++ b/MACS2/Constants.py
-MACS_VERSION = "2.1.2"
+MACS_VERSION = "2.1.3.3"
 #MACSDIFF_VERSION = "1.0.4 20110212 (tag:alpha)"
 FILTERDUP_VERSION = "1.0.0 20140616"
 RANDSAMPLE_VERSION = "1.0.0 20120703"

--- a/MACS2/IO/BedGraph.c
+++ b/MACS2/IO/BedGraph.c
--- a/MACS2/IO/BedGraph.pyx
+++ b/MACS2/IO/BedGraph.pyx
-# Time-stamp: <2016-02-12 00:39:32 Tao Liu>
+# Time-stamp: <2019-09-20 11:37:21 taoliu>

-"""Module for Feature IO classes.
-
-Copyright (c) 2010,2011 Tao Liu <taoliu@jimmy.harvard.edu>
+"""Module for BedGraph data class.

 This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included
-with the distribution).
-
-@status:  experimental
-@version: $Revision$
-@author:  Tao Liu
-@contact: taoliu@jimmy.harvard.edu
+under the terms of the BSD License (see the file LICENSE included with
+the distribution).
 """

 # ------------------------------------
@@ -22,7 +15,6 @@ import logging
 from array import array

 import numpy as np
-np_convolve = np.convolve

 from libc.math cimport sqrt
 from libc.math cimport log
@@ -31,7 +23,7 @@ from cpython cimport bool
 from MACS2.Constants import *
 from MACS2.IO.ScoreTrack import scoreTrackII,CombinedTwoTrack
 from MACS2.IO.PeakIO import PeakIO, BroadPeakIO
-
+from MACS2.Prob import chisq_logp_e

 # ------------------------------------
 # constants
@@ -500,97 +492,6 @@ cdef class bedGraphTrackI:
                       )
            return True
                    
-   
-    # def __close_peak2 (self, peak_content, peaks, int min_length, str chrom, int smoothlen=50):
-    #     # this is where the summits are called, need to fix this
-    #     end, start = peak_content[ -1 ][ 1 ], peak_content[ 0 ][ 0 ]
-    #     if end - start < min_length: return # if the peak is too small, reject it
-    #     #for (start,end,value,summitvalue,index) in peak_content:
-    #     peakdata = np.zeros(end - start, dtype='float32')
-    #     peakindices = np.zeros(end - start, dtype='int32')
-    #     for (tmpstart,tmpend,tmpvalue,tmpsummitvalue, tmpindex) in peak_content:
-    #         i, j = tmpstart-start, tmpend-start
-    #         peakdata[i:j] = self.data[chrom]['sample'][tmpindex]
-    #         peakindices[i:j] = tmpindex
-    #     # apply smoothing window of tsize / 2
-    #     w = np.ones(smoothlen, dtype='float32')
-    #     smoothdata = np_convolve(w/w.sum(), peakdata, mode='same')
-    #     # find maxima and minima
-    #     local_extrema = np.where(np.diff(np.sign(np.diff(smoothdata))))[0]+1
-    #     # get only maxima by requiring it be greater than the mean
-    #     # might be better to take another derivative instead
-    #     plateau_offsets = np.intersect1d(local_extrema,
-    #                                      np.where(peakdata>peakdata.mean())[0])
-    #     # sometimes peak summits are plateaus, so check for adjacent coordinates
-    #     # and take the middle ones if needed
-    #     if len(plateau_offsets)==0:
-    #     #####################################################################
-    #     # ***failsafe if no summits so far***                               #
-    #         summit_offset_groups = [[(end - start) / 2]]                    #
-    #     ##################################################################### 
-    #     elif len(plateau_offsets) == 1:
-    #         summit_offset_groups = [[plateau_offsets[0]]]
-    #     else:
-    #         previous_offset = plateau_offsets[0]
-    #         summit_offset_groups = [[previous_offset]]
-    #         for offset in plateau_offsets:
-    #             if offset == previous_offset + 1:
-    #                 summit_offset_groups[-1].append(offset)
-    #             else:
-    #                 summit_offset_groups.append([offset])
-    #     summit_offsets = []
-    #     for offset_group in summit_offset_groups:
-    #         summit_offsets.append(offset_group[len(offset_group) / 2])
-    #     summit_indices = peakindices[summit_offsets]
-    #     # also purge offsets that have the same summit_index
-    #     unique_offsets = []
-    #     summit_offsets = np.fromiter(summit_offsets, dtype='int32')
-    #     for index in np.unique(summit_indices):
-    #         those_index_indices = np.where(summit_indices == index)[0]
-    #         those_offsets = summit_offsets[those_index_indices]
-    #         unique_offsets.append(int(those_offsets.mean()))
-    #     # also require a valley of at least 0.6 * taller peak
-    #     # in every adjacent two peaks or discard the lesser one
-    #     # this behavior is like PeakSplitter
-    #     better_offsets = []
-    #     previous_offset = unique_offsets.pop()
-    #     while True:
-    #         if len(unique_offsets) == 0:
-    #             better_offsets.append(previous_offset)
-    #             break
-    #         else:
-    #             this_offset = unique_offsets.pop()
-    #             this_h, prev_h = peakdata[[this_offset, previous_offset]]
-    #             if this_h > prev_h:
-    #                 prev_is_taller = False
-    #                 min_valley = 0.6 * this_h
-    #             else:
-    #                 prev_is_taller = True
-    #                 min_valley = 0.6 * prev_h
-    #             s = slice(this_offset, previous_offset)
-    #             valley = np.where(peakdata[s] < min_valley)[0]
-    #             if len(valley) > 0: better_offsets.append(previous_offset)
-    #             else:
-    #                 if prev_is_taller: continue # discard this peak
-    #                 # else: discard previous peak by ignoring it
-    #             previous_offset = this_offset
-    #     better_offsets.reverse()
-    #     better_indices = peakindices[better_offsets]
-    #     assert len(better_offsets) > 0, "Lost peak summit(s) near %s %d" % (chrom, start) 
-    #     for summit_offset, summit_index in zip(better_offsets, better_indices):
-    #         peaks.add( chrom,
-    #                    start,
-    #                    end,
-    #                    summit      = start + summit_offset,
-    #                    peak_score  = peakdata[summit_offset],
-    #                    pileup      = 0,
-    #                    pscore      = 0,
-    #                    fold_change = 0,
-    #                    qscore      = 0,
-    #                    )
-    #     # start a new peak
-    #     return True
-    
    def call_broadpeaks (self, double lvl1_cutoff=500, double lvl2_cutoff=100, int min_length=200,
                         int lvl1_max_gap=50, int lvl2_max_gap=400):
        """This function try to find enriched regions within which,
@@ -662,25 +563,36 @@ cdef class bedGraphTrackI:
        
        start      = lvl2peak["start"]
        end        = lvl2peak["end"]
+
        if not lvl1peakset:
-            bpeaks.add(chrom, start, end, score=lvl2peak["score"], thickStart=".", thickEnd=".",
-                       blockNum = 0, blockSizes = ".", blockStarts = ".")
+            #try:
+            # will complement by adding 1bps start and end to this region
+            # may change in the future if gappedPeak format was improved.
+            bpeaks.add(chrom, start, end, score=lvl2peak["score"], thickStart=str(start), thickEnd=str(end),
+                       blockNum = 2, blockSizes = "1,1", blockStarts = "0,"+str(end-start-1), pileup = lvl2peak["pileup"],
+                       pscore = lvl2peak["pscore"], fold_change = lvl2peak["fc"],
+                       qscore = lvl2peak["qscore"] )
            return bpeaks
+
        thickStart = str(lvl1peakset[0]["start"])
        thickEnd   = str(lvl1peakset[-1]["end"])
        blockNum   = len(lvl1peakset)
        blockSizes = ",".join( map(lambda x:str(x["length"]),lvl1peakset) )
        blockStarts = ",".join( map(lambda x:str(x["start"]-start),lvl1peakset) )
-        #if lvl2peak["start"] != thickStart:
-        #    # add 1bp mark for the start of lvl2 peak
-        #    blockNum += 1
-        #    blockSizes = "1,"+blockSizes
-        #    blockStarts = "0,"+blockStarts
-        #if lvl2peak["end"] != thickEnd:
-        #    # add 1bp mark for the end of lvl2 peak            
-        #    blockNum += 1
-        #    blockSizes = blockSizes+",1"
-        #    blockStarts = blockStarts+","+str(end-start-1)
+
+        # add 1bp left and/or right block if necessary
+        if int(thickStart) != start:
+            # add 1bp left block
+            thickStart = str(start)
+            blockNum += 1
+            blockSizes = "1,"+blockSizes
+            blockStarts = "0,"+blockStarts
+        if int(thickEnd) != end:
+            # add 1bp right block
+            thickEnd = str(end)
+            blockNum += 1
+            blockSizes = blockSizes+",1"
+            blockStarts = blockStarts+","+str(end-start-1)
        
        bpeaks.add(chrom, start, end, score=lvl2peak["score"], thickStart=thickStart, thickEnd=thickEnd,
                   blockNum = blockNum, blockSizes = blockSizes, blockStarts = blockStarts)
@@ -716,7 +628,7 @@ cdef class bedGraphTrackI:
            ret.add_loc(chrom,0,max_p,new_value)
        return ret

-    def overlie (self, bdgTrack2, func="max" ):
+    def overlie (self, bdgTracks, func="max" ):
        """Calculate two bedGraphTrackI objects by letting self
        overlying bdgTrack2, with user-defined functions.

@@ -744,6 +656,7 @@ cdef class bedGraphTrackI:

        Then the given 'func' will be applied on each 2-tuple as func(#1,#2)

+        Supported 'func' are "max", "mean" and "fisher".

        Return value is a bedGraphTrackI object.
        """
@@ -752,66 +665,62 @@ cdef class bedGraphTrackI:
            double v1, v2
            str chrom

-        assert isinstance(bdgTrack2,bedGraphTrackI), "bdgTrack2 is not a bedGraphTrackI object"
+        nr_tracks = len(bdgTracks) + 1  # +1 for self
+        assert nr_tracks >= 2, "Specify at least two replicates"
+        for i, bdgTrack in enumerate(bdgTracks):
+            assert isinstance(bdgTrack, bedGraphTrackI), "bdgTrack{} is not a bedGraphTrackI object".format(i + 1)

        if func == "max":
            f = max
        elif func == "mean":
-            f = lambda x, y: ( x + y ) / 2
+            #f = np_mean
+            def f(*args):
+                return sum(args)/nr_tracks
        elif func == "fisher":
-            f = lambda p1, p2: ( p1 + p2 ) - log1p( ( p1 + p2 ) / LOG10_E ) * LOG10_E
+            # combine -log10pvalues
+            def f(*args):
+                # chisq statistics = sum(-log10p)/log10(e)*2, chisq df = 2*number_of_reps
+                return chisq_logp_e(2*sum(args)/LOG10_E, 2*nr_tracks, log10=True)
        else:
            raise Exception("Invalid function")

        ret = bedGraphTrackI()
        retadd = ret.add_loc

-        chr1 = set(self.get_chr_names())
-        chr2 = set(bdgTrack2.get_chr_names())
-        common_chr = chr1.intersection(chr2)
+        common_chr = set(self.get_chr_names())
+        for track in bdgTracks:
+            common_chr = common_chr.intersection(set(track.get_chr_names()))

        for chrom in common_chr:
-            (p1s,v1s) = self.get_data_by_chr(chrom) # arrays for position and values
-            p1n = iter(p1s).next         # assign the next function to a viable to speed up
-            v1n = iter(v1s).next
+            datas = [self.get_data_by_chr(chrom)]
+            datas.extend([bdgTracks[i].get_data_by_chr(chrom) for i in range(len(bdgTracks))])

-            (p2s,v2s) = bdgTrack2.get_data_by_chr(chrom) # arrays for position and values
-            p2n = iter(p2s).next         # assign the next function to a viable to speed up
-            v2n = iter(v2s).next
+            ps, vs, pn, vn = [], [], [], []
+            for data in datas:
+                ps.append(data[0])
+                pn.append(iter(ps[-1]).next)
+                vs.append(data[1])
+                vn.append(iter(vs[-1]).next)

            pre_p = 0                   # remember the previous position in the new bedGraphTrackI object ret
-            
            try:
-                p1 = p1n()
-                v1 = v1n()
-
-                p2 = p2n()
-                v2 = v2n()
+                ps_cur = [pn[i]() for i in range(len(pn))]
+                vs_cur = [vn[i]() for i in range(len(pn))]

                while True:
-                    if p1 < p2:
-                        # clip a region from pre_p to p1, then set pre_p as p1.
-                        retadd(chrom,pre_p,p1,f(v1,v2))
-                        pre_p = p1
-                        # call for the next p1 and v1
-                        p1 = p1n()
-                        v1 = v1n()
-                    elif p2 < p1:
-                        # clip a region from pre_p to p2, then set pre_p as p2.
-                        retadd(chrom,pre_p,p2,f(v1,v2))
-                        pre_p = p2
-                        # call for the next p2 and v2
-                        p2 = p2n()
-                        v2 = v2n()
-                    elif p1 == p2:
-                        # from pre_p to p1 or p2, then set pre_p as p1 or p2.
-                        retadd(chrom,pre_p,p1,f(v1,v2))
-                        pre_p = p1
-                        # call for the next p1, v1, p2, v2.
-                        p1 = p1n()
-                        v1 = v1n()
-                        p2 = p2n()
-                        v2 = v2n()
+                    # get the lowest position
+                    lowest_p = min(ps_cur)
+
+                    # at least one lowest position, could be multiple
+                    locations = [i for i in range(len(ps_cur)) if ps_cur[i] == lowest_p]
+
+                    # add the data until the interval
+                    ret.add_loc(chrom, pre_p, ps_cur[locations[0]], f(*vs_cur))
+
+                    pre_p = ps_cur[locations[0]]
+                    for index in locations:
+                        ps_cur[index] = pn[index]()
+                        vs_cur[index] = vn[index]()
            except StopIteration:
                # meet the end of either bedGraphTrackI, simply exit
                pass
@@ -1145,80 +1054,6 @@ cdef class bedGraphTrackI:
                ret += "%.2f\t%d\t%d\t%.2f\n" % ( cutoff, cutoff_npeaks[ cutoff ], cutoff_lpeaks[ cutoff ], cutoff_lpeaks[ cutoff ]/cutoff_npeaks[ cutoff ] )
        return ret

-    # def make_scoreTrack_for_macs2diff (self, bdgTrack2 ):
-    #     """A modified overlie function for MACS v2 differential call.
-
-    #     Return value is a bedGraphTrackI object.
-    #     """
-    #     cdef:
-    #         int pre_p, p1, p2
-    #         double v1, v2
-    #         str chrom
-        
-    #     assert isinstance(bdgTrack2,bedGraphTrackI), "bdgTrack2 is not a bedGraphTrackI object"
-
-    #     ret = CombinedTwoTrack()
-    #     retadd = ret.add
-        
-    #     chr1 = set(self.get_chr_names())
-    #     chr2 = set(bdgTrack2.get_chr_names())
-    #     common_chr = chr1.intersection(chr2)
-    #     for chrom in common_chr:
-            
-    #         (p1s,v1s) = self.get_data_by_chr(chrom) # arrays for position and values
-    #         p1n = iter(p1s).next         # assign the next function to a viable to speed up
-    #         v1n = iter(v1s).next
-
-    #         (p2s,v2s) = bdgTrack2.get_data_by_chr(chrom) # arrays for position and values
-    #         p2n = iter(p2s).next         # assign the next function to a viable to speed up
-    #         v2n = iter(v2s).next
-
-    #         chrom_max_len = len(p1s)+len(p2s) # this is the maximum number of locations needed to be recorded in scoreTrackI for this chromosome.
-            
-    #         ret.add_chromosome(chrom,chrom_max_len)
-
-    #         pre_p = 0                   # remember the previous position in the new bedGraphTrackI object ret
-            
-    #         try:
-    #             p1 = p1n()
-    #             v1 = v1n()
-
-    #             p2 = p2n()
-    #             v2 = v2n()
-
-    #             while True:
-    #                 if p1 < p2:
-    #                     # clip a region from pre_p to p1, then set pre_p as p1.
-    #                     retadd( chrom, p1, v1, v2 )
-    #                     pre_p = p1
-    #                     # call for the next p1 and v1
-    #                     p1 = p1n()
-    #                     v1 = v1n()
-    #                 elif p2 < p1:
-    #                     # clip a region from pre_p to p2, then set pre_p as p2.
-    #                     retadd( chrom, p2, v1, v2 )
-    #                     pre_p = p2
-    #                     # call for the next p2 and v2
-    #                     p2 = p2n()
-    #                     v2 = v2n()
-    #                 elif p1 == p2:
-    #                     # from pre_p to p1 or p2, then set pre_p as p1 or p2.
-    #                     retadd( chrom, p1, v1, v2 )
-    #                     pre_p = p1
-    #                     # call for the next p1, v1, p2, v2.
-    #                     p1 = p1n()
-    #                     v1 = v1n()
-    #                     p2 = p2n()
-    #                     v2 = v2n()
-    #         except StopIteration:
-    #             # meet the end of either bedGraphTrackI, simply exit
-    #             pass
-
-    #     ret.finalize()
-    #     #ret.merge_regions()
-    #     return ret
-
-
 def scoreTracktoBedGraph (scoretrack, str colname):
    """Produce a bedGraphTrackI object with certain column as scores.
    

--- a/MACS2/IO/BedGraphIO.c
+++ b/MACS2/IO/BedGraphIO.c
--- a/MACS2/IO/BedGraphIO.pyx
+++ b/MACS2/IO/BedGraphIO.pyx
-# Time-stamp: <2015-03-13 12:50:14 Tao Liu>
+# Time-stamp: <2019-09-20 11:37:42 taoliu>

 """Module Description:  IO Module for bedGraph file

-Copyright (c) 2011 Tao Liu <taoliu@jimmy.harvard.edu>
-
 This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included with
+under the terms of the BSD License (see the file LICENSE included with
 the distribution).
-
-@status:  experimental
-@version: $Revision$
-@author:  Tao Liu
-@contact: taoliu@jimmy.harvard.edu
 """

 # ------------------------------------

--- a/MACS2/IO/BinKeeper.py
+++ b/MACS2/IO/BinKeeper.py
-# Time-stamp: <2011-03-14 17:52:00 Tao Liu>
+# Time-stamp: <2019-09-20 11:38:14 taoliu>

-"""Module Description: BinKeeper for Wiggle-like tracks.
-
-Copyright (c) 2008 Tao Liu <taoliu@jimmy.harvard.edu>
+"""Module Description: BinKeeper for Wiggle-like tracks. (obsolete)

 This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included with
+under the terms of the BSD License (see the file LICENSE included with
 the distribution).
-
-@status:  experimental
-@version: $Revision$
-@author:  Tao Liu
-@contact: taoliu@jimmy.harvard.edu
 """

 # ------------------------------------

--- a/MACS2/IO/CallPeakUnit.c
+++ b/MACS2/IO/CallPeakUnit.c
--- a/MACS2/IO/CallPeakUnit.pyx
+++ b/MACS2/IO/CallPeakUnit.pyx
-# Time-stamp: <2016-05-19 10:22:22 Tao Liu>
+# Time-stamp: <2019-09-20 11:38:30 taoliu>

 """Module for Calculate Scores.

-Copyright (c) 2013 Tao Liu <vladimir.liu@gmail.com>
-
 This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included
-with the distribution).
-
-@status:  experimental
-@version: $Revision$
-@author:  Tao Liu
-@contact: vladimir.liu@gmail.com
+under the terms of the BSD License (see the file LICENSE included with
+the distribution).
 """

 # ------------------------------------

--- a/MACS2/IO/FixWidthTrack.c
+++ b/MACS2/IO/FixWidthTrack.c
--- a/MACS2/IO/FixWidthTrack.pyx
+++ b/MACS2/IO/FixWidthTrack.pyx
-# Time-stamp: <2015-06-03 00:49:19 Tao Liu>
+# Time-stamp: <2019-09-20 11:38:45 taoliu>

 """Module for FWTrack classes.

-Copyright (c) 2010,2011 Tao Liu <taoliu@jimmy.harvard.edu>
-
 This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included
-with the distribution).
-
-@status:  experimental
-@version: $Revision$
-@author:  Tao Liu
-@contact: taoliu@jimmy.harvard.edu
+under the terms of the BSD License (see the file LICENSE included with
+the distribution).
 """

 # ------------------------------------
@@ -446,7 +439,7 @@ cdef class FWTrack:
        return 0

    @cython.boundscheck(False) # do not check that np indices are valid
-    cpdef filter_dup ( self, int32_t maxnum = -1):
+    cpdef unsigned long filter_dup ( self, int32_t maxnum = -1):
        """Filter the duplicated reads.

        Run it right after you add all data into this object.
@@ -462,7 +455,7 @@ cdef class FWTrack:
            str k
            np.ndarray[np.int32_t, ndim=1] plus, new_plus, minus, new_minus

-        if maxnum < 0: return           # do nothing
+        if maxnum < 0: return self.total           # do nothing

        if not self.__sorted:
            self.sort()
@@ -558,83 +551,7 @@ cdef class FWTrack:
            self.__locations[k]=[new_plus,new_minus]

        self.length = self.fw * self.total
-        return
-
-
-    @cython.boundscheck(False) # do not check that np indices are valid
-    cpdef filter_dup_dryrun ( self, int32_t maxnum = -1):
-        """Filter the duplicated reads. (dry run) only return number of remaining reads
-
-        Run it right after you add all data into this object.
-
-        Note, this function will *throw out* duplicates
-        permenantly. If you want to keep them, use separate_dups
-        instead.
-        """
-        cdef:
-            int p, m, n, current_loc, i_chrom, total
-            # index for old array, and index for new one
-            unsigned long i_old, size
-            str k
-            np.ndarray[np.int32_t, ndim=1] plus, minus
-
-        if maxnum < 0: return           # do nothing
-
-        if not self.__sorted:
-            self.sort()
-
-        total = 0
-        chrnames = self.get_chr_names()
-        
-        for i_chrom in range( len(chrnames) ):
-            # for each chromosome.
-            # This loop body is too big, I may need to split code later...
-            
-            k = chrnames[ i_chrom ]
-            # + strand
-            plus = self.__locations[k][0]
-            size = plus.shape[0]
-            if len(plus) < 1:
-                pass
-            else:
-                total += 1
-                n = 1                # the number of tags in the current location
-                current_loc = plus[0]
-                for i_old in range( 1, size ):
-                    p = plus[ i_old ]
-                    if p == current_loc:
-                        n += 1
-                        if n <= maxnum:
-                            total += 1
-                        else:
-                            logging.debug("Duplicate reads found at %s:%d at + strand" % (k,p) )
-                    else:
-                        current_loc = p
-                        total += 1                        
-                        n = 1
-
-            # - strand
-            minus = self.__locations[k][1]
-            size = minus.shape[0]
-            if len(minus) < 1:
-                pass
-            else:
-                total += 1
-                n = 1                # the number of tags in the current location
-                current_loc = minus[0]
-                for i_old in range( 1, size ):
-                    p = minus[ i_old ]
-                    if p == current_loc:
-                        n += 1
-                        if n <= maxnum:
-                            total += 1
-                        else:
-                            logging.debug("Duplicate reads found at %s:%d at + strand" % (k,p) )
-                    else:
-                        current_loc = p
-                        total += 1                        
-                        n = 1
-        return total
+        return self.total

    cpdef sample_percent (self, float percent, int seed = -1 ):
        """Sample the tags for a given percentage.

--- a/MACS2/IO/PairedEndTrack.c
+++ b/MACS2/IO/PairedEndTrack.c
--- a/MACS2/IO/PairedEndTrack.pyx
+++ b/MACS2/IO/PairedEndTrack.pyx
-# Time-stamp: <2018-10-16 12:15:12 Tao Liu>
+# Time-stamp: <2019-09-20 11:39:05 taoliu>

 """Module for filter duplicate tags from paired-end data

-Copyright (c) 2010,2011 Tao Liu <taoliu@jimmy.harvard.edu>
-
 This code is free software; you can redistribute it and/or modify it
-under the terms of the BSD License (see the file COPYING included
-with the distribution).
-
-@status:  experimental
-@version: $Revision$
-@author:  Tao Liu
-@contact: taoliu@jimmy.harvard.edu
+under the terms of the BSD License (see the file LICENSE included with
+the distribution).
 """
+
 from MACS2.Constants import *
 from logging import debug, info
 import numpy as np
@@ -357,7 +351,7 @@ cdef class PETrackI:
        return
    
    @cython.boundscheck(False) # do not check that np indices are valid
-    def filter_dup ( self, int maxnum=-1):
+    cpdef unsigned long filter_dup ( self, int maxnum=-1):
        """Filter the duplicated reads.
    
        Run it right after you add all data into this object.
@@ -371,7 +365,7 @@ cdef class PETrackI:
            str k
            np.ndarray locs, new_locs
                
-        if maxnum < 0: return # condition to return if not filtering
+        if maxnum < 0: return self.total # condition to return if not filtering
        
        if not self.__sorted: self.sort()
        
@@ -432,7 +426,7 @@ cdef class PETrackI:
    
            self.__locations[k] = new_locs
        self.average_template_length = float( self.length ) / self.total
-        return
+        return self.total

    def sample_percent (self, float percent, int seed = -1):
        """Sample the tags for a given percentage.

--- a/MACS2/IO/Parser.c
+++ b/MACS2/IO/Parser.c