Commits on Source (5)
# Change log
All notable changes to this project will be documented in this file.
## Version 2.5.2 2019-05-16
- Made some improvements in command-line version of 'tn5gaps'
- Added flags for trimming insertions in N- and C-termini of genes for tn5gaps (-iN and -iC)
## Version 2.5.1 2019-04-25
- Add support for [handling interactions in ZINB](
- Fix selection bug for gff3 in GUI
## Version 2.5.0 2019-03-28
- Added analysis method for Zero-Inflated Negative Binomial ([ZINB](
- Fix LOESS flag bug in resampling 2.4.2
- Resampling supports combined_wig files
- Change ordering of metadata and annotation file in ANOVA cmd
## Version 2.4.2 2019-03-15
#### TPP:
- updated docs for TPP; expanded discussion of protocols, including Mme1
- for Mme1, change min read length from 20bp to 15bp (for genomic part of read1)
- replaced '-himar1' and 'tn5' flags with '-protocol [sassetti|tn5|mme1]'
- added 'auto' for -replicon-ids
- added 'pre-trimmed' as option for transposon in TPP GUI (prefix="")
- [resampling can now be done between TnSeq libraries from different strains](
- add documentation for 'griffin' and Mann-Whitney 'utest' analysis methods
## Version 2.4.1 2019-03-04
#### TPP:
- allow the primer sequence to be the empty string (i.e. -primer "" on command-line; for pre-trimmed reads)
- do not throw an error if header ids in read1 and read2 fastq files happen to match identically
- minor bug fixes:
- fixed problem of order of data in tn_stats table when there are multiple contigs but only single-ended reads
- fixed name of flag from "replicon-id" to "replicon-ids"
- prevent div-by-zero error in cases where no reads map
## Version 2.4.0 2019-02-28
#### TPP:
- **can now handle genomes with multiple contigs** (thanks to modifications by Robert Jenquin and William Matern); it creates multiple .wig files as output
- BWA: switched from using 'aln' to 'mem' by default
- added flags to set the nucleotide window for searching for start of primer sequence (-primer-window-start)
- fixed bug in counting misprimed reads, and reads mapped to both R1 and R2
- added some fields to TPP GUI, and made it more consistent about saving/reading parameters in the tpp.cfg config file
#### Transit:
- fixed bug in handling '-minreads' flag in Gumbel analysis
- updated support for converting .gff files to .prot_table format (in GUI and on command line)
- added a status field to ANOVA output
- TrackView scales all plots simultaneously by default
- updated documentation
## Pull Request 18 by Robert Jenquin and William Matern (Jan, 2019)
- Added the ability to accept multiple replicons in the form of either multiline reference genomes or multiple reference genome files.
- Added `-bwa-alg` argument, allowing the user to specify `mem` or `aln` to use `bwa mem` or `bwa aln` algorithms
- Now requires `-replicon-id` argument to specify names for the replicons if multiple reference genomes given (respective order to order appearing in reference genome(s)
- Code cleanup: closing dangling file handles
- Bug fix: if adapter is at exact end of R1, it is now properly handled
- Bug fix: trimmed\_reads now counted properly
- Added support for specifying `-window-size` argument
- **Sample usage:**
python2 src/ -himar1 -bwa /usr/bin/bwa -bwa-alg aln -ref MAC109_genome.fa -replicon-id CP029332 CP029333 CP029334 -reads1 ../HJKK5BCX2_ATGCTG_1.fastq -reads2 ../HJKK5BCX2_ATGCTG_2.fastq -primer AACCTGTTA -mismatches 2 -window-size 6 -output tpp_output/avium
- Explanation of arguments
- `-himar1` specifies that the Himar1 transposon was used in the transposon mutagenesis procedure. Tn5 is also supported (`-tn5`)
- `-bwa` specifies the path to the `bwa` executable
- `-bwa-alg` specifies either `mem` or `aln` algorithms for `bwa` to use. `aln` is widely considered obselete to `mem` for reads of length > 70bp. `aln` is default.
- `-ref` specifies the reference genome(s) in FASTA format to which reads will be mapped. If more than one, they can be specified in either multiple FASTAs, or as a multilined FASTA (or a combination of both).
- `-replicon-id [contig1 contig2 ...]` specifies the names of the contigs in the genome(s). These are used as filename suffixes for output files (ie \*\_contig1.wig, \*\_contig2.wig, etc). The order of the contigs is assumed to be the same as they appear in the reference genome(s) (as given with `-ref`). Specifying this option is only required if there is more than one contig. Note: While you can technically use any contig name at this step, if you wish to use `` to organize the data you should use the contig names as they appear in the Genbank file (as specified by ` -g`).
- `-reads1` specifies the file containing the raw reads (untrimmed) for read1 in FASTQ or FASTA format
- `-reads2` specifies the file containing the raw reads (untrimmed) for read2 in FASTQ or FASTA format
- `-primer` specifies a nucleotide sequence at the end of the transposon, is used to separate transposon DNA from genomic DNA in read 1.
- `-window-size` specifies how many positions to look for `-primer` within read 1. It should be set to at least the difference between the maximum and minumum expected positions of the first base of genomic DNA in read 1 (and larger if you want to allow for insertions/deletions). For the Long et al 2015 protocol (using a pool of 4 shifting prefixes) the window-size should be at least 6. Default value is 6.
- `-mismatches` specifies the number of mismatches to allow when searching for the transposon in read 1 (ie number of mismatches to `-primer`).
- `-output` specifies the filename prefix to be applied to output files. Can include directories, allowing custom paths to be specified.
## Version 2.3.4 2019-01-14
- Minor bug fixes related to flags in Resampling and HMM
From r-base:3.4.1
RUN apt-get update -y && apt-get install -y -f python2 python-dev python-pip
ADD src/ /src
ADD tests/ /tests
RUN pip install pytest 'numpy~=1.15' 'scipy~=1.2' 'matplotlib~=2.2' 'pillow~=5.0' 'statsmodels~=0.9' 'rpy2<2.9.0'
RUN R -e "install.packages('MASS')"
RUN R -e "install.packages('pscl')"
CMD [ "pytest", "./tests" ]
# TRANSIT 2.3.4
# TRANSIT 2.5.1
[![Build Status](]( [![Documentation Status](]( [![Downloads](](
[![Build Status](]( [![Documentation Status](](
Welcome! This is the distribution for the TRANSIT and TPP tools developed by the [Ioerger Lab]( at Texas A&M University.
Welcome! This is the distribution for the TRANSIT and TPP tools developed by the Ioerger Lab at Texas A&M University.
TRANSIT is a tool for processing and statistical analysis of Tn-Seq data.
TRANSIT is a tool for processing and statistical analysis of Tn-Seq data.
It provides an easy to use graphical interface and access to three different analysis methods that allow the user to determine essentiality in a single condition as well as between conditions.
TRANSIT Home page:
......@@ -18,8 +17,8 @@ TRANSIT Documentation:
## Features
TRANSIT offers a variety of features including:
- More than **8 analysis methods**, including methods for determining **conditional essentiality** as well as **genetic interactions**.
- More than **10 analysis methods**, including methods for determining **conditional essentiality** as well as **genetic interactions**.
- Ability to analyze datasets from libraries constructed using **himar1 or tn5 transposons**.
tnseq-transit (2.5.2-1) unstable; urgency=medium
* New upstream version
* Asked upstrem for Python3 port
-- Andreas Tille <> Tue, 09 Jul 2019 15:21:53 +0200
tnseq-transit (2.3.4-1) unstable; urgency=medium
* New upstream version
......@@ -2,6 +2,6 @@
__all__ = ["transit_tools", "tnseq_tools", "norm_tools", "stat_tools"]
__version__ = "v2.3.4"
__version__ = "v2.5.2"
prefix = "[TRANSIT]"
......@@ -21,11 +21,12 @@ import pytransit
import pytransit.transit_tools as transit_tools
import pytransit.analysis
import pytransit.export
import pytransit.convert
method_wrap_width = 250
methods = pytransit.analysis.methods
export_methods = pytransit.export.methods
convert_methods = pytransit.convert.methods
all_methods = {}
......@@ -48,10 +49,20 @@ def main(*args, **kwargs):
if (not args and 'h' in kwargs):
if (not args and ('v' in kwargs or '-version' in kwargs)):
print "Version: {0}".format(pytransit.__version__)
if (not args and ('h' in kwargs or '-help' in kwargs)):
print "For commandline mode, please use one of the known methods (or see documentation to add a new one):"
print("Analysis methods: ")
for m in all_methods:
## TODO :: Move normalize to separate subcommand?
if (m == "normalize"): continue
print "\t - %s" % m
print("Other functions: ")
print("\t - normalize")
print("\t - convert")
print("\t - export")
print "Usage: python %s <method>" % sys.argv[0]
......@@ -73,7 +84,7 @@ def main(*args, **kwargs):
#start the applications
# Tried GUI mode but has no wxPython
elif not (args or kwargs) and not hasWx:
print "Please install wxPython to run in GUI Mode."
......@@ -93,7 +104,7 @@ def main(*args, **kwargs):
export_method_name = ""
if len(args) > 1:
export_method_name = args[1]
if export_method_name not in export_methods:
print "Error: Need to specify the export method."
print "Please use one of the known methods (or see documentation to add a new one):"
......@@ -103,7 +114,20 @@ def main(*args, **kwargs):
methodobj = export_methods[export_method_name].method.fromconsole()
elif method_name.lower() == "convert":
convert_method_name = ""
if len(args) > 1:
convert_method_name = args[1]
if convert_method_name not in convert_methods:
print "Error: Need to specify the convert method."
print "Please use one of the known methods (or see documentation to add a new one):"
for m in convert_methods:
print "\t - %s" % m
print "Usage: python %s convert <method>" % sys.argv[0]
methodobj = convert_methods[convert_method_name].method.fromconsole()
print "Error: The '%s' method is unknown." % method_name
print "Please use one of the known methods (or see documentation to add a new one):"
......@@ -20,7 +20,9 @@ import utest
import normalize
import pathway_enrichment #08/22/2018 by Ivan
import anova
import zinb
import tnseq_stats
import winsorize
methods = {}
methods["example"] = example.ExampleAnalysis()
......@@ -34,12 +36,14 @@ methods["rankproduct"] = rankproduct.RankProductAnalysis()
methods["utest"] = utest.UTestAnalysis()
methods["GI"] = gi.GIAnalysis()
methods["anova"] = anova.AnovaAnalysis()
methods["zinb"] = zinb.ZinbAnalysis()
#methods["mcce"] = mcce.MCCEAnalysis()
#methods["mcce2"] = mcce2.MCCE2Analysis()
#methods["motifhmm"] = motifhmm.MotifHMMAnalysis()
methods["normalize"] = normalize.Normalize()
methods["winsorize"] = winsorize.Winsorize()
import norm
......@@ -33,9 +33,9 @@ class AnovaMethod(base.MultiConditionMethod):
def __init__(self, combined_wig, metadata, annotation, normalization, output_file, ignored_conditions=set()):
base.MultiConditionMethod.__init__(self, short_name, long_name, short_desc, long_desc, combined_wig, metadata, annotation, output_file, normalization=normalization)
self.ignored_conditions = ignored_conditions
def __init__(self, combined_wig, metadata, annotation, normalization, output_file, ignored_conditions=[], included_conditions=[], nterm=0.0, cterm=0.0):
base.MultiConditionMethod.__init__(self, short_name, long_name, short_desc, long_desc, combined_wig, metadata, annotation, output_file,
normalization=normalization, ignored_conditions=ignored_conditions, included_conditions=included_conditions, nterm=nterm, cterm=cterm)
def fromargs(self, rawargs):
......@@ -46,13 +46,21 @@ class AnovaMethod(base.MultiConditionMethod):
combined_wig = args[0]
annotation = args[1]
metadata = args[2]
annotation = args[2]
metadata = args[1]
output_file = args[3]
normalization = kwargs.get("n", "TTR")
ignored_conditions = set(kwargs.get("-ignore-conditions", "Unknown").split(","))
NTerminus = float(kwargs.get("iN", 0.0))
CTerminus = float(kwargs.get("iC", 0.0))
ignored_conditions = filter(None, kwargs.get("-ignore-conditions", "").split(","))
included_conditions = filter(None, kwargs.get("-include-conditions", "").split(","))
if len(included_conditions) > 0 and len(ignored_conditions) > 0:
print(self.transit_error("Cannot use both include-conditions and ignore-conditions flags"))
return self(combined_wig, metadata, annotation, normalization, output_file, ignored_conditions)
return self(combined_wig, metadata, annotation, normalization, output_file, ignored_conditions, included_conditions, NTerminus, CTerminus)
def wigs_to_conditions(self, conditionsByFile, filenamesInCombWig):
......@@ -60,7 +68,7 @@ class AnovaMethod(base.MultiConditionMethod):
({FileName: Condition}, [FileName]) -> [Condition]
Condition :: [String]
return [conditionsByFile.get(f, "Unknown") for f in filenamesInCombWig]
return [conditionsByFile.get(f, self.unknown_cond_flag) for f in filenamesInCombWig]
def means_by_condition_for_gene(self, sites, conditions, data):
......@@ -69,45 +77,12 @@ class AnovaMethod(base.MultiConditionMethod):
Site :: Number
Condition :: String
nTASites = len(sites)
wigsByConditions = collections.defaultdict(lambda: [])
for i, c in enumerate(conditions):
return { c: numpy.mean(data[wigIndex][:, sites]) for (c, wigIndex) in wigsByConditions.items() }
def filter_by_conditions_blacklist(self, data, conditions, ignored_conditions):
Filters out wigfiles, with ignored conditions.
([[Wigdata]], [Condition]) -> Tuple([[Wigdata]], [Condition])
d_filtered, cond_filtered = [], [];
for i, c in enumerate(conditions):
if c not in ignored_conditions:
return (numpy.array(d_filtered), numpy.array(cond_filtered))
def read_samples_metadata(self, metadata_file):
Filename -> ConditionMap
ConditionMap :: {Filename: Condition}
wigFiles = []
conditionsByFile = {}
headersToRead = ["condition", "filename"]
with open(metadata_file) as mfile:
lines = mfile.readlines()
headIndexes = [i
for h in headersToRead
for i, c in enumerate(lines[0].split())
if c.lower() == h]
for line in lines:
if line[0]=='#': continue
vals = line.split()
[condition, wfile] = vals[headIndexes[0]], vals[headIndexes[1]]
conditionsByFile[wfile] = condition
return conditionsByFile
return { c: numpy.mean(data[wigIndex][:, sites]) if nTASites > 0 else 0 for (c, wigIndex) in wigsByConditions.items() }
def means_by_rv(self, data, RvSiteindexesMap, genes, conditions):
......@@ -121,8 +96,7 @@ class AnovaMethod(base.MultiConditionMethod):
MeansByRv = {}
for gene in genes:
Rv = gene["rv"]
if len(RvSiteindexesMap[gene["rv"]]) > 0: # skip genes with no TA sites
MeansByRv[Rv] = self.means_by_condition_for_gene(RvSiteindexesMap[Rv], conditions, data)
MeansByRv[Rv] = self.means_by_condition_for_gene(RvSiteindexesMap[Rv], conditions, data)
return MeansByRv
def group_by_condition(self, wigList, conditions):
......@@ -134,10 +108,12 @@ class AnovaMethod(base.MultiConditionMethod):
DataForCondition :: [Number]
countsByCondition = collections.defaultdict(lambda: [])
countSum = 0
for i, c in enumerate(conditions):
countSum += numpy.sum(wigList[i])
return [numpy.array(v).flatten() for v in countsByCondition.values()]
return (countSum, [numpy.array(v).flatten() for v in countsByCondition.values()])
def run_anova(self, data, genes, MeansByRv, RvSiteindexesMap, conditions):
......@@ -152,15 +128,25 @@ class AnovaMethod(base.MultiConditionMethod):
count = 0
pvals,Rvs = [],[]
pvals,Rvs,status = [],[],[]
for gene in genes:
count += 1
Rv = gene["rv"]
if Rv in MeansByRv:
countsvec = self.group_by_condition(map(lambda wigData: wigData[RvSiteindexesMap[Rv]], data), conditions)
stat,pval = scipy.stats.f_oneway(*countsvec)
if (len(RvSiteindexesMap[Rv]) <= 1):
status.append("TA sites <= 1")
countSum, countsVec = self.group_by_condition(map(lambda wigData: wigData[RvSiteindexesMap[Rv]], data), conditions)
if (countSum == 0):
pval = 1
status.append("No counts in all conditions")
stat,pval = scipy.stats.f_oneway(*countsVec)
# Update progress
text = "Running Anova Method... %5.1f%%" % (100.0*count/len(genes))
......@@ -171,10 +157,10 @@ class AnovaMethod(base.MultiConditionMethod):
qvals = numpy.full(pvals.shape, numpy.nan)
qvals[mask] = statsmodels.stats.multitest.fdrcorrection(pvals[mask])[1] # BH, alpha=0.05
p,q = {},{}
p,q,statusMap = {},{},{}
for i,rv in enumerate(Rvs):
p[rv],q[rv] = pvals[i],qvals[i]
return (p, q)
p[rv],q[rv],statusMap[rv] = pvals[i],qvals[i],status[i]
return (p, q, statusMap)
def Run(self):
self.transit_message("Starting Anova analysis")
......@@ -186,43 +172,50 @@ class AnovaMethod(base.MultiConditionMethod):
self.transit_message("Normalizing using: %s" % self.normalization)
(data, factors) = norm_tools.normalize_data(data, self.normalization)
conditionsByFile, _, _, _ = tnseq_tools.read_samples_metadata(self.metadata)
conditions = self.wigs_to_conditions(
data, conditions = self.filter_by_conditions_blacklist(data, conditions, self.ignored_conditions)
data, conditions, _, _ = self.filter_wigs_by_conditions(data, conditions, ignored_conditions = self.ignored_conditions, included_conditions = self.included_conditions)
genes = tnseq_tools.read_genes(self.annotation_path)
TASiteindexMap = {TA: i for i, TA in enumerate(sites)}
RvSiteindexesMap = tnseq_tools.rv_siteindexes_map(genes, TASiteindexMap)
RvSiteindexesMap = tnseq_tools.rv_siteindexes_map(genes, TASiteindexMap, nterm=self.NTerminus, cterm=self.CTerminus)
MeansByRv = self.means_by_rv(data, RvSiteindexesMap, genes, conditions)
self.transit_message("Running Anova")
pvals,qvals = self.run_anova(data, genes, MeansByRv, RvSiteindexesMap, conditions)
pvals,qvals,run_status = self.run_anova(data, genes, MeansByRv, RvSiteindexesMap, conditions)
self.transit_message("Adding File: %s" % (self.output))
file = open(self.output,"w")
conditionsList = list(set(conditions))
vals = "Rv Gene TAs".split() + conditionsList + "pval padj".split()
conditionsList = self.included_conditions if len(self.included_conditions) > 0 else list(set(conditions))
heads = ("Rv Gene TAs".split() +
conditionsList +
"pval padj".split() + ["status"])
file.write("#Console: python %s\n" % " ".join(sys.argv))
for gene in genes:
Rv = gene["rv"]
if Rv in MeansByRv:
vals = ([Rv, gene["gene"], str(len(RvSiteindexesMap[Rv]))] +
["%0.1f" % MeansByRv[Rv][c] for c in conditionsList] +
["%f" % x for x in [pvals[Rv], qvals[Rv]]])
["%0.2f" % MeansByRv[Rv][c] for c in conditionsList] +
["%f" % x for x in [pvals[Rv], qvals[Rv]]] + [run_status[Rv]])
self.transit_message("Finished Anova analysis")
self.transit_message("Time: %0.1fs\n" % (time.time() - start_time))
def usage_string(self):
return """python %s anova <combined wig file> <annotation .prot_table> <samples_metadata file> <output file> [Optional Arguments]
return """python %s anova <combined wig file> <samples_metadata file> <annotation .prot_table> <output file> [Optional Arguments]
Optional Arguments:
-n <string> := Normalization method. Default: -n TTR
--ignore-conditions <cond1,cond2> := Comma seperated list of conditions to ignore, for the analysis. Default --ignore-conditions Unknown
--ignore-conditions <cond1,cond2> := Comma separated list of conditions to ignore, for the analysis. Default --ignore-conditions Unknown
--include-conditions <cond1,cond2> := Comma separated list of conditions to include, for the analysis. Conditions not in this list, will be ignored.
-iN <float> := Ignore TAs occuring within given percentage (as integer) of the N terminus. Default: -iN 0
-iC <float> := Ignore TAs occuring within given percentage (as integer) of the C terminus. Default: -iC 0
""" % (sys.argv[0])
......@@ -18,6 +18,7 @@ if hasWx:
import traceback
import datetime
import numpy
import pytransit.transit_tools as transit_tools
file_prefix = "[FileDisplay]"
......@@ -508,14 +509,56 @@ class MultiConditionMethod(AnalysisMethod):
Class to be inherited by analysis methods that compare essentiality between multiple conditions (e.g Anova).
def __init__(self, short_name, long_name, short_desc, long_desc, combined_wig, metadata, annotation_path, output, normalization=None, LOESS=False, ignoreCodon=True, wxobj=None):
def __init__(self, short_name, long_name, short_desc, long_desc, combined_wig, metadata, annotation_path, output, normalization=None, LOESS=False, ignoreCodon=True, wxobj=None, ignored_conditions=[], included_conditions=[], nterm=0.0, cterm=0.0):
AnalysisMethod.__init__(self, short_name, long_name, short_desc, long_desc, output,
annotation_path, wxobj)
self.combined_wig = combined_wig
self.metadata = metadata
self.normalization = normalization
self.ignoreCodon = ignoreCodon
self.NTerminus = nterm
self.CTerminus = cterm
self.unknown_cond_flag = "FLAG-UNMAPPED-CONDITION-IN-WIG"
self.ignored_conditions = ignored_conditions
self.included_conditions = included_conditions
def filter_wigs_by_conditions(self, data, conditions, covariates = [], interactions = [], ignored_conditions = [], included_conditions = []):
Filters conditions that are ignored/included.
([[Wigdata]], [Condition], [[Covar]], [Condition], [Condition]) -> Tuple([[Wigdata]], [Condition])
ignored_conditions, included_conditions = (set(ignored_conditions), set(included_conditions))
d_filtered, cond_filtered, filtered_indexes = [], [], [];
if len(ignored_conditions) > 0 and len(included_conditions) > 0:
self.transit_error("Both ignored and included conditions have len > 0", ignored_conditions, included_conditions)
elif (len(ignored_conditions) > 0):
self.transit_message("conditions ignored: {0}".format(ignored_conditions))
for i, c in enumerate(conditions):
if (c != self.unknown_cond_flag) and (c not in ignored_conditions):
elif (len(included_conditions) > 0):
self.transit_message("conditions included: {0}".format(included_conditions))
for i, c in enumerate(conditions):
if (c != self.unknown_cond_flag) and (c in included_conditions):
for i, c in enumerate(conditions):
if (c != self.unknown_cond_flag):
covariates_filtered = [[c[i] for i in filtered_indexes] for c in covariates]
interactions_filtered = [[c[i] for i in filtered_indexes] for c in interactions]
return (numpy.array(d_filtered),
......@@ -516,8 +516,8 @@ class BinomialMethod(base.SingleConditionMethod):
Optional Arguments:
-s <int> := Number of samples to take. Default: -s 10000
-b <int> := Number of burn-in samples to take. Default: -b 500
-iN <float> := Ignore TAs occuring at given fraction of the N terminus. Default: -iN 0.0
-iC <float> := Ignore TAs occuring at given fraction of the C terminus. Default: -iC 0.0
-iN <float> := Ignore TAs occuring at given percentage (as integer) of the N terminus. Default: -iN 0
-iC <float> := Ignore TAs occuring at given percentage (as integer) of the C terminus. Default: -iC 0
-pi0 <float> := Hyper-parameters for rho, non-essential genes. Default: -pi0 0.5
......@@ -901,8 +901,8 @@ class GIMethod(base.QuadConditionMethod):
-n <string> := Normalization method. Default: -n TTR
-iz := Include rows with zero accross conditions.
-l := Perform LOESS Correction; Helps remove possible genomic position bias. Default: Turned Off.
-iN <float> := Ignore TAs occuring at given fraction of the N terminus. Default: -iN 0.0
-iC <float> := Ignore TAs occuring at given fraction of the C terminus. Default: -iC 0.0
-iN <float> := Ignore TAs occuring at given percentage (as integer) of the N terminus. Default: -iN 0
-iC <float> := Ignore TAs occuring at given percentage (as integer) of the C terminus. Default: -iC 0
""" % (sys.argv[0])