Skip to content
Commits on Source (6)
......@@ -217,3 +217,7 @@
proteinortho_grab_proteins.pl speedup for -exact and a given proteinortho file
proteinortho6.pl replaced chomp with s/[\r\n]+$//
proteinortho_clustering.cpp fix bug that only uses lapack if -pld is set, regardless of the value.
11. Sept (uid: 3813)
updated shebang of ffadj such that python2.7 is used directly (ffadj fails if called with higher version of python)
-p=blastp is now alias of blastp+ and legacy blast is now -p=blastp_legacy (blastn is equivalent)
Makefile: static now includes -lquadmath
......@@ -132,7 +132,7 @@ ifeq ($(USELAPACK),TRUE)
ifeq ($(USEPRECOMPILEDLAPACK),TRUE)
ifeq ($(STATIC),TRUE)
@echo "[ 20%] Building **proteinortho_clustering** with LAPACK (static linking)";
@$(CXX) $(CXXFLAGS) $(CXXFLAGS_PO) -fopenmp -o $@ $< $(LDFLAGS) $(LDLIBS) -static -Wl,--allow-multiple-definition -llapack -lblas -lgfortran -pthread -Wl,--whole-archive -lpthread -Wl,--no-whole-archive && ([ $$? -eq 0 ] ) || ( \
@$(CXX) $(CXXFLAGS) $(CXXFLAGS_PO) -fopenmp -o $@ $< $(LDFLAGS) $(LDLIBS) -static -Wl,--allow-multiple-definition -llapack -lblas -lgfortran -lquadmath -pthread -Wl,--whole-archive -lpthread -Wl,--no-whole-archive && ([ $$? -eq 0 ] ) || ( \
echo "......$(ORANGE)static linking failed, now I try dynamic linking.$(NC)"; \
$(CXX) $(CXXFLAGS) $(CXXFLAGS_PO) -fopenmp -o $@ $< $(LDFLAGS) $(LDLIBS) -llapack -lblas -pthread -Wl,--whole-archive -lpthread -Wl,--no-whole-archive && ([ $$? -eq 0 ] && echo "......OK dynamic linking was successful for proteinortho_clustering!";) || ( \
echo "......$(ORANGE)dynamic linking failed too, now I try dynamic linking without -WL,-whole-archive (this should now work for OSX).$(NC)"; \
......
......@@ -39,8 +39,8 @@ You can also send a mail to lechner@staff.uni-marburg.de.
# Installation
**Proteinortho comes with precompiled binaries of all executables (Linux/x86) so just run the proteinortho6.pl in the downloaded directory.**
You could also move all executables to your favorite bin directory (e.g. with make install PREFIX=/home/paul/bin).
**Proteinortho comes with precompiled binaries of all executables (Linux/x86) so you should be able to run perl proteinortho6.pl in the downloaded directory.**
You could also move all executables to your favorite directory (e.g. with make install PREFIX=/home/paul/bin).
If you cannot execute the src/BUILD/Linux_x86_64/proteinortho_clustering, then you have to recompile with make, see the section 2. Building and installing proteinortho from source.
<br>
......@@ -73,6 +73,21 @@ If you need brew (see [here](https://brew.sh/index_de))
<br>
#### Easy installation with dpkg (root privileges are required)
The deb package can be downloaded here: [https://packages.debian.org/unstable/proteinortho](https://packages.debian.org/unstable/proteinortho).
Afterwards the deb package can be installed with `sudo dpkg -i proteinortho*deb`.
<br>
#### *(Easy installation with apt-get)*
**! Disclamer: Work in progress !**
*proteinortho will be released to stable with Debian 11 (~2021), then proteinortho can be installed with `sudo apt-get install proteinortho` (currently this installes the outdated version v5.16b)*
<br>
#### 1. Prerequisites
Proteinortho uses standard software which is often installed already or is part of then package repositories and can thus easily be installed. The sources come with a precompiled version of Proteinortho for 64bit Linux.
......@@ -94,7 +109,7 @@ Proteinortho uses standard software which is often installed already or is part
- mmseqs2 (conda install mmseqs2, https://github.com/soedinglab/MMseqs2)
- Perl v5.08 or higher (to test this, type perl -v in the command line)
- Python v2.6.0 or higher to include synteny analysis (to test this, type 'python -V' in the command line)
- Perl modules: Thread::Queue, File::Basename, Pod::Usage, threads (if you miss one just install with `cpan install Thread::Queue` )
- Perl standard modules (these should come with Perl): Thread::Queue, File::Basename, Pod::Usage, threads (if you miss one just install with `cpan install ...` )
</details>
<br>
......@@ -113,9 +128,9 @@ Proteinortho uses standard software which is often installed already or is part
#### 2. Building and installing proteinortho from source (linux and osx)
Here you <i>can</i> use a working lapack library, check this with 'dpkg --get-selections | grep lapack'. Install lapack e.g. with 'apt-get install libatlas3-base' or liblapack3.
Here you can use a working lapack library, check this with 'dpkg --get-selections | grep lapack'. Install lapack e.g. with 'apt-get install libatlas3-base' or liblapack3.
If you dont have one (or you have no root permissions), then 'make' will automatically compile a lapack (v3.8.0) for you !
If you dont have Lapack, then 'make' will automatically compiles Lapack v3.8.0 for you !
Fetch the latest source code archive downloaded from <a href="https://gitlab.com/paulklemm_PHD/proteinortho/-/archive/master/proteinortho-master.zip">here</a>
<details> <summary>or from here (Click to expand)</summary>
......@@ -281,7 +296,7 @@ Open `proteinorthoHelper.html` in your favorite browser or visit [lechnerlab.de/
<details>
<summary>show all algorithms (Click to expand)</summary>
- blastn,blastp,tblastx : legacy blast family (shell commands: blastall -) family. The suffix 'n' or 'p' indicates nucleotide or protein input files.
- blastn_legacy,blastp_legacy,tblastx_legacy : legacy blast family (shell commands: blastall -) family. The suffix 'n' or 'p' indicates nucleotide or protein input files.
- blastn+,blastp+,tblastx+ : standard blast family (shell commands: blastn,blastp,tblastx)
family. The suffix 'n' or 'p' indicates nucleotide or protein input files.
......
proteinortho (6.0.7+dfsg-1) UNRELEASED; urgency=medium
* New upstream version
* debhelper-compat 12
* Use 2to3 to port to Python3
-- Andreas Tille <tille@debian.org> Mon, 16 Sep 2019 12:41:53 +0200
proteinortho (6.0.6+dfsg-1) unstable; urgency=medium
[ Paul Klemm ]
......
......@@ -3,7 +3,7 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.
Uploaders: Andreas Tille <tille@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 12~),
Build-Depends: debhelper-compat (= 12),
ncbi-blast+,
liblapack-dev | libatlas-base-dev | liblapack.so,
diamond-aligner
......
Description: Use 2to3 to port to Python3
Author: Andreas Tille <tille@debian.org>
Last-Update: Mon, 16 Sep 2019 12:41:53 +0200
--- a/src/proteinortho_ffadj_mcs.py
+++ b/src/proteinortho_ffadj_mcs.py
@@ -1,9 +1,9 @@
-#!/usr/bin/env python2.7
+#!/usr/bin/python3
-from sys import stdout, stderr, exit, argv, maxint
+from sys import stdout, stderr, exit, argv, maxsize
from copy import deepcopy
from bisect import bisect
-from itertools import izip, product
+from itertools import product
from os.path import basename, dirname
from random import randint
from math import ceil
@@ -46,7 +46,7 @@ class Run:
adjTerm = 0
if len(self.weight) > 1:
adjTerm = sum([self.weight[i] * self.weight[i+1] for i in
- xrange(len(self.weight)-1)])
+ range(len(self.weight)-1)])
edgeTerm = sum([w **2 for w in self.weight])
# edgeTerm = max(self.weight)**2
return alpha * adjTerm + (1-alpha) * edgeTerm
@@ -101,9 +101,9 @@ def readDistsAndOrder(data, edgeThreshol
if edgeWeight < edgeThreshold:
continue
- if not g1_chromosomes.has_key(chr1):
+ if chr1 not in g1_chromosomes:
g1_chromosomes[chr1] = set()
- if not g2_chromosomes.has_key(chr2):
+ if chr2 not in g2_chromosomes:
g2_chromosomes[chr2] = set()
g1_chromosomes[chr1].add(g1)
@@ -124,19 +124,19 @@ def readDistsAndOrder(data, edgeThreshol
# add telomeres
for t1, t2 in product(tel1, tel2):
- if not res.has_key(t1):
+ if t1 not in res:
res[t1] = dict()
res[t1][t2] = (DIRECTION_BOTH_STRANDS, 1)
-# res[maxint] = dict([
-# (maxint, (DIRECTION_WATSON_STRAND, 1)),
+# res[maxsize] = dict([
+# (maxsize, (DIRECTION_WATSON_STRAND, 1)),
# (0, (DIRECTION_WATSON_STRAND, 1)),
-# (maxint, (DIRECTION_CRICK_STRAND, 1)),
+# (maxsize, (DIRECTION_CRICK_STRAND, 1)),
# (0, (DIRECTION_CRICK_STRAND, 1))])
-# res[maxint] = dict([
-# (maxint, (DIRECTION_WATSON_STRAND, 1)),
+# res[maxsize] = dict([
+# (maxsize, (DIRECTION_WATSON_STRAND, 1)),
# (0, (DIRECTION_WATSON_STRAND, 1)),
-# (maxint, (DIRECTION_CRICK_STRAND, 1)),
+# (maxsize, (DIRECTION_CRICK_STRAND, 1)),
# (0, (DIRECTION_CRICK_STRAND, 1))])
return hasMultipleChromosomes, g1, g2, res
@@ -148,20 +148,20 @@ def establish_linear_genome_order(chromo
g.append((k, -1))
telomeres.add((k, -1))
g.extend([(k, i) for i in sorted(chromosomes[k])])
- g.append((k, maxint))
- telomeres.add((k, maxint))
+ g.append((k, maxsize))
+ telomeres.add((k, maxsize))
return telomeres, g
def insertIntoRunList(runs, runList):
- keys = map(lambda x: x.getWeight(alpha), runList)
+ keys = [x.getWeight(alpha) for x in runList]
for run in runs:
i = bisect(keys, run.getWeight(alpha))
keys.insert(i, run.getWeight(alpha))
runList.insert(i, run)
def checkMatching(g1, g2, g1_runs, g2_runs, runs, dist):
- g1pos = dict(izip(g1, xrange(len(g1))))
- g2pos = dict(izip(g2, xrange(len(g2))))
+ g1pos = dict(zip(g1, range(len(g1))))
+ g2pos = dict(zip(g2, range(len(g2))))
if len(g1) != len(g2):
@@ -177,7 +177,7 @@ def checkMatching(g1, g2, g1_runs, g2_ru
r_counter = 0
prev_run = None
c_adj = 0
- for i in xrange(len(g1)):
+ for i in range(len(g1)):
if not g1_runs[i]:
logging.error('Gene %s is not included in any run' %g1[i])
continue
@@ -213,7 +213,7 @@ def checkMatching(g1, g2, g1_runs, g2_ru
missing_runs = all_included.symmetric_difference(runs)
if missing_runs:
logging.error(('Additional runs in runslist that are not part in the' + \
- ' matching: %s') %(map(str, missing_runs)))
+ ' matching: %s') %(list(map(str, missing_runs))))
logging.info('Number of adjacencies is %s in matching of size %s.' %(c_adj,
len(g1)))
@@ -222,7 +222,7 @@ def checkMatching(g1, g2, g1_runs, g2_ru
logging.error(('Sum of run lengths does not equal matching size! Sum ' + \
'of run lengths: %s, matching size: %s') % (r_counter, len(g1)))
- for j in xrange(len(g2)):
+ for j in range(len(g2)):
if not g2_runs[j]:
logging.error('Gene %s is not included in any run' %g2[j])
if len(g2_runs[j]) > 1:
@@ -262,8 +262,8 @@ def checkMatching(g1, g2, g1_runs, g2_ru
'Weights: %s, run length: %s, run: %s') %(len(r.weight),
g1pos[r.endG1] - g1pos[r.startG1], r))
- g1_chromosomes = set(map(lambda x: x[0], g1[g1pos[r.startG1]:g1pos[r.endG1]+1]))
- g2_chromosomes = set(map(lambda x: x[0], g2[g2pos[r.startG2]:g2pos[r.endG2]+1]))
+ g1_chromosomes = set([x[0] for x in g1[g1pos[r.startG1]:g1pos[r.endG1]+1]])
+ g2_chromosomes = set([x[0] for x in g2[g2pos[r.startG2]:g2pos[r.endG2]+1]])
if len(g1_chromosomes) != 1 and len(g2_chromosomes) != 1:
logging.error(('Number of chromosomes on G1 (#chrs: %s) or G2 ' + \
'(#chrs: %s) in run %s is not 1 (Meaning that possibly' + \
@@ -281,7 +281,7 @@ def checkMatching(g1, g2, g1_runs, g2_ru
run_ends[r.startG1] = (r.direction, r.endG2)
run_ends[r.endG1] = (r.direction, r.startG2)
- for i in xrange(len(g1)-1):
+ for i in range(len(g1)-1):
g1i = g1[i]
g1i2 = g1[i+1]
if g1i in run_ends and g1i2 in run_ends and run_ends[g1i][0] == \
@@ -290,13 +290,13 @@ def checkMatching(g1, g2, g1_runs, g2_ru
g2i = run_ends[g1i][1]
g2i2 = run_ends[g1i2][1]
if direction == DIRECTION_CRICK_STRAND and g2pos[g2i] == g2pos[g2i2]-1:
- logging.error('Runs %s and %s could be merged, but are not!' % (map(str, g1_runs[i])[0], map(str, g1_runs[i+1])[0]))
+ logging.error('Runs %s and %s could be merged, but are not!' % (list(map(str, g1_runs[i]))[0], list(map(str, g1_runs[i+1]))[0]))
elif direction == DIRECTION_WATSON_STRAND and g2pos[g2i] == g2pos[g2i2]+1:
- logging.error('Runs %s and %s could be merged, but are not!' % (map(str, g1_runs[i])[0], map(str, g1_runs[i+1])[0]))
+ logging.error('Runs %s and %s could be merged, but are not!' % (list(map(str, g1_runs[i]))[0], list(map(str, g1_runs[i+1]))[0]))
def getAllRuns(g1, g2, d):
- g2pos = dict(izip(g2, xrange(len(g2))))
+ g2pos = dict(zip(g2, range(len(g2))))
g1_runs = [set() for _ in g1]
g2_runs = [set() for _ in g2]
@@ -305,7 +305,7 @@ def getAllRuns(g1, g2, d):
reportedRuns= list()
- for i in xrange(len(g1)):
+ for i in range(len(g1)):
curPos = g1[i]
@@ -355,7 +355,7 @@ def getAllRuns(g1, g2, d):
# if no edge exists, nothing has to be done...
if e:
- for (g2_gene, (direction, weight)) in d[curPos].items():
+ for (g2_gene, (direction, weight)) in list(d[curPos].items()):
if (direction, g2_gene) not in forbiddenRunStarts:
j = g2pos[g2_gene]
if isinstance(direction, BothStrands):
@@ -391,12 +391,12 @@ def replaceByNew(g1_runs, g2_runs, i, j,
break
def doMatching(g1, g2, g1_runs, g2_runs, m, runList):
- g1pos = dict(izip(g1, xrange(len(g1))))
- g2pos = dict(izip(g2, xrange(len(g2))))
+ g1pos = dict(zip(g1, range(len(g1))))
+ g2pos = dict(zip(g2, range(len(g2))))
newRuns = set()
- for k in xrange(g1pos[m.endG1] - g1pos[m.startG1] + 1):
+ for k in range(g1pos[m.endG1] - g1pos[m.startG1] + 1):
i = g1pos[m.startG1] + k
j = g2pos[m.startG2] + k
@@ -516,13 +516,13 @@ def doMatching(g1, g2, g1_runs, g2_runs,
insertIntoRunList(newRuns, runList)
def mergeRuns(mod_g1, g1, g2, g1_runs, g2_runs, runList, alreadyMatched):
- g1pos = dict(izip(g1, xrange(len(g1))))
- g2pos = dict(izip(g2, xrange(len(g2))))
+ g1pos = dict(zip(g1, range(len(g1))))
+ g2pos = dict(zip(g2, range(len(g2))))
newRuns = set()
wSrt = lambda x: x.getWeight(alpha)
mod_g1 = list(mod_g1)
- for x in xrange(len(mod_g1)):
+ for x in range(len(mod_g1)):
g1i = mod_g1[x]
i = g1pos[g1i]
if len(g1) < i+2:
@@ -601,18 +601,18 @@ def removeSingleGenes(genome, genome_run
def findRandomRunSequence(g1, g2, dists, topXperCent):
g2dists = dict()
- for g1i, x in dists.items():
- for g2j, d in x.items():
+ for g1i, x in list(dists.items()):
+ for g2j, d in list(x.items()):
if g2j not in g2dists:
g2dists[g2j] = dict()
g2dists[g2j][g1i] = d
# copy g1, g2 and dists map, because we'll modify it. Also remove all genes
# that do not contain edges.
- g1 = [x for x in g1 if dists.has_key(x) and len(dists[x])]
- g2 = [x for x in g2 if g2dists.has_key(x) and len(g2dists[x])]
+ g1 = [x for x in g1 if x in dists and len(dists[x])]
+ g2 = [x for x in g2 if x in g2dists and len(g2dists[x])]
- g1pos = dict(izip(g1, xrange(len(g1))))
+ g1pos = dict(zip(g1, range(len(g1))))
g1_runs, g2_runs, runs = getAllRuns(g1, g2, dists)
logging.info('Found %s runs.' %len(runs))
@@ -621,7 +621,7 @@ def findRandomRunSequence(g1, g2, dists,
res = set()
while runList:
- noOfAdjacencies = len(filter(lambda x: x.getWeight(alpha) and x.getWeight(alpha) or 0, runList))
+ noOfAdjacencies = len([x for x in runList if x.getWeight(alpha) and x.getWeight(alpha) or 0])
if noOfAdjacencies:
randPos = randint(1, ceil(noOfAdjacencies * topXperCent))
else:
@@ -645,7 +645,7 @@ def findRandomRunSequence(g1, g2, dists,
for g in del_g1.intersection(mod_g1):
mod_g1.remove(g)
- g1pos = dict(izip(g1, xrange(len(g1))))
+ g1pos = dict(zip(g1, range(len(g1))))
# add new modification points
mod_g1.update(new_mod_g1)
@@ -653,7 +653,7 @@ def findRandomRunSequence(g1, g2, dists,
if del_g2:
logging.info('Zombie genes removed from G2: %s' %', '.join(map(str, del_g2)))
for g2j in mod_g2:
- for g1i, (d, _) in g2dists[g2j].items():
+ for g1i, (d, _) in list(g2dists[g2j].items()):
if g1i in g1:
if d == DIRECTION_CRICK_STRAND:
mod_g1.add(g1i)
@@ -665,8 +665,8 @@ def findRandomRunSequence(g1, g2, dists,
runList, res)
if res:
- logging.info('Matching finished. Longest run size is %s.' %(max(map(len,
- res))))
+ logging.info('Matching finished. Longest run size is %s.' %(max(list(map(len,
+ res)))))
else:
logging.info('Matching finished, but no runs found. Empty input?')
@@ -681,19 +681,19 @@ def repeatMatching(g1, g2, g1_mod, g2_mo
g2_runs_res = g2_runs
selectedRuns_res = list()
- g1pos = dict(izip(g1_mod, xrange(len(g1_mod))))
- g2pos = dict(izip(g2_mod, xrange(len(g2_mod))))
+ g1pos = dict(zip(g1_mod, range(len(g1_mod))))
+ g2pos = dict(zip(g2_mod, range(len(g2_mod))))
noReps = repMatching
while repMatching:
- for i in xrange(len(g1_runs)):
+ for i in range(len(g1_runs)):
run_set = g1_runs[i]
if len(run_set) != 1:
logging.error(('Expected run, set length of 1, but was told' + \
' different: %s.') %(', '.join(map(str, run_set))))
- run = run_set.__iter__().next()
+ run = next(run_set.__iter__())
g1i = g1_mod[i]
@@ -720,11 +720,11 @@ def repeatMatching(g1, g2, g1_mod, g2_mo
len(g1_mod), noReps-repMatching+2))
# remove runs that fall below min length of minCsSize
- ff = lambda x: len(x.__iter__().next()) >= minCsSize
- g1_mod = [g1_mod[i] for i in xrange(len(g1_mod)) if ff(g1_runs[i])]
- g2_mod = [g2_mod[i] for i in xrange(len(g2_mod)) if ff(g2_runs[i])]
- g1_runs = filter(ff, g1_runs)
- g2_runs = filter(ff, g2_runs)
+ ff = lambda x: len(next(x.__iter__())) >= minCsSize
+ g1_mod = [g1_mod[i] for i in range(len(g1_mod)) if ff(g1_runs[i])]
+ g2_mod = [g2_mod[i] for i in range(len(g2_mod)) if ff(g2_runs[i])]
+ g1_runs = list(filter(ff, g1_runs))
+ g2_runs = list(filter(ff, g2_runs))
selectedRuns = set([s for s in selectedRuns if len(s) >= minCsSize])
# stop if no runs were found matching the criteria
@@ -736,10 +736,10 @@ def repeatMatching(g1, g2, g1_mod, g2_mo
logging.info('%s feasible runs retained.' %len(selectedRuns))
# reconciliate with result data
- g2pos = dict(izip(g2_mod, xrange(len(g2_mod))))
- g1pos = dict(izip(g1_mod, xrange(len(g1_mod))))
- g2pos_res = dict(izip(g2_mod_res, xrange(len(g2_mod_res))))
- g1pos_res = dict(izip(g1_mod_res, xrange(len(g1_mod_res))))
+ g2pos = dict(zip(g2_mod, range(len(g2_mod))))
+ g1pos = dict(zip(g1_mod, range(len(g1_mod))))
+ g2pos_res = dict(zip(g2_mod_res, range(len(g2_mod_res))))
+ g1pos_res = dict(zip(g1_mod_res, range(len(g1_mod_res))))
chr_srt = lambda x, y: x[0] == y[0] and (x[1] < y[1] and -1 or 1) or (x[0] < y[0] and -1 or 1)
g1_mod_new = sorted(set(g1_mod_res + g1_mod), cmp=chr_srt)
@@ -749,17 +749,17 @@ def repeatMatching(g1, g2, g1_mod, g2_mo
for g1i in g1_mod_new:
x = set()
- if g1pos_res.has_key(g1i):
+ if g1i in g1pos_res:
x.update(g1_runs_res[g1pos_res[g1i]])
- if g1pos.has_key(g1i):
+ if g1i in g1pos:
x.update(g1_runs[g1pos[g1i]])
g1_runs_new.append(x)
for g2j in g2_mod_new:
x = set()
- if g2pos_res.has_key(g2j):
+ if g2j in g2pos_res:
x.update(g2_runs_res[g2pos_res[g2j]])
- if g2pos.has_key(g2j):
+ if g2j in g2pos:
x.update(g2_runs[g2pos[g2j]])
g2_runs_new.append(x)
@@ -776,21 +776,21 @@ def repeatMatching(g1, g2, g1_mod, g2_mo
def printMatching(g1, g2, g1_runs, hasMultipleChromosomes, out):
if hasMultipleChromosomes:
- print >> f, 'Chr(G1)\tG1\tChr(G2)\tG2\tdirection\tedge weight'
+ print('Chr(G1)\tG1\tChr(G2)\tG2\tdirection\tedge weight', file=f)
else:
- print >> f, 'G1\tG2\tdirection\tedge weight'
+ print('G1\tG2\tdirection\tedge weight', file=f)
- g2pos = dict(izip(g2, xrange(len(g2))))
- g1pos = dict(izip(g1, xrange(len(g1))))
+ g2pos = dict(zip(g2, range(len(g2))))
+ g1pos = dict(zip(g1, range(len(g1))))
cur_index = dict()
- for i in xrange(len(g1_runs)):
+ for i in range(len(g1_runs)):
run_set = g1_runs[i]
for run in run_set:
g1i = g1[i]
j = 0
- if cur_index.has_key(run):
+ if run in cur_index:
j = cur_index[run]
if run.direction == DIRECTION_CRICK_STRAND:
g2j = g2[g2pos[run.startG2] + j]
@@ -800,22 +800,22 @@ def printMatching(g1, g2, g1_runs, hasMu
direction = run.direction == DIRECTION_CRICK_STRAND and '1' or '-1'
g1i1 = g1i[1] == -1 and 'TELOMERE_START' or g1i[1]
- g1i1 = g1i[1] == maxint and 'TELOMERE_END' or g1i1
+ g1i1 = g1i[1] == maxsize and 'TELOMERE_END' or g1i1
g2j1 = g2j[1] == -1 and 'TELOMERE_START' or g2j[1]
- g2j1 = g2j[1] == maxint and 'TELOMERE_END' or g2j1
+ g2j1 = g2j[1] == maxsize and 'TELOMERE_END' or g2j1
if hasMultipleChromosomes:
- print >> f, '%s\t%s\t%s\t%s\t%s\t%s' %(g1i[0], g1i1, g2j[0],
- g2j1, direction, run.weight[j])
+ print('%s\t%s\t%s\t%s\t%s\t%s' %(g1i[0], g1i1, g2j[0],
+ g2j1, direction, run.weight[j]), file=f)
else:
- print >> f, '%s\t%s\t%s\t%s' %(g1i1, g2j1, direction,
- run.weight[j])
+ print('%s\t%s\t%s\t%s' %(g1i1, g2j1, direction,
+ run.weight[j]), file=f)
cur_index[run] = j+1
if __name__ == '__main__':
if len(argv) < 3 or len(argv) > 8:
- print '\tusage: %s <DIST FILE> <ALPHA> [ <EDGE WEIGHT THRESHOLD> --repeat-matching (-R) <NUMBER >= 2> --min-cs-size (-M) <NUMBER >= 1> ]' %argv[0]
+ print('\tusage: %s <DIST FILE> <ALPHA> [ <EDGE WEIGHT THRESHOLD> --repeat-matching (-R) <NUMBER >= 2> --min-cs-size (-M) <NUMBER >= 1> ]' %argv[0])
exit(1)
repMatching= '--repeat-matching' in argv or '-R' in argv
@@ -826,8 +826,8 @@ if __name__ == '__main__':
minCsSize = int(argv[pos+1])
argv = argv[:pos] + argv[pos+2:]
if not repMatching:
- print >> stderr, ('Argument --min-cs-size (-M) only valid in ' + \
- 'combination with --repeat-matching (-R)')
+ print(('Argument --min-cs-size (-M) only valid in ' + \
+ 'combination with --repeat-matching (-R)'), file=stderr)
exit(1)
else:
minCsSize = 1
@@ -869,7 +869,7 @@ if __name__ == '__main__':
# sum of weights of adjacencies
wAdj = sum([r.getWeight(1) for r in selectedRuns])
# sum of weights of all edges of the matching
- wEdg = sum([sum(map(lambda x: x**2, r.weight)) for r in selectedRuns])
+ wEdg = sum([sum([x**2 for x in r.weight]) for r in selectedRuns])
edg = sum(map(len, selectedRuns))
@@ -892,6 +892,6 @@ if __name__ == '__main__':
' is %s with #edg = %s, adj(M) = %.3f and edg(M) = %.3f') %(bkp, edg,
wAdj, wEdg))
- print '#bkp\t#edg\tadj\tedg'
- print '%s\t%s\t%.6f\t%.6f' %(bkp, edg, wAdj, wEdg)
+ print('#bkp\t#edg\tadj\tedg')
+ print('%s\t%s\t%.6f\t%.6f' %(bkp, edg, wAdj, wEdg))
......@@ -58,8 +58,8 @@ You can also send a mail to lechner@staff.uni-marburg.de.</p>
<h1 id="installation">Installation</h1>
<p><strong>Proteinortho comes with precompiled binaries of all executables (Linux/x86) so just run the proteinortho6.pl in the downloaded directory.</strong>
You could also move all executables to your favorite bin directory (e.g. with make install PREFIX=/home/paul/bin).
<p><strong>Proteinortho comes with precompiled binaries of all executables (Linux/x86) so you should be able to run perl proteinortho6.pl in the downloaded directory.</strong>
You could also move all executables to your favorite directory (e.g. with make install PREFIX=/home/paul/bin).
If you cannot execute the src/BUILD/Linux<em>x86</em>64/proteinortho_clustering, then you have to recompile with make, see the section 2. Building and installing proteinortho from source.</p>
<p><br></p>
......@@ -95,6 +95,20 @@ If you cannot execute the src/BUILD/Linux<em>x86</em>64/proteinortho_clustering,
<p><br></p>
<h4 id="easyinstallationwithdpkgrootprivilegesarerequired">Easy installation with dpkg (root privileges are required)</h4>
<p>The deb package can be downloaded here: <a href="https://packages.debian.org/unstable/proteinortho">https://packages.debian.org/unstable/proteinortho</a>.
Afterwards the deb package can be installed with <code>sudo dpkg -i proteinortho*deb</code>.</p>
<p><br></p>
<h4 id="easyinstallationwithaptget"><em>(Easy installation with apt-get)</em></h4>
<p><strong>! Disclamer: Work in progress !</strong>
<em>proteinortho will be released to stable with Debian 11 (~2021), then proteinortho can be installed with <code>sudo apt-get install proteinortho</code> (currently this installes the outdated version v5.16b)</em></p>
<p><br></p>
<h4 id="1prerequisites">1. Prerequisites</h4>
<p>Proteinortho uses standard software which is often installed already or is part of then package repositories and can thus easily be installed. The sources come with a precompiled version of Proteinortho for 64bit Linux.</p>
......@@ -128,7 +142,7 @@ If you cannot execute the src/BUILD/Linux<em>x86</em>64/proteinortho_clustering,
<li><p>Python v2.6.0 or higher to include synteny analysis (to test this, type 'python -V' in the command line) </p></li>
<li><p>Perl modules: Thread::Queue, File::Basename, Pod::Usage, threads (if you miss one just install with <code>cpan install Thread::Queue</code> )
<li><p>Perl standard modules (these should come with Perl): Thread::Queue, File::Basename, Pod::Usage, threads (if you miss one just install with <code>cpan install ...</code> )
</details></p></li>
</ul>
......@@ -154,9 +168,9 @@ If you cannot execute the src/BUILD/Linux<em>x86</em>64/proteinortho_clustering,
<h4 id="2buildingandinstallingproteinorthofromsourcelinuxandosx">2. Building and installing proteinortho from source (linux and osx)</h4>
<p>Here you <i>can</i> use a working lapack library, check this with 'dpkg --get-selections | grep lapack'. Install lapack e.g. with 'apt-get install libatlas3-base' or liblapack3.</p>
<p>Here you can use a working lapack library, check this with 'dpkg --get-selections | grep lapack'. Install lapack e.g. with 'apt-get install libatlas3-base' or liblapack3.</p>
<p>If you dont have one (or you have no root permissions), then 'make' will automatically compile a lapack (v3.8.0) for you !</p>
<p>If you dont have Lapack, then 'make' will automatically compiles Lapack v3.8.0 for you !</p>
<p>Fetch the latest source code archive downloaded from <a href="https://gitlab.com/paulklemm_PHD/proteinortho/-/archive/master/proteinortho-master.zip">here</a>
<details> <summary>or from here (Click to expand)</summary></p>
......@@ -345,7 +359,7 @@ blast. 3 -> run the clustering.</p></li></ul>
<p><details>
<summary>show all algorithms (Click to expand)</summary></p>
<pre><code>- blastn,blastp,tblastx : legacy blast family (shell commands: blastall -) family. The suffix 'n' or 'p' indicates nucleotide or protein input files.
<pre><code>- blastn_legacy,blastp_legacy,tblastx_legacy : legacy blast family (shell commands: blastall -) family. The suffix 'n' or 'p' indicates nucleotide or protein input files.
- blastn+,blastp+,tblastx+ : standard blast family (shell commands: blastn,blastp,tblastx)
family. The suffix 'n' or 'p' indicates nucleotide or protein input files.
......
#!/usr/bin/perl
#!/usr/bin/env perl
##########################################################################################
# This file is part of Proteinortho.
......@@ -59,7 +59,7 @@ To enhance the prediction accuracy, the relative order of genes (synteny) can be
Proteinortho assumes, that you have all your gene sequences in FASTA format either
represented as amino acids or as nucleotides. The source code archive contains some examples, namely C.faa, E.faa, L.faa, M.faa located in the test/ directory.
I<By default Proteinortho assumes amino> I<acids and thus uses diamond> (-p=diamond) to compare sequences. If you have nucleotide sequences, you need to change this by adding the parameter
-p=blastn+ (or some other algorithm). (In case you have only have NCBI BLAST legacy installed, you need to tell this too - either by adding -p=blastp or -p=blastn respectively.)
-p=blastn+ (or some other algorithm). (In case you have only have NCBI BLAST legacy installed, you need to tell this too - either by adding -p=blastp_legacy or -p=blastn_legacy respectively.)
The full command for the example files would thus be proteinortho6.pl -project=test test/C.faa test/E.faa test/L.faa test/M.faa.
Instead of naming the FASTA files one by one, you could also supply test/*.faa as argument.
Please note that the parameter -project=test is optional. With this, you can set the prefix of the output files generated by Proteinortho.
......@@ -119,7 +119,7 @@ removes all database files generated by the -p= algorithm afterwards
=item B<--p>=algorithm (default: diamond)
B<blastn>,B<blastp>,B<tblastx>,B<blastn+>,B<blastp+>,B<tblastx+> : standard blast family. The suffix 'n' or 'p' indicates nucleotide or protein version (of the input files).
B<blastn>,B<blastp>,B<tblastx>,B<blastn+>,B<blastp+>,B<tblastx+> : standard blast family. The suffix 'n' or 'p' indicates nucleotide or protein version (of the input files). Use *_legacy for legacy blast.
B<diamond> : Only for protein files! standard diamond procedure and for genes/proteins of length >40 with the additional --sensitive flag
......@@ -567,7 +567,7 @@ foreach my $option (@ARGV) {
elsif ($option =~ m/^--?debug$/) { $debug = 1; }
elsif ($option =~ m/^--?exactstep3$/) { $exactstep3 = 1; }
elsif ($option =~ m/^--?debug=([\da-zA-Z_]*)$/) { $debug = $1; }
elsif ($option =~ m/^--?p=(.*)$/) { $blastmode = $1; }
elsif ($option =~ m/^--?p=(.*)$/) { $blastmode = $1; if($blastmode eq "blastn"){$blastmode.="+";} if($blastmode eq "blastp"){$blastmode.="+";}}
elsif ($option =~ m/^--?e=(.*)$/) { $evalue = $1; }
elsif ($option =~ m/^--?cpus=(\d*)$/) { $cpus = $1; }
elsif ($option =~ m/^--?cpus=auto$/) { $cpus = 0; }
......@@ -627,8 +627,8 @@ $po_path = &get_po_path(); # Determine local path
our $nucleotideAlphabet="ACGTURYSWKMBDHVNXacgturyswkmbdhvnx\.\-";
our $aminoAlphabet="XOUBZACDEFGHIKLMNPQRSTVWYxoubzacdefghiklmnpqrstvwy\.\*\-";
our $allowedAlphabet = {
'blastn' => 'n' ,
'blastp' => 'a' ,
'blastn_legacy' => 'n' ,
'blastp_legacy' => 'a' ,
'tblastx' => 'n' ,
'blastn+' => 'n' ,
'blastp+' => 'a' ,
......@@ -645,8 +645,8 @@ our $allowedAlphabet = {
'topaz' => 'a' };
##MARK_FOR_NEW_BLAST_ALGORITHM
our $blastmode_pendant = { # if you choose blastp+ and the input is nucleotide -> choose the pendant blastn+ and restart (only if you check files with -checkfasta)
'blastn' => 'blastp' ,
'blastp' => 'blastn' ,
'blastn_legacy' => 'blastp_legacy' ,
'blastp_legacy' => 'blastn_legacy' ,
'blastn+' => 'blastp+' ,
'blastp+' => 'blastn+' ,
'rapsearch' => 'a' ,
......@@ -989,6 +989,7 @@ Options:
{blast*|tblastx|blast*+|tblastx+|diamond|usearch|ublast|lastp|lastn|rapsearch|topaz|*blat*|mmseqs*}
blast*|tblastx : standard blast family (blastp : protein files, blastn : dna files)
blast*+|tblastx+ : standard blastal family (blastp+ : protein files, blastn+ : dna files)
blast*_legacy : legacy blast family
diamond : Only for protein files! standard diamond procedure and for genes/proteins of length >40 with the additional --sensitive flag
usearch : usearch_local procedure with -id 0 (minimum identity percentage).
ublast : usearch_ublast procedure.
......@@ -1867,7 +1868,7 @@ sub blast {
}
}
if ($blastmode eq "blastp" || $blastmode eq "blastn" || $blastmode eq "tblastx") {lock($threads_per_process); $command = $binpath."blastall -a $threads_per_process -d '$a.$blastmode' -i '$_[1]' -p $blastmode -m8 -e $evalue $blastOptions $printSTDERR";}
if ($blastmode eq "blastp_legacy" || $blastmode eq "blastn_legacy" || $blastmode eq "tblastx_legacy") {lock($threads_per_process); $command = $binpath."blastall -a $threads_per_process -d '$a.$blastmode' -i '$_[1]' -p $blastmode -m8 -e $evalue $blastOptions $printSTDERR";}
elsif ($blastmode eq "blastp+") {lock($threads_per_process); $command = $binpath."blastp -num_threads $threads_per_process -db '$a.$blastmode' -query '$_[1]' -evalue $evalue -outfmt 6 $blastOptions $printSTDERR";}
elsif ($blastmode eq "blastn+") {lock($threads_per_process); $command = $binpath."blastn -num_threads $threads_per_process -db '$a.$blastmode' -query '$_[1]' -evalue $evalue -outfmt 6 $blastOptions $printSTDERR";}
elsif ($blastmode eq "tblastx+") {lock($threads_per_process); $command = $binpath."tblastx -num_threads $threads_per_process -db '$a.$blastmode' -query '$_[1]' -evalue $evalue -outfmt 6 $blastOptions $printSTDERR";}
......@@ -2046,14 +2047,14 @@ sub check_bins {
}elsif ($blastmode eq "blast+") {
&Error("Please call -p=blastp+ for protein datasets and -p=blastn+ for nucleotide datasets (and -p=tblastx+ for translated query/db).");
}
elsif ($blastmode eq "blastp" || $blastmode eq "blastn" || $blastmode eq "tblastx") {
elsif ($blastmode eq "blastp_legacy" || $blastmode eq "blastn_legacy" || $blastmode eq "tblastx_legacy") {
my $cmd = $binpath."blastall";
my @blastv = qx($cmd 2>&1);
foreach (@blastv) {
$_=~s/[\r\n]+$//;
if ($_ =~ /blastall.+?([^\s]+)/) {
my $versionnumber = $1;
if ($blastmode eq "blastp") {$makedb = $binpath."formatdb -p T -o F -i";}
if ($blastmode eq "blastp_legacy") {$makedb = $binpath."formatdb -p T -o F -i";}
elsif ($blastmode eq "blastn") {$makedb = $binpath."formatdb -p F -o F -i";}
elsif ($blastmode eq "tblastx") {$makedb = $binpath."formatdb -p F -o F -i";}
else {&Error("This should not happen! Please submit the FASTA file(s) and the parameter vector (above to incoming+paulklemm-phd-proteinortho-7278443-issue-\@incoming.gitlab.com to help fixing this issue.");}
......@@ -2064,7 +2065,7 @@ sub check_bins {
}
&Error("Failed to detect '$blastmode'! Tried to call '$binpath/blastall'.\nPlease install $blastmode in $binpath (or specify another binpath with -binpath=/home/...)");
}elsif ($blastmode eq "blast") {
&Error("Please call -p=blastp for protein datasets and -p=blastn for nucleotide datasets.");
&Error("Please call -p=blastp_legacy for protein datasets and -p=blastn for nucleotide datasets.");
}
elsif ($blastmode eq "topaz") {
my $cmd = $binpath."topaz -h";
......
#!/usr/bin/perl
#!/usr/bin/env perl
#pk
##########################################################################################
......
#!/usr/bin/perl
#!/usr/bin/env perl
##########################################################################################
# This file is part of proteinortho.
......
#!/usr/bin/perl
#!/usr/bin/env perl
use strict;
use warnings "all";
......
#!/usr/bin/perl
#!/usr/bin/env perl
#pk
##########################################################################################
......
#!/usr/bin/perl
#!/usr/bin/env perl
use warnings;
use strict;
......
#!/usr/bin/python
#!/usr/bin/env python2.7
from sys import stdout, stderr, exit, argv, maxint
from copy import deepcopy
......
#!/usr/bin/perl
#!/usr/bin/env perl
use warnings;
use strict;
......
#!/usr/bin/perl
#!/usr/bin/env perl
#pk
##########################################################################################
......
#!/usr/bin/perl
#!/usr/bin/env perl
use strict;
use warnings "all";
......