Skip to content
Commits on Source (5)
......@@ -14,6 +14,7 @@ __pycache__/
# Distribution / packaging
.Python
env/
venv/
build/
develop-eggs/
dist/
......@@ -26,7 +27,7 @@ sdist/
var/
*.egg-info/
.installed.cfg
*.egg
*.egg*
# PyInstaller
# Usually these files are written by a python script from a template
......@@ -43,6 +44,7 @@ htmlcov/
.tox/
.coverage
.cache
out.card*
nosetests.xml
coverage.xml
......@@ -55,3 +57,10 @@ docs/_build/
# PyBuilder
target/
# PyCharm
.idea
# Mac files
.DS_Store
......@@ -8,7 +8,7 @@ addons:
- libgfortran3
- libncurses5-dev
python:
- '3.5'
- '3.6'
sudo: false
install:
- source ./install_dependencies.sh
......
# Change Log
## [Unreleased](https://github.com/sanger-pathogens/ariba/tree/HEAD)
[v2.14.2](https://github.com/sanger-pathogens/ariba/tree/v2.14.2) (2019-06-18)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.14.1...v2.14.2)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.13.1...HEAD)
**Fixed bugs:**
- Added Spades assembler into Docker file - RT ticket 660940
- Incremented release number
- Added LICENSE file into release distribution - RT ticket 660890
[v2.14.1](https://github.com/sanger-pathogens/ariba/tree/v2.14.1) (2019-06-13)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.14.0...v2.14.1)
**Fixed bugs:**
- Ariba fails to install from PyPI due to missing .h files in distribution. Related to MANIFEST.in change in [\#269](https://github.com/sanger-pathogens/ariba/pull/269)
- Fix for Issue [\#263](https://github.com/sanger-pathogens/ariba/issues/263)
[v2.14.0](https://github.com/sanger-pathogens/ariba/tree/v2.14.0) (2019-06-06)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.13.5...v2.14.0)
**Closed issues:**
- Reference dataset of ARG-ANNOT cannot be downloaded [\#265](https://github.com/sanger-pathogens/ariba/issues/265)
- unable to download mlst schemes [\#264](https://github.com/sanger-pathogens/ariba/issues/264)
- Several "At least one cluster failed!Stopping..." errors [\#261](https://github.com/sanger-pathogens/ariba/issues/261)
- Allow increasing cd-hit-est memory allocation [\#255](https://github.com/sanger-pathogens/ariba/issues/255)
- Ariba pubmlstget Error [\#240](https://github.com/sanger-pathogens/ariba/issues/240)
- segmentation fault when I import ariba [\#230](https://github.com/sanger-pathogens/ariba/issues/230)
- How to use spades and set minimum coverage [\#215](https://github.com/sanger-pathogens/ariba/issues/215)
**Merged pull requests:**
- Additional v2.14.0 updates [\#270](https://github.com/sanger-pathogens/ariba/pull/270) ([kpepper](https://github.com/kpepper))
- Added getref feature for NCBI's Bacterial Antimicrobial Resistance Reference Gene Database [\#269](https://github.com/sanger-pathogens/ariba/pull/269) ([schultzm](https://github.com/schultzm))
## [v2.13.5](https://github.com/sanger-pathogens/ariba/tree/v2.13.5) (2019-03-26)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.13.4...v2.13.5)
**Fixed bugs:**
- Ariba fails without --noclean depending on database [\#205](https://github.com/sanger-pathogens/ariba/issues/205)
**Closed issues:**
- Installation failed: clang: error: linker command failed with exit code 1 \(use -v to see invocation\) [\#245](https://github.com/sanger-pathogens/ariba/issues/245)
- virfinddb db not downloading properly [\#229](https://github.com/sanger-pathogens/ariba/issues/229)
- getref error with resfinder db [\#225](https://github.com/sanger-pathogens/ariba/issues/225)
**Merged pull requests:**
- Updated CHANGELOG.md [\#260](https://github.com/sanger-pathogens/ariba/pull/260) ([kpepper](https://github.com/kpepper))
- Bump version to 2.13.5 and fix Spades invocation issue [\#259](https://github.com/sanger-pathogens/ariba/pull/259) ([kpepper](https://github.com/kpepper))
- Minor code change to mitigate issue \#245 \(installation failure\) [\#258](https://github.com/sanger-pathogens/ariba/pull/258) ([kpepper](https://github.com/kpepper))
## [v2.13.4](https://github.com/sanger-pathogens/ariba/tree/v2.13.4) (2019-03-15)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.13.3...v2.13.4)
**Closed issues:**
- Ariba run signal 28 [\#252](https://github.com/sanger-pathogens/ariba/issues/252)
- NCBI database. [\#241](https://github.com/sanger-pathogens/ariba/issues/241)
**Merged pull requests:**
- Rebuilt CHANGELOG for v2.13.4 [\#257](https://github.com/sanger-pathogens/ariba/pull/257) ([kpepper](https://github.com/kpepper))
- Allow increasing cd-hit-est memory allocation \#255 [\#256](https://github.com/sanger-pathogens/ariba/pull/256) ([kpepper](https://github.com/kpepper))
## [v2.13.3](https://github.com/sanger-pathogens/ariba/tree/v2.13.3) (2019-01-02)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.13.2...v2.13.3)
**Merged pull requests:**
- TB D94A fix [\#251](https://github.com/sanger-pathogens/ariba/pull/251) ([martinghunt](https://github.com/martinghunt))
## [v2.13.2](https://github.com/sanger-pathogens/ariba/tree/v2.13.2) (2018-12-21)
[Full Changelog](https://github.com/sanger-pathogens/ariba/compare/v2.13.1...v2.13.2)
**Merged pull requests:**
- Update tb panel [\#250](https://github.com/sanger-pathogens/ariba/pull/250) ([martinghunt](https://github.com/martinghunt))
- Added changelog [\#248](https://github.com/sanger-pathogens/ariba/pull/248) ([ssjunnebo](https://github.com/ssjunnebo))
- Update python min version [\#247](https://github.com/sanger-pathogens/ariba/pull/247) ([ssjunnebo](https://github.com/ssjunnebo))
## [v2.13.1](https://github.com/sanger-pathogens/ariba/tree/v2.13.1) (2018-11-16)
......@@ -596,3 +671,6 @@
- Initial working version [\#1](https://github.com/sanger-pathogens/ariba/pull/1) ([martinghunt](https://github.com/martinghunt))
\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
\ No newline at end of file
FROM ubuntu:17.10
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install --no-install-recommends -y \
ENV DEBIAN_FRONTEND=noninteractive
MAINTAINER ariba-help@sanger.ac.uk
# Software version numbers
ARG BOWTIE2_VERSION=2.2.9
ARG SPADES_VERSION=3.13.1
ARG ARIBA_VERSION=2.14.2
RUN apt-get -qq update && \
apt-get install --no-install-recommends -y \
build-essential \
cd-hit \
curl \
......@@ -9,7 +18,6 @@ RUN apt-get install --no-install-recommends -y \
libbz2-dev \
liblzma-dev \
mummer \
python \
python3-dev \
python3-setuptools \
python3-pip \
......@@ -19,17 +27,25 @@ RUN apt-get install --no-install-recommends -y \
wget \
zlib1g-dev
RUN wget -q http://downloads.sourceforge.net/project/bowtie-bio/bowtie2/2.2.9/bowtie2-2.2.9-linux-x86_64.zip \
&& unzip bowtie2-2.2.9-linux-x86_64.zip \
&& rm bowtie2-2.2.9-linux-x86_64.zip
RUN wget -q http://downloads.sourceforge.net/project/bowtie-bio/bowtie2/${BOWTIE2_VERSION}/bowtie2-${BOWTIE2_VERSION}-linux-x86_64.zip \
&& unzip bowtie2-${BOWTIE2_VERSION}-linux-x86_64.zip \
&& rm -f bowtie2-${BOWTIE2_VERSION}-linux-x86_64.zip
RUN wget -q https://github.com/ablab/spades/releases/download/v${SPADES_VERSION}/SPAdes-${SPADES_VERSION}-Linux.tar.gz \
&& tar -zxf SPAdes-${SPADES_VERSION}-Linux.tar.gz \
&& rm -f SPAdes-${SPADES_VERSION}-Linux.tar.gz
# Need MPLBACKEND="agg" to make matplotlib work without X11, otherwise get the error
# _tkinter.TclError: no display name and no $DISPLAY environment variable
ENV ARIBA_BOWTIE2=$PWD/bowtie2-2.2.9/bowtie2 ARIBA_CDHIT=cdhit-est MPLBACKEND="agg"
ENV ARIBA_BOWTIE2=$PWD/bowtie2-${BOWTIE2_VERSION}/bowtie2 ARIBA_CDHIT=cdhit-est MPLBACKEND="agg"
ENV PATH=$PATH:$PWD/SPAdes-${SPADES_VERSION}-Linux/bin
RUN cd /usr/local/bin && ln -s /usr/bin/python3 python && cd
RUN git clone https://github.com/sanger-pathogens/ariba.git \
&& cd ariba \
&& git checkout v2.12.0 \
&& git checkout v${ARIBA_VERSION} \
&& rm -rf .git \
&& python3 setup.py test \
&& python3 setup.py install
......
recursive-include third_party *.h
include LICENSE
include AUTHORS
\ No newline at end of file
......@@ -39,15 +39,15 @@ The input is a FASTA file of reference sequences (can be a mix of genes and nonc
## Quick Start
Get reference data, for instance from [CARD](https://card.mcmaster.ca/). See [getref](https://github.com/sanger-pathogens/ariba/wiki/Task%3A-getref) for a full list.
ariba getref card out.card
ariba getref ncbi out.ncbi
Prepare reference data for ARIBA:
ariba prepareref -f out.card.fa -m out.card.tsv out.card.prepareref
ariba prepareref -f out.ncbi.fa -m out.ncbi.tsv out.ncbi.prepareref
Run local assemblies and call variants:
ariba run out.card.prepareref reads1.fastq reads2.fastq out.run
ariba run out.ncbi.prepareref reads1.fastq reads2.fastq out.run
Summarise data from several runs:
......@@ -60,7 +60,7 @@ Please read the [ARIBA wiki page][ARIBA wiki] for full usage instructions.
If you encounter an issue when installing ARIBA please contact your local system administrator. If you encounter a bug please log it [here](https://github.com/sanger-pathogens/ariba/issues) or email us at ariba-help@sanger.ac.uk.
### Required dependencies
* [Python3][python] version >= 3.4.0
* [Python3][python] version >= 3.6.0
* [Bowtie2][bowtie2] version >= 2.1.0
* [CD-HIT][cdhit] version >= 4.6
* [MUMmer][mummer] version >= 3.23
......@@ -69,7 +69,7 @@ ARIBA also depends on several Python packages, all of which are available
via pip. Installing ARIBA with pip3 will get these automatically if they
are not already installed:
* dendropy >= 4.2.0
* matplotlib (no minimum version required, but only tested on 2.0.0)
* matplotlib>=3.1.0
* pyfastaq >= 3.12.0
* pysam >= 0.9.1
* pymummer >= 0.10.1
......@@ -85,10 +85,16 @@ Download the latest release from this github repository or clone it. Run the tes
python3 setup.py test
**Note for OS X:** The tests require gawk which will need to be installed separately, e.g. via Homebrew.
If the tests all pass, install:
python3 setup.py install
Alternatively, install directly from github using:
pip3 install git+https://github.com/sanger-pathogens/ariba.git #--user
### Docker
ARIBA can be run in a Docker container. First install Docker, then install ARIBA:
......@@ -98,6 +104,8 @@ To use ARIBA use a command like this (substituting in your directories), where y
docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/ariba ariba -h
When calling Ariba via Docker (as above) you'll also need to add **/data/** in front of all the passed in file or directory names (e.g. /data/my_output_folder).
### Debian (testing)
ARIBA is available in the latest version of Debian, and over time will progressively filter through to Ubuntu and other distributions which use Debian. To install it as root:
......@@ -219,5 +227,3 @@ Microbial Genomics 2017. doi: [110.1099/mgen.0.000131](http://mgen.microbiologyr
[ARIBA wiki]: https://github.com/sanger-pathogens/ariba/wiki
[mummer]: http://mummer.sourceforge.net/
[python]: https://www.python.org/
......@@ -140,7 +140,7 @@ class Assembly:
spades_out_seq_base = "contigs.fasta"
else:
raise ValueError("Unknown spades_mode value: {}".format(self.spades_mode))
asm_cmd = [spades_exe, "-t", str(self.threads), "--pe1-1", self.reads1, "--pe1-2", self.reads2, "-o", self.assembler_dir] + \
asm_cmd = ['python3', spades_exe, "-t", str(self.threads), "--pe1-1", self.reads1, "--pe1-2", self.reads2, "-o", self.assembler_dir] + \
spades_options
asm_ok,err = common.syscall(asm_cmd, verbose=True, verbose_filehandle=self.log_fh, shell=False, allow_fail=True)
if not asm_ok:
......
......@@ -13,6 +13,7 @@ class Runner:
seq_identity_threshold=0.9,
threads=1,
length_diff_cutoff=0.0,
memory_limit=None,
verbose=False,
min_cluster_number=0
):
......@@ -20,10 +21,14 @@ class Runner:
if not os.path.exists(infile):
raise Error('File not found: "' + infile + '". Cannot continue')
if (memory_limit is not None) and (memory_limit < 0):
raise Error('Input parameter cdhit_max_memory is set to an invalid value. Cannot continue')
self.infile = os.path.abspath(infile)
self.seq_identity_threshold = seq_identity_threshold
self.threads = threads
self.length_diff_cutoff = length_diff_cutoff
self.memory_limit = memory_limit
self.verbose = verbose
self.min_cluster_number = min_cluster_number
extern_progs = external_progs.ExternalProgs(fail_on_error=True, using_spades=False)
......@@ -133,15 +138,11 @@ class Runner:
return clusters
def run(self):
tmpdir = tempfile.mkdtemp(prefix='tmp.run_cd-hit.', dir=os.getcwd())
cdhit_fasta = os.path.join(tmpdir, 'cdhit')
cluster_info_outfile = cdhit_fasta + '.bak.clstr'
def get_run_cmd(self, output_file):
cmd = ' '.join([
self.cd_hit_est,
'-i', self.infile,
'-o', cdhit_fasta,
'-o', output_file,
'-c', str(self.seq_identity_threshold),
'-T', str(self.threads),
'-s', str(self.length_diff_cutoff),
......@@ -149,8 +150,21 @@ class Runner:
'-bak 1',
])
# Add in cdhit memory allocation if one has been specified
if self.memory_limit is not None:
cmd = ' '.join([cmd, '-M', str(self.memory_limit)])
return cmd
def run(self):
tmpdir = tempfile.mkdtemp(prefix='tmp.run_cd-hit.', dir=os.getcwd())
cdhit_fasta = os.path.join(tmpdir, 'cdhit')
cluster_info_outfile = cdhit_fasta + '.bak.clstr'
cmd = self.get_run_cmd(cdhit_fasta)
common.syscall(cmd, verbose=self.verbose)
clusters = self._get_clusters_from_bak_file(cluster_info_outfile, self.min_cluster_number)
common.rmtree(tmpdir)
return clusters
......@@ -168,7 +168,7 @@ class Clusters:
if self.verbose:
print('Temporary directory:', self.tmp_dir)
for i in [x for x in dir(signal) if x.startswith("SIG") and x not in {'SIGCHLD', 'SIGCLD'}]:
for i in [x for x in dir(signal) if x.startswith("SIG") and x not in {'SIGCHLD', 'SIGCLD', 'SIGPIPE', 'SIGTSTP', 'SIGCONT'}]:
try:
signum = getattr(signal, i)
signal.signal(signum, self._receive_signal)
......
......@@ -144,9 +144,14 @@ class ExternalProgs:
Returns tuple (bool, version). First element True iff found version ok.
Second element is version string (if found), otherwise an error message'''
assert prog in prog_to_version_cmd
cmd, regex = prog_to_version_cmd[prog]
cmd = path + ' ' + cmd
args, regex = prog_to_version_cmd[prog]
cmd = path + ' ' + args
if prog == 'spades':
cmd_output = subprocess.Popen(['python3', path, args], shell=False, stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()
else:
cmd_output = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
cmd_output = common.decode(cmd_output[0]).split('\n')[:-1] + common.decode(cmd_output[1]).split('\n')[:-1]
for line in cmd_output:
......
......@@ -86,8 +86,10 @@ def run_bowtie2(
if LooseVersion(bowtie2_version) >= LooseVersion('2.3.1'):
map_cmd.append('--score-min G,1,10')
# We use gawk instead of awk here as we need bitwise comparisons
# and these are not available via awk on Mac OSX.
if remove_both_unmapped:
map_cmd.append(r''' | awk ' !(and($2,4)) || !(and($2,8)) ' ''')
map_cmd.append(r''' | gawk ' !(and($2,4)) || !(and($2,8)) ' ''')
tmp_sam_file = out_prefix + '.unsorted.sam'
map_cmd.append(' > ' + tmp_sam_file)
......
......@@ -20,6 +20,7 @@ allowed_ref_dbs = {
'vfdb_core',
'vfdb_full',
'virulencefinder',
'ncbi',#added by schultzm
}
argannot_ref = '"ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes",\nGupta et al 2014, PMID: 24145532\n'
......@@ -461,7 +462,7 @@ class RefGenesGetter:
@classmethod
def _fix_virulencefinder_fasta_file(cls, infile, outfile):
'''Some line breaks are missing in the FASTA files from
viruslence finder. Which means there are lines like this:
virulence finder. Which means there are lines like this:
AAGATCCAATAACTGAAGATGTTGAACAAACAATTCATAATATTTATGGTCAATATGCTATTTTCGTTGA
AGGTGTTGCGCATTTACCTGGACATCTCTCTCCATTATTAAAAAAATTACTACTTAAATCTTTATAA>coa:1:BA000018.3
ATGAAAAAGCAAATAATTTCGCTAGGCGCATTAGCAGTTGCATCTAGCTTATTTACATGGGATAACAAAG
......@@ -541,5 +542,124 @@ class RefGenesGetter:
print('If you use this downloaded data, please cite:')
print('"Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli", Joensen al 2014, PMID: 24574290\n')
def _get_from_ncbi(self, outprefix, test=None): ## author github:schultzm
"""
Download the NCBI-curated Bacterial Antimicrobial Resistance Reference Gene Database.
Uses BioPython to do the data collection and extraction.
Author: schultzm (github) May 31, 2019.
>>> from Bio import Entrez
>>> import getpass
>>> import socket
>>> BIOPROJECT = "PRJNA313047"
>>> RETMAX = 100
>>> import getpass
>>> import socket
>>> Entrez.email = getpass.getuser()+'@'+socket.getfqdn()
>>> search_results = Entrez.read(Entrez.esearch(db="nucleotide",
... term=BIOPROJECT,
... retmax=RETMAX,
... usehistory="y",
... idtype="acc"))
>>> webenv = search_results["WebEnv"]
>>> query_key = search_results["QueryKey"]
>>> target_accn = " NG_061627.1"
>>> records = Entrez.efetch(db="nucleotide",
... rettype="gbwithparts",
... retmode="text",
... retstart=0,
... retmax=RETMAX,
... webenv=webenv,
... query_key=query_key,
... idtype="acc")
>>> from Bio.Alphabet import generic_dna
>>> from Bio import SeqIO
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> for gb_record in SeqIO.parse(records, "genbank"):
... if gb_record.id == 'NG_061627.1':
... gb_record
SeqRecord(seq=Seq('TAATCCTTGGAAACCTTAGAAATTGATGGAGGATCTTAACAAGATCCTGACATA...GGC', IUPACAmbiguousDNA()), id='NG_061627.1', name='NG_061627', description='Klebsiella pneumoniae SCKLB88 mcr-8 gene for phosphoethanolamine--lipid A transferase MCR-8.2, complete CDS', dbxrefs=['BioProject:PRJNA313047'])
"""
outprefix = os.path.abspath(outprefix)
final_fasta = outprefix + '.fa'
final_tsv = outprefix + '.tsv'
# Download the database as genbank using Bio.Entrez
from Bio import Entrez
import getpass
import socket
import sys
BIOPROJECT = "PRJNA313047" ## https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA313047
RETMAX=100000000
Entrez.email = getpass.getuser()+'@'+socket.getfqdn()
# See section 9.15 Using the history and WebEnv in
# http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:entrez-webenv
search_results = Entrez.read(Entrez.esearch(db="nucleotide",
term=BIOPROJECT,
retmax=RETMAX,
usehistory="y",
idtype="acc"))
acc_list = search_results["IdList"]
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]
if test:
return acc_list
#up to here
if len(acc_list) > 0:
print(f"E-fetching {len(acc_list)} genbank records from BioProject {BIOPROJECT} and writing to. This may take a while.", file=sys.stderr)
records = Entrez.efetch(db="nucleotide",
rettype="gbwithparts", retmode="text",
retstart=0, retmax=RETMAX,
webenv=webenv, query_key=query_key,
idtype="acc")
#pull out the records as fasta from the genbank
from Bio.Alphabet import generic_dna
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
print(f"Parsing genbank records.")
with open(final_fasta, "w") as f_out_fa, \
open(final_tsv, "w") as f_out_tsv:
for idx, gb_record in enumerate(SeqIO.parse(records, "genbank")):
print(f"'{gb_record.id}'")
n=0
record_new=[]
for index, feature in enumerate(gb_record.features):
if feature.type == 'CDS':
n+=1
gb_feature = gb_record.features[index]
id = None
try:
id = gb_feature.qualifiers["allele"]
except:
try:
try:
id = gb_feature.qualifiers["gene"]
except:
id = gb_feature.qualifiers["locus_tag"]
except KeyError:
print(f"gb_feature.qualifer not found", file=sys.stderr)
accession = gb_record.id
seq_out = Seq(str(gb_feature.extract(gb_record.seq)), generic_dna)
record_new.append(SeqRecord(seq_out,
id=f"{id[0]}.{accession}",
description=""))
if len(record_new) == 1:
print(f"Processing record {idx+1} of {len(acc_list)} (accession {accession})", file=sys.stderr)
f_out_fa.write(f"{record_new[0].format('fasta').rstrip()}\n")
f_out_tsv.write(f"{id[0]}.{accession}\t1\t0\t.\t.\t{gb_feature.qualifiers['product'][0]}\n")
if idx == len(acc_list)-1:
print('Finished. Final files are:', final_fasta, final_tsv, sep='\n\t', end='\n\n')
print('You can use them with ARIBA like this:')
print('ariba prepareref -f', final_fasta, '-m', final_tsv, 'output_directory\n')
else:
print(f"Nothing to do. Exiting.")
def run(self, outprefix):
exec('self._get_from_' + self.ref_db + '(outprefix)')
......@@ -19,6 +19,7 @@ class RefPreparer:
genetic_code=11,
cdhit_min_id=0.9,
cdhit_min_length=0.0,
cdhit_max_memory=None,
run_cdhit=True,
clusters_file=None,
threads=1,
......@@ -40,6 +41,7 @@ class RefPreparer:
self.genetic_code = genetic_code
self.cdhit_min_id = cdhit_min_id
self.cdhit_min_length = cdhit_min_length
self.cdhit_max_memory = cdhit_max_memory
self.run_cdhit = run_cdhit
self.clusters_file = clusters_file
self.threads = threads
......@@ -193,6 +195,7 @@ class RefPreparer:
seq_identity_threshold=self.cdhit_min_id,
threads=self.threads,
length_diff_cutoff=self.cdhit_min_length,
memory_limit=self.cdhit_max_memory,
nocluster=not self.run_cdhit,
verbose=self.verbose,
clusters_file=self.clusters_file,
......@@ -214,4 +217,4 @@ class RefPreparer:
print(' grep REMOVE', os.path.join(outdir, '01.filter.check_genes.log'), file=sys.stderr)
if number_of_bad_variants_logged > 0:
print('WARNING. Problem with at least one variant. Problem variants are rmoved. Please see the file', os.path.join(outdir, '01.filter.check_metadata.log'), 'for details.', file=sys.stderr)
print('WARNING. Problem with at least one variant. Problem variants are removed. Please see the file', os.path.join(outdir, '01.filter.check_metadata.log'), 'for details.', file=sys.stderr)
......@@ -434,7 +434,7 @@ class ReferenceData:
pyfastaq.utils.close(f_out)
def cluster_with_cdhit(self, outprefix, seq_identity_threshold=0.9, threads=1, length_diff_cutoff=0.0, nocluster=False, verbose=False, clusters_file=None):
def cluster_with_cdhit(self, outprefix, seq_identity_threshold=0.9, threads=1, length_diff_cutoff=0.0, memory_limit=None, nocluster=False, verbose=False, clusters_file=None):
clusters = {}
ReferenceData._write_sequences_to_files(self.sequences, self.metadata, outprefix)
ref_types = ('noncoding', 'noncoding.varonly', 'gene', 'gene.varonly')
......@@ -454,6 +454,7 @@ class ReferenceData:
seq_identity_threshold=seq_identity_threshold,
threads=threads,
length_diff_cutoff=length_diff_cutoff,
memory_limit=memory_limit,
verbose=verbose,
min_cluster_number = min_cluster_number,
)
......
......@@ -21,6 +21,7 @@ def run(options):
genetic_code=options.genetic_code,
cdhit_min_id=options.cdhit_min_id,
cdhit_min_length=options.cdhit_min_length,
cdhit_max_memory=options.cdhit_max_memory,
run_cdhit=not options.no_cdhit,
clusters_file=options.cdhit_clusters,
threads=options.threads,
......
import unittest
import os
import re
from ariba import cdhit, external_progs
modules_dir = os.path.dirname(os.path.abspath(cdhit.__file__))
data_dir = os.path.join(modules_dir, 'tests', 'data')
extern_progs = external_progs.ExternalProgs()
......@@ -13,6 +15,13 @@ class TestCdhit(unittest.TestCase):
cdhit.Runner('oopsnotafile', 'out')
def test_init_fail_invalid_memory(self):
'''test_init_fail_invalid_memory'''
infile = os.path.join(data_dir, 'cdhit_test_run.in.fa')
with self.assertRaises(cdhit.Error):
cdhit.Runner(infile, memory_limit=-10)
def test_get_clusters_from_bak_file(self):
'''test _get_clusters_from_bak_file'''
infile = os.path.join(data_dir, 'cdhit_test_get_clusters_from_bak_file.in')
......@@ -162,3 +171,30 @@ class TestCdhit(unittest.TestCase):
'1': {'seq3'},
}
self.assertEqual(clusters, expected_clusters)
def test_get_run_cmd_with_default_memory(self):
'''test_get_run_cmd_with_default_memory'''
fa_infile = os.path.join(data_dir, 'cdhit_test_run_get_clusters_from_dict_rename.in.fa')
r = cdhit.Runner(fa_infile)
run_cmd = r.get_run_cmd('foo/bar/file.out')
match = re.search('^.+ -o foo/bar/file.out -c 0.9 -T 1 -s 0.0 -d 0 -bak 1$', run_cmd)
self.assertIsNotNone(match, msg="Command output was " + run_cmd)
def test_get_run_cmd_with_non_default_memory(self):
'''test_get_run_cmd_with_non_default_memory'''
fa_infile = os.path.join(data_dir, 'cdhit_test_run_get_clusters_from_dict_rename.in.fa')
r = cdhit.Runner(fa_infile, memory_limit=900)
run_cmd = r.get_run_cmd('foo/bar/file.out')
match = re.search('^.+ -o foo/bar/file.out -c 0.9 -T 1 -s 0.0 -d 0 -bak 1 -M 900$', run_cmd)
self.assertIsNotNone(match, msg="Command output was " + run_cmd)
def test_get_run_cmd_with_unlimited_memory(self):
'''test_get_run_cmd_with_unlimited_memory'''
fa_infile = os.path.join(data_dir, 'cdhit_test_run_get_clusters_from_dict_rename.in.fa')
r = cdhit.Runner(fa_infile, memory_limit=0)
run_cmd = r.get_run_cmd('foo/bar/file.out')
match = re.search('^.+ -o foo/bar/file.out -c 0.9 -T 1 -s 0.0 -d 0 -bak 1 -M 0$', run_cmd)
self.assertIsNotNone(match, msg="Command output was " + run_cmd)
#!/usr/bin/env python3
import unittest
import os
from ariba.ref_genes_getter import RefGenesGetter
class TestNcbiGetter(unittest.TestCase):
def setUp(self):
self.ncbi_db = RefGenesGetter('ncbi')._get_from_ncbi('ncbi.test', 'test')
# self.ncbi_db = RefGenesGetter.run('ncbi')
def test_ncbi(self):
'''
Test that more than 4000 records have been found on NCBI AMR DB.
'''
self.assertTrue(len(self.ncbi_db) > 4000)
if __name__ == '__main__':
unittest.main()
\ No newline at end of file
ariba (2.14.2+ds-1) unstable; urgency=medium
* New upstream release.
* Make sure all testdata is installed before running after-build tests.
* Disable tests attempting Internet access.
-- Sascha Steinbiss <satta@debian.org> Mon, 08 Jul 2019 18:06:19 +0200
ariba (2.13.3+ds-1) unstable; urgency=medium
* New upstream release.
......
--- a/setup.py
+++ b/setup.py
@@ -57,7 +57,42 @@
version='2.14.2',
description='ARIBA: Antibiotic Resistance Identification By Assembly',
packages = find_packages(),
- package_data={'ariba': ['test_run_data/*', 'tb_data/*']},
+ package_data={'ariba': ['test_run_data/*',
+ 'tb_data/*',
+ 'tests/data/*',
+ 'tests/data/refdata_query_prepareref/*',
+ 'tests/data/cluster_test_full_run_ok_non_coding/*',
+ 'tests/data/cluster_test_full_run_ref_not_in_cluster/*',
+ 'tests/data/cluster_full_run_smtls_snp_varonly_gene_no_snp/*',
+ 'tests/data/ref_preparer_test_run.out/*',
+ 'tests/data/clusters_run_with_tb.ref/*',
+ 'tests/data/cluster_full_run_smtls_snp_varonly_gene/*',
+ 'tests/data/cluster_test_full_run_delete_codon/*',
+ 'tests/data/cluster_full_run_known_smtls_snp_presabs_nonc/*',
+ 'tests/data/cluster_test_full_run_assembly_fail/*',
+ 'tests/data/cluster_test_full_run_ok_gene_start_mismatch/*',
+ 'tests/data/cluster_test_full_run_ok_presence_absence/*',
+ 'tests/data/cluster_test_full_run_multiple_vars/*',
+ 'tests/data/cluster_full_run_known_smtls_snp_presabs_gene/*',
+ 'tests/data/cluster_test_full_run_partial_asmbly/*',
+ 'tests/data/pubmlst_ref_prepare.test_load_fa_and_clusters.in/*',
+ 'tests/data/cluster_test_init_no_refs_fa/*',
+ 'tests/data/cluster_full_run_smtls_snp_varonly_gene_2/*',
+ 'tests/data/cluster_test_full_run_choose_ref_fail/*',
+ 'tests/data/ref_preparer_test_run_all_noncoding.out/*',
+ 'tests/data/megares_zip_parser_write_files/*',
+ 'tests/data/cluster_full_run_smtls_snp_presabs_nonc/*',
+ 'tests/data/cluster_test_full_run_no_reads_after_filtering/*',
+ 'tests/data/cluster_full_run_smtls_snp_varonly_nonc_no_snp/*',
+ 'tests/data/cluster_test_init_no_reads_2/*',
+ 'tests/data/clusters_load_ref_data_from_dir/*',
+ 'tests/data/cluster_full_run_smtls_known_snp_presabs_nonc/*',
+ 'tests/data/cluster_test_full_run_smtls_snp_varonly_nonc/*',
+ 'tests/data/cluster_test_full_run_ok_variants_only/*',
+ 'tests/data/cluster_full_run_smtls_snp_varonly_nonc/*',
+ 'tests/data/cluster_test_full_run_insert_codon/*',
+ 'tests/data/cluster_test_init_no_reads_1/*',
+ 'tests/data/cluster_full_run_smtls_snp_presabs_gene/*']},
author='Martin Hunt',
author_email='path-help@sanger.ac.uk',
url='https://github.com/sanger-pathogens/ariba',