Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille
--- a/.travis.yml
+++ b/.travis.yml
+language: python
+
+python:
+  - "3.5"
+  - "3.6"
+
+before_install:
+  - sudo apt-get -qq update
+  - sudo apt-get install -y ncbi-blast+ prodigal 
+  - wget http://github.com/bbuchfink/diamond/releases/download/v0.8.36/diamond-linux64.tar.gz
+  - tar xzf diamond-linux64.tar.gz
+  - sudo mv diamond /usr/bin
+ 
+install:
+  - pip install -r requirements.txt
+  - pip install .
+
+script:
+  - ./test.sh
--- a/Dockerfile
+++ b/Dockerfile
+# base image (stripped down ubuntu for Docker)
+FROM phusion/baseimage
+
+# metadata
+LABEL base.image="ubuntu"
+LABEL version="1"
+LABEL software="RGI"
+LABEL software.version="4.0.3"
+LABEL description="Tool to identify resistance genes using the CARD database"
+LABEL website="https://card.mcmaster.ca/"
+LABEL documentation="https://github.com/arpcard/rgi/blob/master/README.rst"
+LABEL license="https://github.com/arpcard/rgi/blob/master/LICENSE"
+LABEL tags="Genomics"
+
+# maintainer
+MAINTAINER Finlay Maguire <finlaymaguire@gmail.com>
+
+# install system dependencies
+RUN \
+    apt-get update && \
+    apt-get install -y git python3 python3-dev python3-pip ncbi-blast+ prodigal wget && \
+    wget http://github.com/bbuchfink/diamond/releases/download/v0.8.36/diamond-linux64.tar.gz && \
+    tar xvf diamond-linux64.tar.gz && \
+    mv diamond /usr/bin
+
+# install and test rgi
+RUN git clone https://github.com/arpcard/rgi
+WORKDIR rgi/
+RUN pip3 install -r requirements.txt && \
+    pip3 install . && \
+    bash test.sh
+
+# sort the database path issue
+RUN pwd
+WORKDIR /usr
+RUN cp /rgi/card_data/card.json /usr/local/lib/python3.5/dist-packages/app/_data/card.json
+
+# move to workdir 
+WORKDIR /data/
+
+# set rgi executable as cmd to allow overriding
+CMD ["rgi"]
+
--- a/LICENSE
+++ b/LICENSE
+1. About the CARD, ARO, and RGI
+
+The Comprehensive Antibiotic Resistance Database ("CARD") provides data, models, and 
+algorithms relating to the molecular basis of antimicrobial resistance. The CARD provides 
+curated reference sequences and SNPs organized via the Antibiotic Resistance Ontology 
+("ARO"). These data can be browsed on the website or downloaded in a number of formats. 
+These data are additionally associated with detection models, in the form of curated 
+homology cut-offs and SNP maps, for prediction of resistome from molecular sequences. 
+These models can be downloaded or can be used for analysis of genome sequences using the 
+Resistance Gene Identifier ("RGI"), either online or as a stand-alone tool.
+
+The CARD was designed and developed by the laboratories of Drs. Gerry Wright and Andrew G. 
+McArthur of McMaster University's Department of Biochemistry & Biomedical Sciences 
+(Hamilton, Ontario, Canada) with the help of a global team of collaborators. It is built 
+entirely using open source software and tools. This research has been supported by funds 
+from the Canadian Foundation for Innovation, Canadian Institutes of Health Research, 
+Natural Sciences and Engineering Research Council of Canada, Medical Research Council 
+(UK), and Ontario Research Fund, as well as a Cisco Research Chair in Bioinformatics 
+supported by Cisco Systems Canada, Inc. (Dr. McArthur), Canada Research Chair (Dr. 
+Wright), and Killam Research Fellowship (Dr. Wright).
+
+2. Trade Marks and Official Marks
+
+All trade marks used or displayed on the CARD (card.mcmaster.ca) website are registered or 
+unregistered trade  marks or official marks of McMaster University. Nothing on the website 
+should be interpreted to grant any license to use any trade mark or official mark without 
+the express written permission of McMaster University.
+
+3. Copyright and Discliamer
+
+All of the materials on the website, including but not limited to the CARD, the RGI, and 
+the underlying HTML, text, illustration, designs, icons, audio clips, video clips, 
+documents, products, software and all other content (the "Materials") are protected by 
+copyright and are either owned by McMaster University, provided by the creator for free 
+use by the scientific community, or McMaster University has obtained a license to use the 
+Materials for their website. Except as noted in the Terms of Use below, specifically 
+sections 4 and 5 below, none of the Materials may be copied, reproduced, distributed, 
+republished, downloaded, displayed, posted or transmitted in any form or by any means, 
+without the prior written consent of McMaster University, or the original creator, where 
+applicable. 
+
+The only exception is the distribution of the Resistance Gene Identifier (RGI) software on
+the Conda open source, cross-platform, language-agnostic package manager and environment 
+management system (https://conda.io/). Conda users are still required to abid by the Terms 
+of Use listed, specifically sections 4 and 5 below.
+
+To obtain permission to reproduce material on the CARD website, please send a request to:
+
+McMaster University
+c/o McMaster Industry Liaison Office
+305 – 175 Longwood Rd South
+Hamilton, Ontario L8P 0A1
+Attention: Copyright Advisor
+E: milodsk@mcmaster.ca
+
+4. Terms of Use for Non-Commercial, Research or Academic Reproduction
+
+All information and the Materials at CARD has been posted to provide the public with 
+direct access to information about Antibiotic Resistance. The information and the 
+Materials have been posted with the intent that it be readily available for personal and 
+public non-commercial, research or academic use by individuals at academic institutions 
+only and except where otherwise prohibited, may be reproduced, in part or in whole and by 
+any means, without charge or further permission from McMaster University. Any other 
+organization must seek to obtain a license described in section 5 below.
+
+We only ask that:
+
+4.1 The Materials not be modified and used "as is"
+
+4.2 Users exercise due diligence in ensuring the accuracy of the Materials
+
+4.3 McMaster University be identified as the source of the Materials
+
+4.4 The reproduction is not represented as an official version of the Materials 
+reproduced, nor as having been made in affiliation with or without the endorsement of 
+McMaster University
+
+5. Terms of Use for Commercial or Non-Academic Organizations
+
+Use or reproduction of Materials on the site, in whole or in part, by any non-academic 
+organization whether or not for non-commercial (including research) or commercial purposes 
+is prohibited, except with written permission of McMaster University. Commercial uses of 
+the Materials on the site are offered only pursuant to a written license and user fee. To 
+obtain permission and begin the licensing process, please contact McMaster University 
+using the address in section 3.
+
+6. Disclaimer
+
+The Materials on this site are provided "as is" without warranties or conditions of any 
+kind either expressed or implied. To the fullest extent possible under applicable law, 
+McMaster University disclaims all warranties and conditions, expressed or implied, 
+including but not limited to, implied warranties or conditions of merchantability and 
+fitness for a particular purpose, non-infringement or other violation of rights. McMaster 
+University does not warrant or make any other representations regarding the use, accuracy,
+timelines, applicability, performance, security, availability or reliability of this 
+website or any of the sites linked to this website, or the results from the use of this
+website or any sites linked to this website, or otherwise respecting the Materials on this 
+website or any sites linked to this website.
+
+7. Limitation of Liability
+
+Under no circumstances, including but not limited to negligence, shall McMaster University 
+be liable for any direct, indirect, special, punitive, incidental or consequential damages 
+arising out of the use of, or the inability to use, the website or the Materials.
--- a/README.rst
+++ b/README.rst
+Resistance Gene Identifier (RGI) 
+--------------------------------------------
+
+This application is used to predict resistome(s) from protein or nucleotide data based on homology and SNP models. The application uses data from `CARD database <https://card.mcmaster.ca/>`_.
+
+.. image:: https://travis-ci.org/arpcard/rgi.svg?branch=master
+    :target: https://travis-ci.org/arpcard/rgi
+
+Table of Contents
+-------------------------------------
+
+- `License`_
+- `Requirements`_
+- `Install dependencies`_
+- `Install RGI from project root`_
+- `Running RGI tests`_
+- `Help menu`_
+- `Usage`_
+- `Load card.json`_
+- `Check database version`_
+- `Run RGI`_
+- `Run RGI using GNU parallel`_
+- `Running RGI with short contigs to predict partial genes`_
+- `Clean previous or old databases`_
+- `RGI Heatmap`_
+- `Run RGI from docker`_
+- `Tab-delimited results file`_
+- `Support & Bug Reports`_
+
+
+
+License
+--------
+Use or reproduction of these materials, in whole or in part, by any non-academic organization whether or not for non-commercial (including research) or commercial purposes is prohibited, except with written permission of McMaster University. Commercial uses are offered only pursuant to a written license and user fee. To obtain permission and begin the licensing process, see `CARD website <https://card.mcmaster.ca/about>`_
+
+Requirements
+--------------------
+
+- `Prodigal v2.6.3 <https://github.com/hyattpd/prodigal/wiki/Installation>`_
+- `NCBI BLAST v2.6.0+ <https://blast.ncbi.nlm.nih.gov/Blast.cgi>`_
+- `DIAMOND v0.8.36 <https://ab.inf.uni-tuebingen.de/software/diamond>`_
+- `Python 3.6 <https://www.python.org/>`_
+
+Install dependencies
+--------------------
+
+- pip3 install biopython
+- pip3 install filetype
+- pip3 install pandas
+- pip3 install pytest
+- pip3 install mock
+
+Install RGI from project root
+-----------------------------
+
+.. code-block:: sh
+
+   pip3 install .
+
+or
+
+.. code-block:: sh
+
+   python3 setup.py build
+   python3 setup.py test
+   python3 setup.py install
+
+Running RGI tests
+-------------------
+.. code-block:: sh
+   
+   cd tests
+   pytest -v -rxs
+
+Help menu
+-------------------
+
+.. code-block:: sh
+
+   rgi --help
+
+Usage
+-------------------
+
+.. code-block:: sh
+
+      usage: rgi <command> [<args>]
+                  commands are:
+                  main     Runs rgi application
+                  tab      Creates a Tab-delimited from rgi results
+                  parser   Creates categorical JSON files RGI wheel visualization
+                  load     Loads CARD database JSON file
+                  clean    Removes BLAST databases and temporary files
+                  galaxy   Galaxy project wrapper
+                  database Information on installed card database
+                  heatmap  Heatmap for multiple analysis
+                  ---------------------------------------------------------------------------------------
+                  bwt                   Metagenomics resistomes (Experimental)
+                  card_annotation       Create fasta files with annotations from card.json (Experimental)
+                  wildcard_annotation   Create fasta files with annotations from variants (Experimental)
+                  baits_annotation      Create fasta files with annotations from baits (Experimental)
+                  remove_duplicates     Removes duplicate sequences (Experimental)
+                  kmer_build            Build CARD*kmer database (Experimental)
+                  kmer_query            Query sequences through CARD*kmers (Experimental)
+
+   Resistance Gene Identifier - <version_number>
+
+   positional arguments:
+   command     Subcommand to run
+
+   optional arguments:
+   -h, --help  show this help message and exit
+
+   Use the Resistance Gene Identifier to predict resistome(s) from protein or
+   nucleotide data based on homology and SNP models. Check
+   https://card.mcmaster.ca/download for software and data updates. Receive email
+   notification of monthly CARD updates via the CARD Mailing List
+   (https://mailman.mcmaster.ca/mailman/listinfo/card-l)
+
+
+Load card.json 
+-------------------
+
+- local or working directory
+
+   .. code-block:: sh
+   
+      rgi load --afile /path/to/card.json --local
+
+- system wide 
+
+   .. code-block:: sh
+
+      rgi load --afile /path/to/card.json
+
+Check database version
+-----------------------
+
+- local or working directory
+
+   .. code-block:: sh
+   
+      rgi database --version --local
+
+- system wide 
+
+   .. code-block:: sh
+
+      rgi database --version
+
+Run RGI 
+----------------------
+
+- local or working directory
+
+   .. code-block:: sh
+   
+      rgi main --input_sequence /path/to/protein_input.fasta --output_file /path/to/output_file --input_type protein --local 
+
+- system wide 
+
+   .. code-block:: sh
+   
+      rgi main --input_sequence /path/to/nucleotide_input.fasta --output_file /path/to/output_file --input_type contig
+
+Run RGI using GNU parallel
+--------------------------------------------
+
+- system wide and writing log files for each input file. (Note add code below to script.sh then run with `./script.sh /path/to/input_files`)
+
+   .. code-block:: sh
+
+      #!/bin/bash
+      DIR=`find . -mindepth 1 -type d`
+      for D in $DIR; do
+            NAME=$(basename $D);
+            parallel --no-notice --progress -j+0 'rgi main -i {} -o {.} -n 16 -a diamond --clean --debug > {.}.log 2>&1' ::: $NAME/*.{fa,fasta};
+      done
+
+
+
+Running RGI with short contigs to predict partial genes 
+--------------------------------------------------------
+
+- local or working directory
+
+   .. code-block:: sh
+   
+      rgi main --input_sequence /path/to/nucleotide_input.fasta --output_file /path/to/output_file --local --low_quality 
+
+- system wide 
+
+   .. code-block:: sh
+   
+      rgi main --input_sequence /path/to/nucleotide_input.fasta --output_file /path/to/output_file --low_quality
+
+
+Clean previous or old databases
+--------------------------------
+
+- local or working directory
+
+   .. code-block:: sh
+
+      rgi clean --local
+
+- system wide 
+
+   .. code-block:: sh 
+   
+      rgi clean      
+
+RGI Heatmap
+------------
+
+- Default Heatmap
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/
+       
+- Heatmap with `AMR Gene Family` categorization
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --category gene_family
+
+- Heatmap with `AMR Gene Family` categorization and fill display
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --category gene_family --display fill
+
+- Heatmap with `AMR Gene Family` categorization and coloured y-axis labels display
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --category gene_family --display text
+
+
+- Heatmap with frequency display enabled
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --frequency
+
+- Heatmap with drug class category and frequency enabled
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --category drug_class --frequency --display text
+
+- Heatmap with samples and genes clustered
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --cluster both
+
+- Heatmap with resistance mechanism categorization and clustered samples
+
+      .. code-block:: sh
+
+            rgi heatmap --input /path/to/rgi_results_json_files_directory/ --cluster samples --category resistance_mechanism --display fill
+
+
+Run RGI from docker
+-------------------
+
+- First you you must either pull the docker container from dockerhub (latest CARD version automatically installed)
+
+  .. code-block:: sh
+
+        docker pull finlaymaguire/rgi
+
+- Or Alternatively, build it locally from the Dockerfile (latest CARD version automatically installed)
+
+  .. code-block:: sh
+
+        git clone https://github.com/arpcard/rgi
+        docker build -t arpcard/rgi rgi
+
+- Then you can either run interactively (mounting a local directory called `rgi_data` in your current directory
+  to `/data/` within the container
+
+  .. code-block:: sh
+
+        docker run -i -v $PWD/rgi_data:/data -t arpcard/rgi bash
+
+- Or you can directly run the container as an executable with `$RGI_ARGS` being any of the commands described above. Remember paths to input and outputs files are relative to the container (i.e. `/data/` if mounted as above).
+
+  .. code-block:: sh
+        
+        docker run -v $PWD/rgi_data:/data arpcard/rgi $RGI_ARGS
+       
+Tab-delimited results file
+---------------------------
+
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    ORF_ID                                                | Open Reading Frame identifier (internal to RGI)|
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Contig                                                | Source Sequence                                |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Start                                                 | Start co-ordinate of ORF                       |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Stop                                                  | End co-ordinate of ORF                         |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Orientation                                           | Strand of ORF                                  |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Cut_Off                                               | RGI Detection Paradigm                         |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Pass_Bitscore                                         | STRICT detection model bitscore value cut-off  |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Best_Hit_Bitscore                                     | Bitscore value of match to top hit in CARD     |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Best_Hit_ARO                                          | ARO term of top hit in CARD                    |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Best_Identities                                       | Percent identity of match to top hit in CARD   |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    ARO                                                   | ARO accession of top hit in CARD               |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Model_type                                            | CARD detection model type                      |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|                                                          | Mutations observed in the ARO term of top hit  |
+|    SNPs_in_Best_Hit_ARO                                  | in CARD (if applicable)                        |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|                                                          | Mutations observed in ARO terms of other hits  |
+|    Other_SNPs                                            | indicated by model id (if applicable)          |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Drug Class                                            | ARO Categorization                             |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Resistance Mechanism                                  | ARO Categorization                             |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    AMR Gene Family                                       | ARO Categorization                             |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Predicted_DNA                                         | ORF predicted nucleotide sequence              |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Predicted_Protein                                     | ORF predicted protein sequence                 |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    CARD_Protein_Sequence                                 | Protein sequence of top hit in CARD            |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       | Calculated as percentage                       |
+|                                                          | (length of ORF protein /                       |
+|    Percentage Length of Reference Sequence               | length of CARD reference protein)              |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    ID                                                    | HSP identifier (internal to RGI)               |
+----------------------------------------------------------+------------------------------------------------+
+| ::                                                       |                                                |
+|    Model_id                                              | CARD detection model id                        |
+----------------------------------------------------------+------------------------------------------------+
+
+
+
+Support & Bug Reports
+----------------------
+
+Please log an issue on `github issue <https://github.com/arpcard/rgi/issues>`_.
+
+You can email the CARD curators or developers directly at `card@mcmaster.ca <mailto:card@mcmaster.ca>`_, via Twitter at `@arpcard <http://www.twitter.com/arpcard>`_.
+
--- a/_data/.gitignore
+++ b/_data/.gitignore
-card.json
\ No newline at end of file
--- a/_data/card.json
+++ b/_data/card.json
--- a/_db/.gitignore
+++ b/_db/.gitignore
-protein*
-!__init__.py
\ No newline at end of file
--- a/_docs/INSTALL
+++ b/_docs/INSTALL
-
-|--------------------------------------------------------------------------
-| Resistance Gene Identifier (RGI) Documentation
-|--------------------------------------------------------------------------
-
-	Before you run the RGI scripts, make sure you have installed needed external tools:
-
-|--------------------------------------------------------------------------
-| Install open reading frame callers : Prodigal, http://prodigal.ornl.gov/, https://github.com/hyattpd/prodigal/wiki/Installation
-|--------------------------------------------------------------------------
-
-
-	# Install Prodigal - conda should be installed
-
-		$ conda install --channel bioconda prodigal
-
-	# Mac OS X
-
-		$ brew tap homebrew/science
-
-		$ brew install homebrew/science/prodigal
-
-	# Linux Redhat / Centos
-
-		$ sudo yum groupinstall 'Development Tools' && sudo yum install curl git irb python-setuptools ruby
-
-		$ git clone https://github.com/Linuxbrew/brew.git ~/.linuxbrew
-
-		$ export PATH="$HOME/.linuxbrew/bin:$PATH"
-
-		$ export MANPATH="$HOME/.linuxbrew/share/man:$MANPATH"
-	
-		$ export INFOPATH="$HOME/.linuxbrew/share/info:$INFOPATH"
-
-	# Linux Debian / Ubuntu
-
-		$ brew tap homebrew/science
-
-		$ brew install homebrew/science/prodigal
-
-
-|--------------------------------------------------------------------------
-| Install alignment software : BLAST and DIAMOND
-|--------------------------------------------------------------------------
-
-|--------------------------------------------------------------------------
-| BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
-|--------------------------------------------------------------------------
-
-	- Tested with BLAST 2.2.28, BLAST 2.2.31+ and 2.5.0+ on linux 64 and Mac OS X
-
-	* You can alson run the following command to install blast. This will only install version 2.2.28
-
-	$ sudo apt-get install ncbi-blast+
-
-	- Test blast install with the following command:
-
-	$ makeblastdb
-
-	* Biopython http://biopython.org/DIST/docs/install/Installation.html#sec12
-	* Run the following command to install Bio-python
-
-	$ sudo apt-get install python-biopython
-
-	* Download the database - card.json from Downloads on the CARD website (a copy may be included with this release)
-
-
-|--------------------------------------------------------------------------
-| DIAMOND 
-| - https://ab.inf.uni-tuebingen.de/software/diamond
-| - https://github.com/bbuchfink/diamond
-| - https://github.com/bbuchfink/diamond/releases
-|--------------------------------------------------------------------------
-
-	* Install diamond using conda packages
-
-	$ conda install --channel bioconda diamond
-
-
-|--------------------------------------------------------------------------
-| Running RGI:
-|--------------------------------------------------------------------------
-
-Open a terminal, type: 
-
-	$ python rgi.py -h 
-
-Check software version:
-
-	$ python rgi.py -sv or python rgi.py --software_version
-
-Check data version:
-
-	$ python rgi.py -dv or python rgi.py --data_version
-
-|--------------------------------------------------------------------------
-| RGI inputs
-|--------------------------------------------------------------------------
-
-	$ python rgi.py -h
-
-		usage: rgi.py [-h] [-t INTYPE] [-i INPUTSEQ] [-n THREADS] [-o OUTPUT]
-		              [-e CRITERIA] [-c CLEAN] [-d DATA] [-l VERBOSE] [-a ALIGNER]
-		              [-r DATABASE] [-sv] [-dv]
-
-		Resistance Gene Identifier - Version 3.1.2
-
-		optional arguments:
-		  -h, --help            show this help message and exit
-		  -t INTYPE, --input_type INTYPE
-		                        must be one of contig, orf, protein, read (default:
-		                        contig)
-		  -i INPUTSEQ, --input_sequence INPUTSEQ
-		                        input file must be in either FASTA (contig and
-		                        protein), FASTQ(read) or gzip format! e.g
-		                        myFile.fasta, myFasta.fasta.gz
-		  -n THREADS, --num_threads THREADS
-		                        Number of threads (CPUs) to use in the BLAST search
-		                        (default=32)
-		  -o OUTPUT, --output_file OUTPUT
-		                        Output JSON file (default=Report)
-		  -e CRITERIA, --loose_criteria CRITERIA
-		                        The options are YES to include loose hits and NO to
-		                        exclude loose hits. (default=NO to exclude loose hits)
-		  -c CLEAN, --clean CLEAN
-		                        This removes temporary files in the results directory
-		                        after run. Options are NO or YES (default=YES for
-		                        remove)
-		  -d DATA, --data DATA  Specify a data-type, i.e. wgs, chromosome, plasmid,
-		                        etc. (default = NA)
-		  -l VERBOSE, --verbose VERBOSE
-		                        log progress to file. Options are OFF or ON (default =
-		                        OFF for no logging)
-		  -a ALIGNER, --alignment_tool ALIGNER
-		                        choose between BLAST and DIAMOND. Options are BLAST or
-		                        DIAMOND (default = BLAST)
-		  -r DATABASE, --db DATABASE
-		                        specify path to CARD blast databases (default: None)
-		  -sv, --software_version
-		                        Prints software number
-		  -dv, --data_version   Prints data version number
-
-
-	INTYPE could be one of 'contig', 'protein' or 'read'.
-
-	1. 'contig' means that inputSequence is a DNA sequence stored in a FASTA file, presumably a complete genome or assembly contigs. RGI will predict ORFs de novo and predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening.
-
-	2. 'protein', as its name suggests, requires a FASTA file with protein sequences. As above, RGI predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening.
-
-	3. 'read' expects raw FASTQ format nucleotide data and predicts resistome using a combination of BLASTX against the CARD data, curated cut-offs, and SNP screening. This is an experimental tool and we have yet to adjust the CARD cut-offs for BLASTX.  We will be exploring other metagenomics or FASTQ screening methods. Note that RGI does not perform any pre-processing of the FASTQ data (linker trimming, etc).
-
-
-|--------------------------------------------------------------------------
-| RGI outputs
-|--------------------------------------------------------------------------
-
-	RGI output will produce a detailed JSON file, Summary Tab-delimited file and gff3 (where applicable)
-
-	The JSON is as follows (example shows only one hit):
-
-	- gene_71|gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence: {
-		// Hit 1
-		gnl|BL_ORD_ID|39|hsp_num:0: {
-			SequenceFromBroadStreet: "MRYIRLCIISLLATLPLAVHASPQPLEQIKQSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGASKRGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAA",
-			"orf_start": 67822,
-			"ARO_name": "SHV-12",
-			"type_match": "Loose",
-			"query": "INDWRLDYNECRPHSSLNYLTPAEFAAGWRN",
-			"evalue": 3.82304,
-			"max-identities": 10,
-			"orf_strand": "-",
-			"bit-score": 24.6386,
-			"cvterm_id": "35914",
-			"sequenceFromDB": "LDRWETELNEALPGDARDTTTPASMAATLRK",
-			"match": "++ W  + NE  P  + +  TPA  AA  R ",
-			"model_id": "103",
-			"orf_From": "gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence",
-			"pass_evalue": 1e-100,
-			"query_end": 68607,
-			"ARO_category": {
-			    "36696": {
-				    "category_aro_name": "antibiotic inactivation enzyme",
-				    "category_aro_cvterm_id": "36696",
-				    "category_aro_accession": "3000557",
-				    "category_aro_description": "Enzyme that catalyzes the inactivation of an antibiotic resulting in resistance.  Inactivation includes chemical modification, destruction, etc."
-				},
-				"36268": {
-			        "category_aro_name": "beta-lactam resistance gene",
-			        "category_aro_cvterm_id": "36268",
-			        "category_aro_accession": "3000129",
-			        "category_aro_description": "Genes conferring resistance to beta-lactams."
-			    }
-			},
-			"ARO_accession": "3001071",
-			"query_start": 68515,
-			"model_name": "SHV-12",
-			"model_type": "protein homolog model",
-			"orf_end": 68646
-		},
-		...
-		// Hit 2
-		...
-		// Hit 3
-		...
-	}
-
-|--------------------------------------------------------------------------
-| Getting Tab Delimited output after running RGI:
-|--------------------------------------------------------------------------
-
-	Run the following command to get help on how to get the Tab Delimited output
-
-	$ python convertJsonToTSV.py -h
-
-|--------------------------------------------------------------------------
-| convertJsonToTSV inputs
-|--------------------------------------------------------------------------
-
-	$ python convertJsonToTSV.py -h
-
-		usage: convertJsonToTSV.py [-h] [-i AFILE] [-o OUTPUT] [-v VERBOSE]
-
-		Convert RGI JSON file to Tab-delimited file
-
-		optional arguments:
-		  -h, --help            show this help message and exit
-		  -i AFILE, --afile AFILE
-		                        must be a json file generated from RGI in JSON or gzip
-		                        format e.g out.json, out.json.gz
-		  -o OUTPUT, --out_file OUTPUT
-		                        Output JSON file (default=dataSummary)
-		  -v VERBOSE, --verbose VERBOSE
-		                        include help menu. Options are OFF or ON (default =
-		                        OFF for no help)
-
-|--------------------------------------------------------------------------
-| convertJsonToTSV outputs
-|--------------------------------------------------------------------------
-
-	This outputs a tab-delimited text file: dataSummary.txt
-
-	The tab-output is as follows:
-			
-	---------------------------------------------------------------------		
-	COLUMN  		HELP_MESSAGE
-	---------------------------------------------------------------------
-	ORF_ID ------- Open Reading Frame identifier (internal to RGI)
-	CONTIG ------- Source Sequence
-	START ------- Start co-ordinate of ORF
-	STOP ------- End co-ordinate of ORF
-	ORIENTATION ------- Strand of ORF
-	CUT_OFF ------- RGI Detection Paradigm
-	PASS_EVALUE ------- STRICT detection model Expectation value cut-off
-	Best_Hit_evalue ------- Expectation value of match to top hit in CARD
-	Best_Hit_ARO ------- ARO term of top hit in CARD
-	Best_Identities ------- Percent identity of match to top hit in CARD
-	ARO ------- ARO accession of top hit in CARD
-	ARO_name ------- ARO term of top hit in CARD
-	Model_type ------- CARD detection model type
-	SNP ------- Observed mutation (if applicable)
-	AR0_category ------- ARO Categorization
-	bit_score ------- Bitscore of match to top hit in CARD
-	Predicted_Protein ------- ORF predicted protein sequence
-	CARD_Protein_Sequence ------- Protein sequence of top hit in CARD
-	LABEL ------- ORF label (internal to RGI)
-	ID ------- HSP identifier (internal to RGI)
-
-|--------------------------------------------------------------------------
-| Files Structure
-|--------------------------------------------------------------------------
-
-
-`-- rgi/
-   |-- _data/
-   	   |-- card.json
-   |-- _db/
-   		... (BLAST DBs)
-   |-- _docs/
-   		|-- INSTALL
-   		|-- README
-   |-- _tmp/
-   |-- tests/
-   |-- __init__.py
-   |-- blastnsnp.py
-   |-- clean.py
-   |-- contigToORF.py
-   |-- contigToProteins.py
-   |-- convertJsonToTSV.py
-   |-- create_gff3_file.py
-   |-- filepaths.py
-   |-- formatJson.py
-   |-- fqToFsa.py
-   |-- load.py
-   |-- rgi.py
-   |-- rrna.py
-   
-
-
-|--------------------------------------------------------------------------
-| Loading new card.json:
-|--------------------------------------------------------------------------
-
-	* If new card.json is available. Replace card.json in the directory show above. Use the following command:
-
-	$ python load.py -h
-
-
-|--------------------------------------------------------------------------
-| Load inputs
-|--------------------------------------------------------------------------
-
-	$ python load.py -h
-
-		usage: load.py [-h] [-i AFILE]
-
-		Load card database json file
-
-		optional arguments:
-		  -h, --help            show this help message and exit
-		  -i AFILE, --afile AFILE
-		                        must be a card database json file
-
-
-|--------------------------------------------------------------------------
-| Clean databases
-|--------------------------------------------------------------------------
-
-	* Database is created once the rgi.py is run. Use clean.py to remove databases after new card.json is loaded.
-
-	* Then run clean.py to clean directory.
-
-	$ python clean.py -h
-
-
-|--------------------------------------------------------------------------
-| Clean inputs
-|--------------------------------------------------------------------------
-
-	$ python clean.py -h
-
-		usage: clean.py [-h]
-
-		Removes BLAST databases created using card.json
-
-		optional arguments:
-		  -h, --help  show this help message and exit
-
-|--------------------------------------------------------------------------
-| Format JSON
-|--------------------------------------------------------------------------
-
-	$ python formatJson.py -h 
-
-		usage: formatJson.py [-h] [-i IN_FILE] [-o OUT_FILE]
-
-		Convert RGI JSON file to Readable JSON file
-
-		optional arguments:
-		  -h, --help            show this help message and exit
-		  -i IN_FILE, --in_file IN_FILE
-		                        input file must be in JSON format e.g Report.json
-		  -o OUT_FILE, --out_file OUT_FILE
-		                        Output JSON file (default=ReportFormatted)	
-
-|--------------------------------------------------------------------------
-| Contact Us:
-|--------------------------------------------------------------------------
-
-	For help please contact us at:
-
-	* CARD Developers <card@mcmaster.ca>
-
- 
\ No newline at end of file
--- a/_docs/README
+++ b/_docs/README
-|--------------------------------------------------------------------------
-| Running RGI localy
-|--------------------------------------------------------------------------
-
-
-|--------------------------------------------------------------------------
-| Files
-|--------------------------------------------------------------------------
-
-
-`-- rgi/
-   |-- _data/
-         |-- card.json
-   |-- _db/
-         ... (BLAST DBs)
-   |-- _docs/
-         |-- INSTALL
-         |-- README
-   |-- _tmp/
-   |-- tests/
-   |-- __init__.py
-   |-- blastnsnp.py
-   |-- clean.py
-   |-- contigToORF.py
-   |-- contigToProteins.py
-   |-- convertJsonToTSV.py
-   |-- create_gff3_file.py
-   |-- filepaths.py
-   |-- formatJson.py
-   |-- fqToFsa.py
-   |-- load.py
-   |-- rgi.py
-   |-- rrna.py
-   
-
-
-|--------------------------------------------------------------------------
-| Commands for Running RGI localy
-|--------------------------------------------------------------------------
-
-
-$ python rgi.py -h
-
-$ python clean.py -h
-
-$ python convertJsonToTSV.py -h
-
-$ python load.py -h
-
-$ python formatJson.py -h 
-
-
-|--------------------------------------------------------------------------
-| Running RGI system-wide
-|--------------------------------------------------------------------------
-
-# install RGI
-
-$ conda install --channel rgi
-
-# un-install RGI
-
-$ conda remove --channel rgi
-
-|--------------------------------------------------------------------------
-| Commands for Running RGI system-wide
-|--------------------------------------------------------------------------
-
-$ rgi -h
-
-$ rgi_clean -h
-
-$ rgi_jsontab -h
-
-$ rgi_load -h
-
-$ rgi_jsonformat -h
-
--- a/app/Analyser.py
+++ b/app/Analyser.py
+from app.settings import *
+
+class Analyser(object):
+	"""Class to find analyse fasta files."""
+	def __init__(self,input_file):
+		"""Creates Analyser object for analysing fasta files."""
+		self.input_file = input_file
+		logger.info(repr(self))
+
+	def __repr__(self):
+		"""Returns Analyser class full object."""
+		return "Analyser({}".format(self.__dict__)
+
+	def run(self):
+		checksums = set()
+		d = {}
+	    # Using the Biopython fasta parse we can read our fasta input
+		for record in SeqIO.parse(self.input_file, "fasta"):
+			checksum = seguid(record.seq)
+			if checksum in checksums:
+				model_type_id = record.description.split("|")[1].split(":")[-1].strip()
+				if model_type_id in ["40292"]:
+					logger.warning("Duplicates {} -> {} ~ {} - [{}]".format(record.id, record.description.split("|")[-1], d[checksum].split("|")[-1], model_type_id))
+				continue
+			d[checksum] = record.description
+			checksums.add(checksum)
+
--- a/app/BWT.py
+++ b/app/BWT.py
--- a/app/Base.py
+++ b/app/Base.py
+#!/usr/bin/env python
+# -*- encoding: utf-8 -*-
+
+"""
+Base classes used throughout RGI
+"""
+import os
+from Bio import SeqIO
+import json
+from abc import ABCMeta, abstractmethod
+from app.settings import logger
+from Bio.Alphabet import generic_dna
+from Bio.Seq import Seq
+from pyfaidx import Fasta
+
+class RGIBase(object):
+    """Interface for RGI"""
+    __metaclass__ = ABCMeta
+
+    @abstractmethod
+    def from_string(self): pass
+
+    @abstractmethod
+    def from_args(self): pass
+
+    @abstractmethod
+    def validate_inputs(self): pass
+
+    @abstractmethod
+    def run(self): pass
+
+    @abstractmethod
+    def create_databases(self): pass
+
+    @abstractmethod
+    def run_blast(self): pass
+
+    @abstractmethod
+    def filter_process(self): pass
+
+    @abstractmethod
+    def output(self): pass
+
+
+class BaseModel(object):
+    def extract_nth_bar(self, bunch_str, n):
+        """
+        Parse the nth "|" delimited result field from aligner output
+        and return the result information with the appropriate type
+
+        Parameters
+        ----------
+        Args:
+            bunch_str (str): "|" delimited result information from aligner
+            n (int): "|" field index to extract
+
+        Return:
+            result (int, float, ascii byte-str): result with correct type
+        """
+
+        # to get appropriate field set offset
+        start = n + 3
+        end = n + 4
+
+        # subset the string to find the relevant set of "|" delimited results
+        subset_split_str = bunch_str.split('|')[start: end]
+
+        temporary_str = "|".join(subset_split_str)
+
+        # rebuild the string and extract the hit information after colon
+        result = temporary_str[temporary_str.find(':')+2:]
+        result = result.rstrip()
+
+        # check if integer first
+        if result.isdigit():
+            return int(result)
+
+        # otherwise try to coerce to float
+        else:
+            try:
+                return float(result)
+
+            # if that doesn't work encode into ascii-bytestring
+            except ValueError:
+                # return result.encode("ascii", "replace")
+                return result
+
+    def extract_nth_hash(self, bunch_str, n):
+        """
+        Parse and return the nth hash delimited field from
+        alignment output
+
+        Parameters
+        ----------
+
+        Args:
+            bunch_str (str): '#' delimited string from aligner
+            n (int): '#' field index to extract
+
+        Return:
+            result (int, str): extracted value from output in appropriate type
+        """
+        if "#" not in bunch_str:
+            return 0
+        else:
+            arr = bunch_str.split("#")
+            if n >= len(arr):
+                return ""
+            else:
+                # if first two positional information fields return as integers
+                if n == 1 or n == 2:
+                    return int(arr[n])
+
+                # strandedness data field
+                elif n == 3:
+                    # convert 1/0 strand indicator to +/- notation
+                    if int(arr[n]) == 1:
+                        return "+"
+                    else:
+                        return "-"
+
+                # return as string if not specific strandedness or positional
+                else:
+                    return arr[n]
+
+    def find_num_dash(self, subject, index):
+        """
+        Finds location of mutation by counting the
+        number of dashes in the aligner subject output
+
+        Parameters
+        ----------
+
+        Args:
+            subject (str): aligner output string
+            index (int): max output size
+
+        Returns:
+            dash_count (int): position of the mutation/SNP
+        """
+        dash_count = 0
+        output = []
+
+        for i in range(len(subject)):
+            if subject[i] == '-':
+                dash_count += 1
+            else:
+                output.append(subject[i])
+            if len(output) == index:
+                break
+
+        return dash_count
+
+    def get_submitted_protein_sequence(self, seq_filepath):
+        """
+        Parses sequence fasta into a dictionary keyed with the sequence IDs
+
+        Parameters
+        ----------
+
+        Args:
+            seq_filepath (str): sequence filepath
+
+        Returns:
+            submitted_proteins_dict (dict): dictionary of sequences of the
+                                            format {seq_id: sequence string}
+        """
+        submitted_proteins_dict = {}
+
+        if os.stat(seq_filepath).st_size != 0:
+            for record in SeqIO.parse(seq_filepath, 'fasta'):
+                submitted_proteins_dict[record.id] = str(record.seq)
+
+        return submitted_proteins_dict
+
+    def get_orf_dna_sequence(self, input_file, input_type):
+        """
+        Get the predicted open reading frame nucleotide sequence.
+
+        Args:
+            input_file (str): filepath of the input file
+            input_type (str): [contig, read]
+        """
+
+        filename = os.path.basename(input_file)
+        predicted_genes_dict = {}
+
+        if input_type in ["contig"]:
+            contig_filepath = os.path.join(self.working_directory,
+                                           filename + ".temp.contigToORF.fsa")
+            if os.stat(contig_filepath).st_size != 0:
+                for record in SeqIO.parse(contig_filepath, 'fasta'):
+                    predicted_genes_dict[record.id] = str(record.seq)
+
+        elif input_type in ["read"]:
+            read_filepath = os.path.join(self.working_directory,
+                                         filename + ".temp.read.fsa")
+            if os.stat(read_filepath).st_size != 0:
+                for record in SeqIO.parse(read_filepath, 'fasta'):
+                    predicted_genes_dict[record.id] = str(record.seq)
+        else:
+            raise ValueError("input_type invalid \
+                    (must be 'contig' or 'read'): {}".format(input_type))
+
+        # write json for all predicted file
+        pjson = json.dumps(predicted_genes_dict)
+
+        predicted_filepath = os.path.join(self.working_directory,
+                                          filename + ".temp.predictedGenes.json")
+        with open(predicted_filepath, 'w') as wf:
+            wf.write(pjson)
+
+        return predicted_genes_dict
+
+    def results(self, blast_results, query_id, perfect, strict , loose):
+        """
+        Sort results to perfect, strict, loose paradigm
+
+        Parameters
+        ----------
+
+        Args:
+            blast_results (dict): dictionary containing perfect, strict anf loose hits
+            query_id (str): blast record query
+            perfect (dict): dictionary containing perfect hits
+            strict (dict): dictionary containing strict hits
+            loose (dict): dictionary containing loose hits
+
+        Returns:
+            blast_results (dict): dictionary of sorted results
+        """     
+        nudged = False
+        if len(perfect) == 0 and len(strict) == 0 and len(loose) > 0:
+            nudged , loose = self.nudge_loose_to_strict(loose)
+            if nudged is True and self.loose is False:
+                blast_results[query_id] = loose
+            elif self.loose is True:
+                blast_results[query_id] = loose
+
+        elif len(perfect) == 0:
+            if len(strict) > 0: 
+                nudged , strict = self.nudge_strict_to_perfect(strict)
+                blast_results[query_id] = strict
+        else:
+            if len(perfect) > 0: 
+                blast_results[query_id] = perfect
+
+        return blast_results
+
+    def nudge_strict_to_perfect(self, strict):
+        """
+        Nudge strict hits with missing n-terminus, c-terminus and alternate start codons
+
+        Parameters
+        ----------
+
+        Args:
+            strict (dict): dictionary containing strict hits
+
+        Returns:
+            nudged (bool): True or False
+            strict (dict): dictionary containing strict or perfect hits
+        """ 
+        
+        nudged = False
+        
+        # - check if there is 100% match with matching part to the reference
+        # - getting matching protein then pull nucleotides from reference and translate 
+        # - check the start codons including alternates 
+        # - promote to perfect if the start codon is present in the N-terminus
+        
+        for s in strict:
+            if int(strict[s]["perc_identity"]) == 100 and strict[s]["type_match"] == "Strict" and strict[s]["model_type_id"] not in [40295] and self.input_type == 'contig':
+                reference = strict[s]["sequence_from_broadstreet"]
+                query = strict[s]["orf_prot_sequence"]
+                orf_dna_sequence_ = ""
+                orf_end_ = ""
+                orf_start_ = ""
+                _partial_protein = ""
+                # Missing n-terminus or c-terminus
+                if len(query) < len(reference) and query in reference:
+                    length_nucleotides = (len(reference) - len(strict[s]["match"]))*3
+
+                    # pull nucleotides from query or submitted sequence
+                    partial_bases, orf_start_, orf_end_ = self.get_part_sequence(
+                        self.input_sequence, strict[s]["orf_from"], 
+                        strict[s]["orf_start"], strict[s]["orf_end"], 
+                        length_nucleotides, strict[s]["orf_strand"], 
+                        strict[s]["ARO_name"]
+                    )
+
+                    # check if partial_bases are multiples of 3
+                    if len(partial_bases) % 3 == 0:
+                        logger.info("Missing part: [{}]".format(partial_bases))
+                        if strict[s]["orf_strand"] == "-":
+                            # update orf_end
+                            strict[s]["orf_end_possible"] = orf_end_ + 1
+                            strict[s]["orf_start_possible"] = strict[s]["orf_start"]
+                            orf_dna_sequence_ = str(Seq(partial_bases, generic_dna).reverse_complement()) + strict[s]["orf_dna_sequence"]
+                            partial_protein = str(Seq(partial_bases, generic_dna).reverse_complement().translate(table=11))
+                            # logger.info("Reverse strand: {}".format(partial_protein))
+                        else:
+                            # update orf_start
+                            strict[s]["orf_start_possible"] = orf_start_
+                            strict[s]["orf_end_possible"] = int(strict[s]["orf_end"]) + 1
+                            orf_dna_sequence_ = partial_bases + strict[s]["orf_dna_sequence"]
+                            partial_protein = str(Seq(partial_bases, generic_dna).translate(table=11))
+                            # logger.info("Forward strand: {}".format(partial_protein))
+                        
+                        logger.info("Translated protein: [{}]".format(partial_protein))
+                        # update start codon to M for all other alternate start codons
+                        if len(_partial_protein) > 0:
+                            _partial_protein = partial_protein[0]
+                            if partial_protein[0] in ["L","M","I","V"]:
+                                _partial_protein = "M"+partial_protein[1:]
+
+                        combine = _partial_protein + strict[s]["match"]
+
+                        if combine == strict[s]["sequence_from_broadstreet"]:
+                            logger.info("Missing n-terminus push to Perfect: {} #complete_gene {}".format(strict[s]["ARO_name"],
+                                json.dumps({
+                                    "ARO_name": strict[s]["ARO_name"],
+                                    "strand": strict[s]["orf_strand"], 
+                                    "header": strict[s]["orf_from"],
+                                    "orf_start_possible": strict[s]["orf_start_possible"], 
+                                    "orf_end_possible": strict[s]["orf_end_possible"]
+                                }, indent=2)
+                                )
+                            )
+                            strict[s]["type_match"] = "Perfect"
+                            strict[s]["nudged"] = True
+                            strict[s]["note"] = "possible complete gene, missing n-terminus"
+                            strict[s]["missed_bases_by_prodigal"] = partial_bases
+                            # update orf_dna_sequence
+                            strict[s]["orf_dna_sequence_possible"] = orf_dna_sequence_
+                            nudged = True
+                    else:
+                        logger.info("incomplete nucleotides for gene: {} #partial_gene {}".format(strict[s]["ARO_name"], 
+                            json.dumps({
+                                        "ARO_name": strict[s]["ARO_name"],
+                                        "strand": strict[s]["orf_strand"], 
+                                        "header": strict[s]["orf_from"],
+                                        "orf_start": strict[s]["orf_start"], 
+                                        "orf_end": strict[s]["orf_end"],
+                                        "partial_bases": partial_bases
+                                    }, indent=2)
+                            )
+                        )
+
+                # reference contained within open reading frame
+                elif len(query) > len(reference) and reference in query:
+                    strict[s]["type_match"] = "Perfect"
+                    strict[s]["nudged"] = True
+                    strict[s]["note"] = "possible complete gene, reference contained within open reading frame"
+                    nudged = True 
+                    if strict[s]["orf_strand"] == "-":
+                        strict[s]["orf_start_possible"] = strict[s]["orf_start"]
+                        strict[s]["orf_end_possible"] = int(strict[s]["orf_start"]) + len(strict[s]["dna_sequence_from_broadstreet"])
+                    else:
+                        strict[s]["orf_start_possible"] = int(strict[s]["orf_end"]) - len(strict[s]["dna_sequence_from_broadstreet"]) + 1
+                        strict[s]["orf_end_possible"] = int(strict[s]["orf_end"]) + 1
+                    logger.info("Reference contained within open reading frame push to Perfect: {} #complete_gene {}".format(strict[s]["ARO_name"],
+                        json.dumps({
+                            "ARO_name": strict[s]["ARO_name"],
+                            "strand": strict[s]["orf_strand"], 
+                            "header": strict[s]["orf_from"],
+                            "orf_start_possible": strict[s]["orf_start_possible"], 
+                            "orf_end_possible": strict[s]["orf_end_possible"]
+                        }, indent=2)
+                    ))
+                
+                # orf and reference are overlapping  
+                '''
+                elif reference not in query and query not in reference:
+                    logger.warning("TODO:: orf and reference are overlapping")
+                '''
+        
+        return nudged, strict
+
+    def get_part_sequence(self, fasta_file, header, start, stop, nterminus, strand, name):
+        """
+        Pull part sequence from fasta file
+        # https://github.com/mdshw5/pyfaidx
+        # pip install pyfaidx
+
+        Parameters
+        ----------
+
+        Args:
+            fasta_file (str): input fasta file
+            header (str): header for fasta sequence
+            start (str): start coordinate
+            stop (str): stop coordinate
+            nterminus (int): length of missing sequence
+            strand (str): strand
+            name (str): gene name
+
+        Returns:
+            sequence (str): portion on a sequence
+        """  
+        # remove the last 2 characters from header as this is appended by prodigal
+        header = header[:header.rfind("_")]
+
+        # logger.info("[PARTIAL] ARO: {} | contig: {} | filename: {}".format(name, header, fasta_file))
+    
+        genes = Fasta(fasta_file, sequence_always_upper=False, read_long_names=False, one_based_attributes=True)
+        # logger.info(genes.records)
+
+        # logger.info(json.dumps({"strand":strand, "start":start, "stop":stop, "nterminus":nterminus}, indent=2))
+        if strand == "-":
+            _start = stop + 1
+            _stop = stop + nterminus
+            # logger.info("grep sequence from {}|-|{}-{}".format(header,_start, _stop,))
+            return str(genes.get_spliced_seq( header, [[_start, _stop]])), _start, _stop
+        elif strand == "+":
+            _start = start - nterminus
+            _stop = start - 1
+            if _start <= 0:
+                _start = 1
+            if _stop <= 0:
+                _stop = 1
+            # logger.info("grep sequence from {}|+|{}-{}".format(header,_start, _stop))
+            return str(genes.get_spliced_seq( header, [[_start, _stop]])), _start, _stop
+
+    def nudge_loose_to_strict(self, loose):
+        """
+        Nudge loose hits with at least 95 percent identity to be strict hits
+
+        Parameters
+        ----------
+
+        Args:
+            loose (dict): dictionary containing loose hits
+
+        Returns:
+            nudged (bool): True or False
+            loose (dict): dictionary containing loose or strict hits
+        """  
+        nudged = False
+        # check if there are any loose hits that might be strict
+        for i in loose:
+            if 95 <= int(loose[i]["perc_identity"]) <= 100:
+                # add to strict 
+                logger.info("loose hit with at least 95 percent identity push to Strict: {} #partial_gene".format(loose[i]["ARO_name"]))
+                loose[i]["type_match"] = "Strict"
+                loose[i]["nudged"] = True
+                nudged = True
+
+        return nudged, loose
+
+
+
--- a/app/Blast.py
+++ b/app/Blast.py
+from app.settings import *
+
+class Blast(object):
+	"""Class to create Blast object and align for protein and translated DNA searches."""
+	def __init__(self,input_file, output_file=None, program = 'blastp', num_threads=32, local_database=False ):
+		"""Creates Blast object for running NCBI BLAST algorithm."""
+		self.input_file = input_file
+		if output_file == None:
+			f_path, f_name = os.path.split(input_file)
+			self.output_file = os.path.join(f_path,"{}.blastRes.xml".format(f_name))
+		else:
+			self.output_file = output_file
+		self.local_database = local_database
+		self.db = path
+		if self.local_database:
+			self.db = LOCAL_DATABASE
+		self.program = program
+		self.num_threads = num_threads
+		self.outfmt = 5
+
+	def __repr__(self):
+		"""Returns Blast class full object."""
+		return "Blast({}".format(self.__dict__)
+
+	def run(self):
+		"""Runs BLAST algorithm.""" 
+		logger.info("run blast")
+		os.system('{program} -query {input} -db {path} \
+					-num_threads {num_threads} -outfmt {outfmt} -out {output_file}' \
+					.format(
+						program=self.program, 
+						num_threads=self.num_threads, 
+						outfmt=self.outfmt,
+						input=self.input_file,
+						path=os.path.join(self.db,"protein.db"), 
+						output_file=self.output_file
+					)
+				)
+
+	def run_custom(self, db ):
+		"""Runs DIAMOND algorithm.""" 
+		# logger.info("run diamond")
+		os.system('{program} -query {input} -db {db} -num_threads {num_threads} \
+					-outfmt {outfmt} -out {output_file} -perc_identity {perc_identity} \
+					-strand {strand}' \
+					.format(
+						program=self.program, 
+						input=self.input_file,
+						db=db,
+						num_threads=self.num_threads, 
+						outfmt=self.outfmt,
+						output_file=self.output_file,
+						perc_identity=95.0,
+						strand="both"
+					)
+				)
+		logger.info("done running {} -> {}".format(self.program, db))
+
+
--- a/app/ConvertRGIJsonToTSV.py
+++ b/app/ConvertRGIJsonToTSV.py
+import csv
+from app.settings import *
+from operator import itemgetter, attrgetter
+
+class ConvertJsonToTSV(object):
+
+	def __init__(self, filepath, homolog_file=None, variant_file=None, overexpression_file=None, rrna_file=None):
+		f_path, f_name = os.path.split(filepath)
+		name, ext = os.path.splitext(f_name)
+		self.filepath = os.path.join(f_path, "{}.json".format(f_name))
+		if ext.lower() == ".json":
+			self.filepath = os.path.join(f_path, "{}{}".format(name,ext))			
+		self.homolog_file = homolog_file
+		self.variant_file = variant_file
+		self.overexpression_file = overexpression_file
+		self.rrna_file = rrna_file
+
+	def __repr__(self):
+		"""Returns ConvertJsonToTSV class full object."""
+		return "ConvertJsonToTSV({}".format(self.__dict__)
+
+	def parse_jsons(self, part, data):
+		if data.keys():
+			for k1,v1 in part.items():
+				if k1 in data:
+					data[k1].update(v1)
+				elif k1 not in data:
+					data[k1]=v1
+		else:
+			data.update(part)
+		return data
+
+	def combine_jsons(self):
+		if self.homolog_file is not None and self.variant_file is not None and self.overexpression_file is not None and self.rrna_file is not None:
+			data={}
+			if os.path.isfile(self.homolog_file):
+				with open (self.homolog_file) as hf:
+					homolog = json.load(hf)
+					data = self.parse_jsons(homolog, data)
+			if os.path.isfile(self.variant_file):
+				with open (self.variant_file) as vf:
+					variant = json.load(vf)
+					data = self.parse_jsons(variant, data)
+
+			if os.path.isfile(self.overexpression_file):
+				with open (self.overexpression_file) as of:
+					overexpression = json.load(of)
+					data = self.parse_jsons(overexpression, data)
+
+			if os.path.isfile(self.rrna_file):
+				with open (self.rrna_file) as rf:
+					rrna = json.load(rf)
+					data = self.parse_jsons(rrna, data)
+
+			with open(self.filepath, "w") as outfile:
+				json.dump(data, outfile)
+
+	def run(self):
+		if os.path.isfile(self.filepath):
+			f_path, f_name = os.path.split(self.filepath)
+			with open(os.path.join(f_path, "{}.txt".format(os.path.splitext(f_name)[0])), "w") as af:
+				writer = csv.writer(af, delimiter='\t', dialect='excel')
+				writer.writerow(["ORF_ID",
+                                "Contig",
+                                "Start",
+                                "Stop",
+                                "Orientation",
+                                "Cut_Off",
+                                "Pass_Bitscore",
+                                "Best_Hit_Bitscore",
+                                "Best_Hit_ARO",
+                                "Best_Identities",
+                                "ARO",
+                                "Model_type",
+                                "SNPs_in_Best_Hit_ARO",
+								"Other_SNPs",
+                                "Drug Class",
+                                "Resistance Mechanism",
+                                "AMR Gene Family",
+                                "Predicted_DNA",
+                                "Predicted_Protein",
+                                "CARD_Protein_Sequence",
+                                "Percentage Length of Reference Sequence",
+                                "ID",
+                                "Model_ID"])
+
+				if os.path.isfile(self.filepath):
+					with open(self.filepath) as rgi_file:
+						rgi_data = json.load(rgi_file)
+					try:
+						del rgi_data["_metadata"]
+					except:
+						pass
+
+					for hsp in rgi_data:
+						order_perfect = []
+						order_loose = []
+						order_strict = []
+						dna = 0
+						cgList = []
+						hitID = []
+						temp2=[]
+						temp3 = []
+						best_snps = ""
+						other_snps = ""
+
+						for hit in rgi_data[hsp]:
+							if rgi_data[hsp][hit]["type_match"] == "Perfect":
+								order_perfect.append((
+									hit, rgi_data[hsp][hit]["bit_score"], rgi_data[hsp][hit]["perc_identity"]
+								))
+
+							if rgi_data[hsp][hit]["type_match"] == "Strict":
+								order_strict.append((
+									hit, rgi_data[hsp][hit]["bit_score"], rgi_data[hsp][hit]["perc_identity"]
+								))
+
+							if rgi_data[hsp][hit]["type_match"] == "Loose":
+								order_loose.append((
+									hit, rgi_data[hsp][hit]["bit_score"], rgi_data[hsp][hit]["perc_identity"]
+								))
+
+						ordered = []
+						# sort by bitscore and percent identity
+						if len(order_perfect) > 0:
+							ordered = sorted(order_perfect, key=itemgetter(1,2)).pop(0)
+						elif len(order_strict) > 0:
+							ordered = sorted(order_strict, key=itemgetter(1,2)).pop(0)
+						else:
+							ordered = sorted(order_loose, key=itemgetter(1,2)).pop(0)
+
+						ordered = [i for i in ordered]
+
+						if "orf_dna_sequence" in rgi_data[hsp][ordered[0]]:
+							dna = 1
+						if "ARO_category" in rgi_data[hsp][ordered[0]]:
+							for aroctkey in rgi_data[hsp][hit]["ARO_category"]:
+								cgList.append(str(rgi_data[hsp][hit]["ARO_category"][aroctkey]["category_aro_name"].encode('ascii','replace').decode("utf-8")))
+						if "hsp_num:" in rgi_data[hsp][ordered[0]]:
+							hitID.append(rgi_data[hsp][ordered[0]])
+
+						match_dict = {}
+
+						if dna == 1:
+							if len(rgi_data[hsp]) != 0:
+								if rgi_data[hsp][hit]["model_type_id"] == 41091:
+									if "snp" in rgi_data[hsp][ordered[0]]:
+										for x in rgi_data[hsp].values():
+											if "snp" in x.keys():
+												if x['model_id'] == rgi_data[hsp][ordered[0]]['model_id']:
+													temp2.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"])
+													best_snps = ', '.join(temp2)
+												else:
+													temp3.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"] + ":" + x['model_id'])
+													other_snps = ', '.join(temp3)
+									else:
+										best_snps = "n/a"
+										other_snps = "n/a"
+								elif rgi_data[hsp][hit]["model_type_id"] in [40293,40295]:
+									if "snp" in rgi_data[hsp][ordered[0]]:
+										for x in rgi_data[hsp].values():
+											if "snp" in x.keys():
+												if x['model_id'] == rgi_data[hsp][ordered[0]]['model_id']:
+													temp2.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"])
+													best_snps = ', '.join(temp2)
+												else:
+													temp3.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"] + ":" + x['model_id'])
+													other_snps = ', '.join(temp3)
+									else:
+										best_snps = "n/a"
+										other_snps = "n/a"
+								elif rgi_data[hsp][hit]["model_type_id"] == 40292:
+									best_snps = "n/a"
+									other_snps = "n/a"
+								if not other_snps:
+									other_snps = "n/a"
+
+								if rgi_data[hsp][hit]["model_type_id"] in [40295]:
+									percentage_length_reference_sequence = format(((rgi_data[hsp][ordered[0]]["orf_end"] - rgi_data[hsp][ordered[0]]["orf_start"]) /\
+										len(rgi_data[hsp][ordered[0]]["dna_sequence_from_broadstreet"]))*100, '.2f')
+								else:
+									percentage_length_reference_sequence = format((len(rgi_data[hsp][ordered[0]]["orf_prot_sequence"]) /\
+										len(rgi_data[hsp][ordered[0]]["sequence_from_broadstreet"]))*100, '.2f')
+
+								match_dict[hsp] = [hsp,
+								rgi_data[hsp][ordered[0]]["orf_from"],
+								rgi_data[hsp][ordered[0]]["orf_start"],
+								rgi_data[hsp][ordered[0]]["orf_end"],
+								rgi_data[hsp][ordered[0]]["orf_strand"],
+								rgi_data[hsp][ordered[0]]["type_match"],
+								rgi_data[hsp][ordered[0]]["pass_bitscore"],
+								rgi_data[hsp][ordered[0]]["bit_score"],
+								rgi_data[hsp][ordered[0]]["ARO_name"],
+								rgi_data[hsp][ordered[0]]["perc_identity"],
+                                rgi_data[hsp][ordered[0]]["ARO_accession"],
+								rgi_data[hsp][ordered[0]]["model_type"],
+								best_snps,
+								other_snps,
+								"; ".join(rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_name"] for x in rgi_data[hsp][ordered[0]]["ARO_category"] \
+									if rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_class_name"] == 'Drug Class'),
+                                "; ".join(rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_name"] for x in rgi_data[hsp][ordered[0]]["ARO_category"] \
+                                	if rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_class_name"] == 'Resistance Mechanism'),
+                                "; ".join(rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_name"] for x in rgi_data[hsp][ordered[0]]["ARO_category"] \
+                                	if rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_class_name"] == 'AMR Gene Family'),
+								rgi_data[hsp][ordered[0]]["orf_dna_sequence"],
+								rgi_data[hsp][ordered[0]]["orf_prot_sequence"],
+								rgi_data[hsp][ordered[0]]["sequence_from_broadstreet"],
+								# length of hsps / length reference
+								percentage_length_reference_sequence,
+								ordered[0],
+								rgi_data[hsp][ordered[0]]["model_id"]
+								]
+							for key, value in match_dict.items():
+								writer.writerow(value)
+
+
+						else:
+							if len(rgi_data[hsp]) != 0:
+								if rgi_data[hsp][hit]["model_type_id"] == 41091:
+									if "snp" in rgi_data[hsp][ordered[0]]:
+										for x in rgi_data[hsp].values():
+											if "snp" in x.keys():
+												if x['model_id'] == rgi_data[hsp][ordered[0]]['model_id']:
+													temp2.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"])
+													best_snps = ', '.join(temp2)
+												else:
+													temp3.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"] + ":" + x['model_id'])
+													other_snps = ', '.join(temp3)
+									else:
+										best_snps = "n/a"
+										other_snps = "n/a"
+								elif rgi_data[hsp][hit]["model_type_id"] == 40293:
+									if "snp" in rgi_data[hsp][ordered[0]]:
+										for x in rgi_data[hsp].values():
+											if "snp" in x.keys():
+												if x['model_id'] == rgi_data[hsp][ordered[0]]['model_id']:
+													temp2.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"])
+													best_snps = ', '.join(temp2)
+												else:
+													temp3.append(x["snp"]["original"] + str(x["snp"]["position"]) + x["snp"]["change"] + ":" + x['model_id'])
+													other_snps = ', '.join(temp3)
+									else:
+										best_snps = "n/a"
+										other_snps = "n/a"
+								elif rgi_data[hsp][hit]["model_type_id"] == 40292:
+									best_snps = "n/a"
+									other_snps = "n/a"
+
+								match_dict[hsp] = [hsp, "", "", "", "",
+								rgi_data[hsp][ordered[0]]["type_match"],
+								rgi_data[hsp][ordered[0]]["pass_bitscore"],
+								rgi_data[hsp][ordered[0]]["bit_score"],
+								rgi_data[hsp][ordered[0]]["ARO_name"],
+								rgi_data[hsp][ordered[0]]["perc_identity"],
+                                rgi_data[hsp][ordered[0]]["ARO_accession"],
+								rgi_data[hsp][ordered[0]]["model_type"],
+								best_snps,
+								other_snps,
+								"; ".join(rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_name"] for x in rgi_data[hsp][ordered[0]]["ARO_category"] \
+									if rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_class_name"] == 'Drug Class'),
+                                "; ".join(rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_name"] for x in rgi_data[hsp][ordered[0]]["ARO_category"] \
+                                	if rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_class_name"] == 'Resistance Mechanism'),
+                                "; ".join(rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_name"] for x in rgi_data[hsp][ordered[0]]["ARO_category"] \
+                                	if rgi_data[hsp][ordered[0]]["ARO_category"][x]["category_aro_class_name"] == 'AMR Gene Family'),
+								"",
+								rgi_data[hsp][ordered[0]]["orf_prot_sequence"],
+								rgi_data[hsp][ordered[0]]["sequence_from_broadstreet"],
+								format((len(rgi_data[hsp][ordered[0]]["orf_prot_sequence"]) / len(rgi_data[hsp][ordered[0]]["sequence_from_broadstreet"]))*100, '.2f'),
+								ordered[0],
+								rgi_data[hsp][ordered[0]]["model_id"]
+								]
+
+							for key, value in match_dict.items():
+								writer.writerow(value)
+
+	def manual():
+		h = {}
+		h["ORF_ID"] = "Open Reading Frame identifier (internal to RGI)"
+		h["Contig"] = "Source Sequence"
+		h["Start"] = "Start co-ordinate of ORF"
+		h["Stop"] = "End co-ordinate of ORF"
+		h["Orientation"] = "Strand of ORF"
+		h["Cut_Off"] = "RGI Detection Paradigm"
+		h["Pass_Bitscore"] = "STRICT detection model bitscore value cut-off"
+		h["Best_Hit_Bitscore"] = "bitscore value of match to top hit in CARD"
+		h["Best_Hit_ARO"] = "ARO term of top hit in CARD"
+		h["Best_Identities"] = "Percent identity of match to top hit in CARD"
+		h["ARO"] = "ARO accession of top hit in CARD"
+		h["Model_type"] = "CARD detection model type"
+		h["SNPs_in_Best_Hit_ARO"] = "Mutations observed in the ARO term of top hit in CARD (if applicable)"
+		h["Other_SNPs"] = "Mutations observed in ARO terms of other hits indicated by model id (if applicable)"
+		h["Drug Class"] = "ARO Categorization"
+		h["Resistance Mechanism"] = "ARO Categorization"
+		h["AMR Gene Family"] = "ARO Categorization"
+		h["Predicted_DNA"] = "ORF predicted nucleotide sequence"
+		h["Predicted_Protein"] = "ORF predicted protein sequence"
+		h["CARD_Protein_Sequence"] = "Protein sequence of top hit in CARD"
+		h["Percentage Length of Reference Sequence"] = "Percentage Length of Reference Sequence"
+		h["ID"] = "HSP identifier (internal to RGI)"
+		h["Model_ID"] = "CARD detection model id"
+
+		print ("\n")
+		print ("COLUMN","\t\t\t","HELP_MESSAGE")
+		for i in h:
+			print (i,"\t\t\t",h[i])
+		print ("\n")
--- a/app/Database.py
+++ b/app/Database.py
+from app.settings import *
+
+class Database(object):
+	"""Class to create BLAST databases from a card.json file."""
+	def __init__(self, local_database=False):
+		"""Creates Database object."""
+		self.local_database = local_database
+		self.db = path
+		self.data = data_path
+		self.stdout = "2>&1 >> /dev/null" #"2> /dev/null"
+
+		if self.local_database:
+			self.db = LOCAL_DATABASE
+			self.data = LOCAL_DATABASE
+
+	def __repr__(self):
+		"""Returns Database class full object."""
+		return "Database({}".format(self.__dict__)
+
+	def build_databases(self):
+		"""Build BLAST and DIAMOND databases."""
+		self.write_fasta_from_json()
+		self.make_blast_database()
+		self.make_diamond_database()
+		self.write_fasta_from_json_rna()
+
+	def make_blast_database(self):
+		"""Build BLAST database from a FASTA file."""
+		if os.path.isfile(os.path.join(self.db,"proteindb.fsa")) == True and os.path.exists(os.path.join(self.db,"proteindb.fsa")) == True  \
+		   and os.path.exists(os.path.join(self.db,"protein.db.phr")) == True and os.path.exists(os.path.join(self.db,"protein.db.pin")) == True \
+		   and os.path.exists(os.path.join(self.db,"protein.db.psq")) == True:
+		   logger.info("blast DB exists")
+		   pass
+		else:
+			logger.info("create blast DB.")
+			os.system('makeblastdb -in {} -dbtype prot -out {} {stdout}'.format(os.path.join(self.db,"proteindb.fsa"),os.path.join(self.db,"protein.db"),stdout=self.stdout))
+
+	def make_diamond_database(self):
+		"""Build DIAMOND database from a FASTA file."""
+		if os.path.isfile(os.path.join(self.db,"proteindb.fsa")) == True and os.path.exists(os.path.join(self.db,"proteindb.fsa")) == True \
+			and os.path.exists(os.path.join(self.db,"protein.db.dmnd")) == True:
+			logger.info("diamond DB exists")
+			pass
+		else:
+			logger.info("create diamond DB.")
+			os.system('diamond makedb --quiet --in {} --db {} {stdout}'.format(os.path.join(self.db,"proteindb.fsa"),os.path.join(self.db,"protein.db"),stdout=self.stdout))
+
+	def make_custom_db(self, in_file, out_file, db_type="nucl", program="blast"):
+		if program == 'blast':
+			os.system('makeblastdb -in {in_file}  -dbtype {db_type} -out {out_file} \
+				{stdout}'.format( in_file=in_file, db_type=db_type, out_file=out_file, stdout=self.stdout))
+		else:
+			exit("Only NCBI BLAST is supported.")
+
+	def write_fasta_from_json(self):
+		"""Creates a fasta file from card.json file."""
+		if os.path.isfile(os.path.join(self.db, "proteindb.fsa")):
+			# logger.info("Database already exists.")
+			return
+		else:
+			try:
+				with open(os.path.join(self.data, "card.json"), 'r') as jfile:
+					j = json.load(jfile)
+			except Exception as e:
+				logger.error(e)
+				exit()
+
+			with open(os.path.join(self.db, "proteindb.fsa"), 'w') as fout:
+				for i in j:
+					if i.isdigit():
+		            	# model_type: protein homolog model
+						if j[i]['model_type_id'] == '40292':
+							try:
+								pass_bit_score = j[i]['model_param']['blastp_bit_score']['param_value']
+							except KeyError:
+								logger.warning("No bitscore for model (%s, %s). RGI will omit this model and keep running." \
+									% (j[i]['model_id'], j[i]['model_name']))
+								logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+							else:
+								try:
+									for seq in j[i]['model_sequences']['sequence']:
+										fout.write('>%s_%s | model_type_id: 40292 | pass_bitscore: %s | %s\n' % (i, seq, pass_bit_score, j[i]['ARO_name']))
+										fout.write('%s\n' %(j[i]['model_sequences']['sequence'][seq]['protein_sequence']['sequence']))
+								except Exception as e:
+									logger.warning("No model sequences for model (%s, %s). RGI will omit this model and keep running." \
+										% (j[i]['model_id'], j[i]['model_name']))
+									logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+
+
+		            	# model_type: protein variant model
+						elif j[i]["model_type_id"] == "40293":
+							try:
+								pass_bit_score = j[i]['model_param']['blastp_bit_score']['param_value']
+							except KeyError:
+								logger.warning("No bitscore for model (%s, %s). RGI will omit this model and keep running." \
+									% (j[i]['model_id'], j[i]['model_name']))
+								logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+							else:
+								try:
+									snpList = [j[i]['model_param']['snp']['param_value'][k] for k in j[i]['model_param']['snp']['param_value']]
+								except Exception as e:
+									logger.warning("No snp for model (%s, %s). RGI will omit this model and keep running." \
+										% (j[i]['model_id'], j[i]['model_name']))
+									logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+
+								try:
+									for seq in j[i]['model_sequences']['sequence']:
+										fout.write('>%s_%s | model_type_id: 40293 | pass_bit_score: %s | SNP: %s | %s\n' \
+											% (i, seq, pass_bit_score, ','.join(snpList), j[i]['ARO_name']))
+										fout.write('%s\n' % (j[i]['model_sequences']['sequence'][seq]['protein_sequence']['sequence']))
+								except Exception as e:
+									logger.warning("No model sequences for model (%s, %s). RGI will omit this model and keep running." \
+										% (j[i]['model_id'], j[i]['model_name']))
+									logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+
+		            	# model_type: protein overexpression model
+						elif j[i]["model_type_id"] == "41091":
+							try:
+								pass_bit_score = j[i]["model_param"]["blastp_bit_score"]["param_value"]
+							except KeyError:
+								logger.warning("No bitscore for model (%s, %s). RGI will omit this model and keep running." \
+									% (j[i]['model_id'], j[i]['model_name']))
+								logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+							else:
+								try:
+									snpList = [j[i]['model_param']['snp']['param_value'][k] for k in j[i]['model_param']['snp']['param_value']]
+								except Exception as e:
+									logger.warning("No snp for model (%s, %s). RGI will omit this model and keep running." \
+										% (j[i]['model_id'], j[i]['model_name']))
+									logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+
+								try:
+									for seq in j[i]['model_sequences']['sequence']:
+										fout.write('>%s_%s | model_type_id: 41091 | pass_bit_score: %s | SNP: %s | %s\n' \
+											% (i, seq, pass_bit_score, ','.join(snpList), j[i]['ARO_name']))
+										fout.write('%s\n' % (j[i]['model_sequences']['sequence'][seq]['protein_sequence']['sequence']))
+								except Exception as e:
+									logger.warning("No model sequences for model (%s, %s). RGI will omit this model and keep running." \
+										% (j[i]['model_id'], j[i]['model_name']))
+									logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+
+
+	def write_fasta_from_json_rna(self):
+		snpList_16s = []
+		snpList_23s = []
+		"""Creates a fasta file for 16S and 23S data from card.json file."""
+		if os.path.isfile(os.path.join(self.db, "rnadb.fsa")):
+			# logger.info("RNA database already exists.")
+			return
+		else:
+			with open(os.path.join(self.data, "card.json"), 'r') as jfile:
+				j = json.load(jfile)
+
+			with open(os.path.join(self.db, "rnadb.fsa"), 'w') as fout:
+				for i in j:
+					if i.isdigit():
+						# model_type: ribosomal RNA model
+						if j[i]["model_type_id"] == "40295":
+							try:
+								pass_bit_score = j[i]['model_param']['blastn_bit_score']['param_value']
+							except KeyError:
+								logger.warning("No bitscore for model (%s, %s). RGI will omit this model and keep running." \
+									% (j[i]['model_id'], j[i]['model_name']))
+								logger.info("Please let the CARD Admins know! Email: card@mcmaster.ca")
+							else:
+								snpList = [j[i]['model_param']['snp']['param_value'][k] for k in j[i]['model_param']['snp']['param_value']]
+								for s in snpList:
+									if "16S" in j[i]['ARO_name']:
+										if s not in snpList_16s:
+											snpList_16s.append(s)
+									if "23S" in j[i]['ARO_name']:
+										if s not in snpList_23s:
+											snpList_23s.append(s)
+
+								for seq in j[i]['model_sequences']['sequence']:
+									if j[i]['model_sequences']['sequence'][seq]['dna_sequence']['strand'] == "-":
+										basecomplement = self.complementary_strand(j[i]['model_sequences']['sequence'][seq]['dna_sequence']['sequence'])
+							
+										fout.write('>%s_%s | model_type_id: 40295 | pass_bit_score: %s | SNP: %s | %s\n' \
+										% (i, seq, pass_bit_score, ','.join(snpList), j[i]['ARO_name']))
+										fout.write('%s\n' % (basecomplement))
+
+									else:
+										fout.write('>%s_%s | model_type_id: 40295 | pass_bit_score: %s | SNP: %s | %s\n' \
+										% (i, seq, pass_bit_score, ','.join(snpList), j[i]['ARO_name']))
+										fout.write('%s\n' % (j[i]['model_sequences']['sequence'][seq]['dna_sequence']['sequence']))
+
+		# write snps to file
+		with open(os.path.join(self.db,"16s_rRNA.txt"), 'w') as f16s:
+			snpList_16s.sort()
+			f16s.write(',\n'.join(snpList_16s))
+
+		with open(os.path.join(self.db, "23s_rRNA.txt"), 'w') as f23s:
+			snpList_23s.sort()
+			f23s.write(',\n'.join(snpList_23s))
+
+
+	def complementary_strand(self, strand):
+		'''Takes a DNA strand string and returns its opposite base pair match.'''
+		self.trans = { "T": "A", "A": "T", "G": "C", "C": "G" , "N": "N", "M":"K", "K":"M", \
+					   "R":"Y", "Y":"R", "S":"S", "W":"W", "B":"V", "V":"B", "H":"D", "D":"H"}
+		complement = []
+
+		for base in strand: 
+			complement.append(self.trans[base])
+		
+		complement_seq = ''.join(complement)
+		return complement_seq
+
+
+
--- a/app/Diamond.py
+++ b/app/Diamond.py
+from app.settings import *
+
+class Diamond(object):
+	"""Class to create Diamond object and align for protein and translated DNA searches."""
+	def __init__(self,input_file, output_file=None, program = 'blastp', num_threads=32, local_database=False ):
+		"""Creates Diamond object for running DIAMOND algorithm."""
+		self.input_file = input_file
+		if output_file == None:
+			f_path, f_name = os.path.split(input_file)
+			self.output_file = os.path.join(f_path,"{}.blastRes.xml".format(f_name))
+		else:
+			self.output_file = output_file
+
+		self.local_database = local_database
+		self.db = path
+		if self.local_database:
+			self.db = LOCAL_DATABASE
+
+		self.program = program
+		self.num_threads = num_threads
+		self.index_chunks = 1
+		self.block_size = 1
+		self.outfmt = 5
+
+	def __repr__(self):
+		"""Returns Diamond class full object."""
+		return "Diamond({}".format(self.__dict__)
+
+	def run(self):
+		"""Runs DIAMOND algorithm.""" 
+		# logger.info("run diamond")
+		os.system('diamond {program} --in {in_ref} --db {db} \
+				   --query {input} --outfmt {outfmt} --out {output_file}  \
+				   --threads {num_threads}  --index-chunks {index_chunks} \
+				   --block-size {block_size}  \
+				   --salltitles  --quiet --more-sensitive' \
+					.format(
+						program=self.program,
+						in_ref=os.path.join(self.db,"proteindb.fsa"),
+						db=os.path.join(self.db,"protein.db"),
+						input=self.input_file,
+						output_file=self.output_file,
+						num_threads=self.num_threads, 
+						index_chunks=self.index_chunks,
+						block_size=self.block_size,
+						outfmt=self.outfmt
+					)
+				)
--- a/app/Filter.py
+++ b/app/Filter.py
+from app.Base import BaseModel
+from app.HomologModel import Homolog
+from app.VariantModel import Variant
+from app.OverexpressionModel import Overexpression
+from app.RrnaModel import Rrna
+from app.Blast import Blast
+from app.Database import Database
+from app.ConvertRGIJsonToTSV import ConvertJsonToTSV
+from app.settings import *
+
+import hashlib
+import multiprocessing
+
+class Filter(BaseModel):
+
+	"""This class takes in blast xml file and card.json file and producess perfect strict paradigm for RGI """
+	def __init__(self, input_type, loose, input_sequence, xml_file, card_json, input_file, output_file, num_threads ,rgi_obj=None):
+		self.input_type = input_type
+		self.xml_file = xml_file
+		self.card_json = card_json
+		self.input_filename = input_file
+		self.input_sequence = input_sequence
+		self.loose = loose
+		self.blast_results = {}
+		self.rna_results = {}
+		self.rgi_obj = rgi_obj
+		self.num_threads = num_threads
+		self.working_directory = rgi_obj.working_directory
+
+		if output_file == None:
+			f_path, f_name = os.path.split(input_file)
+			self.output_file = os.path.join(f_path,"{}.Report.json".format(f_name))
+		else:
+			self.output_file = output_file
+
+	def __repr__(self):
+		"""Returns Filter class full object."""
+		return "Filter({}".format(self.__dict__)
+
+	def getHashName(self, name):
+		m = hashlib.md5()
+		t = time.gmtime()
+		m.update(name + str(t))
+		return m.hexdigest()
+
+	def worker(self, model_type):
+		logger.info("{}_worker started...".format(model_type))
+		# results = {}
+		try:
+			if model_type == "homolog":
+				obj = Homolog(self.input_type, self.loose, self.input_sequence, self.xml_file, self.working_directory, self.rgi_obj.local_database)
+			if model_type == "variant":
+				obj = Variant(self.input_type, self.loose, self.input_sequence, self.xml_file, self.working_directory, self.rgi_obj.local_database)
+			if model_type == "overexpression":
+				obj = Overexpression(self.input_type, self.loose, self.input_sequence, self.xml_file, self.working_directory, self.rgi_obj.local_database)
+			results = obj.run()
+			logger.info("save {} results...".format(model_type))
+			file_name = os.path.basename(self.input_sequence)
+			with open(os.path.join(self.working_directory,"{}.temp.{}.json".format(file_name, model_type)), 'w') as fout:
+				fout.write(json.dumps(results))
+
+		except Exception as e:
+			logger.warning("Exception: {} -> {} -> model_type: {}".format(type(e), e, model_type))
+
+	def async_func(self):
+		p = multiprocessing.Process(target=self.worker, args=("homolog",))
+		p2 = multiprocessing.Process(target=self.worker, args=("variant",))
+		p3 = multiprocessing.Process(target=self.worker, args=("overexpression",))
+		p4 = multiprocessing.Process(target=self.process_rrna, args=("rrna",))
+		logger.info("{} -> {}".format(p.pid, p.name))
+		logger.info("{} -> {}".format(p2.pid, p2.name))
+		logger.info("{} -> {}".format(p3.pid, p3.name))
+		logger.info("{} -> {}".format(p4.pid, p4.name))
+		p.start()
+		p2.start()
+		p3.start()
+		p4.start()
+
+	def prepare_output(self):
+		"""
+		Read all json into one json results file
+		"""
+		logger.info("prepare output(s) for input: {}".format(self.input_sequence))
+		file_name = os.path.basename(self.input_sequence)
+		obj=ConvertJsonToTSV(self.output_file, \
+			os.path.join(self.working_directory,"{}.temp.{}.json".format(file_name, "homolog")), \
+			os.path.join(self.working_directory,"{}.temp.{}.json".format(file_name, "variant")), \
+			os.path.join(self.working_directory,"{}.temp.{}.json".format(file_name, "overexpression")), \
+			os.path.join(self.working_directory,"{}.temp.{}.json".format(file_name, "rrna"))
+			)
+		# combine 3 json files			
+		obj.combine_jsons()	
+		# write tsv
+		obj.run()
+
+	def cleanup(self):
+		self.rgi_obj.clean_files()
+
+	def process_xml_file(self):
+		""" This function is used to process blast xml file """
+		model_thread = multiprocessing.Process(target=self.async_func, args=())
+		model_thread.start()
+		model_thread.join()
+		prepare_output_thread = multiprocessing.Process(target=self.prepare_output, args=())
+		prepare_output_thread.start()
+		prepare_output_thread.join()
+		cleanup_thread = multiprocessing.Process(target=self.cleanup, args=())
+		cleanup_thread.start()
+
+	def process_rrna(self, model_type="rrna"):
+		logger.info("rRNA process: {}".format(self.input_type))
+		if self.input_type == "protein":
+			logger.info("Skip rRNA...")
+		else:
+			logger.info("rRNA process")
+			self.format_fasta()
+			""" Cleans rRNA model previous result and temporal files"""
+			self.file_name = os.path.basename(self.input_sequence)
+			d, x = self.create_db_query()
+			rrna_obj = Rrna(self.input_sequence, self.output_file, d, x, self.loose, self.rgi_obj.local_database)
+			res = rrna_obj.run()
+
+
+			file_name = os.path.basename(self.input_sequence)
+			with open(os.path.join(self.working_directory,"{}.{}.json".format(file_name, model_type)), 'w') as fout:
+				fout.write(json.dumps(res))			
+
+			# with open(os.path.splitext(self.output_file)[0]+".{}.json".format(model_type), 'w') as fout:
+			# 	fout.write(json.dumps(res))
+
+			logger.info("rRNA process Done.")
+
+	def create_db_query(self):
+		logger.info("create_db_query")
+		# make_custom_db(self, in_file, out_file, db_type="diamond")
+		in_file = self.input_sequence
+		f_path, f_name = os.path.split(self.input_sequence)
+		out_file = os.path.join(self.working_directory, "{}.db".format(f_name))
+		xml_file = os.path.join(self.working_directory,"{}.blastRes.rrna.xml".format(f_name))
+		logger.info("DB from user query")
+		db_obj = Database(self.rgi_obj.local_database)
+		db_obj.make_custom_db(in_file, out_file)
+		self.blast_reference_to_db_query(out_file, xml_file)
+		return out_file, xml_file
+
+	def blast_reference_to_db_query(self, db, xml_file):
+		logger.info("blast_reference_to_db_query")
+		# blast all rrna db against query db
+		rrna_db_fasta = os.path.join(self.rgi_obj.db, "rnadb.fsa")
+		blast_obj = Blast(rrna_db_fasta, program='blastn', output_file=xml_file, local_database=self.rgi_obj.local_database, num_threads=self.num_threads)
+		blast_obj.run_custom(db)
+
+	def format_fasta(self):
+		f_path, f_name = os.path.split(self.input_sequence)
+		temp_file = os.path.join(self.working_directory, "{}.temp".format(f_name))
+		with open(temp_file, 'w') as fout:
+			for record in SeqIO.parse(self.input_sequence, 'fasta'):
+				header = record.id
+				seq = record.seq
+				fout.write(">{}\n{}\n".format(header, seq))
+		self.input_sequence = temp_file
+
+	def encode_header(self,name):
+		return hashlib.md5(name.encode('utf-8')).hexdigest()
+
+	def write_output(self):
+		file_name = os.path.basename(self.input_sequence)
+		logger.info(self.output_file)
+		with open(self.output_file, 'w') as rrna_js:
+			rrna_js.write(json.dumps(self.rna_results))
+
+	def run(self):
+		if(os.path.exists(self.xml_file)):
+			self.process_xml_file()
+		else:
+			logger.error("missing blast xml file({}). Please check if input_type: '{}' correspond with input file: '{}' or use '--low_quality' flag for short contigs to predicts partial genes." \
+					.format(self.xml_file, self.input_type, self.input_sequence))
--- a/app/Galaxy.py
+++ b/app/Galaxy.py
+from app.settings import *
+import shutil
+
+class Galaxy(object):
+	def __init__(self, database, debug):
+		self.database = database
+		self.debug = debug
+		
+		if self.debug:
+			logger.setLevel(10)
+
+	def __repr__(self):
+		"""Returns Galaxy class full object."""
+		return "Galaxy({}".format(self.__dict__)
+
+	def load_db_galaxy(self):
+
+		"""
+		verify that we have the following files at the specified location:
+
+		# CARD json data file
+		- card.json
+
+		# diamond blast
+		- protein.db.dmnd
+
+		# ncbi blast
+		- protein.db.phr
+		- protein.db.pin
+		- protein.db.psq
+
+		# Protein fasta file
+		- proteindb.fsa
+
+		"""
+		needed_files = ['card.json','proteindb.fsa','protein.db.dmnd','protein.db.phr','protein.db.pin','protein.db.psq']
+		found_files = []
+
+		files = os.listdir(self.database)
+
+		for f in files:
+			if not f.startswith('.'):
+				found_files.append(f)
+
+		missing_files = list(set(needed_files) - set(found_files))
+
+		if len(missing_files) > 0:
+		 	logger.error("missing database files {}".format(missing_files))
+		 	exit()
+		logger.info("found all needed files: {}".format(json.dumps(needed_files, indent=2)))
+		# Files found - move files into _data and _db directories
+		logger.info("files found - now move files into _data and _db directories")
+		for f in os.listdir(self.database):
+			if not f.startswith('.') and f in needed_files:
+				src_path = os.path.join(self.database,f)
+				dst_path = ""
+				if f in ['card.json']:
+					dst_path = os.path.join(data_path,f)
+				else:
+					dst_path = os.path.join(path,f)
+				logger.info("copy {} to {}".format(src_path,dst_path))
+				try:
+					shutil.copy2(src_path,dst_path)
+				except Exception as e:
+					logger.error("failed to copy file: {}".format(e))
+					exit()
+
+				
--- a/app/Heatmap.py
+++ b/app/Heatmap.py
--- a/app/HomologModel.py
+++ b/app/HomologModel.py
+from app.Base import BaseModel
+from app.settings import *
+
+class Homolog(BaseModel):
+	"""Class for homology searches."""
+	def __init__(self, input_type, loose, input_sequence, xml_file, working_directory, local_database=False):
+		self.input_type = input_type
+		self.loose = loose
+		self.input_sequence = input_sequence
+		self.xml_file = xml_file
+		self.output = {}
+		self.working_directory = working_directory
+
+		self.local_database = local_database
+		self.data = data_path
+
+		if self.local_database:
+			self.db = LOCAL_DATABASE
+			self.data = LOCAL_DATABASE
+
+	def __repr__(self):
+		"""Returns Homolog class full object."""
+		return "Homolog({}".format(self.__dict__)
+
+	def run(self):
+		"""Runs homolog search."""
+		blastResults = {}
+		
+		predicted_genes_dict = {}
+		submitted_proteins_dict = {}
+
+		if self.input_type == "contig":
+			predicted_genes_dict = self.get_orf_dna_sequence(self.input_sequence,self.input_type)
+
+		if self.input_type == "protein":
+			submitted_proteins_dict = self.get_submitted_protein_sequence(self.input_sequence)
+
+		with open(os.path.join(self.data,"card.json")) as json_file:
+			json_data = json.load(json_file)
+
+		with open(self.xml_file, 'r') as result_handle:
+			blast_records = NCBIXML.parse(result_handle)
+
+			for blast_record in blast_records:
+				perfect = {}
+				strict = {}
+				loose = {}
+
+				for alignment in blast_record.alignments:
+					alignTitle = alignment.title
+					orfInfo = blast_record.query.encode('ascii','replace')
+
+					c = 0
+					barc = 0
+					for eachc in orfInfo:
+						if barc >= 6:
+							break
+						elif eachc == '|':
+							barc += 1
+							c += 1
+						else:
+							c += 1
+					orffrom = orfInfo[c:]
+
+					modelTypeID = self.extract_nth_bar(alignTitle, 0)
+					
+					if modelTypeID == 40292:
+						spacepos = alignTitle.index(' ')
+						hitid = alignTitle[0:spacepos]
+						hitid = hitid.encode('ascii','replace')
+						modelDescrpt =alignTitle[alignTitle.index(' ')+1:]
+						underscoreinMD = modelDescrpt.index('_')
+						modelID = modelDescrpt[0:underscoreinMD]
+						seqinModel = modelDescrpt[underscoreinMD+1: modelDescrpt.index(' ')]
+
+						pass_bitscore = "{}".format(self.extract_nth_bar(alignment.title, 1))
+						pass_evalue = "{}".format("n/a")		
+
+						# logger.info("pass_evalue: {}".format(pass_evalue))
+						# logger.info("pass_bitscore: {}".format(pass_bitscore))
+
+						init = 0
+
+						for hsp in alignment.hsps:
+							querySeq = hsp.query.replace('-', '')
+							realQueryLength = len(querySeq)
+							card_sequence = ""
+							orf_protein_sequence = ""
+							try:
+								card_sequence = str(json_data[modelID]["model_sequences"]["sequence"][seqinModel]["protein_sequence"]["sequence"])
+							except Exception as e:
+								logger.warning("Exception : {} -> {} -> Model({}) missing in database. Please generate new database.".format(type(e), e, modelID))
+								
+							if predicted_genes_dict:
+								if orfInfo.strip() in predicted_genes_dict.keys():
+									orf_protein_sequence = str(Seq(predicted_genes_dict[orfInfo.decode()], generic_dna).translate(table=11)).strip("*")
+								else:
+									orf_protein_sequence = str(Seq(predicted_genes_dict[orfInfo.decode()[:orfInfo.decode().index(' # ')]], generic_dna).translate(table=11)).strip("*")
+						
+							if submitted_proteins_dict:
+								orf_protein_sequence = str(submitted_proteins_dict[orfInfo.decode().split(" ")[0]])
+
+							if card_sequence.upper() == orf_protein_sequence.upper():
+								""" Perfect hits """
+								# logger.info("Perfect hits")
+								ppinsidedict = {}
+								ppinsidedict["type_match"] = "Perfect"
+								ppinsidedict["model_id"] = modelID
+								ppinsidedict["orf_strand"] = self.extract_nth_bar(orfInfo.decode(), 0)
+								ppinsidedict["orf_start"] = self.extract_nth_bar(orfInfo.decode(), 1)
+								ppinsidedict["orf_end"] = self.extract_nth_bar(orfInfo.decode(), 2)
+								ppinsidedict["orf_from"] = orffrom.decode()
+								ppinsidedict["model_name"] = json_data[modelID]["model_name"]
+								ppinsidedict["model_type"] = json_data[modelID]["model_type"]
+								ppinsidedict["model_type_id"] = modelTypeID
+								ppinsidedict["pass_evalue"] = pass_evalue
+								ppinsidedict["pass_bitscore"] = pass_bitscore
+								ppinsidedict["ARO_accession"] = json_data[modelID]["ARO_accession"]
+								ppinsidedict["ARO_name"] = json_data[modelID]["ARO_name"]
+								ppinsidedict["ARO_category"] = json_data[modelID]["ARO_category"]
+								ppinsidedict["evalue"] = hsp.expect
+								ppinsidedict["bit_score"] = hsp.bits
+								ppinsidedict["max_identities"] = hsp.identities
+								ppinsidedict["cvterm_id"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["NCBI_taxonomy"]["NCBI_taxonomy_cvterm_id"]
+								ppinsidedict["query"] = hsp.query
+								ppinsidedict["match"] = hsp.match
+								ppinsidedict["sequence_from_db"] = hsp.sbjct
+								ppinsidedict["sequence_from_broadstreet"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["protein_sequence"]["sequence"]
+								ppinsidedict["dna_sequence_from_broadstreet"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["dna_sequence"]["sequence"]
+
+								if self.input_type == 'contig':
+									ppinsidedict["query_start"] = self.extract_nth_hash(orfInfo.decode(), 1) + (hsp.query_start - 1)*3
+									ppinsidedict["query_end"] = self.extract_nth_hash(orfInfo.decode(), 1) + (hsp.query_start - 1)*3 + realQueryLength*3 - 1
+									ppinsidedict["orf_strand"] = self.extract_nth_hash(orfInfo.decode(), 3)
+									ppinsidedict["orf_start"] = self.extract_nth_hash(orfInfo.decode(), 1)
+									ppinsidedict["orf_end"] = self.extract_nth_hash(orfInfo.decode(), 2)
+									ppinsidedict["orf_from"] = self.extract_nth_hash(orfInfo.decode(), 0).rstrip()
+
+									if orfInfo.decode().split(' # ')[0] in predicted_genes_dict:
+										ppinsidedict["orf_dna_sequence"] = predicted_genes_dict[orfInfo.decode().split(' # ')[0]] 
+										ppinsidedict["orf_prot_sequence"] = str(Seq(predicted_genes_dict[orfInfo.decode().split(' # ')[0]], generic_dna).translate(table=11)).strip("*")
+									else:
+										ppinsidedict["orf_dna_sequence"] = ""
+										ppinsidedict["orf_prot_sequence"] = ""
+
+								elif self.input_type == 'protein':
+									ppinsidedict["query_start"] = hsp.query_start
+									ppinsidedict["query_end"] = hsp.query_start + realQueryLength
+									ppinsidedict["query_from"] = blast_record.query
+									ppinsidedict["orf_prot_sequence"] = orf_protein_sequence
+
+								elif self.input_type == 'read':
+									pass
+
+								ppinsidedict["perc_identity"] = float(format(float(ppinsidedict["max_identities"]*100) / len(ppinsidedict["query"]),'.2f'))
+								perfect["{}|hsp_num:{}".format(hitid.decode(),init)] = ppinsidedict
+								init += 1
+
+							elif hsp.bits >= float(pass_bitscore):
+								""" Strict hits """
+								# logger.info("Strict hits")
+								insidedict = {}
+								insidedict["type_match"] = "Strict"
+								insidedict["orf_strand"] = self.extract_nth_bar(orfInfo.decode(), 0)
+								insidedict["orf_start"] = self.extract_nth_bar(orfInfo.decode(), 1)							
+								insidedict["orf_end"] = self.extract_nth_bar(orfInfo.decode(), 2)
+								insidedict["orf_from"] = orffrom.decode()
+								insidedict["model_name"] = json_data[modelID]["model_name"]
+								insidedict["model_type"] = json_data[modelID]["model_type"]
+								insidedict["model_type_id"] = modelTypeID
+								insidedict["model_id"] = modelID
+								insidedict["pass_evalue"] = pass_evalue
+								insidedict["pass_bitscore"] = pass_bitscore
+								insidedict["ARO_accession"] = json_data[modelID]["ARO_accession"]
+								insidedict["ARO_name"] = json_data[modelID]["ARO_name"]
+								insidedict["ARO_category"] = json_data[modelID]["ARO_category"]
+								insidedict["evalue"] = hsp.expect
+								insidedict["bit_score"] = hsp.bits
+								insidedict["max_identities"] = hsp.identities
+								insidedict["cvterm_id"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["NCBI_taxonomy"]["NCBI_taxonomy_cvterm_id"]
+								insidedict["query"] = hsp.query
+								insidedict["match"] = hsp.match
+								insidedict["sequence_from_db"] = hsp.sbjct
+								insidedict["sequence_from_broadstreet"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["protein_sequence"]["sequence"]
+								insidedict["dna_sequence_from_broadstreet"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["dna_sequence"]["sequence"]
+
+								if self.input_type == 'contig':
+									insidedict["query_start"] = self.extract_nth_hash(orfInfo.decode(), 1) + (hsp.query_start - 1)*3
+									insidedict["query_end"] = self.extract_nth_hash(orfInfo.decode(), 1) + (hsp.query_start - 1)*3 + realQueryLength*3 - 1
+									insidedict["orf_strand"] = self.extract_nth_hash(orfInfo.decode(), 3)
+									insidedict["orf_start"] = self.extract_nth_hash(orfInfo.decode(), 1)
+									insidedict["orf_end"] = self.extract_nth_hash(orfInfo.decode(), 2)
+									insidedict["orf_from"] = self.extract_nth_hash(orfInfo.decode(), 0).rstrip()
+									
+									if orfInfo.decode().split(' # ')[0] in predicted_genes_dict:
+										insidedict["orf_dna_sequence"] = predicted_genes_dict[orfInfo.decode().split(' # ')[0]] 
+										insidedict["orf_prot_sequence"] = str(Seq(predicted_genes_dict[orfInfo.decode().split(' # ')[0]], generic_dna).translate(table=11)).strip("*")
+									else:
+										insidedict["orf_dna_sequence"] = ""
+										insidedict["orf_prot_sequence"] = ""									
+
+								elif self.input_type == 'protein':
+									insidedict["query_start"] = hsp.query_start
+									insidedict["query_end"] = hsp.query_start + realQueryLength
+									insidedict["query_from"] = blast_record.query
+									insidedict["orf_prot_sequence"] = orf_protein_sequence
+
+								elif self.input_type == 'read':
+									pass
+
+								insidedict["perc_identity"] = float(format(float(insidedict["max_identities"]*100) / len(insidedict["query"]),'.2f'))
+
+								strict["{}|hsp_num:{}".format(hitid.decode(),init)] = insidedict
+								init += 1
+
+							else:
+								""" Loose hits """
+								# logger.info("Loose hits")
+								linsidedict = {}
+								linsidedict["type_match"] = "Loose"
+								linsidedict["orf_strand"] = self.extract_nth_bar(orfInfo.decode(), 0)
+								linsidedict["orf_start"] = self.extract_nth_bar(orfInfo.decode(), 1)
+								linsidedict["orf_end"] = self.extract_nth_bar(orfInfo.decode(), 2)
+								linsidedict["orf_from"] = orffrom.decode().strip()
+								linsidedict["model_name"] = json_data[modelID]["model_name"]
+								linsidedict["model_type"] = json_data[modelID]["model_type"]
+								linsidedict["model_type_id"] = modelTypeID
+								linsidedict["pass_evalue"] = pass_evalue
+								linsidedict["pass_bitscore"] = pass_bitscore
+								linsidedict["model_id"] = modelID
+								linsidedict["ARO_accession"] = json_data[modelID]["ARO_accession"]
+								linsidedict["ARO_name"] = json_data[modelID]["ARO_name"]
+								linsidedict["ARO_category"] = json_data[modelID]["ARO_category"]
+								linsidedict["evalue"] = hsp.expect
+								linsidedict["max_identities"] = hsp.identities
+								linsidedict["bit_score"] = hsp.bits
+								linsidedict["cvterm_id"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["NCBI_taxonomy"]["NCBI_taxonomy_cvterm_id"]
+								linsidedict["query"] = hsp.query
+								linsidedict["match"] = hsp.match
+								linsidedict["sequence_from_db"] = hsp.sbjct
+								linsidedict["sequence_from_broadstreet"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["protein_sequence"]["sequence"]
+								linsidedict["dna_sequence_from_broadstreet"] = json_data[modelID]["model_sequences"]["sequence"][seqinModel]["dna_sequence"]["sequence"]
+
+								if self.input_type == 'contig':
+									linsidedict["query_start"] = self.extract_nth_hash(orfInfo.decode(), 1) + (hsp.query_start - 1)*3
+									linsidedict["query_end"] = self.extract_nth_hash(orfInfo.decode(), 1) + (hsp.query_start - 1)*3 + realQueryLength*3 - 1
+									linsidedict["orf_strand"] = self.extract_nth_hash(orfInfo.decode(), 3)
+									linsidedict["orf_start"] = self.extract_nth_hash(orfInfo.decode(), 1)
+									linsidedict["orf_end"] = self.extract_nth_hash(orfInfo.decode(), 2)
+									linsidedict["orf_from"] = self.extract_nth_hash(orfInfo.decode(), 0)
+
+									if orfInfo.decode().split(' # ')[0] in predicted_genes_dict:
+										linsidedict["orf_dna_sequence"] = predicted_genes_dict[orfInfo.decode().split(' # ')[0]]
+										linsidedict["orf_prot_sequence"] = str(Seq(predicted_genes_dict[orfInfo.decode().split(' # ')[0]], generic_dna).translate(table=11)).strip("*")
+									else:
+										linsidedict["orf_dna_sequence"] = ""
+										linsidedict["orf_prot_sequence"] = ""
+
+								elif self.input_type == 'protein':
+									linsidedict["query_start"] = hsp.query_start
+									linsidedict["query_end"] = hsp.query_start + realQueryLength
+									linsidedict["query_from"] = blast_record.query
+									linsidedict["orf_prot_sequence"] = orf_protein_sequence
+
+								elif self.input_type == 'read':
+									pass
+
+								linsidedict["perc_identity"] = float(format(float(linsidedict["max_identities"]*100) / len(linsidedict["query"]), '.2f'))
+								loose["{}|hsp_num:{}".format(hitid.decode(),init)] = linsidedict
+
+								init += 1
+
+				blastResults = self.results(blastResults, blast_record.query, perfect, strict , loose)
+
+			return blastResults
+