MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker information file mpa_v20_m200_marker_info.txt.bz2 can be found in the Download page here](https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v20_m200_marker_info.txt.bz2)) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker information file `mpa_v20_m200_marker_info.txt.bz2` can be found in the Download page here](https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v20_m200_marker_info.txt.bz2)) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
* unambiguous taxonomic assignments;
* accurate estimation of organismal relative abundance;
...
...
@@ -29,7 +44,7 @@ If you use StrainPhlAn, please cite the MetaPhlAn2 paper and the following Strai
-------------
## MetaPhlAn and StrainPhlAn resources
## MetaPhlAn and StrainPhlAn tutorials and resources
In addition to the information on this page, you can refer to the following additional resources.
...
...
@@ -43,15 +58,18 @@ In addition to the information on this page, you can refer to the following addi
* The related [bioBakery workflows](https://bitbucket.org/biobakery/biobakery/wiki/biobakery_workflows)
-------------
## Pre-requisites
MetaPhlAn requires *python 2.7* or higher with argparse, tempfile and[*numpy*](http://www.numpy.org/) libraries installed
(apart for numpy they are usually installed together with the python distribution).
MetaPhlAn requires *python 2.7* or higher with argparse, tempfile,[numpy](http://www.numpy.org/), and [Biopython](https://biopython.org/) libraries installed
(apart for numpy and Biopython, the others are usually installed together with the python distribution).
Python3 is also now supported.
MetaPhlAn requires the `read_fastx.py` script to be present in the system path, if not found MetaPhlAn will try to locate it in the folder containing the `metaphlan2.py`
script under `utils/read_fastx.py`.
In case you moved the `metaphlan2.py` script, please export the `read_fastx.py` script in your PATH bash variable.
**If you provide the SAM output of [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) as input, there are no additional prerequisite.**
* If you would like to use the BowTie2 integrated in MetaPhlAn, you need to have BowTie2 version 2.0.0 or higher and perl installed (bowtie2 needs to be in the system path with execute _and_ read permission)
...
...
@@ -62,22 +80,8 @@ MetaPhlAn requires *python 2.7* or higher with argparse, tempfile and [*numpy*](
* MetaPhlAn is not tightly integrated with advanced heatmap plotting with [hclust2](https://bitbucket.org/nsegata/hclust2) and cladogram visualization with [GraPhlAn](https://bitbucket.org/nsegata/graphlan/wiki/Home). If you use such visualization tool please refer to their prerequisites.
----------------------
## Installation
MetaPhlAn 2.0 can be obtained by either
Dowloading MetaPhlAn v2.0 [here](https://bitbucket.org/biobakery/metaphlan2/get/default.zip) or [here](https://www.dropbox.com/s/ztqr8qgbo727zpn/metaphlan2.zip?dl=0)
This section presents some basic usages of MetaPhlAn2, for more advanced usages, please see at [its wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
...
...
@@ -97,55 +101,28 @@ Here is the basic example to profile a metagenome from raw reads (requires BowTi
It is highly recommended to save the intermediate BowTie2 output for re-running MetaPhlAn extremely quickly (--bowtie2out), and use multiple CPUs (--nproc) if available:
It is highly recommended to save the intermediate BowTie2 output for re-running MetaPhlAn extremely quickly (`--bowtie2out`), and use multiple CPUs (`--nproc`) if available:
If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you can obtain the results in few seconds by using the previously saved --bowtie2out file and specifying the input (--input_type bowtie2out):
If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you can obtain the results in few seconds by using the previously saved `--bowtie2out` file and specifying the input (`--input_type bowtie2out`):
You can also provide an externally BowTie2-mapped SAM if you specify this format with --input_type. Two steps here: first map your metagenome with BowTie2 and then feed MetaPhlAn2 with the obtained sam:
You can also provide an externally BowTie2-mapped SAM if you specify this format with `--input_type`. Two steps here: first map your metagenome with BowTie2 and then feed MetaPhlAn2 with the obtained sam:
MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not use the paired-end information), and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter):
db['taxonomy']['taxonomy of genome1']=lengthofgenome1
...
...
@@ -577,12 +547,12 @@ In addition, the table below shows the number of snps between the sample strains
In the next sections, we will illustrate step by step how to run MetaPhlAn\_Strainer on this toy example to reproduce the above figures.
### Pre-requisites
StrainPhlAn requires *python 2.7* and the libraries [pysam](http://pysam.readthedocs.org/en/latest/)(tested on **version 0.8.3**), [biopython](http://biopython.org/wiki/Main_Page), [msgpack](https://pypi.python.org/pypi/msgpack-python) and [numpy](http://www.numpy.org/), [dendropy](https://pythonhosted.org/DendroPy/)(tested on version **3.12.0**). Besides, StrainPhlAn also needs the following programs in the executable path:
StrainPhlAn requires *python 2.7* and the libraries [pysam](http://pysam.readthedocs.org/en/latest/)(tested on **version 0.8.3**), [biopython](http://biopython.org/wiki/Main_Page), [msgpack](https://pypi.python.org/pypi/msgpack-python), [pandas](https://pandas.pydata.org)(tested on **version 0.22**), [numpy](http://www.numpy.org/)(tested on **version 1.14.2**) and [scipy](https://www.scipy.org)(tested on **version 1.0.0**), [dendropy](https://pythonhosted.org/DendroPy/)(tested on version **3.12.0**). Besides, StrainPhlAn also needs the following programs in the executable path:
*[bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) for mapping reads against the marker database.
*[MUSCLE](http://www.drive5.com/muscle/) for the alignment step.
*[samtools, bcftools and vcfutils.pl](http://samtools.sourceforge.net/) which can be downloaded from [here](https://github.com/samtools) for building consensus markers. Note that vcfutils.pl is included in bcftools and **StrainPhlAn only works with samtools version 0.1.19** as samtools has changed the output format after this version.
*[blastn](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) for adding reference genomes to the phylogenetic tree.
*[blast+](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) for adding reference genomes to the phylogenetic tree (blastn and makeblastdb commands)
*[raxmlHPC and raxmlHPC-PTHREADS-SSE3](http://sco.h-its.org/exelixis/web/software/raxml/index.html) for building the phylogenetic trees.
All dependence binaries on Linux 64 bit can be downloaded in the folder "bin" from [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
This step will take around 2 minutes. After this step, you will find the tree "output/RAxML\_bestTree.s\_\_Bacteroides\_caccae.tree". All the output files can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).