.TH MMSEQS2 "1" "July 2019" "MMseqs2 (Many against Many sequence searching).
.SH NAME
MMseqs2 \- MMseqs2 (Many against Many sequence searching): fast, parallelized protein sequence searches and clustering of huge protein sequence data sets.
.SH SYNOPSIS
.B mmseqs
.I <module>
.I args
.SH DESCRIPTION
MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge proteins/nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux,
MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST.
At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.
.PP
The following depicts the different
.I <module>
that can be used.
.PP
Easy workflows (for non\-experts)
.PP
An example for running a command using easy-* modules would be
.B mmseqs
easy-search
.I <DB>
.I <targetDB>
\X'ps:'\c
.br
.TP
easy\-search
Search with a query fasta against target fasta (or database) and return a BLAST\-compatible result in a single step
.TP
easy\-linsearch
Linear time search with a query fasta against target fasta (or database) and return a BLAST\-compatible result in a single step
.TP
easy\-linclust
Compute clustering of a fasta/fastq database in linear time. The workflow outputs the representative sequences, a cluster tsv and a fasta\-like format containing all sequences.
.TP
easy\-cluster
Compute clustering of a fasta database. The workflow outputs the representative sequences, a cluster tsv and a fasta\-like format containing all sequences.
.TP
easy\-taxonomy
Compute taxonomy and lowest common ancestor for each sequence. The workflow outputs a taxonomic classification for sequences and a hierarchical summery report.
.PP
Main tools (for non\-experts)
.TP
createdb
Convert protein sequence set in a FASTA file to MMseqs sequence DB format
.TP
search
.br
Search with query sequence or profile DB (iteratively) through target sequence DB
.TP
linsearch
Search with query sequence DB through target sequence DB
.TP
map
.br
Fast ungapped mapping of query sequences to target sequences.
.TP
cluster
Compute clustering of a sequence DB (quadratic time)
.TP
linclust
Cluster sequences of >30% sequence identity *in linear time*
.TP
createindex
Precompute index table of sequence DB for faster searches
.TP
createlinindex
Precompute index for linsearch
.TP
enrich
.br
Enrich a query set by searching iteratively through a profile sequence set.
.TP
rbh
.br
Find reciprocal best hits between query and target
.TP
clusterupdate
Update clustering of old sequence DB to clustering of new sequence DB
.PP
Utility tools for format conversions
.TP
createtsv
Create tab\-separated flat file from prefilter DB, alignment DB, cluster DB, or taxa DB
.TP
convertalis
Convert alignment DB to BLAST\-tab format or specified custom\-column output format
.TP
convertprofiledb
Convert ffindex DB of HMM files to profile DB
.TP
convert2fasta
Convert sequence DB to FASTA format
.TP
result2flat
Create a FASTA\-like flat file from prefilter DB, alignment DB, or cluster DB
.TP
createseqfiledb
Create DB of unaligned FASTA files (1 per cluster) from sequence DB and cluster DB
.PP
Taxonomy tools
.TP
taxonomy
Compute taxonomy and lowest common ancestor for each sequence.
.TP
createtaxdb
Annotates a sequence database with NCBI taxonomy information
.TP
addtaxonomy
Add taxonomy information to result database.
.TP
lca
.br
Compute the lowest common ancestor from a set of taxa.
.TP
taxonomyreport
Create Kraken\-style taxonomy report.
.TP
filtertaxdb
Filter taxonomy database.
.PP
Multi\-hit search tools
.TP
multihitdb
Create sequence database and associated metadata for multi hit searches
.TP
multihitsearch
Search with a grouped set of sequences against another grouped set
.TP
besthitperset
For each set of sequences compute the best element and updates the p\-value
.TP
combinepvalperset
For each set compute the combined p\-value
.TP
summerizeresultsbyset
For each set compute summary statistics, such as spread\-pvalue etc.
.TP
resultsbyset
For each set compute the combined p\-value
.TP
mergeresultsbyset
Merge results from multiple orfs back to their respective contig
.PP
Utility tools for clustering
.TP
mergeclusters
Merge multiple cluster DBs into single cluster DB
.PP
Core tools (for advanced users)
.TP
prefilter
Search with query sequence / profile DB through target DB (k\-mer matching + ungapped alignment)
.TP
ungappedprefilter
Search with query sequence / profile DB through target DB and compute optimal ungapped alignment score