Skip to content
Commits on Source (10)
language: perl
sudo: false
- hmmer
- bedtools
- "5.26"
- "export PATH=$PWD/bin:$PATH"
- "sed -i~ -e 's/-name+/-name/' bin/barrnap"
- "barrnap --version"
- "barrnap --help"
- "barrnap --citation"
- "! barrnap --doesnotexist"
- "barrnap 2>&1 | grep 'ERROR: No input file'"
- "barrnap -q --kingdom bac examples/bacteria.fna"
- "barrnap -q --kingdom arc examples/bacteria.fna"
- "barrnap -q --kingdom mito examples/mitochondria.fna"
- "barrnap -q --kingdom euk examples/fungus.fna"
- "! barrnap examples/empty.fna"
- "! barrnap examples/null.fna"
- "barrnap -q examples/small.fna | grep 16S_rRNA"
- "barrnap -q < examples/small.fna | grep 16S_rRNA"
- "barrnap -q - < examples/small.fna | grep 16S_rRNA"
- "barrnap examples/nohits.fna 2>&1 | grep 'Found 0 '"
- "barrnap --threads 2 examples/small.fna"
- "barrnap -q --incseq examples/small.fna | grep '^>'"
- "barrnap -q --outseq hits.fa < examples/small.fna && head -n3 hits.fa"
[![Build Status](]( [![License: GPL v3](]( [](#lang-au)
# Barrnap
BAsic Rapid Ribosomal RNA Predictor
## Author
Torsten Seemann - - @torstenseemann
## Description
Barrnap predicts the location of ribosomal RNA genes in genomes.
It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S),
mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S).
metazoan mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S).
It takes FASTA DNA sequence as input, and write GFF3 as output.
It uses the new NHMMER tool that comes with HMMER 3.1 for HMM searching in RNA:DNA style.
NHMMER binaries for 64-bit Linux and Mac OS X are included and will be auto-detected.
It uses the new `nhmmer` tool that comes with HMMER 3.1 for HMM searching in RNA:DNA style.
Multithreading is supported and one can expect roughly linear speed-ups with more CPUs.
## Download
* Tarballs:
* Source:
## Install
% cd $HOME
% tar zxvf barrnap-0.X.tar.gz
% echo "PATH=$PATH:$HOME/barrnap-0.x/bin" >> .bashrc
(logout and log back in)
## Installation
### Requirements
* [Perl 5.xx]( (core modules only)
* [nhmmer]( (part of HMMER 3.x)
* [bedtools](
### Conda
Install [Conda]( or [Miniconda](
conda -c bioconda -c conda-forge install barrnap
barrnap --version
### Homebrew
Install [HomeBrew]( (Mac OS X) or [LinuxBrew]( (Linux).
brew install brewsci/bio/barrnap
barrnap --help
### Source
This will install the latest version direct from Github.
You'll need to add the `bin` directory to your PATH.
cd $HOME
tar zxvf barrnap-0.X.tar.gz
barrnap-0.X/barrnap -h
## Usage
% barrnap --quiet examples/small.fna
##gff-version 3
P.marinus barrnap:0.7 rRNA 353314 354793 0 + . Name=16S_rRNA;product=16S ribosomal RNA
P.marinus barrnap:0.7 rRNA 355464 358334 0 + . Name=23S_rRNA;product=23S ribosomal RNA
P.marinus barrnap:0.7 rRNA 358433 358536 7.5e-07 + . Name=5S_rRNA;product=5S ribosomal RNA
% barrnap -q -k mito examples/mitochondria.fna
##gff-version 3
AF346967.1 barrnap:0.7 rRNA 643 1610 . + . Name=12S_rRNA;product=12S ribosomal RNA
AF346967.1 barrnap:0.7 rRNA 1672 3228 . + . Name=16S_rRNA;product=16S ribosomal RNA
% barrnap --quiet examples/small.fna
##gff-version 3
P.marinus barrnap:0.8 rRNA 353314 354793 0 + . Name=16S_rRNA;product=16S ribosomal RNA
P.marinus barrnap:0.8 rRNA 355464 358334 0 + . Name=23S_rRNA;product=23S ribosomal RNA
P.marinus barrnap:0.8 rRNA 358433 358536 7.5e-07 + . Name=5S_rRNA;product=5S ribosomal RNA
% barrnap -q -k mito examples/mitochondria.fna
##gff-version 3
AF346967.1 barrnap:0.8 rRNA 643 1610 . + . Name=12S_rRNA;product=12S ribosomal RNA
AF346967.1 barrnap:0.8 rRNA 1672 3228 . + . Name=16S_rRNA;product=16S ribosomal RNA
% barrnap -o rrna.fa < contigs.fa > rrna.gff
% head -n 3 rrna.fa
## Options
### General
* `--help` show help and exit
* `--version` print version in form `barrnap X.Y` and exit
* `--citation` print a citation and exit
### Search
* `--kingdom` is the database to use: Bacteria:`bac`, Archaea:`arc`, Eukaryota:`euk`, Metazoan Mitochondria:`mito`
* `--threads` is how many CPUs to assign to `nhmmer` search
* `--evalue` is the cut-off for `nhmmer` reporting, before further scrutiny
* `--lencutoff` is the proportion of the full length that qualifies as `partial` match
* `--reject` will not include hits below this proportion of the expected length
### Output
* `--quiet` will not print any messages to `stderr`
* `--incseq` will include the full input sequences in the output GFF
* `--outseq` creates a FASTA file with the hit sequences
## Caveats
Barrnap does not do anything fancy. It has HMM models for each different rRNA gene.
They are built from full length seed alignments.
## Requirements
* Perl >= 5.6
* HMMER >= 3.1b
## License
Barrnap is free software, released under the GPL (version 3).
## Comparison with RNAmmer
Barrnap is designed to be a substitute for RNAmmer. It was motivated by
my desire to remove <A HREF="">Prokka's</A> dependency on RNAmmer
which is encumbered by a free-for-academic sign-up license, and by RNAmmer's
dependence on legacy HMMER 2.x which conflicts with HMMER 3.x that most people are using now.
Barrnap is designed to be a substitute for [RNAmmer](
It was motivated by my desire to remove [Prokka's](
dependency on RNAmmer which is encumbered by a free-for-academic sign-up
license, and by RNAmmer's dependence on legacy HMMER 2.x which conflicts
with HMMER 3.x that most people are using now.
RNAmmer is more sophisticated than Barrnap, and more accurate because it
uses HMMER 2.x in glocal alignment mode whereas NHMMER 3.x currently only
supports local alignment (Sean Eddy expected glocal to be supported in 2014,
but it still isn't available in 2018).
RNAmmer is more sophisticated than Barrnap, and more accurate because it uses HMMER 2.x in glocal alignment mode whereas NHMMER 3.x currently only supports local alignment (Sean Eddy expects glocal to be supported in 2014). In practice, Barrnap will find all the typical rRNA genes in a few seconds (in bacteria), but may get the end points out by a few bases and will probably miss wierd rRNAs. The HMM models it uses are derived from Rfam, Silva and RefSeq.
In practice, Barrnap will find all the typical rRNA genes in a few seconds
(in bacteria), but may get the end points out by a few bases and will
probably miss wierd rRNAs. The HMM models it uses are derived from Rfam,
Silva and RefSeq.
## Data sources for HMM models
Bacteria (70S)
5S RF00001
......@@ -90,12 +130,16 @@ Eukarya (80S)
18S RF01960
Metazoan Mito
12S RefSeq (MT-RNR1, s-rRNA, rns)
16S RefSeq (MT-RNR2, l-rRNA, rnl)
TODO: [Sajeet Haridas]
## Models I would like to add
Fungi [Sajeet Haridas]
LSU 35S ?
......@@ -106,16 +150,16 @@ Fungi
21S (multiple exons)
Apicoplast []
LSU ~2500bp 28S ?
SSU ~1500bp 16S ?
Plastid [Shaun Jackman]
Plant [Shaun Jackman]
Mito []
5S ~118 bp ? rrn5 (use RF00001 ?)
18S ~1935 bp ? rrn18 (use RF01960 ?)
26S ~2568 bp ? rrn26
## Where does the name come from?
......@@ -125,3 +169,13 @@ given the new backronym _BAsic Rapid Ribosomal RNA Predictor_.
The project was originally spawned at CodeFest 2013 in Berlin, Germany
by Torsten Seemann and Tim Booth.
## License
* Barrnap: [GPLv3](
* Rfam: [CC0](
* SILVA: [Free for academic use](
## Author
Torsten Seemann
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Piece;
use List::Util qw(max);
use FindBin;
use File::Temp;
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# global variables
my $VERSION = "0.9";
my $EXE = $FindBin::RealScript;
my $VERSION = "0.8";
my $DESC = "rapid ribosomal RNA prediction";
my $AUTHOR = 'Torsten Seemann <>';
my $AUTHOR = 'Torsten Seemann';
my $URL = '';
my $DBDIR = "$FindBin::RealBin/../db";
my $OPSYS = $^O;
......@@ -27,7 +27,7 @@ my $MAXLEN = int( 1.2 * max(values %LENG) );
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# command line options
my(@Options, $quiet, $kingdom, $threads, $evalue, $lencutoff, $reject, $incseq);
my(@Options, $quiet, $kingdom, $threads, $evalue, $lencutoff, $reject, $incseq, $outseq);
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
......@@ -40,10 +40,8 @@ msg("Detected operating system: $OPSYS");
msg("Adding $BINDIR to end of PATH");
my($NHMMER) = qx(which -a nhmmer 2> /dev/null);
$NHMMER or err("Could not find 'nhmmer' executable in PATH");
chomp $NHMMER;
msg("Using HMMER binary: $NHMMER");
msg("Checking for dependencies:");
require_exe('nhmmer', 'bedtools');
$threads > 0 or err("Invalid --threads $threads");
msg("Will use $threads threads");
......@@ -58,27 +56,43 @@ $reject > 0 or err("Invalid --reject cutoff $reject");
msg("Will reject genes < $reject of expected length.");
my $kdom = $KINGDOM{ lc substr($kingdom,0,1) } or
err("I don't recognise --kingdom '$kingdom'. Try: bac arc euk mito");
err("I don't recognise --kingdom '$kingdom'. Try:", values(%KINGDOM) );
my $hmmdb = "$DBDIR/$kdom.hmm";
err("Can't find database: $hmmdb") unless -r $hmmdb;
msg("Using database: $hmmdb");
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# run the external command
# check if user is piping to STDIN
# nhmmer needs to fseek() so we make a temp fasta file
my $fasta = shift @ARGV;
$fasta && -r $fasta or err("Usage: $EXE <file.fasta>");
my $tmpfh;
if (defined($fasta) && $fasta eq '-' or !defined($fasta) && !-t \*STDIN) {
$tmpfh = File::Temp->new(UNLINK=>1);
msg("Copying STDIN to a temporary file:", $tmpfh->filename);
while (<STDIN>) {
print $tmpfh $_;
$fasta = $tmpfh->filename;
$fasta && -r $fasta or err("No input file on command line or stdin");
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# run the external command
msg("Scanning $fasta for $kdom rRNA genes... please wait");
my $cmd = "$NHMMER --cpu $threads -E $evalue --w_length $MAXLEN -o /dev/null --tblout /dev/stdout \Q$hmmdb\E \Q$fasta\E";
my $opts = "--cpu $threads -E $evalue --w_length $MAXLEN -o /dev/null --tblout /dev/stdout";
my $cmd = "nhmmer $opts '$hmmdb' '$fasta'";
msg("Command: $cmd");
my @hits = qx($cmd 2>&1);
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# process the output
my %hitname;
my @feat;
my @bed;
foreach (@hits) {
......@@ -88,10 +102,14 @@ foreach (@hits) {
my @x = split ' ', $_;
err("bad line in nhmmer output - @x") unless defined $x[6] and $x[6] =~ m/^\d+$/;
# massage to GFF
my($begin,$end,$strand) = $x[6] < $x[7] ? ($x[6],$x[7],'+') : ($x[7],$x[6],'-');
my($seqid, $gene, $prod) = ($x[0], $x[2], $x[2]);
my $score = defined $x[12] ? $x[12] : '.';
# record hits for --outseq retrieval later
$hitname{"$seqid/$begin:$end($strand)"} = $prod;
# check if hit makes sense to us
exists $LENG{$gene} or err("Detected unknown gene '$gene' in scan, aborting.");
......@@ -112,6 +130,9 @@ foreach (@hits) {
$prod .= " (partial)";
# keep track of good hits for retrievel later
push @bed, [ $seqid, $begin-1, $end, $gene, 100, $strand ];
msg("Found:", $gene, $seqid, "L=$len/$LENG{$gene}", "$begin..$end", $strand, $prod);
my $tags = "Name=$gene;product=$prod";
$tags .= ";note=$note" if $note;
......@@ -140,38 +161,68 @@ if ($incseq) {
print while (<FASTA>); # `cat $fasta`
if ($outseq) {
msg("Writing hit sequences to: $outseq");
my $bed = File::Temp->new();
for my $b (@bed) {
print $bed join("\t", @$b),"\n";
$bed->seek(0, SEEK_END); # rewind
my $cmd = "bedtools getfasta -s -name+ -fo '$outseq' -fi '$fasta' -bed '".$bed->filename."'";
msg("Running: $cmd");
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# final cleanup
sub require_exe {
for my $exe (@_) {
my($which) = qx(which $exe 2> /dev/null);
$which or err("Can not find required '$exe' in PATH");
chomp $which;
msg("Found $exe - $which");
sub revcom {
my($s) = @_;
$s = reverse($s);
$s =~ tr/ATCGatcg/TAGCtagc/;
return $s;
sub msg {
return if $quiet;
my $t = localtime;
my $line = "[".$t->hms."] @_\n";
my $line = "[$EXE] @_\n";
print STDERR $line;
sub err {
msg("ERROR:", @_);
sub version {
sub show_citation {
print STDERR << "EOCITE";
If you use Barrnap in your work, please cite:
Seemann T (2013)
Seemann T
......@@ -193,19 +244,19 @@ sub setOptions {
{OPT=>"help", VAR=>\&usage, DESC=>"This help"},
{OPT=>"version", VAR=>\&version, DESC=>"Print version and exit"},
{OPT=>"citation",VAR=>\&show_citation, DESC=>"Print citation for referencing $EXE"},
{OPT=>"kingdom=s", VAR=>\$kingdom, DEFAULT=>'bac',
DESC=>"Kingdom: ".join(' ', values %KINGDOM) },
{OPT=>"kingdom=s", VAR=>\$kingdom, DEFAULT=>'bac', DESC=>"Kingdom: ".join(' ', values %KINGDOM) },
{OPT=>"quiet!", VAR=>\$quiet, DEFAULT=>0, DESC=>"No screen output"},
{OPT=>"threads=i", VAR=>\$threads, DEFAULT=>8, DESC=>"Number of threads/cores/CPUs to use"},
{OPT=>"threads=i", VAR=>\$threads, DEFAULT=>1, DESC=>"Number of threads/cores/CPUs to use"},
{OPT=>"lencutoff=f",VAR=>\$lencutoff, DEFAULT=>0.8, DESC=>"Proportional length threshold to label as partial"},
{OPT=>"reject=f",VAR=>\$reject, DEFAULT=>0.5, DESC=>"Proportional length threshold to reject prediction"},
{OPT=>"reject=f",VAR=>\$reject, DEFAULT=>0.25, DESC=>"Proportional length threshold to reject prediction"},
{OPT=>"evalue=f",VAR=>\$evalue, DEFAULT=>1E-6, DESC=>"Similarity e-value cut-off"},
{OPT=>"incseq!", VAR=>\$incseq, DEFAULT=>0, DESC=>"Include FASTA input sequences in GFF3 output"},
{OPT=>"incseq!", VAR=>\$incseq, DEFAULT=>0, DESC=>"Include FASTA _input_ sequences in GFF3 output"},
{OPT=>"outseq=s", VAR=>\$outseq, DEFAULT=>'', DESC=>"Save rRNA hit seqs to this FASTA file"},
(!@ARGV) && (usage());
# (!@ARGV) && (usage(1));
&GetOptions(map {$_->{OPT}, $_->{VAR}} grep { ref } @Options) || usage();
&GetOptions(map {$_->{OPT}, $_->{VAR}} grep { ref } @Options) || usage(1);
# Now setup default values.
foreach (@Options) {
......@@ -218,9 +269,17 @@ sub setOptions {
sub usage {
print STDERR "Synopsis:\n $EXE $VERSION - $DESC\n";
print STDERR "Author:\n $AUTHOR\n";
print STDERR "Usage:\n $EXE [options] <chromosomes.fasta>\n";
my($exitcode) = @_;
$exitcode = 0 if $exitcode eq 'help'; # what gets passed by getopt func ref
$exitcode ||= 0;
select STDERR if $exitcode; # write to STDERR if exitcode is error
print "Synopsis:\n $EXE $VERSION - $DESC\n";
print "Author:\n $AUTHOR\n";
print "Usage:\n";
print " $EXE [options] chr.fa\n";
print " $EXE [options] < chr.fa\n";
print " $EXE [options] - < chr.fa\n";
foreach (@Options) {
if (ref) {
my $def = defined($_->{DEFAULT}) ? " (default '$_->{DEFAULT}')" : "";
......@@ -230,13 +289,13 @@ sub usage {
$opt =~ s/=s$/ [X]/;
$opt =~ s/=i$/ [N]/;
$opt =~ s/=f$/ [n.n]/;
printf STDERR " --%-15s %s%s\n", $opt, $_->{DESC}, $def;
printf " --%-15s %s%s\n", $opt, $_->{DESC}, $def;
else {
print STDERR "$_\n";
print "$_\n";
......@@ -68,9 +68,9 @@ fi
for K in arc bac euk mito ; do
for T in 5S 5_8S 12S 16S 23S 28S ; do
for T in 5S 5_8S 12S 16S 23S 18S 28S ; do
if [ -r "$ID.aln" ]; then
if [ -s "$ID.aln" ]; then
echo "*** $ID ***"
hmmbuild --cpu $CPUS --rna -n "${T}_rRNA" $T.$K.hmm $T.$K.aln
This diff is collapsed.
barrnap-data-nonfree (0.8-1) UNRELEASED; urgency=medium
barrnap-data-nonfree (0.9-1) UNRELEASED; urgency=medium
* Team upload.
* Fix Vcs URLs
* debhelper 10
* Priority: optional
* cme fix dpkg-control
* Moved packaging from SVN to Git
-- Andreas Tille <> Tue, 03 Jan 2017 09:33:20 +0100
barrnap-data-nonfree (0.5-2) unstable; urgency=low
* add license agreement dialog, update d/copyright
-- Sascha Steinbiss <> Sun, 22 Mar 2015 15:34:59 +0000
barrnap-data-nonfree (0.5-1) unstable; urgency=low
* Inject latest upstream version.
FIXME: use get-orig-source properly since my latest commits used
routine-update which only uses uscan of possible which is
not suffisicent here
* Initial release (Closes: #776710)
-- Sascha Steinbiss <> Sat, 31 Jan 2015 10:12:42 +0000
-- Andreas Tille <> Mon, 08 Apr 2019 09:37:32 +0200
......@@ -4,11 +4,12 @@ Uploaders: Sascha Steinbiss <>
Section: non-free/science
XS-Autobuild: no
Priority: optional
Build-Depends: debhelper (>= 10),
Standards-Version: 3.9.8
Build-Depends: debhelper (>= 12~),
Standards-Version: 4.3.0
Package: barrnap-data-nonfree
Upstream-Name: barrnap
......@@ -2,33 +2,35 @@
# -*- makefile -*-
DEBVERS := $(shell dpkg-parsechangelog | sed -n -e 's/^Version: //p')
VERSION := $(shell echo '$(DEBVERS)' | sed -e 's/^[[:digit:]]*://' -e 's/[~-].*//')
include /usr/share/dpkg/
dh $@
# First uscan is used and afterwards the resulting tarball needs to be changed
# Thus this is no superfluous get-orig-source target
uscan --verbose --force-download --repack --compression xz --destdir=..
unxz ../barrnap-data-nonfree_$(VERSION).orig.tar.xz
unxz ../barrnap-data-nonfree_$(DEB_VERSION_UPSTREAM).orig.tar.xz
mkdir -p debian/repack-tmp
tar xf ../barrnap-data-nonfree_$(VERSION).orig.tar -C debian/repack-tmp
rm ../barrnap-data-nonfree_$(VERSION).orig.tar
mkdir -p debian/repack-tmp/barrnap-data-nonfree-$(VERSION)/db/nonfree
tar xf ../barrnap-data-nonfree_$(DEB_VERSION_UPSTREAM).orig.tar -C debian/repack-tmp
rm ../barrnap-data-nonfree_$(DEB_VERSION_UPSTREAM).orig.tar
mkdir -p debian/repack-tmp/barrnap-data-nonfree-$(DEB_VERSION_UPSTREAM)/db/nonfree
debian/filter_hmms.lua 28S t < \
debian/repack-tmp/barrnap-$(VERSION)/db/euk.hmm > \
debian/repack-tmp/barrnap-$(DEB_VERSION_UPSTREAM)/db/euk.hmm > \
debian/filter_hmms.lua 23S t < \
debian/repack-tmp/barrnap-$(VERSION)/db/arc.hmm > \
debian/repack-tmp/barrnap-$(DEB_VERSION_UPSTREAM)/db/arc.hmm > \
debian/filter_hmms.lua 23S t < \
debian/repack-tmp/barrnap-$(VERSION)/db/bac.hmm > \
mv debian/repack-tmp/barrnap-$(VERSION)/LICENSE.SILVA \
rm -rf debian/repack-tmp/barrnap-$(VERSION)
tar cf ../barrnap-data-nonfree_$(VERSION).orig.tar -C debian/repack-tmp \
xz ../barrnap-data-nonfree_$(VERSION).orig.tar
debian/repack-tmp/barrnap-$(DEB_VERSION_UPSTREAM)/db/bac.hmm > \
mv debian/repack-tmp/barrnap-$(DEB_VERSION_UPSTREAM)/LICENSE.SILVA \
rm -rf debian/repack-tmp/barrnap-$(DEB_VERSION_UPSTREAM)
tar cf ../barrnap-data-nonfree_$(DEB_VERSION_UPSTREAM).orig.tar -C debian/repack-tmp \
xz ../barrnap-data-nonfree_$(DEB_VERSION_UPSTREAM).orig.tar
rm -rf debian/repack-tmp