Skip to content
Commits on Source (4)
......@@ -17,7 +17,6 @@ nytprof.out
pm_to_blib
bug/
db/cm/*.i1*
db/kingdom/*/sprot.p*
db/kingdom/*/*.p??
db/hmm/*.h3?
db/genus/*.p*
language: perl
sudo: false
perl:
- "5.26"
......
[![Build Status](https://travis-ci.org/tseemann/prokka.svg?branch=master)](https://travis-ci.org/tseemann/prokka) [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) [](#lang-au) [![DOI:10.1093/bioinformatics/btu153](https://zenodo.org/badge/DOI/10.1093/bioinformatics/btu153.svg)](https://doi.org/10.1093/bioinformatics/btu153) ![Don't judge me](https://img.shields.io/badge/Language-Perl_5-steelblue.svg)
[![Build Status](https://travis-ci.org/tseemann/prokka.svg?branch=master)](https://travis-ci.org/tseemann/prokka)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![DOI:10.1093/bioinformatics/btu153](https://zenodo.org/badge/DOI/10.1093/bioinformatics/btu153.svg)](https://doi.org/10.1093/bioinformatics/btu153)
![Don't judge me](https://img.shields.io/badge/Language-Perl_5-steelblue.svg)
# Prokka: rapid prokaryotic genome annotation
......@@ -50,7 +53,7 @@ $HOME/prokka/bin/prokka --setupdb
## Test
* Type `prokka` and it should output it's help screen.
* Type `prokka` and it should output its help screen.
* Type `prokka --version` and you should see an output like `prokka 1.x`
* Type `prokka --listdb` and it will show you what databases it has installed to use.
......@@ -133,9 +136,11 @@ $HOME/prokka/bin/prokka --setupdb
-g linear -c PROK -n 11 -f PRJEB12345/EHEC-Chr1.embl \
"Escherichia coli" 562 PRJEB12345 "Escherichia coli strain EHEC" PRJEB12345/EHEC-Chr1.gff
# Download and run the EMBL validator prior to submitting the EMBL flat file
% curl -L -O ftp://ftp.ebi.ac.uk/pub/databases/ena/lib/embl-client.jar
% java -jar embl-client.jar -r PRJEB12345/EHEC-Chr1.embl
# Download and run the latest EMBL validator prior to submitting the EMBL flat file
# from http://central.maven.org/maven2/uk/ac/ebi/ena/sequence/embl-api-validator/
# which at the time of writing is v1.1.129
% curl -L -O http://central.maven.org/maven2/uk/ac/ebi/ena/sequence/embl-api-validator/1.1.129/embl-api-validator-1.1.129.jar
% java -jar embl-api-validator-1.1.129.jar -r PRJEB12345/EHEC-Chr1.embl
# Compress the file ready to upload to ENA, and calculate MD5 checksum
% gzip PRJEB12345/EHEC-Chr1.embl
......@@ -178,7 +183,6 @@ $HOME/prokka/bin/prokka --setupdb
General:
--help This help
--version Print version and exit
--docs Show full manual/documentation
--citation Print citation for referencing Prokka
--quiet No screen output (default OFF)
--debug Debug mode: keep all temporary files (default OFF)
......@@ -205,6 +209,7 @@ $HOME/prokka/bin/prokka --setupdb
Annotations:
--kingdom [X] Annotation mode: Archaea|Bacteria|Mitochondria|Viruses (default 'Bacteria')
--gcode [N] Genetic code / Translation table (set if --kingdom is set) (default '0')
--prodigaltf [X] Prodigal training file (default '')
--gram [X] Gram: -/neg +/pos (default '')
--usegenus Use genus-specific BLAST databases (needs --genus) (default OFF)
--proteins [X] Fasta file of trusted proteins to first annotate from (default '')
......@@ -235,6 +240,13 @@ use of Genbank is recommended over FASTA, because it will provide `/gene`
and `/EC_number` annotations that a typical `.faa` file will not provide, unless
you have specially formatted it for Prokka.
### Option: --prodigaltf
Instead of letting `prodigal` train its gene model on the contigs you
provide, you can pre-train it on some good closed reference genomes first
using the `prodigal -t` option. Once you've done that, provide `prokka`
the training file using the `--prodgialtf` option.
### Option: --rawproduct
Prokka annotates proteins by using sequence similarity to other proteins in its database,
......@@ -262,11 +274,20 @@ BLAST+. This combination of small database and fast search typically
completes about 70% of the workload. Then a series of slower but more
sensitive HMM databases are searched using HMMER3.
The initial core databases are derived from UniProtKB; there is one per
"kingdom" supported. To qualify for inclusion, a protein must be (1) from
Bacteria (or Archaea or Viruses); (2) not be "Fragment" entries; and (3)
have an evidence level ("PE") of 2 or lower, which corresponds to
experimental mRNA or proteomics evidence.
The three core databases, applied in order, are:
1. [ISfinder](https://isfinder.biotoul.fr/):
Only the tranposase (protein) sequences; the whole transposon is not annotated.
2. [NCBI Bacterial Antimicrobial Resistance Reference Gene Database](https://www.ncbi.nlm.nih.gov/bioproject/313047):
Antimicrobial resistance genes curated by NCBI.
3. [UniProtKB (SwissProt)](https://www.uniprot.org/uniprot/?query=reviewed:yes):
For each `--kingdom` we include curated proteins with evidence that
(i) from Bacteria (or Archaea or Viruses);
(ii) not be "Fragment" entries;
and (iii) have an evidence level ("PE") of 2 or lower, which
corresponds to experimental mRNA or proteomics evidence.
#### Making a Core Databases
......@@ -278,6 +299,8 @@ has been detected properly.
#### The Genus Databases
:warning: This is no longer recommended. Please use `--proteins` instead.
If you enable `--usegenus` and also provide a Genus via `--genus` then it
will first use a BLAST database which is Genus specific. Prokka comes with
a set of databases for the most common Bacterial genera; type prokka
......@@ -366,7 +389,7 @@ There is no clear reason for this. The only way to restore normal behaviour
is to edit the prokka script and change `parallel` to `parallel --gnu`.
* __Why does prokka fail when it gets to hmmscan?__
Unfortunately HMMER keeps changing it's database format, and they aren't
Unfortunately HMMER keeps changing its database format, and they aren't
upward compatible. If you upgraded HMMER (from 3.0 to 3.1 say) then you
need to "re-press" the files. This can be done as follows:
```
......@@ -388,6 +411,11 @@ compliant. It does not like the ACCESSION and VERSION strings that Prokka
produces via the "tbl2asn" tool. The following Unix command will fix them:
`egrep -v '^(ACCESSION|VERSION)' prokka.gbk > mauve.gbk`
* __How can I make my GFF not have the contig sequences in it?__
```
sed '/^##FASTA/Q' prokka.gff > nosequence.gff
```
## Bugs
Submit problems or requests to the [Issue Tracker](https://github.com/tseemann/prokka/issues).
......
......@@ -23,11 +23,12 @@ use warnings;
use FindBin;
use Cwd qw(abs_path);
use File::Copy;
use File::Basename;
use Time::Piece;
use Time::Seconds;
use XML::Simple;
use Digest::MD5;
use List::Util qw(min max sum);
use List::Util qw(min max sum uniq);
use Scalar::Util qw(openhandle);
use Data::Dumper;
use Bio::Root::Version;
......@@ -45,7 +46,7 @@ my @CMDLINE = ($0, @ARGV);
my $OPSYS = $^O;
my $BINDIR = "$FindBin::RealBin/../binaries/$OPSYS";
my $EXE = $FindBin::RealScript;
my $VERSION = "1.14.0";
my $VERSION = "1.14.5";
my $AUTHOR = 'Torsten Seemann <torsten.seemann@gmail.com>';
my $URL = 'https://github.com/tseemann/prokka';
my $PROKKA_PMID = '24642063';
......@@ -104,7 +105,7 @@ my %tools = (
NEEDED => 0,
},
'barrnap' => {
GETVER => "barrnap --version 2>&1",
GETVER => "LC_ALL=C barrnap --version 2>&1",
REGEXP => qr/($BIDEC)/,
MINVER => "0.4",
NEEDED => 0,
......@@ -113,12 +114,11 @@ my %tools = (
GETVER => "prodigal -v 2>&1 | grep -i '^Prodigal V'",
REGEXP => qr/($BIDEC)/,
MINVER => "2.6",
MAXVER => "2.69", # changed cmdline options in 2.70 git :-/
NEEDED => 1,
},
'signalp' => {
# this is so long-winded as -v changed meaning (3.0=version, 4.0=verbose !?)
GETVER => "signalp -v < /dev/null 2>&1 | egrep ',|# SignalP' | sed 's/^# SignalP-//'",
GETVER => "if [ \"`signalp -version 2>&1 | grep -Eo '[0-9]+\.[0-9]+'`\" != \"\" ]; then echo `signalp -version 2>&1 | grep -Eo '[0-9]+\.[0-9]+'`; else signalp -v < /dev/null 2>&1 | egrep ',|# SignalP' | sed 's/^# SignalP-//'; fi",
REGEXP => qr/^($BIDEC)/,
MINVER => "3.0",
NEEDED => 0, # only if --gram used
......@@ -172,7 +172,6 @@ my %tools = (
NEEDED => 1,
},
# now just the standard unix tools we need
'less' => { NEEDED=>1 },
'grep' => { NEEDED=>1 }, # yes, we need this before we can test versions :-/
'egrep' => { NEEDED=>1 },
'sed' => { NEEDED=>1 },
......@@ -185,6 +184,12 @@ my %tools = (
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# functions to check if tool is installed and correct version
sub ver2str {
my($bidec) = @_;
return $bidec if $bidec !~ m/\./;
return join '', map { sprintf "%03d",$_ } (split m/\./, $bidec);
}
sub check_tool {
my($toolname) = @_;
my $t = $tools{$toolname};
......@@ -196,19 +201,17 @@ sub check_tool {
if ($t->{GETVER}) {
my($s) = qx($t->{GETVER});
if (defined $s) {
$s =~ $t->{REGEXP};
$t->{VERSION} = $1 if defined $1;
msg("Determined $toolname version is $t->{VERSION}");
if (defined $t->{MINVER} and $t->{VERSION} < $t->{MINVER}) {
chomp $s;
$s =~ $t->{REGEXP} or err("Coult not parse version from '$s'");;
$t->{VERSION} = ver2str($1);
msg("Determined $toolname version is $t->{VERSION} from '$s'");
if (defined $t->{MINVER} and $t->{VERSION} lt ver2str($t->{MINVER}) ) {
err("Prokka needs $toolname $t->{MINVER} or higher. Please upgrade and try again.");
}
if (defined $t->{MAXVER} and $t->{VERSION} > $t->{MAXVER}) {
err("Prokka needs a version of $toolname between $t->{MINVER} and $t->{MAXVER}. Please downgrade and try again.");
}
}
else {
err("Could not determine version of $toolname - please install version",
$t->{MINVER}, "or higher"); # FIXME: or less <= MAXVER if given
$t->{MINVER}, "or higher");
}
}
}
......@@ -227,7 +230,7 @@ sub check_all_tools {
my(@Options, $quiet, $debug, $kingdom, $fast, $force, $outdir, $prefix, $cpus, $dbdir,
$addgenes, $addmrna, $cds_rna_olap,
$gcode, $gram, $gffver, $locustag, $increment, $mincontiglen, $evalue, $coverage,
$genus, $species, $strain, $plasmid,
$genus, $species, $strain, $plasmid, $prodigaltf,
$usegenus, $proteins, $hmms, $centre, $scaffolds,
$rfam, $norrna, $notrna, $rnammer, $rawproduct, $noanno, $accver,
$metagenome, $compliant, $listdb, $citation);
......@@ -621,29 +624,31 @@ if ($rfam) {
my $num_ncrna = 0;
my $tool = "Infernal:".$tools{'cmscan'}->{VERSION};
my $icpu = $cpus || 1;
my $cmd = "cmscan --rfam --cpu $icpu -E $evalue --tblout /dev/stdout -o /dev/null --noali $cmdb \Q$outdir/$prefix.fna\E";
my $dbsize = $total_bp * 2 / 1000000;
my $cmd = "cmscan -Z $dbsize --cut_ga --rfam --nohmmonly --fmt 2 --cpu $icpu --tblout /dev/stdout -o /dev/null --noali $cmdb \Q$outdir/$prefix.fna\E";
msg("Running: $cmd");
open INFERNAL, '-|', $cmd;
while (<INFERNAL>) {
next if m/^#/; # ignore comments
my @x = split ' '; # magic Perl whitespace splitter
# msg("DEBUG: ", join("~~~", @x) );
next unless @x > 9; # avoid incorrect lines
next unless defined $x[1] and $x[1] =~ m/^RF\d/;
my $sid = $x[2];
next unless defined $x[2] and $x[2] =~ m/^RF\d/;
my $sid = $x[3];
next unless exists $seq{$sid};
next if defined $x[19] and $x[19] =~ m/^=$/; # Overlaps with a higher scoring match
push @{$seq{$sid}{FEATURE}}, Bio::SeqFeature::Generic->new(
-primary => 'misc_RNA',
-seq_id => $sid,
-source => $tool,
-start => min($x[7], $x[8]),
-end => max($x[7], $x[8]),
-strand => ($x[9] eq '-' ? -1 : +1),
-score => undef, # possibly x[16] but had problems here with '!'
-start => min($x[9], $x[10]),
-end => max($x[9], $x[10]),
-strand => ($x[11] eq '-' ? -1 : +1),
-score => $x[16],
-frame => 0,
-tag => {
'product' => $x[0],
'product' => $x[1],
'inference' => "COORDINATES:profile:$tool",
'accession' => $x[2],
'Note' => '"' . join(' ', @x[26..$#x]) . '"',
}
);
$num_ncrna++;
......@@ -710,6 +715,10 @@ my $prodigal_mode = ($totalbp >= 100000 && !$metagenome) ? 'single' : 'meta';
msg("Contigs total $totalbp bp, so using $prodigal_mode mode");
my $num_cds=0;
my $cmd = "prodigal -i \Q$outdir/$prefix.fna\E -c -m -g $gcode -p $prodigal_mode -f sco -q";
if ($prodigaltf and -r $prodigaltf) {
msg("Gene finding will be aided by Prodigal training file: $prodigaltf");
$cmd .= " -t '$prodigaltf'";
}
msg("Running: $cmd");
open my $PRODIGAL, '-|', $cmd;
my $sid;
......@@ -774,14 +783,15 @@ for my $sid (@seq) {
# Find signal peptide leader sequences
if ($tools{signalp}->{HAVE}) {
my $sigpver = substr $tools{signalp}{VERSION}, 0, 1; # first char, expect 3 or 4
my $sigpver = substr $tools{signalp}{VERSION}, 0, 1; # first char, expect 3, 4 or 5
if ($kingdom eq 'Bacteria' and $sigpver==3 || $sigpver==4) {
if ($kingdom eq 'Bacteria' and $sigpver==3 || $sigpver==4 || $sigpver==5) {
if ($gram) {
$gram = $gram =~ m/\+|[posl]/i ? 'gram+' : 'gram-';
msg("Looking for signal peptides at start of predicted proteins");
msg("Treating $kingdom as $gram");
my $spoutfn = "$outdir/signalp.faa";
my $sp5outfn = "$outdir/signalp_summary.signalp5";
open my $spoutfh, '>', $spoutfn;
my $spout = Bio::SeqIO->new(-fh=>$spoutfh, -format=>'fasta');
my %cds;
......@@ -800,12 +810,17 @@ if ($tools{signalp}->{HAVE}) {
msg("Skipping signalp because it can not handle >$SIGNALP_MAXSEQ sequences.");
}
else {
my $opts = $sigpver==3 ? '-m hmm' : '';
my $cmd = "signalp -t $gram -f short $opts \Q$spoutfn\E 2> /dev/null";
my $opts = $sigpver==3 ? "signalp -t $gram -f short -m hmm" : ($sigpver==4 ? "signalp -t $gram -f short" : '$(which signalp)'." -tmp $outdir -prefix $outdir/signalp -org $gram -format short -fasta");
my $cmd = "$opts \Q$spoutfn\E 2> /dev/null";
msg("Running: $cmd");
my $tool = "SignalP:".$tools{signalp}->{VERSION};
my $num_sigpep = 0;
if ($sigpver == 3 or $sigpver == 4) {
open SIGNALP, '-|', $cmd;
} else {
qx($cmd);
open SIGNALP, '<', $sp5outfn;
}
while (<SIGNALP>) {
my @x = split m/\s+/;
if ($sigpver == 3) {
......@@ -834,8 +849,7 @@ if ($tools{signalp}->{HAVE}) {
);
push @{$seq{$parent->seq_id}{FEATURE}}, $sigpep;
$num_sigpep++;
}
else {
} elsif ($sigpver == 4) {
# msg("sigp$sigpver: @x");
next unless @x==12 and $x[9] eq 'Y'; # has sig_pep
my $parent = $cds{ $x[0] };
......@@ -861,11 +875,45 @@ if ($tools{signalp}->{HAVE}) {
);
push @{$seq{$parent->seq_id}{FEATURE}}, $sigpep;
$num_sigpep++;
} else {
# msg("sigp$sigpver: @x");
next unless @x==12 and $x[1] =~ m/^SP|TAT|LIPO/; # has sig_pep
my $parent = $cds{ $x[0] };
my $tpprob;
if ($x[1] =~ m/^SP/) { $tpprob = $x[2] }
elsif ($x[1] =~ m/^TAT/) { $tpprob = $x[3] }
elsif ($x[1] =~ m/^LIPO/) { $tpprob = $x[4] }
my $type = "$x[1] (Probability: $tpprob)";
my ($cleave1, $cleave2) = ($1, $2) if $x[8] =~ m/(\d+)-(\d+)\./;
my $cleaveseq = $1 if $x[9] =~ m/(\w+-\w+)\./;
my $clprob = $x[11];
my $start = $parent->strand > 0 ? $parent->start : $parent->end;
# need to convert to DNA coordinates
my $end = $start + $parent->strand * ($cleave1*3 - 1);
my $sigpep = Bio::SeqFeature::Generic->new(
-seq_id => $parent->seq_id,
-source_tag => $tool,
-primary => 'sig_peptide',
-start => min($start, $end),
-end => max($start, $end),
-strand => $parent->strand,
-frame => 0, # PHASE: compulsory for peptides, can't be '.'
-tag => {
# 'ID' => $ID,
# 'Parent' => $x[0], # don't have proper IDs yet....
'product' => "putative signal peptide",
'inference' => "ab initio prediction:$tool",
'note' => "$type, predicted cleavage between residues $cleave1 and $cleave2 ($cleaveseq) with probability $clprob",
}
);
push @{$seq{$parent->seq_id}{FEATURE}}, $sigpep;
$num_sigpep++;
}
}
msg("Found $num_sigpep signal peptides");
}
delfile($spoutfn);
delfile($sp5outfn) if $sigpver == 5;
}
else {
msg("Option --gram not specified, will NOT check for signal peptides.");
......@@ -1017,8 +1065,7 @@ else {
}
# create a unqiue output name so we can save them in --debug mode
my $outname = $db->{DB};
$outname =~ s{^.*/}{};
my $outname = "$prefix.".basename($db->{DB}).".tmp.$$";
# we write out all the CDS which haven't been annotated yet and then search them
my $faa_name = "$outdir/$outname.faa";
......@@ -1263,7 +1310,7 @@ for my $sid (@seq) {
$fsa_fh->write_seq($ctg);
$ctg->desc(undef);
print $tbl_fh ">Feature $sid\n";
for my $f ( sort { $a->start <=> $b->start } @{ $seq{$sid}{FEATURE} }) {
for my $f ( sort { $a->start <=> $b->start || $b->end <=> $a->end || $a->has_tag('Parent') <=> $b->has_tag('Parent') } @{ $seq{$sid}{FEATURE} }) {
if ($f->primary_tag eq 'CDS' and not $f->has_tag('product')) {
$f->add_tag_value('product', $HYPO);
}
......@@ -1527,13 +1574,6 @@ sub version {
#----------------------------------------------------------------------
sub showdoc {
system("less $FindBin::Bin/../doc/$EXE-manual.txt");
exit;
}
#----------------------------------------------------------------------
sub show_citation {
print STDERR << "EOCITE";
......@@ -1567,7 +1607,7 @@ sub add_bundle_to_path {
#----------------------------------------------------------------------
sub kingdoms {
return map { m{kingdom/(\w+?)/}; $1 } glob("$dbdir/kingdom/*/*.pin");
return uniq map { m{kingdom/(\w+?)/}; $1 } glob("$dbdir/kingdom/*/*.pin");
}
sub genera {
......@@ -1622,7 +1662,7 @@ sub setup_db {
}
check_tool('cmpress');
for my $cm (<$dbdir/cm/{Viruses,Bacteria}>) {
for my $cm (<$dbdir/cm/{Viruses,Bacteria,Archaea}>) {
msg("Pressing CM database: $cm");
runcmd("cmpress \Q$cm\E");
}
......@@ -1691,7 +1731,6 @@ sub setOptions {
'General:',
{OPT=>"help", VAR=>\&usage, DESC=>"This help"},
{OPT=>"version", VAR=>\&version, DESC=>"Print version and exit"},
{OPT=>"docs", VAR=>\&showdoc, DESC=>"Show full manual/documentation"},
{OPT=>"citation",VAR=>\&show_citation, DESC=>"Print citation for referencing Prokka"},
{OPT=>"quiet!", VAR=>\$quiet, DEFAULT=>0, DESC=>"No screen output"},
{OPT=>"debug!", VAR=>\$debug, DEFAULT=>0, DESC=>"Debug mode: keep all temporary files"},
......@@ -1722,6 +1761,7 @@ sub setOptions {
'Annotations:',
{OPT=>"kingdom=s", VAR=>\$kingdom, DEFAULT=>'Bacteria', DESC=>"Annotation mode: ".join('|', kingdoms()) },
{OPT=>"gcode=i", VAR=>\$gcode, DEFAULT=>0, DESC=>"Genetic code / Translation table (set if --kingdom is set)"},
{OPT=>"prodigaltf=s", VAR=>\$prodigaltf, DEFAULT=>'', DESC=>"Prodigal training file" },
{OPT=>"gram=s", VAR=>\$gram, DEFAULT=>'', DESC=>"Gram: -/neg +/pos"},
{OPT=>"usegenus!", VAR=>\$usegenus, DEFAULT=>0, DESC=>"Use genus-specific BLAST databases (needs --genus)"},
{OPT=>"proteins=s", VAR=>\$proteins, DEFAULT=>'', DESC=>"FASTA or GBK file to use as 1st priority"},
......
......@@ -24,12 +24,20 @@ my $out = Bio::SeqIO->new(-fh=>\*STDOUT, -format=>'fasta');
my %seen;
while (my $seq = $in->next_seq) {
my(undef,$gene,$locustag) = split m"~~~", $seq->id;
$gene = '' if $gene eq $locustag;
my(undef,$gene,$acc,$abx) = split m"~~~", $seq->id;
$gene = '' if $gene eq $acc;
my $prot = $seq->translate;
die Dumper($prot) if $prot->seq =~ m/\*./; # check for stop codon in middle
die Dumper($prot) if $seen{$prot->seq}++; # check for dupes
$prot->id($locustag);
my $aa = $prot->seq;
die Dumper($prot) if $aa =~ m/\*./; # check for stop codon in middle
die Dumper($prot) if $seen{$aa}++; # check for dupes
substr($aa,0,1) = "M"; # force Met start
chop($aa) if $aa =~ m/\*$/; # remove trailing stop codon
$prot->seq($aa);
$prot->id($acc);
# 1. no /EC_number
# 2. /gene
# 3. /product
# 4. COG
$prot->desc( join('~~~', '', $gene, $prot->desc, '') );
$out->write_seq($prot);
}
......
No preview for this file type
The .cm files in this folder were generated by extracting only those RFAM entries that
had members from the Bacteria and Viruses divisions (based on their taxonomy ID in the .gff3 file)
The .cm files in this folder were generated by extracting only those Rfam entries that
had members from the Bacteria, Viruses, and Archaea divisions (based on their taxonomy
description in the public Rfam MySQL database).
Archaea had no entries, so we just use Bacteria.
For more details, see the __build/ directory and https://github.com/tseemann/prokka/issues/243.
- 523 #=GF TP Gene; miRNA; # microRNA - euk only?
- 421 #=GF TP Gene; snRNA; snoRNA; CD-box; # Small nucleolar RNAs
+ 252 #=GF TP Gene; sRNA; #
- 225 #=GF TP Gene; snRNA; snoRNA; HACA-box; Small nucleolar RNAs
- 225 #=GF TP Gene; lncRNA; # Long non-coding RNAs > 200 bp
+ 218 #=GF TP Cis-reg;
+ 87 #=GF TP Gene;
+ 65 #=GF TP Gene; CRISPR;
+ 28 #=GF TP Cis-reg; frameshift_element;
+ 27 #=GF TP Cis-reg; IRES;
+ 26 #=GF TP Cis-reg; riboswitch;
+ 23 #=GF TP Gene; antisense;
- 18 #=GF TP Gene; snRNA; snoRNA; scaRNA;
+ 15 #=GF TP Gene; ribozyme;
- 11 #=GF TP Gene; snRNA; splicing;
+ 11 #=GF TP Cis-reg; leader;
+ 10 #=GF TP Intron;
+ 7 #=GF TP Cis-reg; thermoregulator;
- 6 #=GF TP Gene; rRNA; # rnammer
+ 5 #=GF TP Gene; antitoxin;
- 3 #=GF TP Gene; snRNA;
- 2 #=GF TP Gene; tRNA; # aragorn
$ for file in __build/Rfam_*_14.1.txt; do tail -n +2 $file; done | cut -f 2 | sort | uniq -c | sort -rn
723 Gene; sRNA;
279 Cis-reg;
68 Gene; CRISPR;
62 Gene; antisense;
60 Gene; snRNA; snoRNA; CD-box;
57 Gene;
48 Cis-reg; riboswitch;
39 Gene; miRNA;
34 Cis-reg; leader;
33 Cis-reg; thermoregulator;
26 Cis-reg; frameshift_element;
21 Intron;
21 Gene; ribozyme;
18 Gene; snRNA; snoRNA; HACA-box;
12 Gene; antitoxin;
11 Cis-reg; IRES;
2 Gene; snRNA;
1 Gene; snRNA; snoRNA; HACA-box
No preview for this file type
RF00010 Gene; ribozyme; Bacterial RNase P class A
RF00017 Gene; Metazoan signal recognition particle RNA
RF00028 Intron; Group I catalytic intron
RF00029 Intron; Group II catalytic intron
RF00030 Gene; ribozyme; RNase MRP
RF00032 Cis-reg; Histone 3' UTR stem-loop
RF00050 Cis-reg; riboswitch; FMN riboswitch (RFN element)
RF00058 Gene; snRNA; snoRNA; HACA-box; HgcF RNA (Pab35)
RF00059 Cis-reg; riboswitch; TPP riboswitch (THI element)
RF00060 Gene; snRNA; snoRNA; HACA-box; HgcE RNA (Pab105)
RF00062 Gene; HgcC family RNA
RF00063 Gene; SscA RNA
RF00064 Gene; snRNA; snoRNA; HACA-box HgcG RNA (Pab40)
RF00065 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA snoR9
RF00095 Gene; snRNA; snoRNA; CD-box; Pyrococcus C/D box small nucleolar RNA
RF00150 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA SNORD42
RF00169 Gene; Bacterial small signal recognition particle RNA
RF00174 Cis-reg; riboswitch; Cobalamin riboswitch
RF00373 Gene; ribozyme; Archaeal RNase P
RF00380 Cis-reg; riboswitch; ykoK leader
RF00504 Cis-reg; riboswitch; Glycine riboswitch
RF00517 Cis-reg; leader; serC leader
RF00845 Gene; miRNA; microRNA MIR158
RF01051 Cis-reg; Cyclic di-GMP-I riboswitch
RF01068 Cis-reg; Guanidine-II riboswitch
RF01119 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR32
RF01120 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR33
RF01121 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR38
RF01122 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR39
RF01123 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR35
RF01124 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR36
RF01125 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR4
RF01126 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR41
RF01127 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR42
RF01128 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR42
RF01129 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR44
RF01130 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR46
RF01131 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR47
RF01132 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR48
RF01133 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR3
RF01134 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR30
RF01135 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR24
RF01136 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR28
RF01137 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR21
RF01138 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR23
RF01139 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR2
RF01140 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR20
RF01141 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR18
RF01142 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR19
RF01143 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR16
RF01144 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR17
RF01145 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR14
RF01146 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR15
RF01147 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR12
RF01149 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR10
RF01150 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR11
RF01152 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR1
RF01273 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR34
RF01274 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR45
RF01275 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR22
RF01276 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR53
RF01297 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR40
RF01303 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR49
RF01304 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR5
RF01305 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR51
RF01306 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR52
RF01307 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR55
RF01308 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR58
RF01309 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR60
RF01310 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR7
RF01312 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR9
RF01319 Gene; CRISPR; CRISPR RNA direct repeat element
RF01320 Gene; CRISPR; CRISPR RNA direct repeat element
RF01322 Gene; CRISPR; CRISPR RNA direct repeat element
RF01324 Gene; CRISPR; CRISPR RNA direct repeat element
RF01326 Gene; CRISPR; CRISPR RNA direct repeat element
RF01328 Gene; CRISPR; CRISPR RNA direct repeat element
RF01337 Gene; CRISPR; CRISPR RNA direct repeat element
RF01338 Gene; CRISPR; CRISPR RNA direct repeat element
RF01339 Gene; CRISPR; CRISPR RNA direct repeat element
RF01350 Gene; CRISPR; CRISPR RNA direct repeat element
RF01351 Gene; CRISPR; CRISPR RNA direct repeat element
RF01353 Gene; CRISPR; CRISPR RNA direct repeat element
RF01354 Gene; CRISPR; CRISPR RNA direct repeat element
RF01355 Gene; CRISPR; CRISPR RNA direct repeat element
RF01358 Gene; CRISPR; CRISPR RNA direct repeat element
RF01360 Gene; CRISPR; CRISPR RNA direct repeat element
RF01369 Gene; CRISPR; CRISPR RNA direct repeat element
RF01373 Gene; CRISPR; CRISPR RNA direct repeat element
RF01375 Gene; CRISPR; CRISPR RNA direct repeat element
RF01377 Gene; CRISPR; CRISPR RNA direct repeat element
RF01378 Gene; CRISPR; CRISPR RNA direct repeat element
RF01380 Cis-reg; Human immunodeficiency virus type 1 major splice donor
RF01419 Gene; antisense; Antisense RNA which regulates isiA expression
RF01689 Cis-reg; riboswitch; AdoCbl variant RNA
RF01717 Cis-reg; PhotoRC-II RNA
RF01722 Gene; sRNA; Pyrobac-1 RNA
RF01725 Cis-reg; riboswitch; SAM-I/IV variant riboswitch
RF01734 Cis-reg; riboswitch; Fluoride riboswitch
RF01737 Cis-reg; flpD RNA
RF01745 Cis-reg; manA RNA
RF01761 Cis-reg; wcaG RNA
RF01829 Gene; snRNA; snoRNA; CD-box; sR6 snoRNA
RF01854 Gene; Bacterial large signal recognition particle RNA
RF01856 Gene; Protozoan signal recognition particle RNA
RF01857 Gene; Archaeal signal recognition particle RNA
RF01982 Cis-reg; Pyrrolysine insertion sequence 1
RF01998 Intron; Group II catalytic intron D1-D4-1
RF01999 Intron; Group II catalytic intron D1-D4-2
RF02001 Intron; Group II catalytic intron D1-D4-3
RF02003 Intron; Group II catalytic intron D1-D4-4
RF02005 Intron; Group II catalytic intron D1-D4-6
RF02012 Intron; Group II catalytic intron D1-D4-7
RF02033 Gene; HNH endonuclease-associated RNA and ORF (HEARO) RNA
RF02163 Gene; snRNA; snoRNA; CD-box; Small nucleolar RNA sR-tMet
RF02194 Gene; antisense; Bacterial antisense RNA HPnc0260
RF02253 Cis-reg; Iron response element II
RF02276 Gene; ribozyme; Hammerhead ribozyme (type II)
RF02357 Gene; ribozyme; RNaseP truncated form
RF02509 Cis-reg; Pyrrolysine insertion sequence mtbB
RF02510 Cis-reg; Pyrrolysine insertion sequence mttB
RF02511 Cis-reg; Pyrrolysine insertion sequence TetR
RF02512 Cis-reg; Pyrrolysine insertion sequence transposase 1
RF02513 Cis-reg; Pyrrolysine insertion sequence transposase 2
RF02514 Gene; sRNA; 5' ureB small RNA
RF02656 Gene; sRNA; Sense overlapping transcript RNA 0042 (sot)
RF02657 Gene; sRNA; Sense overlapping transcript RNA 2652 (sot)
RF02792 Gene; antisense; Archaeal Small RNA 162
RF02794 Gene; snRNA; snoRNA; HACA-box; Pab19 RNA
RF02795 Gene; snRNA; snoRNA; HACA-box; Pab91 RNA
RF02796 Gene; snRNA; snoRNA; HACA-box; Pab160 RNA
RF02800 Gene; sRNA; Rickettsia sRNA47
RF02801 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 201
RF02802 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 204
RF02803 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 205
RF02804 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 206
RF02805 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 207
RF02806 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 208
RF02807 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 209
RF02808 Gene; snRNA; snoRNA; HACA-box; Pyrobaculum sRNA 210
RF02814 Gene; sRNA; Sulfolobus sRNA133
RF02820 Gene; antisense; Vibrio RNA AS9
RF02905 Gene; sRNA; Archaeal Small RNA 41
RF02906 Gene; sRNA; Archaeal Small RNA 154
RF02914 Cis-reg; DUF805b RNA
RF02921 Gene; sRNA; RT-14 RNA
RF02984 Gene; sRNA; DUF3800-X RNA
RF02996 Gene; sRNA; int-alpA RNA
RF03001 Cis-reg; leuA-Halobacteria RNA
RF03006 Gene; sRNA; M23 RNA
RF03019 Gene; sRNA; RT-16 RNA
RF03094 Gene; sRNA; LAGLIDADG-2 RNA
This diff is collapsed.
RF00004 Gene; snRNA; splicing; U2 spliceosomal RNA
RF00008 Gene; ribozyme; Hammerhead ribozyme (type III)
RF00024 Gene; Vertebrate telomerase RNA
RF00028 Intron; Group I catalytic intron
RF00029 Intron; Group II catalytic intron
RF00032 Cis-reg; Histone 3' UTR stem-loop
RF00036 Cis-reg; HIV Rev response element
RF00041 Cis-reg; Enteroviral 3' UTR element
RF00044 Gene; Bacteriophage pRNA
RF00048 Cis-reg; Enterovirus cis-acting replication element
RF00061 Cis-reg; IRES; Hepatitis C virus internal ribosome entry site
RF00094 Gene; ribozyme; Hepatitis delta virus ribozyme
RF00102 Gene; VA RNA
RF00106 Gene; antisense; RNAI
RF00164 Cis-reg; Coronavirus 3' stem-loop II-like motif (s2m)
RF00165 Cis-reg; Coronavirus 3' UTR pseudoknot
RF00170 Gene; Retron msr RNA
RF00171 Cis-reg; Tombusvirus 5' UTR
RF00173 Gene; ribozyme; Hairpin ribozyme
RF00175 Cis-reg; Human immunodeficiency virus type 1 dimerisation initiation site
RF00176 Cis-reg; Tombusvirus 3' UTR region IV
RF00182 Cis-reg; Coronavirus packaging signal
RF00184 Cis-reg; Potato virus X cis-acting regulatory element
RF00185 Cis-reg; Flavivirus 3' UTR cis-acting replication element (CRE)
RF00192 Cis-reg; Bovine leukaemia virus RNA packaging signal
RF00193 Cis-reg; Citrus tristeza virus replication signal
RF00194 Cis-reg; Rubella virus 3' cis-acting element
RF00196 Cis-reg; Alfalfa mosaic virus RNA 1 5' UTR stem-loop
RF00209 Cis-reg; IRES; Pestivirus internal ribosome entry site (IRES)
RF00210 Cis-reg; IRES; Aphthovirus internal ribosome entry site (IRES)
RF00214 Cis-reg; Retrovirus direct repeat 1 (dr1)
RF00215 Cis-reg; Tombus virus defective interfering (DI) RNA region 3
RF00220 Cis-reg; Human rhinovirus internal cis-acting regulatory element (CRE)
RF00225 Cis-reg; IRES; Tobamovirus internal ribosome entry site (IRES)
RF00228 Cis-reg; IRES; Hepatitis A virus internal ribosome entry site (IRES)
RF00229 Cis-reg; IRES; Picornavirus internal ribosome entry site (IRES)
RF00233 Cis-reg; Tymovirus/Pomovirus/Furovirus tRNA-like 3' UTR element
RF00250 Gene; miRNA; Trans-activation response element (TAR)
RF00252 Cis-reg; Alfalfa mosaic virus coat protein binding (CPB) RNA
RF00260 Cis-reg; Hepatitis C virus (HCV) cis-acting replication element (CRE)
RF00262 Gene; antisense; sar RNA
RF00290 Cis-reg; Bamboo mosaic potexvirus (BaMV) cis-regulatory element
RF00363 Gene; miRNA; mir-BART1 microRNA precursor family
RF00364 Gene; miRNA; mir-BART2 microRNA precursor family
RF00365 Gene; miRNA; mir-BHRF1-1 microRNA precursor family
RF00366 Gene; miRNA; mir-BHRF1-2 microRNA precursor family
RF00367 Gene; miRNA; mir-BHRF1-3 microRNA precursor family
RF00374 Cis-reg; Gammaretrovirus core encapsidation signal
RF00375 Cis-reg; HIV primer binding site (PBS)
RF00376 Cis-reg; HIV gag stem loop 3 (GSL3)
RF00384 Cis-reg; Poxvirus AX element late mRNA cis-regulatory element
RF00385 Cis-reg; Infectious bronchitis virus D-RNA
RF00386 Cis-reg; Enterovirus 5' cloverleaf cis-acting replication element
RF00389 Cis-reg; Bamboo mosaic virus satellite RNA cis-regulatory element
RF00390 Cis-reg; UPSK RNA
RF00434 Cis-reg; Luteovirus cap-independent translation element (BTE)
RF00448 Cis-reg; IRES; Epstein-Barr virus nuclear antigen (EBNA) IRES
RF00453 Cis-reg; Cardiovirus cis-acting replication element (CRE)
RF00458 Cis-reg; IRES; Cripavirus internal ribosome entry site (IRES)
RF00459 Cis-reg; Mason-Pfizer monkey virus packaging signal
RF00465 Cis-reg; Japanese encephalitis virus (JEV) hairpin structure
RF00467 Cis-reg; Rous sarcoma virus (RSV) primer binding site (PBS)
RF00468 Cis-reg; Hepatitis C virus stem-loop VII
RF00469 Cis-reg; Hepatitis C stem-loop IV
RF00470 Cis-reg; Togavirus 5' plus strand cis-regulatory element
RF00480 Cis-reg; frameshift_element; HIV Ribosomal frameshift signal
RF00481 Cis-reg; Hepatitis C virus 3'X element
RF00496 Cis-reg; Coronavirus SL-III cis-acting replication element (CRE)
RF00498 Cis-reg; leader; Equine arteritis virus leader TRS hairpin (LTH)
RF00499 Cis-reg; Human parechovirus 1 (HPeV1) cis regulatory element (CRE)
RF00500 Cis-reg; Turnip crinkle virus (TCV) repressor of minus strand synthesis H5
RF00501 Cis-reg; Rotavirus cis-acting replication element (CRE)
RF00502 Cis-reg; Turnip crinkle virus (TCV) core promoter hairpin (Pr)
RF00507 Cis-reg; frameshift_element; Coronavirus frameshifting stimulation element
RF00510 Cis-reg; Tombusvirus internal replication element (IRE)
RF00511 Cis-reg; IRES; Kaposi's sarcoma-associated herpesvirus internal ribosome entry site
RF00524 Cis-reg; R2 RNA element
RF00525 Cis-reg; Flavivirus DB element
RF00550 Cis-reg; Hepatitis E virus cis-reactive element
RF00617 Cis-reg; flavivirus capsid hairpin cHP
RF00620 Cis-reg; Hepatitis C alternative reading frame stem-loop
RF00863 Gene; miRNA; microRNA mir-BART17
RF00864 Gene; miRNA; microRNA mir-BART20
RF00866 Gene; miRNA; microRNA mir-BART3
RF00867 Gene; miRNA; microRNA mir-BART5
RF00868 Gene; miRNA; microRNA mir-BART15
RF00869 Gene; miRNA; microRNA mir-BART7
RF00874 Gene; miRNA; microRNA mir-BART12
RF01009 Gene; miRNA; microRNA mir-M7
RF01047 Cis-reg; HBV RNA encapsidation signal epsilon
RF01051 Cis-reg; Cyclic di-GMP-I riboswitch
RF01072 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01073 Cis-reg; Gag/pol translational readthrough site
RF01074 Cis-reg; frameshift_element; Putative RNA-dependent RNA polymerase ribosomal frameshift site
RF01075 Cis-reg; Pseudoknot of tRNA-like structure
RF01076 Cis-reg; frameshift_element; Polymerase ribosomal frameshift site
RF01077 Cis-reg; Pseudoknot of tRNA-like structure
RF01078 Cis-reg; 3'-terminal pseudoknot in PYVV
RF01079 Cis-reg; frameshift_element; Putative RNA-dependent RNA polymerase ribosomal frameshift site
RF01080 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01081 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01082 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01083 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01084 Cis-reg; Pseudoknot of tRNA-like structure
RF01085 Cis-reg; Pseudoknot of tRNA-like structure
RF01088 Cis-reg; Pseudoknot of tRNA-like structure
RF01091 Cis-reg; 3'-terminal pseudoknot in SPCSV
RF01092 Cis-reg; Gag/pol translational readthrough site
RF01094 Cis-reg; frameshift_element; Polymerase ribosomal frameshift site
RF01095 Cis-reg; 3'-terminal pseudoknot of CuYV/BPYV
RF01096 Cis-reg; HepA virus 3'-terminal pseudoknot
RF01097 Cis-reg; frameshift_element; Gag/pro ribosomal frameshift site
RF01098 Cis-reg; frameshift_element; Gag/pro ribosomal frameshift site
RF01099 Cis-reg; Pseudoknot of influenza A virus gene
RF01100 Cis-reg; 3'-terminal pseudoknot in BYV
RF01101 Cis-reg; Pseudoknot of tRNA-like structure
RF01102 Cis-reg; leader; 5'-leader pseudoknot of TEV/CVMV
RF01103 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01104 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01105 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01106 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01107 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01108 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01109 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01111 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01113 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01114 Cis-reg; Pseudoknot of upstream pseudoknot domain (UPD) of the 3'UTR
RF01313 Cis-reg; Avian HBV RNA encapsidation signal epsilon
RF01380 Cis-reg; Human immunodeficiency virus type 1 major splice donor
RF01381 Cis-reg; HIV-1 stem-loop 3 Psi packaging signal
RF01382 Cis-reg; HIV-1 stem-loop 4 packaging signal
RF01386 Gene; sRNA; isrB Hfq binding RNA
RF01394 Gene; sRNA; isrK Hfq binding RNA
RF01412 Gene; sRNA; BsrG
RF01415 Cis-reg; Flavivirus 3'UTR stem loop IV
RF01417 Cis-reg; Retroviral 3'UTR stability element
RF01418 Cis-reg; HIV pol-1 stem loop
RF01453 Cis-reg; 3'TE-DR1 translation enhancer element
RF01454 Cis-reg; 5'UTR enhancer element
RF01458 Gene; antisense; Listeria snRNA rli23
RF01466 Gene; sRNA; Listeria sRNA rli34
RF01479 Gene; sRNA; Listeria sRNA rli48
RF01486 Cis-reg; Listeria sRNA rli62
RF01492 Gene; sRNA; Listeria sRNA rli28
RF01497 Cis-reg; ALIL pseudoknot
RF01508 Cis-reg; Barley yellow dwarf virus 5'UTR
RF01516 Gene; snRNA; snoRNA; CD-box; Human herpesvirus 1 small nucleolar RNA
RF01668 Gene; sRNA; Pseudomonas sRNA P10
RF01695 Gene; antisense; C4 antisense RNA
RF01704 Cis-reg; Downstream peptide RNA
RF01717 Cis-reg; PhotoRC-II RNA
RF01739 Cis-reg; riboswitch; Glutamine riboswitch
RF01745 Cis-reg; manA RNA
RF01761 Cis-reg; wcaG RNA
RF01768 Cis-reg; frameshift_element; ribosomal frameshift site
RF01785 Cis-reg; frameshift_element; ribosomal frameshift site
RF01789 Gene; sRNA; Epstein-Barr virus EBER1
RF01790 Cis-reg; frameshift_element; ribosomal frameshift site
RF01792 Cis-reg; frameshift_element; ribosomal frameshift site
RF01794 Gene; antitoxin; sok antitoxin (CssrC)
RF01802 Gene; snRNA; Herpesvirus saimiri U RNA1/RNA2
RF01804 Cis-reg; thermoregulator; Lambda phage CIII thermoregulator element
RF01828 Gene; sRNA; Small pathogenicity island RNA D
RF01833 Cis-reg; frameshift_element; ribosomal frameshift site
RF01834 Cis-reg; frameshift_element; ribosomal frameshift site
RF01835 Cis-reg; frameshift_element; ribosomal frameshift site
RF01836 Cis-reg; frameshift_element; ribosomal frameshift site
RF01837 Cis-reg; frameshift_element; togavirus ribosomal frameshift element
RF01838 Cis-reg; frameshift_element; sobemovirus ribosomal frameshift elemental
RF01839 Cis-reg; frameshift_element; eastern equine encephalitis ribosomal frameshift element
RF01840 Cis-reg; frameshift_element; ribosomal frameshift element
RF01841 Cis-reg; frameshift_element; venezuelan equine encephalitis virus ribosomal frameshift element
RF01940 Gene; miRNA; microRNA hvt-mir-H9
RF02004 Intron; Group II catalytic intron D1-D4-5
RF02012 Intron; Group II catalytic intron D1-D4-7
RF02032 Gene; Giant, ornate, lake- and Lactobacillales-derived (GOLLD) RNA
RF02033 Gene; HNH endonuclease-associated RNA and ORF (HEARO) RNA
RF02076 Gene; sRNA; Gammaproteobacterial sRNA STnc100
RF02111 Gene; IS009
RF02221 Gene; sRNA; sRNA-Xcc1
RF02276 Gene; ribozyme; Hammerhead ribozyme (type II)
RF02340 Cis-reg; Dengue virus SLA
RF02359 Cis-reg; Bacteriophage MS2 operator hairpin
RF02415 Gene; sRNA; Listeria sRNA rliG
RF02416 Cis-reg; Turnip crinkle virus 3'UTR
RF02435 Gene; sRNA; Streptococcus sRNA SpF41
RF02455 Cis-reg; Dianthovirus RNA2 cap-independent translation element
RF02456 Cis-reg; Dianthovirus RNA2 3'UTR stem loops
RF02457 Cis-reg; Tombusvirus 3' cap-independent translation element
RF02458 Cis-reg; Aureusvirus cap-independent translation element
RF02459 Cis-reg; Necrovirus cap-independent translation element
RF02460 Cis-reg; Satellite tobacco necrosis virus cap-independent translation element
RF02461 Cis-reg; Blackcurrant reversion virus cap-independent translation element
RF02521 Cis-reg; Pea enation mosaic virus-2 cap-independent translation element
RF02522 Cis-reg; Pea enation mosaic virus-2 cap-independent translation element
RF02532 Cis-reg; Murine norovirus 3'UTR
RF02533 Cis-reg; Hepatitis A virus (HAV) cis-acting replication element (CRE)
RF02534 Cis-reg; Norovirus cis-acting replication element (CRE)
RF02536 Cis-reg; Avian encephalitis virus (AEV) cis-acting replication element (CRE)
RF02549 Cis-reg; Pseudoknot PSK3
RF02577 Gene; sRNA; S. aureus tsr24 small RNA
RF02585 Cis-reg; Hepatitis C virus RNA packaging signal
RF02586 Cis-reg; Hepatitis C virus RNA packaging signal 733
RF02587 Cis-reg; Hepatitis C virus RNA packaging signal 4629
RF02588 Cis-reg; Hepatitis C virus RNA packaging signal 6067
RF02589 Gene; sRNA; S. pyogenes small RNA 779816
RF02595 Gene; Epstein-Barr virus stable intronic sequence RNA 1
RF02598 Gene; Epstein-Barr virus stable intronic sequence RNA 2
RF02626 Gene; sRNA; Wolbachia sRNA 59
RF02658 Cis-reg; IRES; Rhopalosiphum padi virus 5'UTR internal ribosome entry site
RF02672 Gene; sRNA; Small pathogenicity island RNA X
RF02679 Gene; ribozyme; Pistol ribozyme
RF02702 Gene; sRNA; Anti GcvB sRNA
RF02703 Gene; sRNA; Anti stx2 sRNA
RF02712 Gene; sRNA; Epstein-Barr virus EBER2
RF02743 Gene; antisense; Saccharopolyspora sRNA 389
RF02816 Cis-reg; Hepatitis B virus post-transcriptional regulatory element 1151-1410
RF02838 Gene; sRNA; Enterococcus sRNA 55
RF02848 Gene; sRNA; Enterococcus sRNA B11
RF02855 Gene; antisense; Yersinia sRNA 251
RF02892 Gene; antisense; Bacillus SR6 antitoxin
RF02897 Gene; sRNA; Staphylococcus sRNA 808
RF02900 Gene; sRNA; Aggregatibacter sRNA 82
RF02910 Cis-reg; Coronavirus 5' stem-loops 1-2
RF02911 Cis-reg; Baculoviridae Nucleocapsid Assembly essential Element
RF02921 Gene; sRNA; RT-14 RNA
RF02924 Gene; sRNA; skipping-rope RNA
RF02931 Gene; sRNA; Bacilli-1 RNA
RF02944 Gene; sRNA; c4-2 RNA
RF02996 Gene; sRNA; int-alpA RNA
RF03003 Cis-reg; GP20-b RNA
RF03010 Gene; sRNA; mcrA RNA
RF03021 Gene; sRNA; RT-18 RNA
RF03022 Gene; sRNA; RT-10 RNA
RF03044 Gene; sRNA; Proteo-phage-1 RNA
RF03064 Gene; sRNA; RAGATH-18 RNA
RF03075 Gene; sRNA; DUF3800-VIII RNA
RF03085 Gene; sRNA; abiF RNA
RF03087 Gene; sRNA; ROOL RNA
SELECT DISTINCT f.rfam_acc, f.type, f.description
FROM taxonomy tx
INNER JOIN rfamseq rf ON rf.ncbi_id = tx.ncbi_id
INNER JOIN full_region fr ON fr.rfamseq_acc = rf.rfamseq_acc
INNER JOIN family f ON f.rfam_acc = fr.rfam_acc
WHERE ((f.type LIKE 'Gene;' AND f.description NOT LIKE '%transfer-messenger RNA')
OR f.type LIKE '%CRISPR;'
OR f.type LIKE '%antisense;'
OR f.type LIKE '%antitoxin;'
OR f.type LIKE '%miRNA;'
OR f.type LIKE '%ribozyme;'
OR f.type LIKE '%sRNA;'
OR f.type LIKE '%snRNA%'
OR f.type LIKE 'Intron;'
OR f.type LIKE 'Cis-reg;'
OR f.type LIKE '%IRES;'
OR f.type LIKE '%frameshift_element;'
OR f.type LIKE '%leader;'
OR f.type LIKE '%riboswitch;'
OR f.type LIKE '%thermoregulator;')
AND tx.tax_string LIKE 'Archaea%';
SELECT DISTINCT f.rfam_acc, f.type, f.description
FROM taxonomy tx
INNER JOIN rfamseq rf ON rf.ncbi_id = tx.ncbi_id
INNER JOIN full_region fr ON fr.rfamseq_acc = rf.rfamseq_acc
INNER JOIN family f ON f.rfam_acc = fr.rfam_acc
WHERE ((f.type LIKE 'Gene;' AND f.description NOT LIKE '%transfer-messenger RNA')
OR f.type LIKE '%CRISPR;'
OR f.type LIKE '%antisense;'
OR f.type LIKE '%antitoxin;'
OR f.type LIKE '%miRNA;'
OR f.type LIKE '%ribozyme;'
OR f.type LIKE '%sRNA;'
OR f.type LIKE '%snRNA%'
OR f.type LIKE 'Intron;'
OR f.type LIKE 'Cis-reg;'
OR f.type LIKE '%IRES;'
OR f.type LIKE '%frameshift_element;'
OR f.type LIKE '%leader;'
OR f.type LIKE '%riboswitch;'
OR f.type LIKE '%thermoregulator;')
AND tx.tax_string LIKE 'Bacteria%';
#!/usr/bin/env bash
# Inspired by https://github.com/tseemann/prokka/issues/243#issuecomment-341672420
rfamversion=14.1
if [ ! -f Rfam.cm ]; then
wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/${rfamversion}/Rfam.cm.gz
gunzip Rfam.cm.gz
fi
for tax in archaea bacteria viruses; do
mysql --user rfamro --host mysql-rfam-public.ebi.ac.uk --port 4497 --database Rfam \
< ${tax}.sql \
| tail -n +2 \
> Rfam_${tax}_${rfamversion}.txt
cmfetch -o Rfam_${tax}.cm -f Rfam.cm Rfam_${tax}_${rfamversion}.txt
cmconvert -o ${tax} -b Rfam_${tax}.cm
done
mv archaea ../Archaea
mv bacteria ../Bacteria
mv viruses ../Viruses
SELECT DISTINCT f.rfam_acc, f.type, f.description
FROM taxonomy tx
INNER JOIN rfamseq rf ON rf.ncbi_id = tx.ncbi_id
INNER JOIN full_region fr ON fr.rfamseq_acc = rf.rfamseq_acc
INNER JOIN family f ON f.rfam_acc = fr.rfam_acc
WHERE ((f.type LIKE 'Gene;' AND f.description NOT LIKE '%transfer-messenger RNA')
OR f.type LIKE '%CRISPR;'
OR f.type LIKE '%antisense;'
OR f.type LIKE '%antitoxin;'
OR f.type LIKE '%miRNA;'
OR f.type LIKE '%ribozyme;'
OR f.type LIKE '%sRNA;'
OR f.type LIKE '%snRNA%'
OR f.type LIKE 'Intron;'
OR f.type LIKE 'Cis-reg;'
OR f.type LIKE '%IRES;'
OR f.type LIKE '%frameshift_element;'
OR f.type LIKE '%leader;'
OR f.type LIKE '%riboswitch;'
OR f.type LIKE '%thermoregulator;')
AND tx.tax_string LIKE 'Viruses%';
prokka (1.14.5+dfsg-1) unstable; urgency=medium
* New upstream version
-- Michael R. Crusoe <michael.crusoe@gmail.com> Sat, 23 Nov 2019 17:37:23 +0100
prokka (1.14.0+dfsg-1) unstable; urgency=medium
* New upstream version
......
This diff is collapsed.
#!/bin/sh
pandoc -f markdown -t plain ../README.md > prokka-manual.txt