gffread (0.11.2-1) unstable; urgency=medium
* Initial release (Closes: #930545)
-- Andreas Tille <> Sat, 15 Jun 2019 00:51:54 +0200
Package: gffread
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}
Description: GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction
Description: GFF/GTF format conversions, region filtering, FASTA sequence extraction
Gffread is a GFF/GTF parsing utility providing format conversions,
region filtering, FASTA sequence extraction and more.
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
.TH GFFREAD "1" "June 2019" "gffread 0.11.2" "User Commands"
gffread \- GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction
.B gffread
<input_gff> [\-g <genomic_seqs_fasta> | <dir>][\-s <seq_info.fsize>]
[\-o <outfile.gff>] [\-t <tname>] [\-r [[<strand>]<chr>:]<start>..<end> [\-R]]
[\-CTVNJMKQAFPGUBHZWTOLE] [\-w <exons.fa>] [\-x <cds.fa>] [\-y <tr_cds.fa>]
[\-i <maxintron>] [\-\-sort\-by <refseq_list.txt>]
Filter and convert GFF3/GTF2 records, extract corresponding sequences etc.
By default (i.e. without \fB\-O\fR) only process transcripts, ignore other features.
<input_gff> is a GFF file, use '\-' for stdin
discard transcripts having an intron larger than <maxintron>
discard transcripts shorter than <minlen> bases
only show transcripts overlapping coordinate range <start>..<end>
(on chromosome/contig <chr>, strand <strand> if provided)
for \fB\-r\fR option, discard all transcripts that are not fully
contained within the given range
discard single\-exon transcripts
coding only: discard mRNAs that have no CDS features
\fB\-\-nc\fR non\-coding only: discard mRNAs that have CDS features
\fB\-\-ignore\-locus\fR : discard locus features and attributes found in the input
use the description field from <seq_info.fsize> and add it
as the value for a 'descr' attribute to the GFF record
<seq_info.fsize> is a tab\-delimited file providing this info
for each of the mapped sequences:
<seq\-name> <seq\-length> <seq\-description>
(useful for \fB\-A\fR option with mRNA/EST/protein mappings)
Sorting: (by default, chromosomes are kept in the order they were found)
\fB\-\-sort\-alpha\fR : chromosomes (reference sequences) are sorted alphabetically
\fB\-\-sort\-by\fR : sort the reference sequences by the order in which their
names are given in the <refseq.lst> file
.SS "Misc options:"
attempt to preserve all GFF attributes preservation
\fB\-\-keep\-exon\-attrs\fR : for \fB\-F\fR option, do not attempt to reduce redundant
exon/CDS attributes
do not keep exon attributes, move them to the transcript feature
(for GFF3 output)
\fB\-\-keep\-genes\fR : in transcript\-only mode (default), also preserve gene records
\fB\-\-keep\-comments\fR: for GFF3 input/output, try to preserve comments
process other non\-transcript GFF records (by default non\-transcript
records are ignored)
discard any mRNAs with CDS having in\-frame stop codons (requires \fB\-g\fR)
for \fB\-V\fR option, check and adjust the starting CDS phase
if the original phase leads to a translation with an
in\-frame stop codon
for \fB\-V\fR option, single\-exon transcripts are also checked on the
opposite strand (requires \fB\-g\fR)
add transcript level GFF attributes about the coding status of each
transcript, including partialness or in\-frame stop codons (requires \fB\-g\fR)
\fB\-\-add\-hasCDS\fR : add a "hasCDS" attribute with value "true" for transcripts
that have CDS features
\fB\-\-adj\-stop\fR stop codon adjustment: enables \fB\-P\fR and performs automatic
adjustment of the CDS stop coordinate if premature or downstream
discard multi\-exon mRNAs that have any intron with a non\-canonical
splice site consensus (i.e. not GT\-AG, GC\-AG or AT\-AC)
discard any mRNAs that either lack initial START codon
or the terminal STOP codon, or have an in\-frame stop codon
(i.e. only print mRNAs with a complete CDS)
\fB\-\-no\-pseudo\fR: filter out records matching the 'pseudo' keyword
\fB\-\-in\-bed\fR: input should be parsed as BED format (automatic if the input
filename ends with .bed*)
\fB\-\-in\-tlf\fR: input GFF\-like one\-line\-per\-transcript format without exon/CDS
features (see \fB\-\-tlf\fR option below); automatic if the input
filename ends with .tlf)
.SS "Clustering:"
\fB\-M\fR/\-\-merge : cluster the input transcripts into loci, discarding
"duplicated" transcripts (those with the same exact introns
and fully contained or equal boundaries)
\fB\-d\fR <dupinfo> : for \fB\-M\fR option, write duplication info to file <dupinfo>
\fB\-\-cluster\-only\fR: same as \fB\-M\fR/\-\-merge but without discarding any of the
"duplicate" transcripts, only create "locus" features
for \fB\-M\fR option: also discard as redundant the shorter, fully contained
transcripts (intron chains matching a part of the container)
for \fB\-M\fR option, no longer require boundary containment when assessing
redundancy (can be combined with \fB\-K\fR); only introns have to match for
multi\-exon transcripts, and >=80% overlap for single\-exon transcripts
for \fB\-M\fR option, enforce \fB\-Q\fR but also discard overlapping single\-exon
transcripts, even on the opposite strand (can be combined with \fB\-K\fR)
.SS "Output options:"
\fB\-\-force\-exons\fR: make sure that the lowest level GFF features are considered
"exon" features
\fB\-\-gene2exon\fR: for single\-line genes not parenting any transcripts, add an
exon feature spanning the entire gene (treat it as a transcript)
decode url encoded characters within attributes
merge very close exons into a single exon (when intron size<4)
full path to a multi\-fasta file with the genomic sequences
for all input mappings, OR a directory with single\-fasta files
(one per genomic sequence, with file names matching sequence names)
write a fasta file with spliced exons for each GFF transcript
write a fasta file with spliced CDS for each GFF transcript
write a protein fasta file with the translation of CDS for each record
for \fB\-w\fR and \fB\-x\fR options, write in the FASTA defline the exon
coordinates projected onto the spliced sequence;
for \fB\-y\fR option, write transcript attributes in the FASTA defline
for \fB\-y\fR option, use '*' instead of '.' as stop codon translation
Ensembl GTF to GFF3 conversion (implies \fB\-F\fR; should be used with \fB\-m\fR)
<chr_replace> is a name mapping table for converting reference
sequence names, having this 2\-column format:
<original_ref_ID> <new_ref_ID>
WARNING: all GFF records on reference sequences whose original IDs
are not found in the 1st column of this table will be discarded!
use <trackname> in the 2nd column of each GFF/GTF output line
print the GFF records to <outfile.gff> (those that passed any
given filters). Use \fB\-o\-\fR to enable printing of to stdout
for \fB\-o\fR, output will be GTF instead of GFF3
\fB\-\-bed\fR for \fB\-o\fR, output BED format instead of GFF3
\fB\-\-tlf\fR for \fB\-o\fR, output "transcript line format" which is like GFF
but exons, CDS features and related data are stored as GFF
attributes in the transcript feature line, like this:
<exons> is a comma\-delimited list of exon_start\-exon_end coordinates;
<CDScoords> is CDS_start:CDS_end coordinates or a list like <exons>;
\fB\-v\fR,\-E expose (warn about) duplicate transcript IDs and other potential
problems with the given GFF/GTF records
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
