Skip to content
Commits on Source (5)
Metadata-Version: 1.1
Name: gffutils
Version: 0.8.7.1
Version: 0.9
Summary: Work with GFF and GTF files in a flexible database framework
Home-page: none
Home-page: https://github.com/daler/gffutils
Author: Ryan Dale
Author-email: dalerr@niddk.nih.gov
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
......@@ -17,4 +17,7 @@ Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Software Development :: Libraries :: Python Modules
......@@ -9,10 +9,11 @@
:target: https://pypi.python.org/pypi/gffutils
See docs at http://daler.github.io/gffutils.
``gffutils`` is a Python package for working with and manipulating the GFF and
GTF format files typically used for genomic annotations. Files are loaded into
a sqlite3 database, allowing much more complex manipulation of hierarchical
features (e.g., genes, transcripts, and exons) than is possible with plain-text
methods alone.
See documentation at **http://daler.github.io/gffutils**.
python-gffutils (0.8.7.1-1) unstable; urgency=medium
python-gffutils (0.9-1) UNRELEASED; urgency=medium
* Removal reported via bug #894298.
* Team non-upload.
* New upstream version.
-- Steffen Moeller <moeller@debian.org> Sun, 15 Apr 2018 14:03:00 +0200
python-gffutils (0.8.7.1-1) REMOVED; urgency=medium
* Initial release. (Closes: #851488)
......
......@@ -17,7 +17,7 @@ Build-Depends: debhelper (>= 10),
python3-nose,
python3-biopython,
python3-pybedtools
Standards-Version: 3.9.8
Standards-Version: 4.1.3
Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/python-gffutils.git
Vcs-Git: https://anonscm.debian.org/git/debian-med/python-gffutils.git
Homepage: https://daler.github.io/gffutils
......
{{ fullname }}
{{ underline }}
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
{% block methods %}
.. automethod:: __init__
{% if methods %}
.. rubric:: Methods
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block attributes %}
{% if attributes %}
.. rubric:: Attributes
.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
Metadata-Version: 1.1
Name: gffutils
Version: 0.8.7.1
Version: 0.9
Summary: Work with GFF and GTF files in a flexible database framework
Home-page: none
Home-page: https://github.com/daler/gffutils
Author: Ryan Dale
Author-email: dalerr@niddk.nih.gov
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
......@@ -17,4 +17,7 @@ Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Software Development :: Libraries :: Python Modules
......@@ -3,6 +3,7 @@ MANIFEST.in
README.rst
requirements.txt
setup.py
doc/source/_templates/class.rst
gffutils/__init__.py
gffutils/attributes.py
gffutils/bins.py
......@@ -34,23 +35,34 @@ gffutils/test/expected.py
gffutils/test/feature_test.py
gffutils/test/helpers_test.py
gffutils/test/parser_test.py
gffutils/test/performance_test.py
gffutils/test/test.py
gffutils/test/test_biopython_integration.py
gffutils/test/data/F3-unique-3.v2.gff
gffutils/test/data/FBgn0031208.gff
gffutils/test/data/FBgn0031208.gtf
gffutils/test/data/Saccharomyces_cerevisiae.R64-1-1.83.5000_gene_ids.txt
gffutils/test/data/Saccharomyces_cerevisiae.R64-1-1.83.5000_transcript_ids.txt
gffutils/test/data/Saccharomyces_cerevisiae.R64-1-1.83.chromsizes.txt
gffutils/test/data/c_elegans_WS199_ann_gff.txt
gffutils/test/data/c_elegans_WS199_dna_shortened.fa
gffutils/test/data/c_elegans_WS199_shortened_gff.txt
gffutils/test/data/dm6-chr2L.fa
gffutils/test/data/dmel-all-no-analysis-r5.49_50k_lines.gff
gffutils/test/data/download-large-annotation-files.sh
gffutils/test/data/ensembl_gtf.txt
gffutils/test/data/gencode-v19.gtf
gffutils/test/data/gencode.vM8.5000_gene_ids.txt
gffutils/test/data/gencode.vM8.5000_transcript_ids.txt
gffutils/test/data/gencode.vM8.chromsizes.txt
gffutils/test/data/gff_example1.gff3
gffutils/test/data/gff_example1.gff3.gz
gffutils/test/data/glimmer_nokeyval.gff3
gffutils/test/data/hybrid1.gff3
gffutils/test/data/intro_docs_example.gff
gffutils/test/data/jgi_gff2.txt
gffutils/test/data/keep-order-test.gtf
gffutils/test/data/keyval_sep_in_attrs.gff
gffutils/test/data/mouse_extra_comma.gff3
gffutils/test/data/ncbi_gff3.txt
gffutils/test/data/nonascii
......
......@@ -139,6 +139,7 @@ dialect = {
}
always_return_list = True
ignore_url_escape_characters = False
# these keyword args are used by iterators.
_iterator_kwargs = (
......
......@@ -4,24 +4,19 @@ Conversion functions that operate on :class:`FeatureDB` classes.
import six
def to_bed12(f, db, child_type='exon', name_field='ID'):
"""
Given a top-level feature (e.g., transcript), construct a BED12 entry
Parameters
----------
f : Feature object or string
This is the top-level feature represented by one BED12 line. For
a canonical GFF or GTF, this will generally be a transcript.
db : a FeatureDB object
This is need to get the children for the feature
child_type : str
Featuretypes that will be represented by the BED12 "blocks". Typically
"exon".
name_field : str
Attribute to be used in the "name" field of the BED12 entry. Usually
"ID" for GFF; "transcript_id" for GTF.
......
......@@ -57,6 +57,7 @@ class _DBCreator(object):
force_merge_fields=None,
text_factory=sqlite3.OptimizedUnicode,
pragmas=constants.default_pragmas, _keep_tempfiles=False,
directives=None,
**kwargs):
"""
Base class for _GFFDBCreator and _GTFDBCreator; see create_db()
......@@ -80,6 +81,9 @@ class _DBCreator(object):
self.pragmas = pragmas
self.merge_strategy = merge_strategy
self.default_encoding = default_encoding
if directives is None:
directives = []
self.directives = directives
if not infer_gene_extent:
warnings.warn("'infer_gene_extent' will be deprecated. For now, "
......@@ -121,6 +125,7 @@ class _DBCreator(object):
dialect=dialect
)
def set_verbose(self, verbose=None):
if verbose == 'debug':
logger.setLevel(logging.DEBUG)
......@@ -439,9 +444,10 @@ class _DBCreator(object):
In general, if you'll be adding stuff to the meta table, do it here.
"""
c = self.conn.cursor()
directives = self.directives + self.iterator.directives
c.executemany('''
INSERT INTO directives VALUES (?)
''', ((i,) for i in self.iterator.directives))
''', ((i,) for i in directives))
c.execute(
'''
INSERT INTO meta (version, dialect)
......@@ -472,6 +478,16 @@ class _DBCreator(object):
logger.info("Creating features(featuretype) index")
c.execute('DROP INDEX IF EXISTS featuretype')
c.execute('CREATE INDEX featuretype ON features (featuretype)')
logger.info("Creating features (seqid, start, end) index")
c.execute('DROP INDEX IF EXISTS seqidstartend')
c.execute('CREATE INDEX seqidstartend ON features (seqid, start, end)')
logger.info("Creating features (seqid, start, end, strand) index")
c.execute('DROP INDEX IF EXISTS seqidstartendstrand')
c.execute('CREATE INDEX seqidstartendstrand ON features (seqid, start, end, strand)')
# speeds computation 1000x in some cases
logger.info("Running ANALYSE features")
c.execute('ANALYZE features')
self.conn.commit()
......@@ -1104,7 +1120,7 @@ def create_db(data, dbfn, id_spec=None, force=False, verbose=False,
Using `merge_strategy="warning"`, a warning will be printed to the
logger, and the duplicate feature will be skipped.
Using `merge_strategy="replace" will replace the entire existing
Using `merge_strategy="replace"` will replace the entire existing
feature with the new feature.
transform : callable
......@@ -1216,7 +1232,6 @@ def create_db(data, dbfn, id_spec=None, force=False, verbose=False,
-------
New :class:`FeatureDB` object.
"""
_locals = locals()
# Check if any older kwargs made it in
......@@ -1235,16 +1250,16 @@ def create_db(data, dbfn, id_spec=None, force=False, verbose=False,
if dialect is None:
dialect = iterator.dialect
if isinstance(iterator, iterators._FeatureIterator):
# However, a side-effect of this is that if `data` was a generator,
# then we've just consumed `checklines` items (see
# iterators.BaseIterator.__init__, which calls iterators.peek).
#
# But it also chains those consumed items back onto the beginning, and
# the result is available as as iterator._iter.
#
# That's what we should be using now for `data:
kwargs['data'] = iterator._iter
# However, a side-effect of this is that if `data` was a generator, then
# we've just consumed `checklines` items (see
# iterators.BaseIterator.__init__, which calls iterators.peek).
#
# But it also chains those consumed items back onto the beginning, and the
# result is available as as iterator._iter.
#
# That's what we should be using now for `data:
kwargs['data'] = iterator._iter
kwargs['directives'] = iterator.directives
# Since we've already checked lines, we don't want to do it again
kwargs['checklines'] = 0
......
......@@ -74,6 +74,14 @@ class Feature(object):
dictionary and the dialect -- except if the original attributes
string was provided, in which case that will be used directly.
Notes on encoding/decoding: the only time unquoting
(e.g., "%2C" becomes ",") happens is if `attributes` is a string
and if `settings.ignore_url_escape_characters = False`. If dict or
JSON, the contents are used as-is.
Similarly, the only time characters are quoted ("," becomes "%2C")
is when the feature is printed (`__str__` method).
extra : string or list
Additional fields after the canonical 9 fields for GFF/GTF.
......@@ -114,11 +122,11 @@ class Feature(object):
"""
# start/end can be provided as int-like, ".", or None, but will be
# converted to int or None
if start == ".":
if start == "." or start == "":
start = None
elif start is not None:
start = int(start)
if end == ".":
if end == "." or end == "":
end = None
elif end is not None:
end = int(end)
......@@ -224,6 +232,7 @@ class Feature(object):
return unicode(self).encode('utf-8')
def __unicode__(self):
# All fields but attributes (and extra).
items = [getattr(self, k) for k in constants._gffkeys[:-1]]
......@@ -264,7 +273,7 @@ class Feature(object):
return self.stop - self.start + 1
# aliases for official GFF field names; this way x.chrom == x.seqid; and
# x.start == x.end.
# x.stop == x.end.
@property
def chrom(self):
return self.seqid
......@@ -334,11 +343,14 @@ class Feature(object):
string
"""
if isinstance(fasta, six.string_types):
fasta = Fasta(fasta, as_raw=True)
fasta = Fasta(fasta, as_raw=False)
# recall GTF/GFF is 1-based closed; pyfaidx uses Python slice notation
# and is therefore 0-based half-open.
return fasta[self.chrom][self.start-1:self.stop]
seq = fasta[self.chrom][self.start-1:self.stop]
if use_strand and self.strand == '-':
seq = seq.reverse.complement
return seq.seq
def feature_from_line(line, dialect=None, strict=True, keep_order=False):
......
......@@ -2,6 +2,7 @@ import os
import six
import sqlite3
import shutil
import warnings
from gffutils import bins
from gffutils import helpers
from gffutils import constants
......@@ -150,6 +151,15 @@ class FeatureDB(object):
self.set_pragmas(pragmas)
if not self._analyzed():
warnings.warn(
"It appears that this database has not had the ANALYZE "
"sqlite3 command run on it. Doing so can dramatically "
"speed up queries, and is done by default for databases "
"created with gffutils >0.8.7.1 (this database was "
"created with version %s) Consider calling the analyze() "
"method of this object." % self.version)
def set_pragmas(self, pragmas):
"""
Set pragmas for the current database connection.
......@@ -178,6 +188,14 @@ class FeatureDB(object):
kwargs.setdefault('sort_attribute_values', self.sort_attribute_values)
return Feature(**kwargs)
def _analyzed(self):
res = self.execute(
"""
SELECT name FROM sqlite_master WHERE type='table'
AND name='sqlite_stat1';
""")
return len(list(res)) == 1
def schema(self):
"""
Returns the database schema as a string.
......@@ -442,6 +460,14 @@ class FeatureDB(object):
c = self.conn.cursor()
return c.execute(query)
def analyze(self):
"""
Runs the sqlite ANALYZE command to potentially speed up queries
dramatically.
"""
self.execute('ANALYZE features')
self.conn.commit()
def region(self, region=None, seqid=None, start=None, end=None,
strand=None, featuretype=None, completely_within=False):
"""
......
......@@ -2,6 +2,8 @@
import re
import copy
import collections
from six.moves import urllib
from gffutils import constants
from gffutils.exceptions import AttributeStringError
......@@ -16,6 +18,60 @@ logger.addHandler(ch)
gff3_kw_pat = re.compile('\w+=')
# Encoding/decoding notes
# -----------------------
# From
# https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md#description-of-the-format:
#
# GFF3 files are nine-column, tab-delimited, plain text files.
# Literal use of tab, newline, carriage return, the percent (%) sign,
# and control characters must be encoded using RFC 3986
# Percent-Encoding; no other characters may be encoded. Backslash and
# other ad-hoc escaping conventions that have been added to the GFF
# format are not allowed. The file contents may include any character
# in the set supported by the operating environment, although for
# portability with other systems, use of Latin-1 or Unicode are
# recommended.
#
# tab (%09)
# newline (%0A)
# carriage return (%0D)
# % percent (%25)
# control characters (%00 through %1F, %7F)
#
# In addition, the following characters have reserved meanings in
# column 9 and must be escaped when used in other contexts:
#
# ; semicolon (%3B)
# = equals (%3D)
# & ampersand (%26)
# , comma (%2C)
#
#
# See also issue #98.
#
# Note that spaces are NOT encoded. Some GFF files have spaces encoded; in
# these cases round-trip invariance will not hold since the %20 will be decoded
# but not re-encoded.
_to_quote = '\n\t\r%;=&,'
_to_quote += ''.join([chr(i) for i in range(32)])
_to_quote += chr(127)
# Caching idea from urllib.parse.Quoter, which uses a defaultdict for
# efficiency. Here we're sort of doing the reverse of the "reserved" idea used
# there.
class Quoter(collections.defaultdict):
def __missing__(self, b):
if b in _to_quote:
res = '%{:02X}'.format(ord(b))
else:
res = b
self[b] = res
return res
quoter = Quoter()
def _reconstruct(keyvals, dialect, keep_order=False,
sort_attribute_values=False):
......@@ -46,17 +102,27 @@ def _reconstruct(keyvals, dialect, keep_order=False,
return ""
parts = []
# Re-encode when reconstructing attributes
if constants.ignore_url_escape_characters or dialect['fmt'] != 'gff3':
attributes = keyvals
else:
attributes = {}
for k, v in keyvals.items():
attributes[k] = []
for i in v:
attributes[k].append(''.join([quoter[j] for j in i]))
# May need to split multiple values into multiple key/val pairs
if dialect['repeated keys']:
items = []
for key, val in keyvals.items():
for key, val in attributes.items():
if len(val) > 1:
for v in val:
items.append((key, [v]))
else:
items.append((key, val))
else:
items = list(keyvals.items())
items = list(attributes.items())
def sort_key(x):
# sort keys by their order in the dialect; anything not in there will
......@@ -87,7 +153,10 @@ def _reconstruct(keyvals, dialect, keep_order=False,
# Typically "=" for GFF3 or " " otherwise
part = dialect['keyval separator'].join([key, val_str])
else:
part = key
if dialect['fmt'] == 'gtf':
part = dialect['keyval separator'].join([key, '""'])
else:
part = key
parts.append(part)
# Typically ";" or "; "
......@@ -116,6 +185,19 @@ def _split_keyvals(keyval_str, dialect=None):
Otherwise, use the provided dialect (and return it at the end).
"""
def _unquote_quals(quals, dialect):
"""
Handles the unquoting (decoding) of percent-encoded characters.
See notes on encoding/decoding above.
"""
if not constants.ignore_url_escape_characters and dialect['fmt'] == 'gff3':
for key, vals in quals.items():
unquoted = [urllib.parse.unquote(v) for v in vals]
quals[key] = unquoted
return quals
infer_dialect = False
if dialect is None:
# Make a copy of default dialect so it can be modified as needed
......@@ -160,11 +242,14 @@ def _split_keyvals(keyval_str, dialect=None):
key, val = item
# Only key provided?
else:
assert len(item) == 1, item
elif len(item) == 1:
key = item[0]
val = ''
else:
key = item[0]
val = dialect['keyval separator'].join(item[1:])
try:
quals[key]
except KeyError:
......@@ -181,6 +266,7 @@ def _split_keyvals(keyval_str, dialect=None):
vals = val.split(',')
quals[key].extend(vals)
quals = _unquote_quals(quals, dialect)
return quals, dialect
# If we got here, then we need to infer the dialect....
......@@ -229,10 +315,16 @@ def _split_keyvals(keyval_str, dialect=None):
key, val = item
# Only key provided?
elif len(item) == 1:
key = item[0]
val = ''
# Pathological cases where values of a key have within them the key-val
# separator, e.g.,
# Alias=SGN-M1347;ID=T0028;Note=marker name(s): T0028 SGN-M1347 |identity=99.58|escore=2e-126
else:
assert len(item) == 1, item
key = item[0]
val = ''
val = dialect['keyval separator'].join(item[1:])
# Is the key already in there?
if key in quals:
......@@ -258,29 +350,11 @@ def _split_keyvals(keyval_str, dialect=None):
# keep track of the order of keys
dialect['order'].append(key)
#for key, vals in quals.items():
#
# TODO: urllib.unquote breaks round trip invariance for "hybrid1.gff3"
# test file. This is because the "Note" field has %xx escape chars,
# but "Dbxref" has ":" which, if everything were consistent, should
# have also been escaped.
#
# (By the way, GFF3 spec says only literal use of \t, \n, \r, %, and
# control characters should be encoded)
#
# Solution 1: don't unquote
# Solution 2: store, along with each attribute, whether or not it
# should be quoted later upon reconstruction
# Solution 3: don't care about invariance
# unquoted = [urllib.unquote(v) for v in vals]
#quals[key] = vals
if (
(dialect['keyval separator'] == ' ') and
(dialect['quoted GFF2 values'])
):
dialect['fmt'] = 'gtf'
quals = _unquote_quals(quals, dialect)
return quals, dialect
......@@ -89,10 +89,12 @@ attrs = [
'AFFX-U95:1332_f_at',
'Swissprot:SOMA_HUMAN',
],
'Note': ['growth%20hormone%201'],
'Note': ['growth hormone 1'],
'Alias': ['GH1']},
None,
'ID=A00469;Dbxref=AFFX-U133:205840_x_at,Locuslink:2688,Genbank-mRNA:'
'A00469,Swissprot:P01241,PFAM:PF00103,AFFX-U95:1332_f_at,Swissprot:'
'SOMA_HUMAN;Note=growth hormone 1;Alias=GH1',
),
# jgi_gff2.txt
......@@ -157,8 +159,8 @@ attrs = [
'Parent': ['NC_008596.1:speB'],
'locus_tag': ['MSMEG_1072'],
'EC_number': ['3.5.3.11'],
'note': ['identified%20by%20match%20to%20protein%20family%20HMM%20P'
'F00491%3B%20match%20to%20protein%20family%20HMM%20TIGR01'
'note': ['identified by match to protein family HMM P'
'F00491; match to protein family HMM TIGR01'
'230'],
'transl_table': ['11'],
'product': ['agmatinase'],
......@@ -167,7 +169,12 @@ attrs = [
'exon_number': ['1'],
},
None,
'ID=NC_008596.1:speB:unknown_transcript_1;Parent=NC_008596.1:speB;'
'locus_tag=MSMEG_1072;EC_number=3.5.3.11;note=identified by mat'
'ch to protein family HMM PF00491%3B match to prote'
'in family HMM TIGR01230;transl_table=11;product=agmatinase;p'
'rotein_id=YP_885468.1;db_xref=GI:118469242;db_xref=GeneID:4535378;'
'exon_number=1',
),
# wormbase_gff2_alt.txt
......
I 230218
II 813184
III 316620
IV 1531933
IX 439888
Mito 85779
V 576874
VI 270161
VII 1090940
VIII 562643
X 745751
XI 666816
XII 1078177
XIII 924431
XIV 784333
XV 1091291
XVI 948066
>chr2L
Cgacaatgcacgacagaggaagcagaacagatatttagattgcctctcat
tttctctcccatattatagggagaaatatgatcgcgtatgcgagagtagt
gccaacatattgtgctctttgattttttggcaacccaaaatggtggcgga
tgaaCGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAATAAATTCA
TTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTG
CCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTATATTACCG
CAAACCCAAAAAgacaatacacgacagagagagagagcagcggagatatt
tagattgcctattaaatatgatcgcgtatgcgagagtagtgccaacatat
tgtgctctCTATATAATGACTGCCTCTCATTCTGTCTTATTTTACCGCAA
ACCCAAatcgacaatgcacgacagaggaagcagaacagatatttagattg
cctctcattttctctcccatattatagggagaaatatgatcgcgtatgcg
agagtagtgccaacatattgtgctctttgattttttggcaacccaaaatg
gtggcggatgaaCGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAA
TAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGA
GAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTA
TATTACCGCAAACCCAAAAAgacaatacacgacagagagagagagcagcg
gagatatttagattgcctattaaatatgatcgcgtatgcgagagtagtgc
caacatattgtgctctCTATATAATGACTGCCTCTCATTCTGTCTTATTT
TACCGCAAACCCAAatcgacaatgcacgacagaggaagcagaacagatat
ttagattgcctctcattttctctcccatattatagggagaaatatgatcg
cgtatgcgagagtagtgccaacatattgtgctctttgattttttggcaac
ccaaaatggtggcggatgaaCGAGATGATAATATATTCAAGTTGCCGCTA
ATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGC
GTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCT
CTGTCTTATATTACCGCAAACCCAAAAAgacaatacacgacagagagaga
gagcagcggagatatttagattgcctattaaatatgatcgcgtatgcgag
agtagtgccaacatattgtgctctCTATATAATGACTGCCTCTCATTCTG
TCTTATTTTACCGCAAACCCAAatcgacaatgcacgacagaggaagcaga
acagatatttagattgcctctcattttctctcccatattatagggagaaa
tatgatcgcgtatgcgagagtagtgccaacatattgtgctctttgatttt
ttggcaacccaaaatggtggcggatgaaCGAGATGATAATATATTCAAGT
TGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATAT
ATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCT
CTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAAgacaatacacgaca
gagagagagagcagcggagatatttagattgcctattaaatatgatcgcg
tatgcgagagtagtgccaacatattgtgctctCTATATAATGACTGCCTC
TCATTCTGTCTTATTTTACCGCAAACCCAAatcgacaatgcacgacagag
gaagcagaacagatatttagattgcctctcattttctctcccatattata
gggagaaatatgatcgcgtatgcgagagtagtgccaacatattgtgctct
ttgattttttggcaacccaaaatggtggcggatgaaCGAGATGATAATAT
ATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAG
CACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAAT
GAGTGCCTCTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAAgacaat
acacgacagagagagagagcagcggagatatttagattgcctattaaata
tgatcgcgtatgcgagagtagtgccaacatattgtgctctCTATATAATG
ACTGCCTCTCATTCTGTCTTATTTTACCGCAAACCCAAatcgacaatgca
cgacagaggaagcagaacagatatttagattgcctctcattttctctccc
atattatagggagaaatatgatcgcgtatgcgagagtagtgccaacatat
tgtgctctttgattttttggcaacccaaaatggtggcggatgaaCGAGAT
# Download large annotation files neede for testing
# to gffutils/test/data/ directory.
cd $(dirname $0)
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M8/gencode.vM8.annotation.gff3.gz
gzip -d gencode.vM8.annotation.gff3.gz
wget ftp://ftp.ensembl.org/pub/release-83/gff3/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.83.gff3.gz
gzip -d Saccharomyces_cerevisiae.R64-1-1.83.gff3.gz
This diff is collapsed.