These files have a line for each RefSeq genome listing all metagenomic SRA runs
(as of August 2018) with Mash Containment Scores above the specified threshold.
They are provided for two screen modes:
* ``nucl``: Genomic RefSeq sequences
* ``prot``: Proteomic RefSeq sequences (combined amino acid sequences per organism). **NOTE:** Protein tables above are not p-value filtered and thus large (> ~50Gb) runs may have spurious hits. They also do not contain plasmids. Updates coming soon!
...and at two thresholds:
* ``95idy``: 95% Mash Containment Score, any coverage. Useful for finding runs containing a specific genome.
* ``80idy_3x``: 80% Mash Containment Score, at least 3x median k-mer multiplicity.
Useful for finding related, but novel, sequences.
The files are tab separated, with each line beginning with a RefSeq assembly accession, followed by SRA accessions, for example: