parseTaxonomy-functions.Rd 3.17 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/IO-methods.R
\title{Parse elements of a taxonomy vector}


\item{char.vec}{(Required). A single character vector of taxonomic
ranks for a single OTU, unprocessed (ugly).}
A character vector in which each element is a different
 taxonomic rank of the same OTU, and each element name is the name of
 the rank level. For example, an element might be \code{"Firmicutes"}
 and named \code{"phylum"}.
 These parsed, named versions of the taxonomic vector should
 reflect embedded information, naming conventions,
 desired length limits, etc; or in the case of \code{\link{parse_taxonomy_default}},
 not modified at all and given dummy rank names to each element.
These are provided as both example and default functions for
parsing a character vector of taxonomic rank information for a single taxa.
As default functions, these are intended for cases where the data adheres to
the naming convention used by greengenes
or where the convention is unknown, respectively.
To work, these functions -- and any similar custom function you may want to
create and use -- must take as input a single character vector of taxonomic
ranks for a single OTU, and return a \strong{named} character vector that has
been modified appropriately (according to known naming conventions,
desired length limits, etc.
The length (number of elements) of the output named vector does \strong{not}
need to be equal to the input, which is useful for the cases where the
source data files have extra meaningless elements that should probably be
removed, like the ubiquitous 
``Root'' element often found in greengenes/QIIME taxonomy labels.
In the case of \code{parse_taxonomy_default}, no naming convention is assumed and
so dummy rank names are added to the vector.  
More usefully if your taxonomy data is based on greengenes, the
\code{parse_taxonomy_greengenes} function clips the first 3 characters that 
identify the rank, and uses these to name the corresponding element according
to the appropriate taxonomic rank name used by greengenes
(e.g. \code{"p__"} at the beginning of an element means that element is 
the name of the phylum to which this OTU belongs).
Most importantly, the expectations for these functions described above
make them compatible to use during data import,
specifcally the \code{\link{import_biom}} function, but 
it is a flexible structure that will be implemented soon for all phyloseq
import functions that deal with taxonomy (e.g. \code{\link{import_qiime}}).
 taxvec1 = c("Root", "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Staphylococcaceae")
 taxvec2 = c("Root;k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae")