Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille · Andreas Tille
--- a/.hg_archival.txt
+++ b/.hg_archival.txt
+repo: 9abeda989914318a9bdd9de1f76edb42c4b84a8b
+node: 2b20f3dbd996d3db969b64b744e0f0107f33876c
+branch: default
+tag: 2019.9.13a
--- a/.hgignore
+++ b/.hgignore
@@ -18,3 +18,12 @@ draw_results
 *.coverage
 *egg-info*
 *.wpu
+.cache*
+*taskpaper
+*.ipynb_checkpoints*
+*.sublime*
+*.patch
+*.pytest_cache
+*.tox
+*.vscode
+*.code-workspace
\ No newline at end of file
--- a/.hgtags
+++ b/.hgtags
+4ee8471d007032a28f2d75edf63504fcdafb5eaa 3.0a1
+72e162442356b3090e0e884d0461c1ddb16f6c5c remove
+4ee8471d007032a28f2d75edf63504fcdafb5eaa 3.0a1
+72e162442356b3090e0e884d0461c1ddb16f6c5c 3.0a1
+72e162442356b3090e0e884d0461c1ddb16f6c5c 3.0a1
+d8e9473a0b248c3a0875fd71053114412ab290ed 3.0a1
+e2e8731e8ea3dd6b483f1d9b60268cd7762a7e48 3.0a2
+e2e8731e8ea3dd6b483f1d9b60268cd7762a7e48 3.0a2
+01be08e75ae196debb552110547315f751d2bd64 3.0a2
+36da26dc56e4f400ac62c43bed0813af9d08e204 2019.08.06a
+36da26dc56e4f400ac62c43bed0813af9d08e204 2019.08.06a
+91686dc745555aa467b93d3f95e2a313e9d6e69c 2019.08.06a
+91686dc745555aa467b93d3f95e2a313e9d6e69c 2019.08.06a
+833f75a211c160c76267a98d094e32b571b284ec 2019.08.06a
+833f75a211c160c76267a98d094e32b571b284ec 2019.08.06a
+aa0e8057ad0f0a225eaf715773053481eb7f7184 2019.08.06a
+aa0e8057ad0f0a225eaf715773053481eb7f7184 2019.08.06a
+a79911d979e8e52b74e84922171305e1686a28f5 2019.08.06a
+a79911d979e8e52b74e84922171305e1686a28f5 2019.08.06a
+7828e9e27b0b79473eb34c821d68590ca2d909d6 2019.08.06a
+7828e9e27b0b79473eb34c821d68590ca2d909d6 2019.08.06a
+d24d24a416d58e406722903179c5788b8dfc5fbb 2019.08.06a
+d24d24a416d58e406722903179c5788b8dfc5fbb 2019.08.06a
+124948744635e6612d35d369cbff385a6e859781 2019.08.06a
+124948744635e6612d35d369cbff385a6e859781 2019.08.06a
+ddff5cd51d0b0e91dddc99cf38be5c5a5e580cca 2019.08.06a
+39a382d7836bda436d9970e54b84242dca88f7a2 2019.8.20a
+27890ba31bb55df37a4176dbd5fdbacbec4774cc 2019.8.23a
+e24728110ae2894b52bd7f6f2b8ec02411deb0ae 2019.8.28a
+e24728110ae2894b52bd7f6f2b8ec02411deb0ae 2019.8.28a
+5ee7ba84b3a2c606b800313a99b5bf602679ce84 2019.8.28a
+5b08dd8d5184773611bf265a13d20f427335a9ee 2019.8.30a
--- a/ChangeLog
+++ b/ChangeLog
--- a/LICENSE
+++ b/LICENSE
+Copyright 2019 Gavin Huttley
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
--- a/MANIFEST.in
+++ b/MANIFEST.in
+include MANIFEST.in LICENSE
+recursive-include src *.c
+recursive-include src *.pyx *.c
--- a/README
+++ b/README
-The Readme
-==========
-
-:Download: `From github <https://github.com/pycogent/pycogent>`_ or follow the :ref:`quick-install` instructions.
-
-Dependencies
------------
-
-The toolkit requires Python 2.5.1 or greater, and Numpy 1.3 or greater. Aside from these the dependencies below are optional and the code will work as is. A C compiler, however, will allow external C module's responsible for the likelihood and matrix exponentiation calculations to be compiled, resulting in significantly improved performance.
-
-.. _required:
-
-Required
-^^^^^^^^
-
- Python_: the language the toolkit is primarily written in, and in which the user writes control scripts.
- Numpy_: This is a python module used for speeding up matrix computations. It is available as source code for \*nix.
- zlib_: This is a compression library which is available for all platforms and comes pre-installed on most too. If, by chance, your platform doesn't have this installed then download the source from the zlib_ site and follow the install instructions, or refer to the instructions for `compiling matplotlib`_.
-
-.. note:: On some linux platforms (like Ubuntu), you must specifically install a ``python-dev`` package so that the Python_ header files required for building some external dependencies are available.
-
-Optional
-^^^^^^^^
-
- C compiler: This is standard on most \*nix platforms. On Macos X this is available for free in the Developer tools which, if you don't already have them, can be obtained from Apple_.
- Matplotlib_: used to plot several kinds of graphs related to codon usage. For installation, see these instructions for `compiling matplotlib`_.
- Cython_: This module is only necessary if you are a developer who wants to modify the \*.pyx files.
- mpi4py_: Message Passing Interface interface, required for parallel computation.
- SQLAlchemy_ and a mysql connector such as `PyMySQL <https://github.com/PyMySQL/PyMySQL>`_: These are required for the Ensembl querying code.
-
-See the :ref:`quick-install` approach for how these can be grabbed using the ``all`` option..
-
-Installation
------------
-
-See the :ref:`quick-install`.
-
-Testing
-------
-
-``PyCogent/tests`` contains all the tests. You can most readily run the tests using the ``PyCogent/run_tests`` shell script. This is done by typing:
-
-.. code-block:: guess
-    
-    $ sh run_tests
-
-which will automatically build extensions in place, set up the PYTHONPATH and run ``PyCogent/tests/alltests.py``. Note that if certain optional applications are not installed this will be indicated in the output as "can't find" or "not installed". A "`.`" will be printed to screen for each test and if they all pass, you'll see output like:
-
-.. code-block:: guess
-    
-    Ran 3299 tests in 58.128s
-    
-    OK
-
-Tips for usage
--------------
-
-A good IDE can greatly simplify writing control scripts. Features such as code completion and definition look-up are extremely useful. For a complete list of `editors go here`_.
-
-To get help on attributes of an object in python, use
-
-.. code-block:: python
-    
-    >>> dir(myalign)
-
-to list the attributes of ``myalign`` or 
-
-.. code-block:: python
-    
-    >>> help(myalign.writeToFile)
-
-to figure out how to use the ``myalign.writeToFile`` method. Also note that the directory structure of the package is similar to the import statements required to use a module -- to see the contents of ``alignment.py`` or ``sequence.py`` you need to look in the directory ``cogent/core`` path, to use the classes in those files you specify ``cogent.core`` for importing.
-
-.. _Python: http://www.python.org
-.. _Cython: http://www.cython.org/
-.. _Numpy: http://numpy.scipy.org/
-.. _Matplotlib: http://matplotlib.sourceforge.net
-.. _Apple: http://www.apple.com
-.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
-.. _`editors go here`: http://www.python.org/cgi-bin/moinmoin/PythonEditors
-.. _mpi4py: http://code.google.com/p/mpi4py
-.. _`restructured text`: http://docutils.sourceforge.net/rst.html
-.. _gcc: http://gcc.gnu.org/
-.. _SQLAlchemy: http://www.sqlalchemy.org
-.. _`mysql-connector-python`: https://pypi.python.org/pypi/mysql-connector-python
-.. _zlib: http://www.zlib.net/
-.. _`compiling matplotlib`: http://sourceforge.net/projects/pycogent/forums/forum/651121/topic/5635916
--- a/README.md
+++ b/README.md
-PyCogent: A toolkit for making sense from sequence
-==================================================
+# NOTICE

-The official PyCogent source code repository. For details on PyCogent, see http://www.pycogent.org/.
+We will be departing Bitbucket for GitHub soon due to their recent decision to drop support for mercurial. Stay tuned for details.

-PyCogent includes connectors to remote databases, built-in generalized probabilistic techniques for working with biological sequences, and controllers for 3rd party applications.
+## PyCogent3 →Cogent3

-For PyCogent news and announcements, including notification of new releases, you can subscribe to the [PyCogent News and Announcements Blog](http://pycogent.wordpress.com).
+This is the port of PyCogent to Python 3, it's being renamed to Cogent3 as it has been rewritten extensively and `cogent` was always been the import name (dating back to 2004!).
+
+## What's New?
+
+A massive amount!
+
+We integrated many of the new developments on modelling of non-stationary processes published by the Huttley lab over the last decade.
+
+We have implemented a `cogent3.app` module which contains very different capabilities. Notably, a functional programming style interface for many capabilities to lower the effort for using cogent3's advanced capabilities. These are as yet documented.
+
+Importantly, the app interface should be considered as alpha level code. Documentation for these new features will follow in the next few weeks.
+
+## What's Changed?
+
+There are massive changes to the library since the PyCogent version 1.9 release (which is only compatible with python 2).
+
+Please check the wiki for a list of the major API changes, but note that as of August 23, 2019 that Wiki is now very much out-of-date.
+
+We will update as soon as possible.
+
+## Documentation
+
+PyCogent3 documentation is on [readthedocs](https://cogent3.readthedocs.io/en/latest/).
\ No newline at end of file
--- a/bitbucket-pipelines.yml
+++ b/bitbucket-pipelines.yml
+image: python:3.7.1
+
+pipelines:
+  default:
+    - step:
+        caches:
+          - pip
+        script: # Modify the commands below to build your repository.
+          - pip install numpy
+          - pip install pandas
+          - pip install tqdm
+          - pip install plotly
+          - pip install scitrack
+          - pip install tinydb
+          - pip install pytest
+          - pip install tox
+          - tox -e py37
--- a/cogent/__init__.py
+++ b/cogent/__init__.py
-"""The most commonly used constructors are available from this toplevel module.
-The rest are in the subpackages: phylo, evolve, maths, draw, parse and format."""
-
-import sys, os, re, cPickle
-import numpy
-
-__author__ = ""
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__credits__ = ["Gavin Huttley", "Rob Knight", "Peter Maxwell",
-                    "Jeremy Widmann", "Catherine Lozupone", "Matthew Wakefield",
-                    "Edward Lang", "Greg Caporaso", "Mike Robeson",
-                    "Micah Hamady", "Sandra Smit", "Zongzhi Liu",
-                    "Andrew Butterfield", "Amanda Birmingham", "Brett Easton",
-                    "Hua Ying", "Jason Carnes", "Raymond Sammut",
-                    "Helen Lindsay", "Daniel McDonald"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Gavin Huttley"
-__email__ = "gavin.huttley@anu.edu.au"
-__status__ = "Production"
-
-#SUPPORT2425
-if sys.version_info < (2, 6):
-    py_version = ".".join([str(n) for n in sys.version_info])
-    raise RuntimeError("Python-2.6 or greater is required, Python-%s used." % py_version)
-
-numpy_version = re.split("[^\d]", numpy.__version__)
-numpy_version_info = tuple([int(i) for i in numpy_version if i.isdigit()])
-if numpy_version_info < (1, 3):
-    raise RuntimeError("Numpy-1.3 is required, %s found." % numpy_version)
-
-version = __version__
-version_info = tuple([int(v) for v in version.split(".") if v.isdigit()])
-
-
-from cogent.util.table import Table as _Table
-from cogent.parse.table import load_delimited, autogen_reader
-from cogent.core.tree import TreeBuilder, TreeError
-from cogent.parse.tree_xml import parse_string as tree_xml_parse_string
-from cogent.parse.newick import parse_string as newick_parse_string
-from cogent.core.alignment import SequenceCollection
-from cogent.core.alignment import Alignment
-from cogent.parse.sequence import FromFilenameParser
-from cogent.parse.structure import FromFilenameStructureParser
-#note that moltype has to be imported last, because it sets the moltype in
-#the objects created by the other modules.
-from cogent.core.moltype import ASCII, DNA, RNA, PROTEIN, STANDARD_CODON, \
-        CodonAlphabet
-
-def Sequence(moltype=None, seq=None, name=None, filename=None, format=None):
-    if seq is None:
-        for (a_name, a_seq) in FromFilenameParser(filename, format):
-            if seq is None:
-                seq = a_seq
-                if name is None:
-                    name = a_name
-            else:
-                raise ValueError("Multiple sequences in '%s'" % filename)
-    if moltype is not None:
-        seq = moltype.makeSequence(seq)
-    elif not hasattr(seq, 'MolType'):
-        seq = ASCII.makeSequence(seq)
-    if name is not None:
-        seq.Name = name
-    return seq
-
-def LoadSeqs(filename=None, format=None, data=None, moltype=None,
-            name=None, aligned=True, label_to_name=None, parser_kw={},
-            constructor_kw={}, **kw):
-    """Initialize an alignment or collection of sequences.
-    
-    Arguments:
-    - filename: name of the sequence file
-    - format: format of the sequence file
-    - data: optional explicit provision of sequences
-    - moltype: the MolType, eg DNA, PROTEIN
-    - aligned: set True if sequences are already aligned and have the same
-      length, results in an Alignment object. If False, a SequenceCollection
-      instance is returned instead. If callable, will use as a constructor
-      (e.g. can pass in DenseAlignment or CodonAlignment).
-    - label_to_name: function for converting original name into another
-      name. Default behavior is to preserve the original FASTA label and
-      comment. 
-      To remove all FASTA label comments, and pass in only the label, pass in: 
-            label_to_name=lambda x: x.split()[0]
-      To look up names in a dict, pass in:
-            label_to_name = lambda x: d.get(x, default_name)
-      ...where d is a dict that's in scope, and default_name is what you want
-      to assign any sequence that isn't in the dict.
-    
-    If format is None, will attempt to infer format from the filename
-    suffix. If label_to_name is None, will attempt to infer correct
-    conversion from the format.
-    """
-    
-    if filename is None:
-        assert data is not None
-        assert format is None
-        assert not kw, kw
-    else:
-        assert data is None, (filename, data)
-        data = list(FromFilenameParser(filename, format, **parser_kw))
-
-    # the following is a temp hack until we have the load API sorted out.
-    if aligned: #if callable, call it -- expect either f(data) or bool
-        if hasattr(aligned, '__call__'):
-            return aligned(data=data, MolType=moltype, Name=name,
-                label_to_name=label_to_name, **constructor_kw)
-        else:   #was not callable, but wasn't False
-            return Alignment(data=data, MolType=moltype, Name=name,
-                label_to_name=label_to_name, **constructor_kw)
-    else:   #generic case: return SequenceCollection
-        return SequenceCollection(data, MolType=moltype, Name=name,
-            label_to_name=label_to_name, **constructor_kw)
-
-def LoadStructure(filename, format=None, parser_kw={}):
-    """Initialize a Structure from data contained in filename.
-    Arguments:
-        - filename: name of the filename to create structure from.
-        - format: the optional file format extension.
-        - parser_kw: optional keyword arguments for the parser."""
-    # currently there is no support for string-input
-    assert filename is not None, 'No filename given.'
-    return FromFilenameStructureParser(filename, format, **parser_kw)
-
-
-def LoadTable(filename=None, sep=',', reader=None, header=None, rows=None,
-            row_order=None, digits=4, space=4, title='', missing_data='',
-            max_width = 1e100, row_ids=False, legend='', column_templates=None,
-            dtype=None, static_column_types=False, limit=None, **kwargs):
-    """
-    Arguments:
-    - filename: path to file containing a pickled table
-    - sep: the delimiting character between columns
-    - reader: a parser for reading filename. This approach assumes the first
-      row returned by the reader will be the header row.
-    - static_column_types: if True, and reader is None, identifies columns
-      with a numeric data type (int, float) from the first non-header row.
-      This assumes all subsequent entries in that column are of the same type.
-      Default is False.
-    - header: column headings
-    - rows: a 2D dict, list or tuple. If a dict, it must have column
-      headings as top level keys, and common row labels as keys in each
-      column.
-    - row_order: the order in which rows will be pulled from the twoDdict
-    - digits: floating point resolution
-    - space: number of spaces between columns or a string
-    - title: as implied
-    - missing_data: character assigned if a row has no entry for a column
-    - max_width: maximum column width for printing
-    - row_ids: if True, the 0'th column is used as row identifiers and keys
-      for slicing.
-    - legend: table legend
-    - column_templates: dict of column headings: string format templates
-      or a function that will handle the formatting.
-    - dtype: optional numpy array typecode.
-    - limit: exits after this many lines. Only applied for non pickled data
-      file types.
-    """
-    # 
-    if filename is not None and not (reader or static_column_types):
-        if filename[filename.rfind(".")+1:] == 'pickle':
-            f = file(filename, 'U')
-            loaded_table = cPickle.load(f)
-            f.close()
-            return _Table(**loaded_table)
-
-        sep = sep or kwargs.pop('delimiter', None)
-        header, rows, loaded_title, legend = load_delimited(filename,
-                                    delimiter = sep, limit=limit, **kwargs)
-        title = title or loaded_title
-    elif filename and (reader or static_column_types):
-        f = file(filename, "r")
-        if not reader:
-            reader = autogen_reader(f, sep, limit=limit,
-                        with_title=kwargs.get('with_title', False))
-        
-        rows = [row for row in reader(f)]
-        f.close()
-        header = rows.pop(0)
-
-    table = _Table(header=header, rows=rows, digits=digits, row_order=row_order,
-                title=title,
-                dtype=dtype, column_templates=column_templates, space=space,
-                missing_data=missing_data, max_width=max_width, row_ids=row_ids,
-                legend=legend)
-
-    return table
-
-def LoadTree(filename=None, treestring=None, tip_names=None, format=None, \
-    underscore_unmunge=False):
-
-    """Constructor for tree.
-    
-    Arguments, use only one of:
-        - filename: a file containing a newick or xml formatted tree.
-        - treestring: a newick or xml formatted tree string.
-        - tip_names: a list of tip names.
-
-    Note: underscore_unmunging is turned off by default, although it is part
-    of the Newick format. Set underscore_unmunge to True to replace underscores
-    with spaces in all names read.
-    """
-
-    if filename:
-        assert not (treestring or tip_names)
-        treestring = open(filename).read()
-        if format is None and filename.endswith('.xml'):
-            format = "xml"
-    if treestring:
-        assert not tip_names
-        if format is None and treestring.startswith('<'):
-            format = "xml"
-        if format == "xml":
-            parser = tree_xml_parse_string
-        else:
-            parser = newick_parse_string
-        tree_builder = TreeBuilder().createEdge
-        #FIXME: More general strategy for underscore_unmunge
-        if parser is newick_parse_string:
-            tree = parser(treestring, tree_builder, \
-                    underscore_unmunge=underscore_unmunge)
-        else:
-            tree = parser(treestring, tree_builder)
-        if not tree.NameLoaded:
-            tree.Name = 'root'
-    elif tip_names:
-        tree_builder = TreeBuilder().createEdge
-        tips = [tree_builder([], tip_name, {}) for tip_name in tip_names]
-        tree = tree_builder(tips, 'root', {})
-    else:
-        raise TreeError, 'filename or treestring not specified'
-    return tree
-
--- a/cogent/align/_compare.c
+++ b/cogent/align/_compare.c
--- a/cogent/align/algorithm.py
+++ b/cogent/align/algorithm.py
-#!/usr/bin/env python
-"""Code for performing alignments by Needleman-Wunsch and Smith-Waterman.
-"""
-
-__author__ = "Rob Knight"
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__credits__ = ["Rob Knight", "Jeremy Widmann"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Rob Knight"
-__email__ = "rob@spot.colorado.edu"
-__status__ = "Development"
-
-class ScoreCell(object):
-    """Cell in a ScoreMatrix object. Contains score and pointer."""
-    __slots__ = ['Score', 'Pointer']
-    def __init__(self, Score=0, Pointer=None):
-        """Returns new ScoreCell object.
-
-        Score should be a number.
-        Pointer should be a direction (typically None, "up", "diag", or "left")
-        """
-        self.Score = Score
-        self.Pointer = Pointer
-
-    def update(self, diag, up, left):
-        """Updates the ScoreCell object with scores for its three neighbors.
-
-        Finds max of (diag, up, and left), and sets the score and pointer
-        accordingly. In case of tie, picks up (i.e. tries to minimize
-        gaps since the first sequence is the shortest), then diag, then left.
-        """
-        best = max(diag, up, left)
-        if best == up:
-            self.Score = up
-            self.Pointer = "up"
-        elif best == diag:
-            self.Score = diag
-            self.Pointer = "diag"
-        else:
-            self.Score = left
-            self.Pointer = "left"
-
-def MatchScorer(match, mismatch):
-    """Factory function that returns a score function set to match and mismatch.
-
-    match and mismatch should both be numbers. Typically, match should be 
-    positive and mismatch should be negative.
-
-    Resulting function has signature f(x,y) -> number.
-    """
-    def scorer(x, y):
-        if x == y:
-            return match
-        else:
-            return mismatch
-    return scorer
-
-equality_scorer = MatchScorer(1, -1)
-default_gap = -1
-default_gap_symbol = '-'
-
-class ScoreMatrix(list):
-    """Matrix that contains (score, pointer) pairs for sequence alignment."""
-    
-    def __init__(self, First, Second, Score=equality_scorer, \
-        GapScore=default_gap, GapSymbol=default_gap_symbol):
-        """Returns new ScoreMatrix object, initialized but not filled.
-
-        Calls internal methods to initialize first row, column and cell.
-
-        First and Second are the two sequences that will be aligned. Columns
-        correspond to positions in First; rows correspond to positions in 
-        Second.
-        
-        Score is a scoring function that takes two corresponding items in
-        First and Second, and returns a number that can be compared and added.
-
-        Gap is the score for inserting a gap.
-
-        Note that the matrix is one row and one column bigger than the lengths
-        of the sequences, since the first row and first column contain 
-        initialization data.
-        """
-        self.First = First          #first sequence
-        self.Second = Second        #second sequence
-        self.Cols = len(First) + 1  #need to add 1 for the start col
-        self.Rows = len(Second) + 1 #need to add 1 for the start row
-        self.Score = Score          #scoring function: f(x,y) = num
-        self.GapScore = GapScore    #gap penalty
-        self.GapSymbol = GapSymbol  #symbol for gaps
-        self.Filled = False         #whether the matrix has been filled
-        self.FirstAlign = []        #first sequence, aligned, as list
-        self.SecondAlign = []       #second sequence, aligned, as list
-        for r in range(self.Rows):
-            self.append([ScoreCell() for c in range(self.Cols)])
-        self._init_first_row()
-        self._init_first_col()
-        self._init_first_cell()
-        #print "INITIALIZED WITH:"
-        #print self
-
-    def __str__(self):
-        """Prints score matrix, including labels for row and column."""
-        first = self.First
-        second = self.Second
-        if not first or not second:
-            return "Empty Score Matrix"
-        #add first sequence as first row: skip 2 positions for row label and
-        #for the first column, which is initialized to default values
-        rows = ['\t\t' + '\t'.join(self.First)]
-        for index, row in enumerate(self):
-            #if it's not the first row, add the appropriate char from the seq
-            if index:
-                curr = second[index-1]
-            else:
-                curr = ''
-            #add the row containing the char and the data
-            rows.append('%s\t'%curr + '\t'.join([str(i.Score) for i in row]))
-        return '\n'.join(rows)
-
-    def _init_first_row(self):
-        """Hook for matrices that need to initialize the first row."""
-        pass
-
-    def _init_first_col(self):
-        """Hook for matrices that need to initialize the first column."""
-        pass
-
-    def _init_first_cell(self):
-        """Hook for matrices that need to initialize the first cell."""
-        pass
-
-    def alignment(self):
-        """Returns alignment of first and second sequences.
-
-        Calls fill() and traceback() if necessary.
-        
-        Converts the aligned versions to the same class as the originals.
-        """
-        seq1, seq2 = self.First, self.Second
-        aln1, aln2 = self.FirstAlign, self.SecondAlign
-        if not aln1 or not aln2:
-            self.fill()
-            self.traceback()
-            aln1, aln2 = self.FirstAlign, self.SecondAlign
-        #Remember to return sequences that are the correct class. If the class
-        #subclasses str, you probably want ''.join rather than str to feed into
-        #the constructor, since str() on a list prints the brackets and commas.
-        if isinstance(seq1, str):
-            first_result = seq1.__class__(''.join(aln1))
-        else:
-            first_result = seq1.__class__(aln1)
-            
-        if isinstance(seq2, str):
-            second_result = seq2.__class__(''.join(aln2))
-        else:
-            second_result = seq2.__class__(aln2)
-
-        return first_result, second_result
-    
-
-class NeedlemanWunschMatrix(ScoreMatrix):
-    """Score matrix for global alignment using Needleman-Wunsch."""
-
-    def _init_first_row(self):
-        """First row initialized with index * gap penalty."""
-        gap = self.GapScore
-        self[0] = [ScoreCell(gap * i, 'left') for i in range(self.Cols)]
-        #print "AFTER FIRST ROW:"
-        #print self
-
-    def _init_first_col(self):
-        """First column initialized with index * gap penalty."""
-        gap = self.GapScore
-        for index, row in enumerate(self):
-            row[0] = ScoreCell(gap * index, 'up')
-        #print "AFTER FIRST COL:"
-        #print self
-
-    def _init_first_cell(self):
-        """First cell initialized to 0 with no pointer."""
-        self[0][0] = ScoreCell(0)
-        #print "AFTER FIRST CELL:"
-        #print self
-
-    def fill(self):
-        """Fills each cell with its best score and the direction of the next.
-
-        For moving up or left, calculates the gap penalty plus the score of the
-        cell. For moving diagonally, calculates the match/mismatch score for
-        the corresponding positions in the sequence plus the score of the cell.
-
-        Performs all calculations in place.
-        """
-        score = self.Score
-        gap = self.GapScore
-        seq_1, seq_2 = self.First, self.Second
-        curr_row = self[0]
-        for row_i in range(1, self.Rows):
-            prev_row = curr_row
-            curr_row = self[row_i]
-            curr_cell = curr_row[0]
-            for col_i in range(1, self.Cols):
-                prev_cell = curr_cell
-                curr_cell = curr_row[col_i]
-                #remember to subtract 1 to find corresponding pos in seq
-                diag_score = score(seq_1[col_i-1], seq_2[row_i-1]) + \
-                    prev_row[col_i-1].Score
-                up_score = gap + prev_row[col_i].Score
-                left_score = gap + prev_cell.Score
-                #print 'row: %s col: %s '%(row_i,col_i), \
-                #    diag_score,up_score,left_score
-                curr_cell.update(diag_score, up_score, left_score)
-        self.Filled = True
-        self.MaxScore = (self[-1][-1].Score, len(self)-1, len(self[0])-1)
-        #print "FINISHED FILL"
-        #print self
-
-    def traceback(self):
-        """Returns the optimal alignment as a (first, second) tuple w/ gaps.
-
-        Follows the pointers assigned in fill(), inserting gaps whenever the
-        movement is not diagonal.
-        """
-        if not self.Filled:
-            self.fill()
-        align_1 = []
-        align_2 = []
-        seq_1 = self.First
-        seq_2 = self.Second
-        curr_row = self.Rows - 1    #subtract 1 to use length as index
-        curr_col = self.Cols - 1    #subtract 1 to use length as index
-        p = self[curr_row][curr_col].Pointer
-        while 1:
-            #print "ROW: %s, COL: %s" % (curr_row, curr_col),p
-            if p == 'diag':
-                align_1.append(seq_1[curr_col-1])
-                align_2.append(seq_2[curr_row-1])
-                curr_row -= 1
-                curr_col -= 1
-            elif p == 'left':
-                align_1.append(seq_1[curr_col-1])
-                align_2.append('-')
-                curr_col -= 1
-            elif p == 'up':
-                align_1.append('-')
-                align_2.append(seq_2[curr_row-1])
-                curr_row -= 1
-            else:
-                break
-            p = self[curr_row][curr_col].Pointer
-        align_1.reverse()
-        align_2.reverse()
-        self.FirstAlign, self.SecondAlign = align_1, align_2
-
-class SmithWatermanMatrix(ScoreMatrix):
-
-    def fill(self):
-        max_row, max_col, max_score = 0, 0, 0
-        score = self.Score
-        gap = self.GapScore
-        seq_1, seq_2 = self.First, self.Second
-        curr_row = self[0]
-        for row_i in range(1, self.Rows):
-            prev_row = curr_row
-            curr_row = self[row_i]
-            curr_cell = curr_row[0]
-            for col_i in range(1, self.Cols):
-                prev_cell = curr_cell
-                curr_cell = curr_row[col_i]
-                #remember to subtract 1 to find corresponding pos in seq
-                diag_score = score(seq_1[col_i-1], seq_2[row_i-1]) + \
-                    prev_row[col_i-1].Score
-                up_score = gap + prev_row[col_i].Score
-                left_score = gap + prev_cell.Score
-                if max(up_score, left_score, diag_score) <= 0:
-                    continue #leave uninitialized if scores all below threshold
-                curr_cell.update(diag_score, up_score, left_score)
-                if curr_cell.Score > max_score:
-                    max_score = curr_cell.Score
-                    max_row = row_i
-                    max_col = col_i
-        self.MaxScore = (max_score, max_row, max_col)
-                    
-        #print "FINISHED FILL"
-        #print self
-
-
-    def traceback(self):
-        align_1 = []
-        align_2 = []
-        seq_1 = self.First
-        seq_2 = self.Second
-        max_score, curr_row, curr_col = self.MaxScore
-        p = self[curr_row][curr_col].Pointer
-        while 1:
-            #print "ROW: %s, COL: %s" % (curr_row, curr_col)
-            if p == 'diag':
-                align_1.append(seq_1[curr_col-1])
-                align_2.append(seq_2[curr_row-1])
-                curr_row -= 1
-                curr_col -= 1
-            elif p == 'left':
-                align_1.append(seq_1[curr_col-1])
-                align_2.append('-')
-                curr_col -= 1
-            elif p == 'up':
-                align_1.append('-')
-                align_2.append(seq_2[curr_row-1])
-                curr_row -= 1
-            else:
-                break
-            p = self[curr_row][curr_col].Pointer
-        align_1.reverse()
-        align_2.reverse()
-        self.FirstAlign, self.SecondAlign = align_1, align_2
-
-def nw_align(seq1, seq2, scorer=equality_scorer, gap=default_gap, return_score=False):
-    """Returns globally optimal alignment of seq1 and seq2."""
-    N = NeedlemanWunschMatrix(seq1, seq2, scorer, gap)
-    if return_score:
-        return N.alignment(), N.MaxScore[0]
-    else:
-        return N.alignment()
-
-def sw_align(seq1, seq2, scorer=equality_scorer, gap=default_gap, return_score=False):
-    """Returns locally optimal alignment of seq1 and seq2."""
-    S = SmithWatermanMatrix(seq1, seq2, scorer, gap)
-    if return_score:
-        return S.alignment(), S.MaxScore[0]
-    else:
-        return S.alignment()
-
-def demo(seq1, seq2):
-    result = []
-    result.append("Global alignment:")
-    result.extend(nw_align(seq1, seq2))
-    result.append("Local alignment:")
-    result.extend(sw_align(seq1, seq2))
-    return "\n".join(result)
-
-#initialization
-
-if __name__ == '__main__':
-    from sys import argv
-    print demo(argv[1], argv[2])
--- a/cogent/align/weights/__init__.py
+++ b/cogent/align/weights/__init__.py
-#!/usr/bin/env python
-"""
-Implements sequence weighting using several standard algorithms.
-"""
-__all__ = ['util','methods']
-
-__author__ = "Sandra Smit"
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Sandra Smit"
-__email__ = "sandra.smit@colorado.edu"
-__status__ = "Production"
--- a/cogent/align/weights/methods.py
+++ b/cogent/align/weights/methods.py
-#!/usr/bin/env python
-"""Provides an implementation of several sequence weighting algorithm"""
-
-from __future__ import division
-from random import choice
-from numpy.random.mtrand import exponential
-from numpy import array, float64 as Float64, dot as matrixmultiply, transpose,\
-                ones, zeros
-from numpy.linalg import inv as inverse
-from cogent.core.profile import Profile
-from cogent.core.alignment import Alignment, DenseAlignment
-from cogent.parse.tree import DndParser
-from cogent.util.array import hamming_distance
-from cogent.align.weights.util import Weights, number_of_pseudo_seqs,\
-    pseudo_seqs_exact, pseudo_seqs_monte_carlo, row_to_vote, distance_matrix,\
-    eigenvector_for_largest_eigenvalue, DNA_ORDER,RNA_ORDER,PROTEIN_ORDER,\
-    SeqToProfile
-from cogent.core.moltype import BYTES
-
-__author__ = "Sandra Smit"
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__credits__ = ["Sandra Smit", "Rob Knight", "Gavin Huttley"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Sandra Smit"
-__email__ = "sandra.smit@colorado.edu"
-__status__ = "Development"
-
-def VA(alignment, distance_method=hamming_distance):
-    """Returns Weight object with seq weights according to the VA method.
-
-    alignment: Alignment object
-    
-    The VA method (Vingron & Argos 1989) calculates the Hamming distance
-    between all sequences in the alignment. The weight assigned to a sequence
-    is the sum of the distances of all other sequences in the alignment to that
-    sequence, divided by the sum of all pairwise distances.
-
-    Example:
-            ABBA    ABCA    CBCB
-    ABBA    0       1       3   
-    ABCA    1       0       2
-    CBCB    3       2       1
-    ----------------------------
-    total   4       3       5   (=12)
-    normal. 0.333   0.25    0.417
-
-    so: weight(ABBA) = 0.333, weight(ABCA)=0.25, etc.
-    """
-    distances = distance_matrix(alignment, distance_method)
-    sum_dist = sum(distances)
-    #total weights are the normalized sum of distances (sum over each column,
-    # divided by the total distance in the matrix
-    weights = sum_dist/sum(sum_dist)
-
-    #create a dictionary of {seq_id: weight}
-    weight_dict = Weights(dict(zip(alignment.Names,weights)))
-    return weight_dict
-
-
-def VOR(alignment,n=1000,force_monte_carlo=False,mc_threshold=1000):
-    """Returns sequence weights according to the Voronoi weighting method.
-
-    alignment: Alignment object
-    n: sampling size (in case monte carlo is used)
-    force_monte_carlo: generate pseudo seqs with monte carlo always (even
-        if there's only a small number of possible unique pseudo seqs
-    mc_threshold: threshold of when to use the monte carlo sampling method
-        if the number of possible pseudo seqs exceeds this threshold monte
-        carlo is used.
-
-    VOR differs from VA in the set of sequences against which it's comparing
-    all the sequences in the alignment. In addition to the sequences in the 
-    alignment itself, it uses a set of pseudo sequences.
-    
-    Generating discrete random sequences: 
-    A discrete random sequence is generated by choosing with equal
-    likelihood at each position one of the residues observed at that position 
-    in the alighment. An occurrence of once in the alignment column is 
-    sufficient to make the residue type an option. Note: you're choosing 
-    with equal likelihood from each of the observed residues (independent 
-    of their frequency at that position). In earlier versions of the algorithm 
-    the characters were chosen either at the frequency with which they occur 
-    at a position or at the frequency with which they occur in the database. 
-    Both trials were unsuccesful, because they deviate from random sampling 
-    (see Sibbald & Argos 1990).
-
-    Depending on the number of possible pseudo sequences, all of them are 
-    used or a random sample is taken (monte carlo).
-
-    Example:
-    Alignment: AA, AA, BB
-        AA      AA      BB
-    AA  0 (.5)  0 (.5)  2
-    AB  1 (1/3) 1 (1/3) 1 (1/3)
-    BA  1 (1/3) 1 (1/3) 1 (1/3)
-    BB  2       2       0 (1)
-    -----------------------------
-    total 7/6     7/6     10/6
-    norm  .291    .291    .418
-
-    For a bigger example with more pseudo sequences, see Henikoff 1994
-
-    I tried the described optimization (pre-calculate the distance to the
-    closest sequence). I doesn't have an advantage over the original method.
-    """
-    
-    MC_THRESHOLD = mc_threshold
-    
-    #decide on sampling method
-    if force_monte_carlo or number_of_pseudo_seqs(alignment) > MC_THRESHOLD:
-        sampling_method = pseudo_seqs_monte_carlo
-    else:
-        sampling_method = pseudo_seqs_exact
-    #change sequences into arrays
-    aln_array = DenseAlignment(alignment, MolType=BYTES)
-    weights = zeros(len(aln_array.Names),Float64)
-    #calc distances for each pseudo seq
-    rows = [array(seq, 'c') for seq in map(str, aln_array.Seqs)]
-    for seq in sampling_method(aln_array,n=n):
-        seq = array(seq, 'c')
-        temp = [hamming_distance(row, seq) for row in rows]
-        votes = row_to_vote(array(temp)) #change distances to votes
-        weights += votes #add to previous weights
-    weight_dict = Weights(dict(zip(aln_array.Names,weights)))
-    weight_dict.normalize() #normalize
-
-    return weight_dict
-
-
-def mVOR(alignment,n=1000,order=DNA_ORDER):
-    """Returns sequence weights according to the modified Voronoi method.
-    
-    alignment: Alignment object
-    n: sample size (=number of random profiles to be generated)
-    order: specifies the order of the characters found in the alignment,
-        used to build the sequence and random profiles.
-    
-    mVOR is a modification of the VOR method. Instead of generating discrete
-    random sequences, it generates random profiles, to sample more equally from
-    the sequence space and to prevent random sequences to be equidistant to 
-    multiple sequences in the alignment. 
-
-    See the Implementation notes to see how the random profiles are generated
-    and compared to the 'sequence profiles' from the alignment.
-
-    Random generalized sequences (or a profile filled with random numbers):
-    Sequences that are equidistant to multiple sequences in the alignment
-    can form a problem in small datasets. For longer sequences the likelihood
-    of this event is negligable. Generating 'random generalized sequences' is 
-    a solution, because we're then sampling from continuous sequence space. 
-    Each column of a random profile is generated by normalizing a set of 
-    independent, exponentially distributed random numbers. In other words, a 
-    random profile is a two-dimensional array (rows are chars in the alphabet, 
-    columns are positions in the alignment) filled with a random numbers, 
-    sampled from the standard exponential distribution (lambda=1, and thus 
-    the mean=1), where each column is normalized to one. These random profiles 
-    are compared to the special profiles of just one sequence (ones for the 
-    single character observed at that position). The distance between the 
-    two profiles is simply the Euclidean distance.
-
-    """
-    
-    weights = zeros(len(alignment.Names),Float64)
-
-    #get seq profiles
-    seq_profiles = {}
-    for k,v in alignment.items():
-        #seq_profiles[k] = ProfileFromSeq(v,order=order)
-        seq_profiles[k] = SeqToProfile(v,alphabet=order)
-
-    for count in range(n):
-        #generate a random profile
-        exp = exponential(1,[alignment.SeqLen,len(order)])
-        r = Profile(Data=exp,Alphabet=order)
-        r.normalizePositions()
-        #append the distance between the random profile and the sequence
-        #profile to temp
-        temp = [seq_profiles[key].distance(r) for key in alignment.Names]
-        votes = row_to_vote(array(temp))
-        weights += votes
-    weight_dict = Weights(dict(zip(alignment.Names,weights)))
-    weight_dict.normalize()
-    return weight_dict
-
-def pos_char_weights(alignment, order=DNA_ORDER):
-    """Returns the contribution of each character at each position.
-
-    alignment: Alignemnt object
-    order: the order of characters in the profile (all observed chars
-        in the alignment
-    
-    This function is used by the function position_based
-    
-    For example: 
-    GYVGS
-    GFDGF
-    GYDGF
-    GYQGG
-    
-        0       1       2       3       4       5   
-    G   1/1*4                           1/1*4   1/3*1
-    Y           1/2*3
-    F           1/2*1                           1/3*2
-    V                   1/3*1
-    D                   1/3*2
-    Q                   1/3*1
-    S                                           1/3*1
-    """
-    counts = alignment.columnFreqs()
-    a = zeros([len(order), alignment.SeqLen],Float64)
-    for col, c in enumerate(counts):
-        for char in c:
-            a[order.index(char),col] = 1/(len(c)*c[char])
-    return Profile(a,Alphabet=order)
-
-def PB(alignment, order=DNA_ORDER):
-    """Returns sequence weights based on the diversity at each position.
-
-    The position-based (PB) sequence weighting method is described in Henikoff
-    1994. The idea is that sequences are weighted by the diversity observed
-    at each position in the alignment rather than on the diversity measured
-    for whole sequences.
-
-    A simple method to represent the diversity at a position is to award 
-    each different residue an equal share of the weight, and then to divide 
-    up that weight equally among the sequences sharing the same residue. 
-    So if in a position of a MSA, r different residues are represented, 
-    a residue represented in only one sequence contributes a score of 1/r to 
-    that sequence, whereas a residue represented in s sequences contributes 
-    a score of 1/rs to each of the s sequences. For each sequence, the 
-    contributions from each position are summed to give a sequences weight.
-
-    See Henikoff 1994 for a good example.
-    """
-    #calculate the contribution of each character at each position
-    pos_weights = pos_char_weights(alignment, order)
-    d = pos_weights.Data
-    
-    result = Weights()
-
-    for key,seq in alignment.items():
-        weight = 0
-        for idx, char in enumerate(seq):
-            weight += d[order.index(char),idx]
-            result[key] = weight
-    
-    result.normalize()
-    return result
-
-def SS(alignment):
-    """Returns dict of {seq_id: weight} for sequences in the alignment
-
-    alignment: Alignment object
-
-    The SS sequence weighting method is named after Sander and Schneider, 
-    who published their method in 1991. 
-
-    Their method starts with the same distance matrix as in the VA method, 
-    where distances are the pairwise Hamming distances between the sequences 
-    in the alignment. Where the VA method uses the normalized total weights 
-    for each sequence, the SS method continues with calculating a 
-    self-consistent set of weights.
-
-    They do this by finding the eigenvector of the distance matrix belonging
-    to the largest eigenvalue for the matrix. This special eigenvector can 
-    be found by a numerical method.
-    """
-
-    distances = distance_matrix(alignment)
-    v = eigenvector_for_largest_eigenvalue(distances)
-    return Weights(dict(zip(alignment.Names,v)))
-
-def ACL(tree):
-    """Returns a normalized dictionary of sequence weights {seq_id: weight}
-
-    tree: a PhyloNode object
-
-    The ACL method is named after Altschul, Carroll and Lipman, who published a 
-    paper on sequence weighting in 1989.
-
-    The ACL method is based on an idea of Felsenstein (1973). Imagine 
-    electrical current flows from the root of the tree down the edges and 
-    out the leaves. If the edge lengths are proportional to their electrical 
-    resistances, current flowing out each leaf equals the leaf weight.
-
-    The first step in the calculation of the weight vector is calculating a
-    variance-covariance matrix. The variance of a leaf is proportional to the 
-    distance from the root to that leaf. The covariance of two leaves is 
-    proportional to the distance from the root to the last common ancestor 
-    of the two leaves.
-
-    The second step in the calculation results in a vector of weights. Suppose
-    there are n leaves on the tree. Let i be the vector of size n, all of whose 
-    elements are 1.0. The weight vector is calculated as:
-    w = (inverse(M)*i)/(transpose(i)*inverse(M)*i)
-    See Altschul 1989
-    """
-    #clip branch lengths to avoid error due to negative or zero branch lengths
-    _clip_branch_lengths(tree)
-    
-    #get a list of sequence IDs (in the order that the tree will be traversed)
-    seqs = []
-    for n in tree.tips():
-        seqs.append(n.Name)
-
-    #initialize the variance-covariance matrix
-    m = zeros([len(seqs),len(seqs)],Float64)
-
-    #calculate (co)variances
-    #variance of a node is defined as the distance from the root to the leaf
-    #covariance of two nodes is defined as the distance from the root to the
-    #last common ancestor of the two leaves. 
-    for x in tree.tips():
-        for y in tree.tips():
-            idx_x = seqs.index(x.Name)
-            idx_y = seqs.index(y.Name)
-            if idx_x == idx_y:
-                m[idx_x,idx_y] = x.distance(tree)
-            else:
-                lca = x.lastCommonAncestor(y)
-                dist_lca_root = lca.distance(tree)
-                m[idx_x,idx_y] = dist_lca_root
-                m[idx_y,idx_x] = dist_lca_root
-
-    #get the inverse of the variance-covariance matrix
-    inv = inverse(m)
-    #build vector i (vector or ones, length = # of leaves in the tree)
-    i = ones(len(seqs),Float64)
-    
-    numerator = matrixmultiply(inv, i)
-    denominator = matrixmultiply(matrixmultiply(transpose(i),inv),i)
-    weight_vector = numerator/denominator
-
-    #return a Weights object (is dict {seq_id: weight})
-    return Weights(dict(zip(seqs,weight_vector)))
-
-
-def _clip_branch_lengths(tree, min_val=1e-9, max_val=1e9):
-    """Clips branch lengths in tree to a minimum or maximum value
-
-    tree: TreeNode object
-    min: minimum value that a branch length should have
-    max: maximum value that a branch length should have
-    
-    Note: tree is changed in place!!!
-    """
-    for i in tree.traverse():
-        bl = i.Length
-        if bl > max_val:
-            i.Length = max_val
-        elif bl < min_val:
-            i.Length = min_val
-
-def _set_branch_sum(tree):
-    """Sets the branch sum to each node
-
-    tree: TreeNode object
-
-    The branch sum of a node is the total branch length of all the nodes 
-    under that node.
-
-    WARNING: changes the tree in place!!!
-    """
-    total = 0
-    for child in tree:
-        _set_branch_sum(child)
-        total += child.BranchSum
-        total += child.Length
-    tree.BranchSum = total
-
-def _set_node_weight(tree):
-    """Sets the node weight to nodes according to the GSC method.
-
-    tree: TreeNode object
-    
-    See documentation in GSC on how node weights are calculated.
-    
-    WARNING: changes the tree in place!!!
-    """
-    parent = tree.Parent
-    if parent is None: #root of tree always has weight of 1.0
-        tree.NodeWeight = 1.0
-    else:
-        tree.NodeWeight = parent.NodeWeight * \
-            (tree.Length + tree.BranchSum)/parent.BranchSum
-    for child in tree:
-        _set_node_weight(child)
-
-
-def GSC(tree):
-    """Returns dict of {seq_id: weight} for each leaf in the tree
-
-    tree: PhyloNode object
-
-    The GSC method is named after Gerstein, Sonnhammer, and Chothia,
-    who published their method in 1994.
-
-    The idea is that the sequences are weighted by their total branch length
-    in the tree. In the original algorithm weights are calculated from leaves
-    to root, we calculate them from the root to the leaves. 
-
-    First, the branch lengths have to be clipped to a minimum and maximum 
-    value to avoid errors due to negative or zero branch lengts, and to 
-    make sure the method behaves for two identical sequences. Next the 
-    branch sum for each node is pre-calulated (the branch sum is the 
-    total branch length in the tree below that node). From the (clipped) 
-    branch lengths and branch sums, we can now calulate how the weight has 
-    to be divided.
-
-    Example:
-    tree = (((A:20,B:10)x:30,C:40)r:0);
-
-    weight of the root (r) is always 1
-    to the left node (x) goes ((30+30)/100)*1 = 0.6
-    to the right node (C) goed (40/100)*1 = 0.4
-    to node A goes (20/30)*0.6 = 0.4
-    to node B goes (10/30)*0.6 = 0.2
-    
-    """
-    _clip_branch_lengths(tree)
-    _set_branch_sum(tree)
-    _set_node_weight(tree)
-    weights = Weights()
-    for n in tree.tips():
-        weights[n.Name] = n.NodeWeight
-    return weights
-
-#======================================================  
-if __name__ == "__main__":
-
-    from old_cogent.base.align import Alignment
-    
-    #This main block shows the results for four different alignments
-    a = Alignment({'seq1':'GYVGS','seq2':'GFDGF','seq3':'GYDGF',\
-        'seq4':'GYQGG'},Names=['seq1','seq2','seq3','seq4'])
-
-    b = Alignment({'seq1':'AA', 'seq2':'AA', 'seq3':'BB'},\
-        Names=['seq1','seq2','seq3'])
-
-    c = Alignment({'seq1':'AA', 'seq2':'AA', 'seq3':'BB', 'seq4':'BB',\
-        'seq5':'CC'},Names=['seq1','seq2','seq3','seq4','seq5'])
-
-    d = Alignment({'seq1':'AGCTA', 'seq2':'AGGTA', 'seq3':'ACCTG',
-        'seq4':'TGCAA'},Names=['seq1','seq2','seq3','seq4'])
-
-    print "This is a quick test run of the alignment-based methods"
-    
-    for al in [a,b,c,d]:
-        #determine the needed character order for each alignment
-        if al is b or al is c: 
-            char_order = "ABC"           
-        elif al is a: #need protein_order   
-            char_order=PROTEIN_ORDER
-        else: #need DNA_order
-            char_order=DNA_ORDER
-
-        print "===========ALIGNMENT=============="
-        for k in al.Names:
-            print '\t'.join([k,al[k]])
-        print 'RESULT OF THE VA METHOD:'
-        print VA(al)
-        print 'RESULT OF THE VOR METHOD:'
-        print VOR(al)
-        print 'RESULT OF THE mVOR METHOD:'
-        print mVOR(al, n=10000, order=char_order)
-        print 'RESULTS OF THE SS METHODS:'
-        print SS(al)
-        print 'RESULTS OF THE PB METHODS:'
-        print PB(al, order=char_order)
-
-
--- a/cogent/align/weights/util.py
+++ b/cogent/align/weights/util.py
-#!/usr/bin/env python
-
-"""Provides utility methods for sequence weighting
-"""
-from __future__ import division
-
-from numpy import array, zeros, dot as matrixmultiply, ones, identity, take,\
-    asarray, uint8 as UInt8, add, sqrt
-from random import choice
-from cogent.util.array import hamming_distance
-from cogent.core.profile import Profile, CharMeaningProfile
-from cogent.core.moltype import DNA, RNA, PROTEIN
-from cogent.core.alignment import Alignment
-
-__author__ = "Sandra Smit"
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__credits__ = ["Sandra Smit", "Rob Knight", "Gavin Huttley", "Daniel McDonald"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Sandra Smit"
-__email__ = "sandra.smit@colorado.edu"
-__status__ = "Development"
-
-PROTEIN_ORDER = ''.join(PROTEIN.Alphabet) + '-'
-DNA_ORDER = ''.join(DNA.Alphabet) + '-'
-RNA_ORDER = ''.join(RNA.Alphabet) + '-'
-
-
-class Weights(dict):
-    """Dictionary seq_ids: seq weight
-    """
-
-    def normalize(self):
-        """Normalizes to one. Works in place!"""
-        total = sum(self.values(), 0)
-        for k, v in self.items():
-            self[k] = v/total
-
-
-def number_of_pseudo_seqs(alignment):
-    """Returns the # of unique randomized sequences that can be generated 
-    from the alignment.
-
-    alignment: Alignment object
-
-    A random sequence is created by randomly choosing at each position one 
-    of the residues observed at that position (column) in the alighment. 
-    A single occurrence of that residue type is sufficient to make it an 
-    option and the choice is with equal likelihood from any of the observed
-    characters. (See Implementation Notes for more details).
-    """
-    return reduce(lambda x, y: x*y, map(len,alignment.columnProbs()))
-
-def pseudo_seqs_exact(alignment,n=None):
-    """Returns all possible pseudo sequences (generated from the alignment)
-
-    alignment: Alignment object
-    n: has NO FUNCTION, except to make the API for all functions that generate
-    pseudo sequences the same.
-    
-    This function is used by the VOR method (only when the number of unique
-    pseudo sequences is lower than 1000). 
-
-    The original sequences will be generated in this list of pseudo sequences.
-    Duplications of sequences in the alignment don't cause duplications
-    in the list of pseudo sequences.
-    
-    Example:
-    AA, AA, BB in the alignment
-    generates pseudo sequences: AA,BB,AB,BA
-
-    ABC, BCC, BAC in the alignment
-    generates pseudo sequences: AAC, ABC, ACC, BAC, BBC, and BCC
-    """
-    counts = alignment.columnFreqs()
-    results = [[]]
-    for col in counts:
-        new_results = []
-        for item in results:
-            for option in col:
-                new_results.append(item + [option])
-        results = new_results
-    return [''.join(i) for i in results]
-
-def pseudo_seqs_monte_carlo(alignment,n=1000):
-    """Yields sample of possible pseudo sequences (generated from alignment)
-
-    alignment: Alignment object
-    n = number of pseudo sequences generated (=sample size)
-
-    This function is used by the VOR method, usually when the number of 
-    possible pseudo sequences exceeds 1000.
-
-    To see how the pseudo sequences are generated read the Implementation
-    Notes at the top of this file.
-    """
-    freqs = alignment.columnFreqs()
-    for x in range(n):
-        seq = []
-        for i in freqs:
-            seq.append(choice(i.keys()))
-        yield ''.join(seq)
-
-def row_to_vote(row):
-    """Changes distances to votes.
-
-    There's one vote in total. The sequence with the lowest distance
-    gets the vote. If there are multiple sequences equidistant at the 
-    minimum distance, the vote is split over all the sequences at the 
-    minimum distance.
-    
-    Example: 
-    [4,2,3,7] -> [0,1,0,0]
-    [1,3,2,1] -> [.5,0,0,.5]
-    [1,1,1,1] -> [.25,.25,.25,.25]
-    """
-    result = array(row) - min(row) == 0
-    return result/sum(result, 0)
-
-def distance_matrix(alignment, distance_method=hamming_distance):
-    """Returns distance matrix for seqs in the alignment.
-
-    Order is either the Names in the alignment or the order in which
-        is iterated over the rows.
-
-    Distance is the Hamming distance between two sequences
-    """
-    #change sequences into arrays
-    alignment = [array(str(alignment.NamedSeqs[k]), 'c') for k in\
-        alignment.Names]
-
-    nr_of_seqs = len(alignment)
-    distances = zeros([nr_of_seqs,nr_of_seqs])
-    for i, seq_a in enumerate(alignment):
-        for j, seq_b in enumerate(alignment):
-            if i < j:
-                dist = distance_method(seq_a,seq_b)
-                distances[i,j] = dist
-                distances[j,i] = dist
-            else: # i==j (diagonal) or i>j (lower left half)
-                continue
-    return distances
-
-def eigenvector_for_largest_eigenvalue(matrix):
-    """Returns eigenvector corresponding to largest eigenvalue of matrix.
-    
-    Implements a numerical method for finding an eigenvector by repeated 
-    application of the matrix to a starting vector. For a matrix A the 
-    process w(k) <-- A*w(k-1) converges to eigenvector w with the largest 
-    eigenvalue. Because distance matrix D has all entries >= 0, a theorem 
-    due to Perron and Frobenius on nonnegative matrices guarantees good 
-    behavior of this method, excepting degenerate cases where the 
-    iteration oscillates. For distance matrices a remedy is to add the 
-    identity matrix to A, permitting the iteration to converge to the 
-    eigenvector. (From Sander and Schneider (1991), and Vingron and 
-    Sibbald (1993)) 
-    
-    Note: Only works on square matrices.
-    """
-    #always add the identity matrix to avoid oscillating behavior
-    matrix = matrix + identity(len(matrix))
-
-    #v is a random vector (chosen as the normalized vector of ones)
-    v = ones(len(matrix))/len(matrix)
-
-    #iterate until convergence
-    for i in range(1000):
-        new_v = matrixmultiply(matrix,v)
-        new_v = new_v/sum(new_v, 0) #normalize
-        if sum(map(abs,new_v-v), 0) > 1e-9:
-            v = new_v #not converged yet
-            continue
-        else: #converged
-            break
-    
-    return new_v
-
-
-def distance_to_closest(alignment, distance_method=hamming_distance):
-    """Returns vector of distances to closest neighbor for each s in alignment
-    
-    alignment: Alignment object
-    distance_method: function used to calculate the distance between two seqs
-    
-    Function returns the closest distances according to the Names in the
-    alignment
-
-    example:
-    Alignment({1:'ABCD',2:'ABCC',3:'CBDD',4:'ACAA'},Names=[3,2,1,4])
-    [2,1,1,3]
-    """
-    names = alignment.Names
-    seqs = [array(str(alignment.NamedSeqs[n]), 'c') for n in names]
-    
-    closest = []
-    for i, key in enumerate(names):
-        seq = seqs[i]
-        dist = None
-        for j, other_key in enumerate(names):
-            if key == other_key:
-                continue
-            d = distance_method(seq,seqs[j])
-            if dist:
-                if d < dist:
-                    dist = d
-            else:
-                dist = d
-        closest.append(dist)
-    return array(closest)
-
-def SeqToProfile(seq, alphabet=None, char_order=None,\
-    split_degenerates=False):
-    """Generates a Profile object from a Sequence object.
-
-    seq: Sequence object
-    alphabet (optional): Alphabet object (if you want to split
-        degenerate symbols, the alphabet object should have a 
-        Degenerates property. Default is the Alphabet associated with 
-        the Sequence object.
-    char_order (optional): The order the characters occur in the Profile.
-        Default is the list(alphabet)
-    split_degenerates (optional): Whether you want the counts for the 
-        degenerate symbols to be divided over the non-degenerate symbols they
-        code for.
-    
-    A Profile is a position x character matrix describing which characters
-    occur at each position. In a sequence (as opposed to an alignment) only
-    one character occurs at each position. In general a sequence profile
-    will only contain ones and zeros. However, you have the possibility of 
-    splitting degenerate characters. For example, if a position is R, it 
-    means that you have 50/50% chance of A and G. It's also possible to 
-    ignore characters, which in a sequence profile will lead to positions
-    (rows) containing only zeros.
-    
-    Example:
-    Sequence = ACGU
-    Profile(seq, CharOrder=UCAG):
-    U   C   A   G
-    0   0   1   0   first pos
-    0   1   0   0   second pos
-    0   0   0   1   third pos
-    1   0   0   0   fourth pos
-
-    Sequence= GURY
-    Profile(seq,CharOrder=UCAG, split_degenerates=True)
-    U   C   A   G
-    0   0   0   1   first pos
-    1   0   0   0   second pos
-    0   0   .5  .5  third pos
-    .5  .5  0   0   fourth pos
-
-    Characters can also be ignored
-    Sequence = ACN-
-    Profile(seq, CharOrder=UCAGN, split_degenerates=True)
-    U   C   A   G
-    0   0   1   0   first pos
-    0   1   0   0   second pos
-    .25 .25 .25 .25 third pos
-    0   0   0   0   fourth pos <--contains only zeros
-    """
-
-    if alphabet is None:
-        alphabet = seq.MolType
-    if char_order is None:
-        char_order = list(alphabet)
-
-    #Determine the meaning of each character based on the alphabet, the
-    #character order, and the option to split degenerates
-    char_meaning = CharMeaningProfile(alphabet, char_order,\
-        split_degenerates)
-    #construct profile data
-    idxs = array(str(seq).upper(), 'c').view(UInt8)
-    result_data = char_meaning.Data[idxs]
-    #result_data = take(char_meaning.Data, asarray(str(seq).upper(), UInt8), axis=0)
-    
-    return Profile(result_data, alphabet, char_order)
-
-
-def AlnToProfile(aln, alphabet=None, char_order=None, split_degenerates=False,\
-    weights=None):
-    """Generates a Profile object from an Alignment.
-
-    aln: Alignment object
-    alphabet (optional): an Alphabet object (or list of chars, but if you 
-        want to split degenerate symbols, the alphabet must have a 
-        Degenerates property. Default is the alphabet of the first seq in 
-        the alignment.
-    char_order (optional): order of the characters in the profile. Default
-        is list(alphabet)
-    split_degenerates (optional): Whether you want the counts for the 
-        degenerate symbols to be divided over the non-degenerate symbols they
-        code for.
-    weights (optional): dictionary of seq_id: weight. If not entered all seqs
-        are weighted equally
-
-    A Profile is a position x character matrix describing which characters
-    occur at each position of an alignment. The Profile is always normalized,
-    so it gives the probabilities of each character at each position.
-    
-    Ignoring chars: you can ignore characters in the alignment by not putting
-    the char in the CharOrder. If you ignore all characters at a particular
-    position, an error will be raised, because the profile can't be normalized.
-
-    Splitting degenerates: you can split degenerate characters over the 
-    non-degenerate characters they code for. For example: R = A or G. So,
-    an R at a position counts for 0.5 A and 0.5 G.
-   
-    Example:
-    seq1    TCAG    weight: 0.5
-    seq2    TAR-    weight: 0.25
-    seq3    YAG-    weight: 0.25
-    Profile(aln,alphabet=DNA,char_order="TACG",weights=w,
-    split_degenerates=True)
-    Profile:
-       T      A      C      G
-    [[ 0.875  0.     0.125  0.   ]
-     [ 0.     0.5    0.5    0.   ]
-     [ 0.     0.625  0.     0.375]
-     [ 0.     0.     0.     1.   ]]
-    """
-
-    if alphabet is None:
-        alphabet = aln.values()[0].MolType
-    if char_order is None:
-        char_order = list(alphabet)
-    if weights is None:
-        weights = dict.fromkeys(aln.keys(),1/len(aln))
-    
-    char_meaning = CharMeaningProfile(alphabet, char_order,\
-        split_degenerates)
-
-    profiles = []
-    for k,v in aln.items():
-        idxs = array(str(v).upper(), 'c').view(UInt8)
-        profiles.append(char_meaning.Data[idxs] * weights[k])
-    s = reduce(add,profiles)
-    
-    result = Profile(s,alphabet, char_order)
-    try:
-        result.normalizePositions()
-    except Exception, e:
-        raise ValueError,e
-        #"Probably one of the rows in your profile adds up to zero,\n "+\
-        #"because you are ignoring all of the characters in the "+\
-        #"corresponding\n column in the alignment"
-    return result
-
--- a/cogent/app/__init__.py
+++ b/cogent/app/__init__.py
-#!/usr/bin/env python
-"""apps: provides support libraries for controlling applications (local or web).
-"""
-
-__all__ = ['blast',
-           'carnac',
-           'cd_hit',
-           'clustalw',
-           'cmfinder',
-           'comrna',
-           'consan',
-           'contrafold',
-           'cove',
-           'dialign',
-           'dynalign',
-           'fasttree',
-           'fasttree_v1'
-           'foldalign',
-           'gctmpca',
-           'ilm',
-           'knetfold',
-           'mfold',
-           'muscle',
-           'msms',
-           'nupack',
-           'parameters',
-           'pfold',
-           'pknotsrg',
-           'raxml',
-           'rdp_classifier',
-           'rnaalifold',
-           'rnaforester',
-           'rnashapes',
-           'rnaview',
-           'sfold',
-           'unafold',
-           'util',
-           'vienna_package']
-
-__author__ = ""
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__credits__ = ["Jeremy Widmann", "Catherine Lozuopone", "Gavin Huttley",
-               "Rob Knight", "Zongzhi Liu", "Sandra Smit", "Micah Hamady",
-               "Greg Caporaso", "Mike Robeson", "Daniel McDonald", 
-               "Marcin Cieslik"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Rob Knight"
-__email__ = "rob@spot.colorado.edu"
-__status__ = "Production"
-
-"""Need to add:
-    tcoffee     alignment
-    meme        motif finding
-    phylip      phylogeny
-    mrbayes     phylogeny
-    paml        phylogeny
-    
-    other packages? web tools?
-
-"""
--- a/cogent/app/blast.py
+++ b/cogent/app/blast.py
--- a/cogent/app/blat.py
+++ b/cogent/app/blat.py
--- a/cogent/app/bwa.py
+++ b/cogent/app/bwa.py
--- a/cogent/app/carnac.py
+++ b/cogent/app/carnac.py
-#!/usr/bin/env python
-
-from cogent.app.util import CommandLineApplication,\
-    CommandLineAppResult, ResultPath
-from cogent.app.parameters import Parameter,FlagParameter,Parameters
-
-__author__ = "Shandy Wikman"
-__copyright__ = "Copyright 2007-2016, The Cogent Project"
-__contributors__ = ["Shandy Wikman"]
-__license__ = "GPL"
-__version__ = "1.9"
-__maintainer__ = "Shandy Wikman"
-__email__ = "ens01svn@cs.umu.se"
-__status__ = "Development"
-
-class Carnac(CommandLineApplication):
-    """Application controller for Carnac RNAfolding application
-
-    Info at:
-    http://bioinfo.lifl.fr/carnac/index.html
-
-     Options:
-     -a   Inhibit the energy correction that is automatically
-          performed to create the initial set of potential
-          stems. By default, the energy correction depends of the
-          GC percentage of each sequence.
-
-     -c   Eliminate sequences that are too similar. The similarity
-          treshold is 98%.
-
-     -h   Add hairpins that are present only in one sequence to
-          the initial set of potential stems (may be time and space
-          demanding).
-     """
-    #Limitation
-    #if -c is turned on and file is deleted error in file handling in _get_result_paths
-    _parameters = {
-        '-c':FlagParameter(Prefix='-',Name='c',Value=False), 
-        '-a':FlagParameter(Prefix='-',Name='a'),
-        '-h':FlagParameter(Prefix='-',Name='h')}
-    _command = 'carnac'
-    _input_handler='_input_as_string'
-    
-
-    def _get_result_paths(self,data):
-        """Specifies the paths of output files generated by the application
-        
-        data: the data the instance of the application is called on
-        
-        Carnac produces it's output to a .ct, .eq, to the location of input file
-        and .out files located in the same folder as the program is run from. 
-        graph and align file is also created.
-        You always get back: StdOut,StdErr, and ExitStatus
-
-        """
-        result={}
-        name_counter = 0
-        ones='00'
-        tens='0'
-        count=''
-        if not isinstance(data,list):
-            #means data is file
-            path=str(data)
-            data=open(data).readlines()
-        else: #data input as lines
-            #path=''.join([self.WorkingDir,self._input_filename.split('/')[-1]])
-            path = ''.join(['/tmp/', self._input_filename.split('/')[-1]])
-        for item in data:
-            if item.startswith('>'):
-                name_counter += 1
-                if name_counter < 10:
-                    count=ones             
-                if name_counter > 9:
-                    count=tens
-                if name_counter > 99:
-                    count=''
-                name = item.strip('>\n')
-            else:
-                nr=name_counter
-                
-                result['ct%d' % nr] =\
-                    ResultPath(Path=('%s%s%d.ct' % (path,count,nr)))
-                result['eq%d' % nr] =\
-                    ResultPath(Path=('%s%s%d.eq' % (path,count,nr)))
-                result['out_seq%d' % nr] = \
-                    ResultPath(Path=(''.join([self.WorkingDir,'Z_%s%d.%s.out'% \
-                        (count,nr,name)])))
-
-            result['graph'] =\
-                ResultPath(Path=(self.WorkingDir+'graph.out'))
-            result['align'] =\
-                ResultPath(Path=(self.WorkingDir+'alignment.out'))
-
-        return result