Skip to content
Commits on Source (13)
# Modified from https://github.com/biocore/scikit-bio/
language: python
env:
- PYTHON_VERSION=2.7 USE_H5PY=True NOSE_ARGS="--with-doctest --with-coverage"
- PYTHON_VERSION=2.7 USE_CYTHON=True NOSE_ARGS="--with-doctest --with-coverage"
- PYTHON_VERSION=3.4 USE_H5PY=True
- PYTHON_VERSION=3.4 USE_CYTHON=True
- PYTHON_VERSION=3.5 USE_H5PY=True
- PYTHON_VERSION=3.5 USE_CYTHON=True
- PYTHON_VERSION=3.6 USE_H5PY=True
- PYTHON_VERSION=3.6 USE_CYTHON=True
- PYTHON_VERSION=2.7 WITH_DOCTEST=False USE_CYTHON=True
- PYTHON_VERSION=3.5 WITH_DOCTEST=True USE_CYTHON=True
- PYTHON_VERSION=3.6 WITH_DOCTEST=True USE_CYTHON=True
- PYTHON_VERSION=3.7 WITH_DOCTEST=True USE_CYTHON=True
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
- ./miniconda.sh -b
- export PATH=/home/travis/miniconda3/bin:$PATH
install:
- conda create --yes -n env_name python=$PYTHON_VERSION pip click numpy scipy nose pep8 flake8 coverage future six pandas
- if [ ${USE_CYTHON} ]; then conda install --yes -n env_name cython; fi
- if [ ${USE_H5PY} ]; then conda install --yes -n env_name h5py>=2.2.0; fi
- if [ ${PYTHON_VERSION} = "2.7" ]; then conda install --yes -n env_name Sphinx=1.2.2; fi
- conda create --yes -n env_name python=$PYTHON_VERSION pip click numpy scipy pep8 flake8 coverage future six "pandas>=0.20.0" nose h5py>=2.2.0 cython
- rm biom/*.c
- source activate env_name
- if [ ${PYTHON_VERSION} = "2.7" ]; then pip install pyqi; fi
- if [ ${PYTHON_VERSION} = "2.7" ]; then conda install --yes Sphinx=1.2.2; fi
- pip install coveralls
- pip install -e . --no-deps
script:
- nosetests ${NOSE_ARGS}
- flake8 biom setup.py
- make test
- biom show-install-info
- if [ ${PYTHON_VERSION} = "2.7" ]; then make -C doc html; fi
# we can only validate the tables if we have H5PY
- if [ ${USE_H5PY} ]; then for table in examples/*hdf5.biom; do echo ${table}; biom validate-table -i ${table}; done; fi
- for table in examples/*hdf5.biom; do echo ${table}; biom validate-table -i ${table}; done
# validate JSON formatted tables
- for table in examples/*table.biom; do echo ${table}; biom validate-table -i ${table}; done;
- pushd biom/assets
- if [ ${USE_H5PY} ]; then python exercise_api.py ../../examples/rich_sparse_otu_table_hdf5.biom sample; fi
- if [ ${USE_H5PY} ]; then python exercise_api.py ../../examples/rich_sparse_otu_table_hdf5.biom observation; fi
- if [ ${USE_H5PY} ]; then sh exercise_cli.sh; fi
- popd
- python biom/assets/exercise_api.py examples/rich_sparse_otu_table_hdf5.biom sample
- python biom/assets/exercise_api.py examples/rich_sparse_otu_table_hdf5.biom observation
- sh biom/assets/exercise_cli.sh
after_success:
- coveralls
......@@ -5,7 +5,7 @@
The BIOM Format project is licensed under the terms of the Modified BSD License
(also known as New or Revised BSD), as follows:
Copyright (c) 2011-2013, The BIOM Format Development Team <gregcaporaso@gmail.com>
Copyright (c) 2011-2017, The BIOM Format Development Team <gregcaporaso@gmail.com>
All rights reserved.
......@@ -35,7 +35,7 @@ The following banner should be used in any source code file to indicate the
copyright and license terms:
#-----------------------------------------------------------------------------
# Copyright (c) 2011-2013, The BIOM Format Development Team.
# Copyright (c) 2011-2017, The BIOM Format Development Team.
#
# Distributed under the terms of the Modified BSD License.
#
......
BIOM-Format ChangeLog
=====================
biom 2.1.7
----------
New features and bug fixes, released on 28 September 2018.
Important:
* Python 3.4 support has been dropped. We now only support Python 2.7, 3.5, 3.6 and 3.7.
* We will be dropping Python 2.7 support on the next release.
* Pandas >= 0.20.0 is now the minimum required version.
* pytest is now used instead of nose.
New Features:
* Massive performance boost to `Table.collapse` with the default collapse function. The difference was 10s of milliseconds vs. minutes stemming from prior use of `operator.add`. See [issue #761](https://github.com/biocore/biom-format/issues/761).
* `Table.align_to` for aligning one table to another. This is useful in multi-omic analyses where multiple preparations have been performed on the sample physical samples. This is essentially a helper method around `Table.sort_order`. See [issue #747](https://github.com/biocore/biom-format/issues/747).
* Added additional sanity checks when calling `Table.to_hdf5`, see [PR #769](https://github.com/biocore/biom-format/pull/769).
* `Table.subsample()` can optionally perform subsampling with replacement. See [issue #774](https://github.com/biocore/biom-format/issues/774).
* `Table.to_dataframe()` now supports a `dense` argument to return `pd.DataFrame`. See [issue #762](https://github.com/biocore/biom-format/issues/762).
* Parsing methods for BIOM-Format 1.0.0 tables now preserve dict ordering. See [issue #781](https://github.com/biocore/biom-format/issues/781).
Bug fixes:
* `Table.subsample(by_id=True, axis='observation')` did not subsample over the 'observations'. Because of the nature of the bug, an empty table was returned, so the scope of the issue is such that it should not have produced misleading results but instead triggered empty table errors, with the exception of the pathological case of the ID namespaces between features and samples not being disjoint. See [PR #759](https://github.com/biocore/biom-format/pull/759) for more information.
* Tables of shape `(0, n)` or `(n, 0)` were raising exceptions when being written out. See [issue #619](https://github.com/biocore/biom-format/issues/619).
* Tables loaded with a `list` of empty `dict`s will have their metadata attributes set to None. See [issue #594](https://github.com/biocore/biom-format/issues/594).
biom 2.1.6
----------
......
......@@ -6,9 +6,13 @@ graft biom
graft support_files
graft examples
graft doc
graft licenses
graft tests/test_data
prune docs/_build
global-exclude *.pyc
global-exclude *.pyo
global-exclude .git
global-exclude *.so
global-exclude .*.swp
# ----------------------------------------------------------------------------
# Copyright (c) 2013--, biom-format development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------
ifeq ($(WITH_DOCTEST), TRUE)
TEST_COMMAND = python setup.py test -a --doctest-modules --doctest-glob='*.pyx'
else
TEST_COMMAND = python setup.py test
endif
test:
$(TEST_COMMAND)
flake8 biom setup.py
......@@ -21,7 +21,7 @@ Examples
Load an example table:
>>> from biom import example_table
>>> print example_table # doctest: +NORMALIZE_WHITESPACE
>>> print(example_table) # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1 S2 S3
O1 0.0 1.0 2.0
......
......@@ -12,7 +12,7 @@ import numpy as np
cimport numpy as cnp
def _subsample(arr, n):
def _subsample(arr, n, with_replacement):
"""Subsample non-zero values of a sparse array
Parameters
......@@ -47,17 +47,21 @@ def _subsample(arr, n):
start, end = indptr[i], indptr[i+1]
length = end - start
counts_sum = data[start:end].sum()
if counts_sum < n:
data[start:end] = 0
continue
r = np.arange(length, dtype=np.int32)
unpacked = np.repeat(r, data_i[start:end])
permuted = np.random.permutation(unpacked)[:n]
result = np.zeros(length, dtype=np.float64)
for idx in range(permuted.shape[0]):
result[permuted[idx]] += 1
if with_replacement:
pvals = data[start:end] / counts_sum
data[start:end] = np.random.multinomial(n, pvals)
else:
if counts_sum < n:
data[start:end] = 0
continue
data[start:end] = result
r = np.arange(length, dtype=np.int32)
unpacked = np.repeat(r, data_i[start:end])
permuted = np.random.permutation(unpacked)[:n]
result = np.zeros(length, dtype=np.float64)
for idx in range(permuted.shape[0]):
result[permuted[idx]] += 1
data[start:end] = result
\ No newline at end of file
......@@ -12,6 +12,9 @@ import numpy as np
import tempfile
import h5py
if len(sys.argv) < 3:
raise SystemExit
if '://' in sys.argv[1]:
from urllib import request
fp, _ = request.urlretrieve(sys.argv[1])
......
#!/bin/bash
set -xe
table=../../examples/min_sparse_otu_table_hdf5.biom
obsmd=../../examples/obs_md.txt
table=examples/min_sparse_otu_table_hdf5.biom
obsmd=examples/obs_md.txt
if [[ ! -f ${table} ]];
then
echo "This script expects to operate in the biom/assets directory"
echo "This script expects to operate in the base repository directory"
exit 1
fi
......
......@@ -343,8 +343,8 @@ class TableValidator(object):
return key
def _is_int(self, x):
"""Return True if x is an int"""
return isinstance(x, (int, np.int64))
"""Return True if x is an int or numpy int"""
return np.issubdtype(type(x), np.integer)
def _valid_nnz(self, table):
"""Check if nnz seems correct"""
......@@ -401,7 +401,7 @@ class TableValidator(object):
datetime.strptime(val, fmt)
valid_time = True
break
except:
except: # noqa
pass
if valid_time:
......@@ -437,7 +437,7 @@ class TableValidator(object):
for idx, coord in enumerate(table_json['data']):
try:
x, y, val = coord
except:
except: # noqa
return "Bad matrix entry idx %d: %s" % (idx, repr(coord))
if not self._is_int(x) or not self._is_int(y):
......
......@@ -17,7 +17,8 @@ from biom.exception import BiomParseException, UnknownAxisError
from biom.table import Table
from biom.util import biom_open, __version__
import json
import collections
from collections import defaultdict, OrderedDict
__author__ = "Justin Kuczynski"
__copyright__ = "Copyright 2011-2017, The BIOM Format Development Team"
......@@ -277,7 +278,7 @@ def parse_uc(fh):
identifier in the resulting ``Table``.
"""
data = collections.defaultdict(int)
data = defaultdict(int)
sample_idxs = {}
sample_ids = []
observation_idxs = {}
......@@ -316,7 +317,7 @@ def parse_uc(fh):
if line_type == 'H' or line_type == 'S':
# get the sample id
try:
underscore_index = query_id.index('_')
underscore_index = query_id.rindex('_')
except ValueError:
raise ValueError(
"A query sequence was encountered that does not have an "
......@@ -391,7 +392,9 @@ def parse_biom_table(fp, ids=None, axis='sample', input_is_dense=False):
try:
return Table.from_hdf5(fp, ids=ids, axis=axis)
except:
except ValueError:
pass
except RuntimeError:
pass
if hasattr(fp, 'read'):
old_pos = fp.tell()
......@@ -402,18 +405,21 @@ def parse_biom_table(fp, ids=None, axis='sample', input_is_dense=False):
c = fp.read(1)
if c == '{':
fp.seek(old_pos)
t = Table.from_json(json.load(fp), input_is_dense=input_is_dense)
t = Table.from_json(json.load(fp, object_pairs_hook=OrderedDict),
input_is_dense=input_is_dense)
else:
fp.seek(old_pos)
t = Table.from_tsv(fp, None, None, lambda x: x)
elif isinstance(fp, list):
try:
t = Table.from_json(json.loads(''.join(fp)),
t = Table.from_json(json.loads(''.join(fp),
object_pairs_hook=OrderedDict),
input_is_dense=input_is_dense)
except ValueError:
t = Table.from_tsv(fp, None, None, lambda x: x)
else:
t = Table.from_json(json.loads(fp), input_is_dense=input_is_dense)
t = Table.from_json(json.loads(fp, object_pairs_hook=OrderedDict),
input_is_dense=input_is_dense)
def subset_ids(data, id_, md):
return id_ in ids
......
This diff is collapsed.