Skip to content
Commits on Source (2)
[*.py]
charset=utf-8
end_of_line=lf
insert_final_newline=true
indent_style=space
indent_size=4
sudo: false
language: python
dist: xenial
cache:
directories:
- $HOME/.cache/pip
python:
- "2.7"
- "3.3"
- "3.4"
- "3.5"
- "3.6"
- "3.7"
install:
- pip install .
script:
- nosetests -P tests
- python setup.py --version # Detect encoding problems
- python -m pytest
env:
global:
- TWINE_USERNAME=marcelm
jobs:
include:
- stage: deploy
services:
- docker
python: "3.6"
install: python3 -m pip install twine 'requests-toolbelt!=0.9.0'
if: tag IS present
script:
- |
python3 setup.py sdist
ls -l dist/
python3 -m twine upload dist/*
Copyright (c) 2010-2016 Marcel Martin <mail@marcelm.net>
Copyright (c) 2010-2019 Marcel Martin <mail@marcelm.net>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
......
Metadata-Version: 2.1
Name: xopen
Version: 0.5.0
Summary: Open compressed files transparently
Home-page: https://github.com/marcelm/xopen/
Author: Marcel Martin
Author-email: mail@marcelm.net
License: MIT
Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
:target: https://travis-ci.org/marcelm/xopen
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
:target: https://pypi.python.org/pypi/xopen
=====
xopen
=====
This small Python module provides an ``xopen`` function that works like the
built-in ``open`` function, but can also deal with compressed files.
Supported compression formats are gzip, bzip2 and xz. They are automatically
recognized by their file extensions `.gz`, `.bz2` or `.xz`.
The focus is on being as efficient as possible on all supported Python versions.
For example, simply using ``gzip.open`` is very slow in older Pythons, and
it is a lot faster to use a ``gzip`` subprocess. For writing to gzip files,
``xopen`` uses ``pigz`` when available.
This module has originally been developed as part of the `cutadapt
tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.7.
Usage
-----
Open a file for reading::
from xopen import xopen
with xopen('file.txt.xz') as f:
content = f.read()
Or without context manager::
from xopen import xopen
f = xopen('file.txt.xz')
content = f.read()
f.close()
Open a file for writing::
from xopen import xopen
with xopen('file.txt.gz', mode='w') as f:
f.write('Hello')
Credits
-------
The name ``xopen`` was taken from the C function of the same name in the
`utils.h file which is part of BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for appending to files.
Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
v0.5.0
~~~~~~
* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
problems some users had with too many threads when opening many files at the same time.
* xopen now accepts pathlib.Path objects.
Author
------
Marcel Martin <mail@marcelm.net> (`@marcelm_ on Twitter <https://twitter.com/marcelm_>`_)
Links
-----
* `Source code <https://github.com/marcelm/xopen/>`_
* `Report an issue <https://github.com/marcelm/xopen/issues>`_
* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
Provides-Extra: dev
......@@ -23,7 +23,7 @@ tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
``xopen`` is compatible with Python 2.7, 3.3, 3.4, 3.5 and 3.6.
``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.7.
Usage
......@@ -64,6 +64,16 @@ Some ideas were taken from the `canopener project <https://github.com/selassid/c
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
v0.5.0
~~~~~~
* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
problems some users had with too many threads when opening many files at the same time.
* xopen now accepts pathlib.Path objects.
Author
------
......
[build-system]
requires = ["setuptools", "wheel", "setuptools_scm"]
[bdist_wheel]
universal = 1
[egg_info]
tag_build =
tag_date = 0
......@@ -8,14 +8,10 @@ if sys.version_info < (2, 7):
with open('README.rst') as f:
long_description = f.read()
if sys.version_info < (3, ):
requires = ['bz2file']
else:
requires = []
setup(
name='xopen',
version='0.3.3',
use_scm_version=True,
setup_requires=['setuptools_scm'], # Support pip versions that don't know about pyproject.toml
author='Marcel Martin',
author_email='mail@marcelm.net',
url='https://github.com/marcelm/xopen/',
......@@ -23,15 +19,21 @@ setup(
long_description=long_description,
license='MIT',
py_modules=['xopen'],
install_requires=requires,
install_requires=[
'bz2file; python_version=="2.7"',
],
extras_require={
'dev': ['pytest'],
},
python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4',
classifiers=[
"Development Status :: 4 - Beta",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
]
)
# coding: utf-8
from __future__ import print_function, division, absolute_import
import gzip
import os
import random
import sys
import signal
from contextlib import contextmanager
from nose.tools import raises
import pytest
from xopen import xopen, PipedGzipReader
base = "tests/file.txt"
files = [ base + ext for ext in ['', '.gz', '.bz2' ] ]
extensions = ["", ".gz", ".bz2"]
try:
import lzma
files.append(base + '.xz')
extensions.append(".xz")
except ImportError:
lzma = None
try:
import bz2
except ImportError:
bz2 = None
base = "tests/file.txt"
files = [base + ext for ext in extensions]
CONTENT = 'Testing, testing ...\nThe second line.\n'
major, minor = sys.version_info[0:2]
# File extensions for which appending is supported
append_extensions = extensions[:]
if sys.version_info[0] == 2:
append_extensions.remove(".bz2")
@contextmanager
......@@ -36,24 +38,24 @@ def temporary_path(name):
os.remove(path)
def test_xopen_text():
for name in files:
@pytest.mark.parametrize("name", files)
def test_xopen_text(name):
with xopen(name, 'rt') as f:
lines = list(f)
assert len(lines) == 2
assert lines[1] == 'The second line.\n', name
def test_xopen_binary():
for name in files:
@pytest.mark.parametrize("name", files)
def test_xopen_binary(name):
with xopen(name, 'rb') as f:
lines = list(f)
assert len(lines) == 2
assert lines[1] == b'The second line.\n', name
def test_no_context_manager_text():
for name in files:
@pytest.mark.parametrize("name", files)
def test_no_context_manager_text(name):
f = xopen(name, 'rt')
lines = list(f)
assert len(lines) == 2
......@@ -62,8 +64,8 @@ def test_no_context_manager_text():
assert f.closed
def test_no_context_manager_binary():
for name in files:
@pytest.mark.parametrize("name", files)
def test_no_context_manager_binary(name):
f = xopen(name, 'rb')
lines = list(f)
assert len(lines) == 2
......@@ -72,65 +74,22 @@ def test_no_context_manager_binary():
assert f.closed
@raises(IOError)
def test_nonexisting_file():
with xopen('this-file-does-not-exist') as f:
pass
@raises(IOError)
def test_nonexisting_file_gz():
with xopen('this-file-does-not-exist.gz') as f:
pass
@raises(IOError)
def test_nonexisting_file_bz2():
with xopen('this-file-does-not-exist.bz2') as f:
pass
if lzma:
@raises(IOError)
def test_nonexisting_file_xz():
with xopen('this-file-does-not-exist.xz') as f:
pass
@raises(IOError)
def test_write_to_nonexisting_dir():
with xopen('this/path/does/not/exist/file.txt', 'w') as f:
pass
@raises(IOError)
def test_write_to_nonexisting_dir_gz():
with xopen('this/path/does/not/exist/file.gz', 'w') as f:
pass
@raises(IOError)
def test_write_to_nonexisting_dir_bz2():
with xopen('this/path/does/not/exist/file.bz2', 'w') as f:
@pytest.mark.parametrize("ext", extensions)
def test_nonexisting_file(ext):
with pytest.raises(IOError):
with xopen('this-file-does-not-exist' + ext) as f:
pass
if lzma:
@raises(IOError)
def test_write_to_nonexisting_dir():
with xopen('this/path/does/not/exist/file.xz', 'w') as f:
@pytest.mark.parametrize("ext", extensions)
def test_write_to_nonexisting_dir(ext):
with pytest.raises(IOError):
with xopen('this/path/does/not/exist/file.txt' + ext, 'w') as f:
pass
def test_append():
cases = ["", ".gz"]
if bz2 and sys.version_info > (3,):
# BZ2 does NOT support append in Py 2.
cases.append(".bz2")
if lzma:
cases.append(".xz")
for ext in cases:
# On Py3, need to send BYTES, not unicode. Let's do it for all.
@pytest.mark.parametrize("ext", append_extensions)
def test_append(ext):
text = "AB".encode("utf-8")
reference = text + text
with temporary_path('truncated.fastq' + ext) as path:
......@@ -152,14 +111,8 @@ def test_append():
assert appended == reference
def test_append_text():
cases = ["", ".gz"]
if bz2 and sys.version_info > (3,):
# BZ2 does NOT support append in Py 2.
cases.append(".bz2")
if lzma:
cases.append(".xz")
for ext in cases: # BZ2 does NOT support append
@pytest.mark.parametrize("ext", append_extensions)
def test_append_text(ext):
text = "AB"
reference = text + text
with temporary_path('truncated.fastq' + ext) as path:
......@@ -210,21 +163,21 @@ class timeout:
if sys.version_info[:2] != (3, 3):
@raises(EOFError, IOError)
def test_truncated_gz():
with temporary_path('truncated.gz') as path:
create_truncated_file(path)
with timeout(seconds=2):
with pytest.raises((EOFError, IOError)):
f = xopen(path, 'r')
f.read()
f.close()
@raises(EOFError, IOError)
def test_truncated_gz_iter():
with temporary_path('truncated.gz') as path:
create_truncated_file(path)
with timeout(seconds=2):
with pytest.raises((EOFError, IOError)):
f = xopen(path, 'r')
for line in f:
pass
......@@ -239,3 +192,60 @@ def test_bare_read_from_gz():
def test_read_piped_gzip():
with PipedGzipReader('tests/hello.gz', 'rt') as f:
assert f.read() == 'hello'
def test_write_pigz_threads(tmpdir):
path = str(tmpdir.join('out.gz'))
with xopen(path, mode='w', threads=3) as f:
f.write('hello')
with xopen(path) as f:
assert f.read() == 'hello'
def test_write_stdout():
f = xopen('-', mode='w')
print("Hello", file=f)
f.close()
# ensure stdout is not closed
print("Still there?")
def test_write_stdout_contextmanager():
# Do not close stdout
with xopen('-', mode='w') as f:
print("Hello", file=f)
# ensure stdout is not closed
print("Still there?")
if sys.version_info[:2] >= (3, 4):
# pathlib was added in Python 3.4
from pathlib import Path
@pytest.mark.parametrize("file", files)
def test_read_pathlib(file):
path = Path(file)
with xopen(path, mode='rt') as f:
assert f.read() == CONTENT
@pytest.mark.parametrize("file", files)
def test_read_pathlib_binary(file):
path = Path(file)
with xopen(path, mode='rb') as f:
assert f.read() == bytes(CONTENT, 'ascii')
@pytest.mark.parametrize("ext", extensions)
def test_write_pathlib(ext, tmpdir):
path = Path(str(tmpdir)) / ('hello.txt' + ext)
with xopen(path, mode='wt') as f:
f.write('hello')
with xopen(path, mode='rt') as f:
assert f.read() == 'hello'
@pytest.mark.parametrize("ext", extensions)
def test_write_pathlib_binary(ext, tmpdir):
path = Path(str(tmpdir)) / ('hello.txt' + ext)
with xopen(path, mode='wb') as f:
f.write(b'hello')
with xopen(path, mode='rb') as f:
assert f.read() == b'hello'
[tox]
envlist = py27,py33,py34,py35,py36
envlist = py27,py34,py35,py36,py37
[testenv]
deps = nose
commands = nosetests -P tests
deps = pytest
commands = pytest
Metadata-Version: 2.1
Name: xopen
Version: 0.5.0
Summary: Open compressed files transparently
Home-page: https://github.com/marcelm/xopen/
Author: Marcel Martin
Author-email: mail@marcelm.net
License: MIT
Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
:target: https://travis-ci.org/marcelm/xopen
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
:target: https://pypi.python.org/pypi/xopen
=====
xopen
=====
This small Python module provides an ``xopen`` function that works like the
built-in ``open`` function, but can also deal with compressed files.
Supported compression formats are gzip, bzip2 and xz. They are automatically
recognized by their file extensions `.gz`, `.bz2` or `.xz`.
The focus is on being as efficient as possible on all supported Python versions.
For example, simply using ``gzip.open`` is very slow in older Pythons, and
it is a lot faster to use a ``gzip`` subprocess. For writing to gzip files,
``xopen`` uses ``pigz`` when available.
This module has originally been developed as part of the `cutadapt
tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.7.
Usage
-----
Open a file for reading::
from xopen import xopen
with xopen('file.txt.xz') as f:
content = f.read()
Or without context manager::
from xopen import xopen
f = xopen('file.txt.xz')
content = f.read()
f.close()
Open a file for writing::
from xopen import xopen
with xopen('file.txt.gz', mode='w') as f:
f.write('Hello')
Credits
-------
The name ``xopen`` was taken from the C function of the same name in the
`utils.h file which is part of BWA <https://github.com/lh3/bwa/blob/83662032a2192d5712996f36069ab02db82acf67/utils.h>`_.
Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for appending to files.
Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
If you also want to open S3 files, you may want to use that module instead.
Changes
-------
v0.5.0
~~~~~~
* By default, pigz is now only allowed to use at most four threads. This hopefully reduces
problems some users had with too many threads when opening many files at the same time.
* xopen now accepts pathlib.Path objects.
Author
------
Marcel Martin <mail@marcelm.net> (`@marcelm_ on Twitter <https://twitter.com/marcelm_>`_)
Links
-----
* `Source code <https://github.com/marcelm/xopen/>`_
* `Report an issue <https://github.com/marcelm/xopen/issues>`_
* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
Provides-Extra: dev
.editorconfig
.gitignore
.travis.yml
LICENSE
README.rst
pyproject.toml
setup.cfg
setup.py
tox.ini
xopen.py
tests/file.txt
tests/file.txt.bz2
tests/file.txt.gz
tests/file.txt.xz
tests/hello.gz
tests/test_xopen.py
xopen.egg-info/PKG-INFO
xopen.egg-info/SOURCES.txt
xopen.egg-info/dependency_links.txt
xopen.egg-info/requires.txt
xopen.egg-info/top_level.txt
\ No newline at end of file
[:python_version == "2.7"]
bz2file
[dev]
pytest
......@@ -9,8 +9,14 @@ import io
import os
import time
from subprocess import Popen, PIPE
from pkg_resources import get_distribution, DistributionNotFound
__version__ = '0.3.2'
try:
__version__ = get_distribution(__name__).version
except DistributionNotFound:
# package is not installed
pass
_PY3 = sys.version > '3'
......@@ -32,6 +38,49 @@ except ImportError:
if _PY3:
basestring = str
try:
import pathlib # Exists in Python 3.4+
except ImportError:
pathlib = None
try:
from os import fspath # Exists in Python 3.6+
except ImportError:
def fspath(path):
if hasattr(path, "__fspath__"):
return path.__fspath__()
# Python 3.4 and 3.5 do not support the file system path protocol
if pathlib is not None and isinstance(path, pathlib.Path):
return str(path)
return path
def _available_cpu_count():
"""
Number of available virtual or physical CPUs on this system
Adapted from http://stackoverflow.com/a/1006301/715090
"""
try:
return len(os.sched_getaffinity(0))
except AttributeError:
pass
import re
try:
with open('/proc/self/status') as f:
status = f.read()
m = re.search(r'(?m)^Cpus_allowed:\s*(.*)$', status)
if m:
res = bin(int(m.group(1).replace(',', ''), 16)).count('1')
if res > 0:
return res
except IOError:
pass
try:
import multiprocessing
return multiprocessing.cpu_count()
except (ImportError, NotImplementedError):
return 1
class Closing(object):
"""
......@@ -59,30 +108,54 @@ class PipedGzipWriter(Closing):
therefore also be faster.
"""
def __init__(self, path, mode='wt'):
def __init__(self, path, mode='wt', compresslevel=6, threads=None):
"""
mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab'
compresslevel -- gzip compression level
threads (int) -- number of pigz threads. If this is set to None, a reasonable default is
used. At the moment, this means that the number of available CPU cores is used, capped
at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
"""
if mode not in ('w', 'wt', 'wb', 'a', 'at', 'ab'):
raise ValueError("Mode is '{0}', but it must be 'w', 'wt', 'wb', 'a', 'at' or 'ab'".format(mode))
# TODO use a context manager
self.outfile = open(path, mode)
self.devnull = open(os.devnull, mode)
self.closed = False
self.name = path
kwargs = dict(stdin=PIPE, stdout=self.outfile, stderr=self.devnull)
# Setting close_fds to True in the Popen arguments is necessary due to
# <http://bugs.python.org/issue12786>.
kwargs = dict(stdin=PIPE, stdout=self.outfile, stderr=self.devnull, close_fds=True)
# However, close_fds is not supported on Windows. See
# <https://github.com/marcelm/cutadapt/issues/315>.
if sys.platform != 'win32':
kwargs['close_fds'] = True
if 'w' in mode and compresslevel != 6:
extra_args = ['-' + str(compresslevel)]
else:
extra_args = []
pigz_args = ['pigz']
if threads is None:
threads = min(_available_cpu_count(), 4)
if threads != 0:
pigz_args += ['-p', str(threads)]
try:
self.process = Popen(['pigz'], **kwargs)
self.process = Popen(pigz_args + extra_args, **kwargs)
self.program = 'pigz'
except OSError as e:
except OSError:
# pigz not found, try regular gzip
try:
self.process = Popen(['gzip'], **kwargs)
self.process = Popen(['gzip'] + extra_args, **kwargs)
self.program = 'gzip'
except (IOError, OSError) as e:
except (IOError, OSError):
self.outfile.close()
self.devnull.close()
raise
except IOError as e:
except IOError: # TODO IOError is the same as OSError on Python 3.3
self.outfile.close()
self.devnull.close()
raise
......@@ -110,7 +183,7 @@ class PipedGzipReader(Closing):
raise ValueError("Mode is '{0}', but it must be 'r', 'rt' or 'rb'".format(mode))
self.process = Popen(['gzip', '-cd', path], stdout=PIPE, stderr=PIPE)
self.name = path
if _PY3 and not 'b' in mode:
if _PY3 and 'b' not in mode:
self._file = io.TextIOWrapper(self.process.stdout)
else:
self._file = self.process.stdout
......@@ -165,50 +238,19 @@ if bz2 is not None:
"""
def xopen(filename, mode='r', compresslevel=6):
"""
Replacement for the "open" function that can also open files that have
been compressed with gzip, bzip2 or xz. If the filename is '-', standard
output (mode 'w') or input (mode 'r') is returned. If the filename ends
with .gz, the file is opened with a pipe to the gzip program. If that
does not work, then gzip.open() is used (the gzip module is slower than
the pipe to the gzip program). If the filename ends with .bz2, it's
opened as a bz2.BZ2File. Otherwise, the regular open() is used.
mode can be: 'rt', 'rb', 'at', 'ab', 'wt', or 'wb'
Instead of 'rt', 'wt' and 'at', 'r', 'w' and 'a' can be used as
abbreviations.
In Python 2, the 't' and 'b' characters are ignored.
Append mode ('a', 'at', 'ab') is unavailable with BZ2 compression and
will raise an error.
compresslevel is the gzip compression level. It is not used for bz2 and xz.
"""
if mode in ('r', 'w', 'a'):
mode += 't'
if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'):
raise ValueError("mode '{0}' not supported".format(mode))
def _open_stdin_or_out(mode):
# Do not return sys.stdin or sys.stdout directly as we want the returned object
# to be closable without closing sys.stdout.
std = dict(r=sys.stdin, w=sys.stdout)[mode[0]]
if not _PY3:
mode = mode[0]
if not isinstance(filename, basestring):
raise ValueError("the filename must be a string")
# Enforce str type on Python 2
# Note that io.open is slower than regular open() on Python 2.7, but
# it appears to be the only API that has a closefd parameter.
mode = mode[0] + 'b'
return io.open(std.fileno(), mode=mode, closefd=False)
# standard input and standard output handling
if filename == '-':
if not _PY3:
return dict(r=sys.stdin, w=sys.stdout)[mode]
else:
return dict(
r=sys.stdin,
rt=sys.stdin,
rb=sys.stdin.buffer,
w=sys.stdout,
wt=sys.stdout,
wb=sys.stdout.buffer)[mode]
if filename.endswith('.bz2'):
def _open_bz2(filename, mode):
if bz2 is None:
raise ImportError("Cannot open bz2 files: The bz2 module is not available")
if _PY3:
......@@ -220,11 +262,16 @@ def xopen(filename, mode='r', compresslevel=6):
return ClosingBZ2File(filename, mode)
else:
return bz2.BZ2File(filename, mode)
elif filename.endswith('.xz'):
def _open_xz(filename, mode):
if lzma is None:
raise ImportError("Cannot open xz files: The lzma module is not available (use Python 3.3 or newer)")
raise ImportError(
"Cannot open xz files: The lzma module is not available (use Python 3.3 or newer)")
return lzma.open(filename, mode)
elif filename.endswith('.gz'):
def _open_gz(filename, mode, compresslevel, threads):
if _PY3 and 'r' in mode:
return gzip.open(filename, mode)
if sys.version_info[:2] == (2, 7):
......@@ -241,9 +288,59 @@ def xopen(filename, mode='r', compresslevel=6):
return buffered_reader(gzip.open(filename, mode))
else:
try:
return PipedGzipWriter(filename, mode)
return PipedGzipWriter(filename, mode, compresslevel, threads=threads)
except OSError:
return buffered_writer(gzip.open(filename, mode, compresslevel=compresslevel))
def xopen(filename, mode='r', compresslevel=6, threads=None):
"""
A replacement for the "open" function that can also open files that have
been compressed with gzip, bzip2 or xz. If the filename is '-', standard
output (mode 'w') or input (mode 'r') is returned.
The file type is determined based on the filename: .gz is gzip, .bz2 is bzip2 and .xz is
xz/lzma.
When writing a gzip-compressed file, the following methods are tried in order to get the
best speed 1) using a pigz (parallel gzip) subprocess; 2) using a gzip subprocess;
3) gzip.open. A single gzip subprocess can be faster than gzip.open because it runs in a
separate process.
Uncompressed files are opened with the regular open().
mode can be: 'rt', 'rb', 'at', 'ab', 'wt', or 'wb'. Also, the 't' can be omitted,
so instead of 'rt', 'wt' and 'at', the abbreviations 'r', 'w' and 'a' can be used.
In Python 2, the 't' and 'b' characters are ignored.
Append mode ('a', 'at', 'ab') is unavailable with BZ2 compression and
will raise an error.
compresslevel is the gzip compression level. It is not used for bz2 and xz.
threads is the number of threads for pigz. If None, then the pigz default is used.
"""
if mode in ('r', 'w', 'a'):
mode += 't'
if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'):
raise ValueError("mode '{0}' not supported".format(mode))
if not _PY3:
mode = mode[0]
filename = fspath(filename)
if not isinstance(filename, basestring):
raise ValueError("the filename must be a string")
if compresslevel not in range(1, 10):
raise ValueError("compresslevel must be between 1 and 9")
if filename == '-':
return _open_stdin_or_out(mode)
elif filename.endswith('.bz2'):
return _open_bz2(filename, mode)
elif filename.endswith('.xz'):
return _open_xz(filename, mode)
elif filename.endswith('.gz'):
return _open_gz(filename, mode, compresslevel, threads)
else:
# Python 2.6 and 2.7 have io.open, which we could use to make the returned
# object consistent with the one returned in Python 3, but reading a file
......