Skip to content
Commits on Source (3)
kallisto (0.45.1-1) UNRELEASED; urgency=medium
kallisto (0.45.1+dfsg-1) UNRELEASED; urgency=medium
* Initial release (Closes: #<bug>)
......
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: kallisto
Source: https://github.com/pachterlab/kallisto/releases
Files-Excluded: */htslib
Files: *
Copyright: 2016-2017 Nicolas Bray, Harold Pimentel, Páll Melsted and Lior Pachter
......
version=4
opts="repacksuffix=+dfsg,dversionmangle=auto,repack,compression=xz" \
https://github.com/pachterlab/kallisto/tags .*/v?@ANY_VERSION@@ARCHIVE_EXT@
Building and Installing HTSlib
==============================
Requirements
============
Building HTSlib requires a few programs and libraries to be present.
At least the following are required:
GNU make
C compiler (e.g. gcc or clang)
In addition, building the configure script requires:
autoheader
autoconf
Running the configure script uses awk, along with a number of
standard UNIX tools (cat, cp, grep, mv, rm, sed, among others). Almost
all installations will have these already.
Running the test harness (make test) uses:
bash
perl
HTSlib uses the following external libraries. Building requires both the
library itself, and include files needed to compile code that uses functions
from the library. Note that some Linux distributions put include files in
a development ('-dev' or '-devel') package separate from the main library.
libz (required)
libbz2 (required, unless configured with --disable-bz2)
liblzma (required, unless configured with --disable-lzma)
libcurl (optional, but strongly recommended)
libcrypto (optional for Amazon S3 support; not needed on MacOS)
Disabling libbzip2 and liblzma will make some CRAM files unreadable, so
is not recommended.
Using libcurl provides HTSlib with better network protocol support, for
example it enables the use of https:// URLs. It is also required if
direct access to Amazon S3 or Google Cloud Storage is enabled.
Amazon S3 support requires an HMAC function to calculate a message
authentication code. On MacOS, the CCHmac function from the standard
library is used. Systems that do not have CChmac will get this from
libcrypto. libcrypto is part of OpenSSL or one of its derivatives (LibreSSL
or BoringSSL).
Building Configure
==================
This step is only needed if configure.ac has been changed, or if configure
does not exist (for example, when building from a git clone). The
configure script and config.h.in can be built by running:
autoheader
autoconf
If you have a full GNU autotools install, you can alternatively run:
autoreconf
Basic Installation
==================
To build and install HTSlib, 'cd' to the htslib-1.x directory containing
the package's source and type the following commands:
./configure
make
make install
The './configure' command checks your build environment and allows various
optional functionality to be enabled (see Configuration below). If you
don't want to select any optional functionality, you may wish to omit
configure and just type 'make; make install' as for previous versions
of HTSlib. However if the build fails you should run './configure' as
it can diagnose the common reasons for build failures.
The 'make' command builds the HTSlib library and and various useful
utilities: bgzip, htsfile, and tabix. If compilation fails you should
run './configure' as it can diagnose problems with your build environment
that cause build failures.
The 'make install' command installs the libraries, library header files,
utilities, several manual pages, and a pkgconfig file to /usr/local.
The installation location can be changed by configuring with --prefix=DIR
or via 'make prefix=DIR install' (see Installation Locations below).
Configuration
=============
By default, './configure' examines your build environment, checking for
requirements such as the zlib development files, and arranges for a plain
HTSlib build. The following configure options can be used to enable
various features and specify further optional external requirements:
--enable-plugins
Use plugins to implement exotic file access protocols and other
specialised facilities. This enables such facilities to be developed
and packaged outwith HTSlib, and somewhat isolates HTSlib-using programs
from their library dependencies. By default (or with --disable-plugins),
any enabled pluggable facilities (such as libcurl file access) are built
directly within HTSlib.
The <https://github.com/samtools/htslib-plugins> repository contains
several additional plugins, including the iRODS (<http://irods.org/>)
file access plugin previously distributed with HTSlib.
--with-plugin-dir=DIR
Specifies the directory into which plugins built while building HTSlib
should be installed; by default, LIBEXECDIR/htslib.
--with-plugin-path=DIR:DIR:DIR...
Specifies the list of directories that HTSlib will search for plugins.
By default, only the directory specified via --with-plugin-dir will be
searched; you can use --with-plugin-path='DIR:$(plugindir):DIR' and so
on to cause additional directories to be searched.
--enable-libcurl
Use libcurl (<http://curl.haxx.se/>) to implement network access to
remote files via FTP, HTTP, HTTPS, etc. By default, HTSlib uses its
own simple networking code to provide access via FTP and HTTP only.
--enable-gcs
Implement network access to Google Cloud Storage. By default or with
--enable-gcs=check, this is enabled when libcurl is enabled.
--enable-s3
Implement network access to Amazon AWS S3. By default or with
--enable-s3=check, this is enabled when libcurl is enabled.
--disable-bz2
Bzip2 is an optional compression codec format for CRAM, included
in HTSlib by default. It can be disabled with --disable-bz2, but
be aware that not all CRAM files may be possible to decode.
--disable-lzma
LZMA is an optional compression codec for CRAM, included in HTSlib
by default. It can be disabled with --disable-lzma, but be aware
that not all CRAM files may be possible to decode.
The configure script also accepts the usual options and environment variables
for tuning installation locations and compilers: type './configure --help'
for details. For example,
./configure CC=icc --prefix=/opt/icc-compiled
would specify that HTSlib is to be built with icc and installed into bin,
lib, etc subdirectories under /opt/icc-compiled.
Installation Locations
======================
By default, 'make install' installs HTSlib libraries under /usr/local/lib,
HTSlib header files under /usr/local/include, utility programs under
/usr/local/bin, etc. (To be precise, the header files are installed within
a fixed 'htslib' subdirectory under the specified .../include location.)
You can specify a different location to install HTSlib by configuring
with --prefix=DIR or specify locations for particular parts of HTSlib by
configuring with --libdir=DIR and so on. Type './configure --help' for
the full list of such install directory options.
Alternatively you can specify different locations at install time by
typing 'make prefix=DIR install' or 'make libdir=DIR install' and so on.
Consult the list of prefix/exec_prefix/etc variables near the top of the
Makefile for the full list of such variables that can be overridden.
You can also specify a staging area by typing 'make DESTDIR=DIR install',
possibly in conjunction with other --prefix or prefix=DIR settings.
For example,
make DESTDIR=/tmp/staging prefix=/opt
would install into bin, lib, etc subdirectories under /tmp/staging/opt.
[Files in this distribution outwith the cram/ subdirectory are distributed
according to the terms of the following MIT/Expat license.]
The MIT/Expat License
Copyright (C) 2012-2014 Genome Research Ltd.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
[Files within the cram/ subdirectory in this distribution are distributed
according to the terms of the following Modified 3-Clause BSD license.]
The Modified-BSD License
Copyright (C) 2012-2014 Genome Research Ltd.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the names Genome Research Ltd and Wellcome Trust Sanger Institute
nor the names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY GENOME RESEARCH LTD AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL GENOME RESEARCH LTD OR ITS CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
[The use of a range of years within a copyright notice in this distribution
should be interpreted as being equivalent to a list of years including the
first and last year specified and all consecutive years between them.
For example, a copyright notice that reads "Copyright (C) 2005, 2007-2009,
2011-2012" should be interpreted as being identical to a notice that reads
"Copyright (C) 2005, 2007, 2008, 2009, 2011, 2012" and a copyright notice
that reads "Copyright (C) 2005-2012" should be interpreted as being identical
to a notice that reads "Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012".]
This diff is collapsed.
Noteworthy changes in release 1.4.1 (8th May 2017)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is primarily a security bug fix update.
* Fixed SECURITY issue with buffer overruns with malicious data. (#514).
* S3 support for non Amazon AWS endpoints. (#506)
* Support for variant breakpoints in bcftools. (#516)
* Improved handling of BCF NaNs. (#485)
* Compilation / portability improvements. (#255, #423, #498, #488)
* Miscellaneous bug fixes (#482, #521, #522, #523, #524).
* Sanitise headers (#509)
Release 1.4 (13 March 2017)
* Incompatible changes: several functions and data types have been changed
in this release, and the shared library soversion has been bumped to 2.
- bam_pileup1_t has an additional field (which holds user data)
- bam1_core_t has been modified to allow for >64K CIGAR operations
and (along with bam1_t) so that CIGAR entries are aligned in memory
- hopen() has vararg arguments for setting URL scheme-dependent options
- the various tbx_conf_* presets are now const
- auxiliary fields in bam1_t are now always stored in little-endian byte
order (previously this depended on if you read a bam, sam or cram file)
- index metadata (accessible via hts_idx_get_meta()) is now always
stored in little-endian byte order (previously this depended on if
the index was in tbi or csi format)
- bam_aux2i() now returns an int64_t value
- fai_load() will no longer save local copies of remote fasta indexes
- hts_idx_get_meta() now takes a uint32_t * for l_meta (was int32_t *)
* HTSlib now links against libbz2 and liblzma by default. To remove these
dependencies, run configure with options --disable-bz2 and --disable-lzma,
but note that this may make some CRAM files produced elsewhere unreadable.
* Added a thread pool interface and replaced the bgzf multi-threading
code to use this pool. BAM and CRAM decoding is now multi-threaded
too, using the pool to automatically balance the number of threads
between decode, encode and any data processing jobs.
* New errmod_cal(), probaln_glocal(), sam_cap_mapq(), and sam_prob_realn()
functions, previously internal to SAMtools, have been added to HTSlib.
* Files can now be accessed via Google Cloud Storage using gs: URLs, when
HTSlib is configured to use libcurl for network file access rather than
the included basic knetfile networking.
* S3 file access now also supports the "host_base" setting in the
$HOME/.s3cfg configuration file.
* Data URLs ("data:,text") now follow the standard RFC 2397 format and may
be base64-encoded (when written as "data:;base64,text") or may include
percent-encoded characters. HTSlib's previous over-simplified "data:text"
format is no longer supported -- you will need to add an initial comma.
* When plugins are enabled, S3 support is now provided by a separate
hfile_s3 plugin rather than by hfile_libcurl itself as previously.
When --enable-libcurl is used, by default both GCS and S3 support
and plugins will also be built; they can be individually disabled
via --disable-gcs and --disable-s3.
* The iRODS file access plugin has been moved to a separate repository.
Configure no longer has a --with-irods option; instead build the plugin
found at <https://github.com/samtools/htslib-plugins>.
* APIs to portably read and write (possibly unaligned) data in little-endian
byte order have been added.
* New functions bam_auxB_len(), bam_auxB2i() and bam_auxB2f() have been
added to make accessing array-type auxiliary data easier. bam_aux2i()
can now return the full range of values that can be stored in an integer
tag (including unsigned 32 bit tags). bam_aux2f() will return the value
of integer tags (as a double) as well as floating-point ones. All of
the bam_aux2 and bam_auxB2 functions will set errno if the requested
conversion is not valid.
* New functions fai_load3() and fai_build3() allow fasta indexes to be
stored in a different location to the indexed fasta file.
* New functions bgzf_index_dump_hfile() and bgzf_index_load_hfile()
allow bgzf index files (.gzi) to be written to / read from an existing
hFILE handle.
* hts_idx_push() will report when trying to add a range to an index that
is beyond the limits that the given index can handle. This means trying
to index chromosomes longer than 2^29 bases with a .bai or .tbi index
will report an error instead of apparantly working but creating an invalid
index entry.
* VCF formatting is now approximately 4x faster. (Whether this is
noticable depends on what was creating the VCF.)
* CRAM lossy_names mode now works with TLEN of 0 or TLEN within +/- 1
of the computed value. Note in these situations TLEN will be
generated / fixed during CRAM decode.
* CRAM now supports bzip2 and lzma codecs. Within htslib these are
disabled by default, but can be enabled by specifying "use_bzip2" or
"use_lzma" in an hts_opt_add() call or via the mode string of the
hts_open_format() function.
Noteworthy changes in release 1.3.2 (13 September 2016)
* Corrected bin calculation when converting directly from CRAM to BAM.
Previously a small fraction of converted reads would fail Picard's
validation with "bin field of BAM record does not equal value computed"
(SAMtools issue #574).
* Plugins can now signal to HTSlib which of RTLD_LOCAL and RTLD_GLOBAL
they wish to be opened with -- previously they were always RTLD_LOCAL.
Noteworthy changes in release 1.3.1 (22 April 2016)
* Improved error checking and reporting, especially of I/O errors when
writing output files (#17, #315, PR #271, PR #317).
* Build fixes for 32-bit systems; be sure to run configure to enable
large file support and access to 2GiB+ files.
* Numerous VCF parsing fixes (#321, #322, #323, #324, #325; PR #370).
Particular thanks to Kostya Kortchinsky of the Google Security Team
for testing and numerous input parsing bug reports.
* HTSlib now prints an informational message when initially creating a
CRAM reference cache in the default location under your $HOME directory.
(No message is printed if you are using $REF_CACHE to specify a location.)
* Avoided rare race condition when caching downloaded CRAM reference sequence
files, by using distinctive names for temporary files (in addition to O_EXCL,
which has always been used). Occasional corruption would previously occur
when multiple tools were simultaneously caching the same reference sequences
on an NFS filesystem that did not support O_EXCL (PR #320).
* Prevented race condition in file access plugin loading (PR #341).
* Fixed mpileup memory leak, so no more "[bam_plp_destroy] memory leak [...]
Continue anyway" warning messages (#299).
* Various minor CRAM fixes.
* Fixed documentation problems #348 and #358.
Noteworthy changes in release 1.3 (15 December 2015)
* Files can now be accessed via HTTPS and Amazon S3 in addition to HTTP
and FTP, when HTSlib is configured to use libcurl for network file access
rather than the included basic knetfile networking.
* HTSlib can be built to use remote access hFILE backends (such as iRODS
and libcurl) via a plugin mechanism. This allows other backends to be
easily added and facilitates building tools that use HTSlib, as they
don't need to be linked with the backends' various required libraries.
* When writing CRAM output, sam_open() etc now default to writing CRAM v3.0
rather than v2.1.
* fai_build() and samtools faidx now accept initial whitespace in ">"
headers (e.g., "> chr1 description" is taken to refer to "chr1").
* tabix --only-header works again (was broken in 1.2.x; #249).
* HTSlib's configure script and Makefile now fully support the standard
convention of allowing CC/CPPFLAGS/CFLAGS/LDFLAGS/LIBS to be overridden
as needed. Previously the Makefile listened to $(LDLIBS) instead; if you
were overriding that, you should now override LIBS rather than LDLIBS.
* Fixed bugs #168, #172, #176, #197, #206, #225, #245, #265, #295, and #296.
Noteworthy changes in release 1.2.1 (3 February 2015)
* Reinstated hts_file_type() and FT_* macros, which were available until 1.1
but briefly removed in 1.2. This function is deprecated and will be removed
in a future release -- you should use hts_detect_format() etc instead
Noteworthy changes in release 1.2 (2 February 2015)
* HTSlib now has a configure script which checks your build environment
and allows for selection of optional extras. See INSTALL for details
* By default, reference sequences are fetched from the EBI CRAM Reference
Registry and cached in your $HOME cache directory. This behaviour can
be controlled by setting REF_PATH and REF_CACHE enviroment variables
(see the samtools(1) man page for details)
* Numerous CRAM improvements:
- Support for CRAM v3.0, an upcoming revision to CRAM supporting
better compression and per-container checksums
- EOF checking for v2.1 and v3.0 (similar to checking BAM EOF blocks)
- Non-standard values for PNEXT and TLEN fields are now preserved
- hts_set_fai_filename() now provides a reference file when encoding
- Generated read names are now numbered from 1, rather than being
labelled 'slice:record-in-slice'
- Multi-threading and speed improvements
* New htsfile command for identifying file formats, and corresponding
file format detection APIs
* New tabix --regions FILE, --targets FILE options for filtering via BED files
* Optional iRODS file access, disabled by default. Configure with --with-irods
to enable accessing iRODS data objects directly via 'irods:DATAOBJ'
* All occurences of 2^29 in the source have been eliminated, so indexing
and querying against reference sequences larger than 512Mbp works (when
using CSI indices)
* Support for plain GZIP compression in various places
* VCF header editing speed improvements
* Added seq_nt16_int[] (equivalent to the samtools API's bam_nt16_nt4_table)
* Reinstated faidx_fetch_nseq(), which was accidentally removed from 1.1.
Now faidx_fetch_nseq() and faidx_nseq() are equivalent; eventually
faidx_fetch_nseq() will be deprecated and removed [#156]
* Fixed bugs #141, #152, #155, #158, #159, and various memory leaks
HTSlib is an implementation of a unified C library for accessing common file
formats, such as SAM, CRAM, VCF, and BCF, used for high-throughput sequencing
data. It is the core library used by samtools and bcftools.
See INSTALL for building and installation instructions.
This diff is collapsed.
/*
Copyright (C) 2017 Genome Research Ltd.
Author: Petr Danecek <pd3@sanger.ac.uk>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/
/*
Reorder duplicate lines so that compatible variant types are
returned together by bcf_sr_next_line()
- readers grouped by variants. Even with many readers there will be
typically only several groups
*/
#ifndef __BCF_SR_SORT_H__
#define __BCF_SR_SORT_H__
#include "htslib/synced_bcf_reader.h"
#include "htslib/kbitset.h"
typedef struct
{
int nrec, mrec;
bcf1_t **rec;
}
vcf_buf_t;
typedef struct
{
char *str; // "A>C" for biallelic records or "A>C,A>CC" for multiallelic records
int type; // VCF_SNP, VCF_REF, etc.
int nalt; // number of alternate alleles in this record
int nvcf, mvcf, *vcf; // the list of readers with the same variants
bcf1_t **rec; // list of VCF records in the readers
kbitset_t *mask; // which groups contain the variant
}
var_t;
typedef struct
{
char *key; // only for debugging
int nvar, mvar, *var; // the variants and their type
int nvcf; // number of readers with the same variants
}
grp_t;
typedef struct
{
int nvar, mvar, *var; // list of compatible variants that can be output together
int cnt; // number of readers in this group
kbitset_t *mask; // which groups are populated in this set (replace with expandable bitmask)
}
varset_t;
typedef struct
{
uint8_t score[256];
int nvar, mvar;
var_t *var; // list of all variants from all readers
int nvset, mvset;
int mpmat, *pmat; // pairing matrix, i-th vset and j-th group accessible as i*ngrp+j
int ngrp, mgrp;
int mcnt, *cnt; // number of VCF covered by a varset
grp_t *grp; // list of VCF representatives, each with a unique combination of duplicate lines
varset_t *vset; // list of variant sets - combinations of compatible variants across multiple groups ready for output
vcf_buf_t *vcf_buf; // records sorted in output order, for each VCF
bcf_srs_t *sr;
void *grp_str2int;
void *var_str2int;
kstring_t str;
int moff, noff, *off, mcharp;
char **charp;
const char *chr;
int pos, nsr, msr;
int pair;
}
sr_sort_t;
sr_sort_t *bcf_sr_sort_init(sr_sort_t *srt);
int bcf_sr_sort_next(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, int pos);
void bcf_sr_sort_destroy(sr_sort_t *srt);
void bcf_sr_sort_remove_reader(bcf_srs_t *readers, sr_sort_t *srt, int i);
#endif
This diff is collapsed.
/* bgzip.c -- Block compression/decompression utility.
Copyright (C) 2008, 2009 Broad Institute / Massachusetts Institute of Technology
Copyright (C) 2010, 2013-2017 Genome Research Ltd.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notices and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/
#include <config.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <stdarg.h>
#include <getopt.h>
#include <sys/stat.h>
#include "htslib/bgzf.h"
#include "htslib/hts.h"
static const int WINDOW_SIZE = 64 * 1024;
static void error(const char *format, ...)
{
va_list ap;
va_start(ap, format);
vfprintf(stderr, format, ap);
va_end(ap);
exit(EXIT_FAILURE);
}
static int confirm_overwrite(const char *fn)
{
int save_errno = errno;
int ret = 0;
if (isatty(STDIN_FILENO)) {
char c;
fprintf(stderr, "[bgzip] %s already exists; do you wish to overwrite (y or n)? ", fn);
if (scanf("%c", &c) == 1 && (c == 'Y' || c == 'y')) ret = 1;
}
errno = save_errno;
return ret;
}
static int bgzip_main_usage(void)
{
fprintf(stderr, "\n");
fprintf(stderr, "Version: %s\n", hts_version());
fprintf(stderr, "Usage: bgzip [OPTIONS] [FILE] ...\n");
fprintf(stderr, "Options:\n");
fprintf(stderr, " -b, --offset INT decompress at virtual file pointer (0-based uncompressed offset)\n");
fprintf(stderr, " -c, --stdout write on standard output, keep original files unchanged\n");
fprintf(stderr, " -d, --decompress decompress\n");
fprintf(stderr, " -f, --force overwrite files without asking\n");
fprintf(stderr, " -h, --help give this help\n");
fprintf(stderr, " -i, --index compress and create BGZF index\n");
fprintf(stderr, " -I, --index-name FILE name of BGZF index file [file.gz.gzi]\n");
fprintf(stderr, " -r, --reindex (re)index compressed file\n");
fprintf(stderr, " -g, --rebgzip use an index file to bgzip a file\n");
fprintf(stderr, " -s, --size INT decompress INT bytes (uncompressed size)\n");
fprintf(stderr, " -@, --threads INT number of compression threads to use [1]\n");
fprintf(stderr, "\n");
return 1;
}
int main(int argc, char **argv)
{
int c, compress, pstdout, is_forced, index = 0, rebgzip = 0, reindex = 0;
BGZF *fp;
void *buffer;
long start, end, size;
char *index_fname = NULL;
int threads = 1;
static const struct option loptions[] =
{
{"help", no_argument, NULL, 'h'},
{"offset", required_argument, NULL, 'b'},
{"stdout", no_argument, NULL, 'c'},
{"decompress", no_argument, NULL, 'd'},
{"force", no_argument, NULL, 'f'},
{"index", no_argument, NULL, 'i'},
{"index-name", required_argument, NULL, 'I'},
{"reindex", no_argument, NULL, 'r'},
{"rebgzip",no_argument,NULL,'g'},
{"size", required_argument, NULL, 's'},
{"threads", required_argument, NULL, '@'},
{"version", no_argument, NULL, 1},
{NULL, 0, NULL, 0}
};
compress = 1; pstdout = 0; start = 0; size = -1; end = -1; is_forced = 0;
while((c = getopt_long(argc, argv, "cdh?fb:@:s:iI:gr",loptions,NULL)) >= 0){
switch(c){
case 'd': compress = 0; break;
case 'c': pstdout = 1; break;
case 'b': start = atol(optarg); compress = 0; pstdout = 1; break;
case 's': size = atol(optarg); pstdout = 1; break;
case 'f': is_forced = 1; break;
case 'i': index = 1; break;
case 'I': index_fname = optarg; break;
case 'g': rebgzip = 1; break;
case 'r': reindex = 1; compress = 0; break;
case '@': threads = atoi(optarg); break;
case 1:
printf(
"bgzip (htslib) %s\n"
"Copyright (C) 2017 Genome Research Ltd.\n", hts_version());
return EXIT_SUCCESS;
case 'h':
case '?': return bgzip_main_usage();
}
}
if (size >= 0) end = start + size;
if (end >= 0 && end < start) {
fprintf(stderr, "[bgzip] Illegal region: [%ld, %ld]\n", start, end);
return 1;
}
if (compress == 1) {
struct stat sbuf;
int f_src = fileno(stdin);
if ( argc>optind )
{
if ( stat(argv[optind],&sbuf)<0 )
{
fprintf(stderr, "[bgzip] %s: %s\n", strerror(errno), argv[optind]);
return 1;
}
if ((f_src = open(argv[optind], O_RDONLY)) < 0) {
fprintf(stderr, "[bgzip] %s: %s\n", strerror(errno), argv[optind]);
return 1;
}
if (pstdout)
fp = bgzf_open("-", "w");
else
{
char *name = malloc(strlen(argv[optind]) + 5);
strcpy(name, argv[optind]);
strcat(name, ".gz");
fp = bgzf_open(name, is_forced? "w" : "wx");
if (fp == NULL && errno == EEXIST && confirm_overwrite(name))
fp = bgzf_open(name, "w");
if (fp == NULL) {
fprintf(stderr, "[bgzip] can't create %s: %s\n", name, strerror(errno));
free(name);
return 1;
}
free(name);
}
}
else if (!pstdout && isatty(fileno((FILE *)stdout)) )
return bgzip_main_usage();
else if ( index && !index_fname )
{
fprintf(stderr, "[bgzip] Index file name expected when writing to stdout\n");
return 1;
}
else
fp = bgzf_open("-", "w");
if ( index && rebgzip )
{
fprintf(stderr, "[bgzip] Can't produce a index and rebgzip simultaneously\n");
return 1;
}
if ( rebgzip && !index_fname )
{
fprintf(stderr, "[bgzip] Index file name expected when writing to stdout\n");
return 1;
}
if (threads > 1)
bgzf_mt(fp, threads, 256);
if ( index ) bgzf_index_build_init(fp);
buffer = malloc(WINDOW_SIZE);
if (rebgzip){
if ( bgzf_index_load(fp, index_fname, NULL) < 0 ) error("Could not load index: %s.gzi\n", argv[optind]);
while ((c = read(f_src, buffer, WINDOW_SIZE)) > 0)
if (bgzf_block_write(fp, buffer, c) < 0) error("Could not write %d bytes: Error %d\n", c, fp->errcode);
}
else {
while ((c = read(f_src, buffer, WINDOW_SIZE)) > 0)
if (bgzf_write(fp, buffer, c) < 0) error("Could not write %d bytes: Error %d\n", c, fp->errcode);
}
if ( index )
{
if (index_fname) {
if (bgzf_index_dump(fp, index_fname, NULL) < 0)
error("Could not write index to '%s'\n", index_fname);
} else {
if (bgzf_index_dump(fp, argv[optind], ".gz.gzi") < 0)
error("Could not write index to '%s.gz.gzi'", argv[optind]);
}
}
if (bgzf_close(fp) < 0) error("Close failed: Error %d", fp->errcode);
if (argc > optind && !pstdout) unlink(argv[optind]);
free(buffer);
close(f_src);
return 0;
}
else if ( reindex )
{
if ( argc>optind )
{
fp = bgzf_open(argv[optind], "r");
if ( !fp ) error("[bgzip] Could not open file: %s\n", argv[optind]);
}
else
{
if ( !index_fname ) error("[bgzip] Index file name expected when reading from stdin\n");
fp = bgzf_open("-", "r");
if ( !fp ) error("[bgzip] Could not read from stdin: %s\n", strerror(errno));
}
buffer = malloc(BGZF_BLOCK_SIZE);
bgzf_index_build_init(fp);
int ret;
while ( (ret=bgzf_read(fp, buffer, BGZF_BLOCK_SIZE))>0 ) ;
free(buffer);
if ( ret<0 ) error("Is the file gzipped or bgzipped? The latter is required for indexing.\n");
if ( index_fname ) {
if (bgzf_index_dump(fp, index_fname, NULL) < 0)
error("Could not write index to '%s'\n", index_fname);
} else {
if (bgzf_index_dump(fp, argv[optind], ".gzi") < 0)
error("Could not write index to '%s.gzi'\n", argv[optind]);
}
if ( bgzf_close(fp)<0 ) error("Close failed: Error %d\n",fp->errcode);
return 0;
}
else
{
struct stat sbuf;
int f_dst;
if ( argc>optind )
{
if ( stat(argv[optind],&sbuf)<0 )
{
fprintf(stderr, "[bgzip] %s: %s\n", strerror(errno), argv[optind]);
return 1;
}
char *name;
int len = strlen(argv[optind]);
if ( strcmp(argv[optind]+len-3,".gz") )
{
fprintf(stderr, "[bgzip] %s: unknown suffix -- ignored\n", argv[optind]);
return 1;
}
fp = bgzf_open(argv[optind], "r");
if (fp == NULL) {
fprintf(stderr, "[bgzip] Could not open file: %s\n", argv[optind]);
return 1;
}
if (pstdout) {
f_dst = fileno(stdout);
}
else {
const int wrflags = O_WRONLY | O_CREAT | O_TRUNC;
name = strdup(argv[optind]);
name[strlen(name) - 3] = '\0';
f_dst = open(name, is_forced? wrflags : wrflags|O_EXCL, 0666);
if (f_dst < 0 && errno == EEXIST && confirm_overwrite(name))
f_dst = open(name, wrflags, 0666);
if (f_dst < 0) {
fprintf(stderr, "[bgzip] can't create %s: %s\n", name, strerror(errno));
free(name);
return 1;
}
free(name);
}
}
else if (!pstdout && isatty(fileno((FILE *)stdin)) )
return bgzip_main_usage();
else
{
f_dst = fileno(stdout);
fp = bgzf_open("-", "r");
if (fp == NULL) {
fprintf(stderr, "[bgzip] Could not read from stdin: %s\n", strerror(errno));
return 1;
}
}
if (threads > 1)
bgzf_mt(fp, threads, 256);
buffer = malloc(WINDOW_SIZE);
if ( start>0 )
{
if ( bgzf_index_load(fp, argv[optind], ".gzi") < 0 ) error("Could not load index: %s.gzi\n", argv[optind]);
if ( bgzf_useek(fp, start, SEEK_SET) < 0 ) error("Could not seek to %d-th (uncompressd) byte\n", start);
}
while (1) {
if (end < 0) c = bgzf_read(fp, buffer, WINDOW_SIZE);
else c = bgzf_read(fp, buffer, (end - start > WINDOW_SIZE)? WINDOW_SIZE:(end - start));
if (c == 0) break;
if (c < 0) error("Could not read %d bytes: Error %d\n", (end - start > WINDOW_SIZE)? WINDOW_SIZE:(end - start), fp->errcode);
start += c;
if ( write(f_dst, buffer, c) != c ) error("Could not write %d bytes\n", c);
if (end >= 0 && start >= end) break;
}
free(buffer);
if (bgzf_close(fp) < 0) error("Close failed: Error %d\n",fp->errcode);
if (!pstdout) unlink(argv[optind]);
return 0;
}
}
# Optional configure Makefile overrides for htslib.
#
# Copyright (C) 2015-2017 Genome Research Ltd.
#
# Author: John Marshall <jm18@sanger.ac.uk>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# This is @configure_input@
#
# If you use configure, this file overrides variables and augments rules
# in the Makefile to reflect your configuration choices. If you don't run
# configure, the main Makefile contains suitable conservative defaults.
prefix = @prefix@
exec_prefix = @exec_prefix@
bindir = @bindir@
includedir = @includedir@
libdir = @libdir@
libexecdir = @libexecdir@
datarootdir = @datarootdir@
mandir = @mandir@
CC = @CC@
RANLIB = @RANLIB@
CPPFLAGS = @CPPFLAGS@
CFLAGS = @CFLAGS@
LDFLAGS = @LDFLAGS@
LIBS = @LIBS@
PLATFORM = @PLATFORM@
PLUGIN_EXT = @PLUGIN_EXT@
# Lowercase here indicates these are "local" to config.mk
plugin_OBJS =
noplugin_LDFLAGS =
noplugin_LIBS =
# ifeq/.../endif, +=, and target-specific variables are GNU Make-specific.
# If you don't have GNU Make, comment out this conditional and note that
# to enable libcurl you will need to implement the following elsewhere.
ifeq "libcurl-@libcurl@" "libcurl-enabled"
LIBCURL_LIBS = -lcurl
plugin_OBJS += hfile_libcurl.o
hfile_libcurl$(PLUGIN_EXT): LIBS += $(LIBCURL_LIBS)
noplugin_LIBS += $(LIBCURL_LIBS)
endif
ifeq "gcs-@gcs@" "gcs-enabled"
plugin_OBJS += hfile_gcs.o
endif
ifeq "s3-@s3@" "s3-enabled"
plugin_OBJS += hfile_s3.o
CRYPTO_LIBS = @CRYPTO_LIBS@
noplugin_LIBS += $(CRYPTO_LIBS)
hfile_s3$(PLUGIN_EXT): LIBS += $(CRYPTO_LIBS)
endif
ifeq "plugins-@enable_plugins@" "plugins-yes"
plugindir = @plugindir@
pluginpath = @pluginpath@
LIBHTS_OBJS += plugin.o
PLUGIN_OBJS += $(plugin_OBJS)
plugin.o plugin.pico: CPPFLAGS += -DPLUGINPATH=\"$(pluginpath)\"
# When built as separate plugins, these record their version themselves.
hfile_gcs.o hfile_gcs.pico: version.h
hfile_libcurl.o hfile_libcurl.pico: version.h
hfile_s3.o hfile_s3.pico: version.h
# Windows DLL plugins depend on the import library, built as a byproduct.
$(plugin_OBJS:.o=.cygdll): cyghts-$(LIBHTS_SOVERSION).dll
else
LIBHTS_OBJS += $(plugin_OBJS)
LDFLAGS += $(noplugin_LDFLAGS)
LIBS += $(noplugin_LIBS)
endif
# Configure script for htslib, a C library for high-throughput sequencing data.
#
# Copyright (C) 2015-2017 Genome Research Ltd.
#
# Author: John Marshall <jm18@sanger.ac.uk>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
dnl Process this file with autoconf to produce a configure script
AC_INIT([HTSlib], m4_esyscmd_s([make print-version]),
[samtools-help@lists.sourceforge.net], [], [http://www.htslib.org/])
AC_PREREQ(2.63) dnl This version introduced 4-argument AC_CHECK_HEADER
AC_CONFIG_SRCDIR(hts.c)
AC_CONFIG_HEADERS(config.h)
dnl Copyright notice to be copied into the generated configure script
AC_COPYRIGHT([Portions copyright (C) 2016 Genome Research Ltd.
This configure script is free software: you are free to change and
redistribute it. There is NO WARRANTY, to the extent permitted by law.])
dnl Notes to be copied (by autoheader) into the generated config.h.in
AH_TOP([/* If you use configure, this file provides @%:@defines reflecting your
configuration choices. If you have not run configure, suitable
conservative defaults will be used.
Autoheader adds a number of items to this template file that are not
used by HTSlib: STDC_HEADERS and most HAVE_*_H header file defines
are immaterial, as we assume standard ISO C headers and facilities;
the PACKAGE_* defines are unused and are overridden by the more
accurate PACKAGE_VERSION as computed by the Makefile. */])
AC_PROG_CC
AC_PROG_RANLIB
dnl Avoid chicken-and-egg problem where pkg-config supplies the
dnl PKG_PROG_PKG_CONFIG macro, but we want to use it to check
dnl for pkg-config...
m4_ifdef([PKG_PROG_PKG_CONFIG], [PKG_PROG_PKG_CONFIG], [PKG_CONFIG=""])
need_crypto=no
pc_requires=
static_LDFLAGS=
static_LIBS='-lz -lm'
private_LIBS=
AC_ARG_ENABLE([bz2],
[AS_HELP_STRING([--disable-bz2],
[omit support for BZ2-compressed CRAM files])],
[], [enable_bz2=yes])
AC_ARG_ENABLE([gcs],
[AS_HELP_STRING([--enable-gcs],
[support Google Cloud Storage URLs])],
[], [enable_gcs=check])
AC_SYS_LARGEFILE
AC_ARG_ENABLE([libcurl],
[AS_HELP_STRING([--enable-libcurl],
[enable libcurl-based support for http/https/etc URLs])],
[], [enable_libcurl=check])
AC_ARG_ENABLE([lzma],
[AS_HELP_STRING([--disable-lzma],
[omit support for LZMA-compressed CRAM files])],
[], [enable_lzma=yes])
AC_ARG_ENABLE([plugins],
[AS_HELP_STRING([--enable-plugins],
[enable separately-compiled plugins for file access])],
[], [enable_plugins=no])
AC_SUBST(enable_plugins)
AC_ARG_WITH([plugin-dir],
[AS_HELP_STRING([--with-plugin-dir=DIR],
[plugin installation location [LIBEXECDIR/htslib]])],
[case $withval in
yes|no) AC_MSG_ERROR([no directory specified for --with-plugin-dir]) ;;
esac],
[with_plugin_dir='$(libexecdir)/htslib'])
AC_SUBST([plugindir], $with_plugin_dir)
AC_ARG_WITH([plugin-path],
[AS_HELP_STRING([--with-plugin-path=PATH],
[default HTS_PATH plugin search path [PLUGINDIR]])],
[case $withval in
yes) AC_MSG_ERROR([no path specified for --with-plugin-path]) ;;
no) with_plugin_path= ;;
esac],
[with_plugin_path=$with_plugin_dir])
AC_SUBST([pluginpath], $with_plugin_path)
AC_ARG_ENABLE([s3],
[AS_HELP_STRING([--enable-s3],
[support Amazon AWS S3 URLs])],
[], [enable_s3=check])
AC_MSG_CHECKING([shared library type])
test -n "$host_alias" || host_alias=unknown-`uname -s`
case $host_alias in
*-cygwin* | *-CYGWIN*)
host_result="Cygwin DLL"
PLATFORM=CYGWIN
PLUGIN_EXT=.cygdll
;;
*-darwin* | *-Darwin*)
host_result="Darwin dylib"
PLATFORM=Darwin
PLUGIN_EXT=.bundle
;;
*)
host_result="plain .so"
PLATFORM=default
PLUGIN_EXT=.so
;;
esac
AC_MSG_RESULT([$host_result])
AC_SUBST([PLATFORM])
dnl FIXME This pulls in dozens of standard header checks
AC_FUNC_MMAP
AC_CHECK_FUNCS(gmtime_r)
# Darwin has a dubious fdatasync() symbol, but no declaration in <unistd.h>
AC_CHECK_DECL([fdatasync(int)], [AC_CHECK_FUNCS(fdatasync)])
if test $enable_plugins != no; then
AC_SEARCH_LIBS([dlopen], [dl], [],
[AC_MSG_ERROR([dlopen() not found
Plugin support requires dynamic linking facilities from the operating system.
Either configure with --disable-plugins or resolve this error to build HTSlib.])])
# TODO Test whether this is required and/or needs tweaking per-platform
LDFLAGS="$LDFLAGS -rdynamic"
static_LDFLAGS="$static_LDFLAGS -rdynamic"
case "$ac_cv_search_dlopen" in
-l*) static_LIBS="$static_LIBS $ac_cv_search_dlopen" ;;
esac
AC_DEFINE([ENABLE_PLUGINS], 1, [Define if HTSlib should enable plugins.])
AC_SUBST([PLUGIN_EXT])
AC_DEFINE_UNQUOTED([PLUGIN_EXT], ["$PLUGIN_EXT"],
[Platform-dependent plugin filename extension.])
fi
AC_SEARCH_LIBS([log], [m], [],
[AC_MSG_ERROR([log() not found
HTSLIB requires a working floating-point math library.
FAILED. This error must be resolved in order to build HTSlib successfully.])])
zlib_devel=ok
dnl Set a trivial non-empty INCLUDES to avoid excess default includes tests
AC_CHECK_HEADER([zlib.h], [], [zlib_devel=missing], [;])
AC_CHECK_LIB(z, inflate, [], [zlib_devel=missing])
if test $zlib_devel != ok; then
AC_MSG_ERROR([zlib development files not found
HTSlib uses compression routines from the zlib library <http://zlib.net>.
Building HTSlib requires zlib development files to be installed on the build
machine; you may need to ensure a package such as zlib1g-dev (on Debian or
Ubuntu Linux) or zlib-devel (on RPM-based Linux distributions or Cygwin)
is installed.
FAILED. This error must be resolved in order to build HTSlib successfully.])
fi
dnl connect() etc. fns are in libc on linux, but libsocket on illumos/Solaris
libsocket=unneeded
AC_SEARCH_LIBS(connect, socket, [libsocket=needed], [])
if test "$enable_bz2" != no; then
bz2_devel=ok
AC_CHECK_HEADER([bzlib.h], [], [bz2_devel=missing], [;])
AC_CHECK_LIB([bz2], [BZ2_bzBuffToBuffCompress], [], [bz2_devel=missing])
if test $bz2_devel != ok; then
AC_MSG_ERROR([libbzip2 development files not found
The CRAM format may use bzip2 compression, which is implemented in HTSlib
by using compression routines from libbzip2 <http://www.bzip.org/>.
Building HTSlib requires libbzip2 development files to be installed on the
build machine; you may need to ensure a package such as libbz2-dev (on Debian
or Ubuntu Linux) or bzip2-devel (on RPM-based Linux distributions or Cygwin)
is installed.
Either configure with --disable-bz2 (which will make some CRAM files
produced elsewhere unreadable) or resolve this error to build HTSlib.])
fi
dnl Unfortunately the 'bzip2' package-cfg module is not standard.
dnl Redhat/Fedora has it; Debian/Ubuntu does not.
if test -n "$PKG_CONFIG" && "$PKG_CONFIG" --exists bzip2; then
pc_requires="$pc_requires bzip2"
else
private_LIBS="$private_LIBS -lbz2"
fi
static_LIBS="$static_LIBS -lbz2"
fi
if test "$enable_lzma" != no; then
lzma_devel=ok
AC_CHECK_HEADER([lzma.h], [], [lzma_devel=missing], [;])
AC_CHECK_LIB([lzma], [lzma_easy_buffer_encode], [], [lzma_devel=missing])
if test $lzma_devel != ok; then
AC_MSG_ERROR([liblzma development files not found
The CRAM format may use LZMA2 compression, which is implemented in HTSlib
by using compression routines from liblzma <http://tukaani.org/xz/>.
Building HTSlib requires liblzma development files to be installed on the
build machine; you may need to ensure a package such as liblzma-dev (on Debian
or Ubuntu Linux), xz-devel (on RPM-based Linux distributions or Cygwin), or
xz (via Homebrew on macOS) is installed; or build XZ Utils from source.
Either configure with --disable-lzma (which will make some CRAM files
produced elsewhere unreadable) or resolve this error to build HTSlib.])
fi
pc_requires="$pc_requires liblzma"
static_LIBS="$static_LIBS -llzma"
fi
libcurl=disabled
if test "$enable_libcurl" != no; then
AC_CHECK_LIB([curl], [curl_easy_pause],
[AC_DEFINE([HAVE_LIBCURL], 1, [Define if libcurl file access is enabled.])
libcurl=enabled],
[AC_CHECK_LIB([curl], [curl_easy_init],
[message="library is too old (7.18+ required)"],
[message="library not found"])
case "$enable_libcurl" in
check) AC_MSG_WARN([libcurl not enabled: $message]) ;;
*) AC_MSG_ERROR([libcurl $message
Support for HTTPS and other SSL-based URLs requires routines from the libcurl
library <http://curl.haxx.se/libcurl/>. Building HTSlib with libcurl enabled
requires libcurl development files to be installed on the build machine; you
may need to ensure a package such as libcurl4-{gnutls,nss,openssl}-dev (on
Debian or Ubuntu Linux) or libcurl-devel (on RPM-based Linux distributions
or Cygwin) is installed.
Either configure with --disable-libcurl or resolve this error to build HTSlib.])
;;
esac])
dnl -lcurl is only needed for static linking if hfile_libcurl is not a plugin
if test "$libcurl" = enabled ; then
if test "$enable_plugins" != yes ; then
static_LIBS="$static_LIBS -lcurl"
fi
fi
fi
AC_SUBST([libcurl])
gcs=disabled
if test "$enable_gcs" != no; then
if test $libcurl = enabled; then
AC_DEFINE([ENABLE_GCS], 1, [Define if HTSlib should enable GCS support.])
gcs=enabled
else
case "$enable_gcs" in
check) AC_MSG_WARN([GCS support not enabled: requires libcurl support]) ;;
*) AC_MSG_ERROR([GCS support not enabled
Support for Google Cloud Storage URLs requires libcurl support to be enabled
in HTSlib. Configure with --enable-libcurl in order to use GCS URLs.])
;;
esac
fi
fi
AC_SUBST([gcs])
s3=disabled
if test "$enable_s3" != no; then
if test $libcurl = enabled; then
s3=enabled
need_crypto="$enable_s3"
else
case "$enable_s3" in
check) AC_MSG_WARN([S3 support not enabled: requires libcurl support]) ;;
*) AC_MSG_ERROR([S3 support not enabled
Support for Amazon AWS S3 URLs requires libcurl support to be enabled
in HTSlib. Configure with --enable-libcurl in order to use S3 URLs.])
;;
esac
fi
fi
CRYPTO_LIBS=
if test $need_crypto != no; then
AC_CHECK_FUNC([CCHmac],
[AC_DEFINE([HAVE_COMMONCRYPTO], 1,
[Define if you have the Common Crypto library.])],
[save_LIBS=$LIBS
AC_SEARCH_LIBS([HMAC], [crypto],
[AC_DEFINE([HAVE_HMAC], 1, [Define if you have libcrypto-style HMAC().])
case "$ac_cv_search_HMAC" in
-l*) CRYPTO_LIBS=$ac_cv_search_HMAC ;;
esac],
[case "$need_crypto" in
check) AC_MSG_WARN([S3 support not enabled: requires SSL development files])
s3=disabled ;;
*) AC_MSG_ERROR([SSL development files not found
Support for AWS S3 URLs requires routines from an SSL library. Building
HTSlib with libcurl enabled requires SSL development files to be installed
on the build machine; you may need to ensure a package such as libgnutls-dev,
libnss3-dev, or libssl-dev (on Debian or Ubuntu Linux, corresponding to the
libcurl4-*-dev package installed), or openssl-devel (on RPM-based Linux
distributions or Cygwin) is installed.
Either configure with --disable-s3 or resolve this error to build HTSlib.]) ;;
esac])
LIBS=$save_LIBS])
dnl Only need to add to static_LIBS if not building as a plugin
if test "$enable_plugins" != yes ; then
static_LIBS="$static_LIBS $CRYPTO_LIBS"
fi
fi
if test "$s3" = enabled ; then
AC_DEFINE([ENABLE_S3], 1, [Define if HTSlib should enable S3 support.])
fi
AC_SUBST([s3])
AC_SUBST([CRYPTO_LIBS])
AC_SUBST([pc_requires])
AC_SUBST([private_LIBS])
AC_SUBST([static_LDFLAGS])
AC_SUBST([static_LIBS])
AC_CONFIG_FILES([config.mk htslib.pc.tmp:htslib.pc.in])
AC_OUTPUT
/*
Copyright (c) 2012-2013 Genome Research Ltd.
Author: James Bonfield <jkb@sanger.ac.uk>
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the names Genome Research Ltd and Wellcome Trust Sanger
Institute nor the names of its contributors may be used to endorse or promote
products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY GENOME RESEARCH LTD AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL GENOME RESEARCH LTD OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/*! \file
* CRAM interface.
*
* Consider using the higher level hts_*() API for programs that wish to
* be file format agnostic (see htslib/hts.h).
*
* This API should be used for CRAM specific code. The specifics of the
* public API are implemented in cram_io.h, cram_encode.h and cram_decode.h
* although these should not be included directly (use this file instead).
*/
#ifndef _CRAM_H_
#define _CRAM_H_
#include "cram/cram_samtools.h"
#include "cram/sam_header.h"
#include "cram_structs.h"
#include "cram_io.h"
#include "cram_encode.h"
#include "cram_decode.h"
#include "cram_stats.h"
#include "cram_codecs.h"
#include "cram_index.h"
// Validate against the external cram.h,
//
// This contains duplicated portions from cram_io.h and cram_structs.h,
// so we want to ensure that the prototypes match.
#include "htslib/cram.h"
#endif
This diff is collapsed.
/*
Copyright (c) 2012-2013 Genome Research Ltd.
Author: James Bonfield <jkb@sanger.ac.uk>
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the names Genome Research Ltd and Wellcome Trust Sanger
Institute nor the names of its contributors may be used to endorse or promote
products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY GENOME RESEARCH LTD AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL GENOME RESEARCH LTD OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef _CRAM_ENCODINGS_H_
#define _CRAM_ENCODINGS_H_
#include <inttypes.h>
#ifdef __cplusplus
extern "C" {
#endif
struct cram_codec;
/*
* Slow but simple huffman decoder to start with.
* Read a bit at a time, keeping track of {length, value}
* eg. 1 1 0 1 => {1,1}, {2,3}, {3,6}, {4,13}
*
* Keep track of this through the huffman code table.
* For fast scanning we have an index of where the first code of length X
* appears.
*/
typedef struct {
int32_t symbol;
int32_t p; // next code start value, minus index to codes[]
int32_t code;
int32_t len;
} cram_huffman_code;
typedef struct {
int ncodes;
cram_huffman_code *codes;
} cram_huffman_decoder;
#define MAX_HUFF 128
typedef struct {
cram_huffman_code *codes;
int nvals;
int val2code[MAX_HUFF+1]; // value to code lookup for small values
} cram_huffman_encoder;
typedef struct {
int32_t offset;
int32_t nbits;
} cram_beta_decoder;
typedef struct {
int32_t offset;
} cram_gamma_decoder;
typedef struct {
int32_t offset;
int32_t k;
} cram_subexp_decoder;
typedef struct {
int32_t content_id;
enum cram_external_type type;
cram_block *b;
} cram_external_decoder;
typedef struct {
struct cram_codec *len_codec;
struct cram_codec *val_codec;
} cram_byte_array_len_decoder;
typedef struct {
unsigned char stop;
int32_t content_id;
cram_block *b;
} cram_byte_array_stop_decoder;
typedef struct {
enum cram_encoding len_encoding;
enum cram_encoding val_encoding;
void *len_dat;
void *val_dat;
struct cram_codec *len_codec;
struct cram_codec *val_codec;
} cram_byte_array_len_encoder;
/*
* A generic codec structure.
*/
typedef struct cram_codec {
enum cram_encoding codec;
cram_block *out;
void (*free)(struct cram_codec *codec);
int (*decode)(cram_slice *slice, struct cram_codec *codec,
cram_block *in, char *out, int *out_size);
int (*encode)(cram_slice *slice, struct cram_codec *codec,
char *in, int in_size);
int (*store)(struct cram_codec *codec, cram_block *b, char *prefix,
int version);
void (*reset)(struct cram_codec *codec); // used between slices in a container
union {
cram_huffman_decoder huffman;
cram_external_decoder external;
cram_beta_decoder beta;
cram_gamma_decoder gamma;
cram_subexp_decoder subexp;
cram_byte_array_len_decoder byte_array_len;
cram_byte_array_stop_decoder byte_array_stop;
cram_huffman_encoder e_huffman;
cram_external_decoder e_external;
cram_byte_array_stop_decoder e_byte_array_stop;
cram_byte_array_len_encoder e_byte_array_len;
cram_beta_decoder e_beta;
};
} cram_codec;
const char *cram_encoding2str(enum cram_encoding t);
cram_codec *cram_decoder_init(enum cram_encoding codec, char *data, int size,
enum cram_external_type option,
int version);
cram_codec *cram_encoder_init(enum cram_encoding codec, cram_stats *st,
enum cram_external_type option, void *dat,
int version);
//int cram_decode(void *codes, char *in, int in_size, char *out, int *out_size);
//void cram_decoder_free(void *codes);
//#define GET_BIT_MSB(b,v) (void)(v<<=1, v|=(b->data[b->byte] >> b->bit)&1, (--b->bit == -1) && (b->bit = 7, b->byte++))
#define GET_BIT_MSB(b,v) (void)(v<<=1, v|=(b->data[b->byte] >> b->bit)&1, b->byte += (--b->bit<0), b->bit&=7)
/*
* Check that enough bits are left in a block to satisy a bit-based decoder.
* Return 0 if there are enough
* 1 if not.
*/
static inline int cram_not_enough_bits(cram_block *blk, int nbits) {
if (nbits < 0 ||
(blk->byte >= blk->uncomp_size && nbits > 0) ||
(blk->uncomp_size - blk->byte <= INT32_MAX / 8 + 1 &&
(blk->uncomp_size - blk->byte) * 8 + blk->bit - 7 < nbits)) {
return 1;
}
return 0;
}
/*
* Returns the content_id used by this codec, also in id2 if byte_array_len.
* Returns -1 for the CORE block and -2 for unneeded.
* id2 is only filled out for BYTE_ARRAY_LEN which uses 2 codecs.
*/
int cram_codec_to_id(cram_codec *c, int *id2);
/*
* cram_codec structures are specialised for decoding or encoding.
* Unfortunately this makes turning a decoder into an encoder (such as
* when transcoding files) problematic.
*
* This function converts a cram decoder codec into an encoder version
* in-place (ie it modifiers the codec itself).
*
* Returns 0 on success;
* -1 on failure.
*/
int cram_codec_decoder2encoder(cram_fd *fd, cram_codec *c);
#ifdef __cplusplus
}
#endif
#endif /* _CRAM_ENCODINGS_H_ */
This diff is collapsed.
/*
Copyright (c) 2012-2013 Genome Research Ltd.
Author: James Bonfield <jkb@sanger.ac.uk>
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the names Genome Research Ltd and Wellcome Trust Sanger
Institute nor the names of its contributors may be used to endorse or promote
products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY GENOME RESEARCH LTD AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL GENOME RESEARCH LTD OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/*! \file
* Include cram.h instead.
*
* This is an internal part of the CRAM system and is automatically included
* when you #include cram.h.
*
* Implements the decoding portion of CRAM I/O. Also see
* cram_codecs.[ch] for the actual encoding functions themselves.
*/
#ifndef _CRAM_READ_H_
#define _CRAM_READ_H_
#ifdef __cplusplus
extern "C" {
#endif
/* ----------------------------------------------------------------------
* CRAM sequence iterators.
*/
/*! Read the next cram record and return it as a cram_record.
*
* Note that to decode cram_record the caller will need to look up some data
* in the current slice, pointed to by fd->ctr->slice. This is valid until
* the next call to cram_get_seq (which may invalidate it).
*
* @return
* Returns record pointer on success (do not free);
* NULL on failure
*/
cram_record *cram_get_seq(cram_fd *fd);
/*! Read the next cram record and convert it to a bam_seq_t struct.
*
* @return
* Returns 0 on success;
* -1 on EOF or failure (check fd->err)
*/
int cram_get_bam_seq(cram_fd *fd, bam_seq_t **bam);
/* ----------------------------------------------------------------------
* Internal functions
*/
/*! INTERNAL:
* Decodes a CRAM block compression header.
*
* @return
* Returns header ptr on success;
* NULL on failure
*/
cram_block_compression_hdr *cram_decode_compression_header(cram_fd *fd,
cram_block *b);
/*! INTERNAL:
* Decodes a CRAM (un)mapped slice header block.
*
* @return
* Returns slice header ptr on success;
* NULL on failure
*/
cram_block_slice_hdr *cram_decode_slice_header(cram_fd *fd, cram_block *b);
/*! INTERNAL:
* Decode an entire slice from container blocks. Fills out s->crecs[] array.
*
* @return
* Returns 0 on success;
* -1 on failure
*/
int cram_decode_slice(cram_fd *fd, cram_container *c, cram_slice *s,
SAM_hdr *hdr);
#ifdef __cplusplus
}
#endif
#endif
This diff is collapsed.