Skip to content
Commits on Source (4)
......@@ -3,7 +3,7 @@ according to the terms of the following MIT/Expat license.]
The MIT/Expat License
Copyright (C) 2012-2018 Genome Research Ltd.
Copyright (C) 2012-2019 Genome Research Ltd.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
......@@ -29,7 +29,7 @@ according to the terms of the following Modified 3-Clause BSD license.]
The Modified-BSD License
Copyright (C) 2012-2018 Genome Research Ltd.
Copyright (C) 2012-2019 Genome Research Ltd.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
......
This diff is collapsed.
This diff is collapsed.
# HTSlib 64 bit reference positions
HTSlib version 1.10 onwards internally use 64 bit reference positions. This
is to support analysis of species like axolotl, tulip and marbled lungfish
which have, or are expected to have, chromosomes longer than two gigabases.
# File format support
Currently 64 bit positions can only be stored in SAM and VCF format files.
Binary BAM, CRAM and BCF cannot be used due to limitations in the formats
themselves. As SAM and VCF are text formats, they have no limit on the
size of numeric values.
# Compatibility issues to check
Various data structure members, function parameters, and return values have
been expanded from 32 to 64 bits. As a result, some changes may be needed to
code that uses the library, even if it does not support long references.
## Variadic functions taking format strings
The type of various structure members (e.g. `bam1_core_t::pos`) and return
values from some functions (e.g. `bam_cigar2rlen()`) have been changed to
`hts_pos_t`, which is a 64-bit signed integer. Using these in 32-bit
code will generally work (as long as the stored positions are within range),
however care needs to be taken when these values are passed directly
to functions like `printf()` which take a variable-length argument list and
a format string.
Header file `htslib/hts.h` defines macro `PRIhts_pos` which can be
used in `printf()` format strings to get the correct format specifier for
an `hts_pos_t` value. Code that needs to print positions should be
changed from:
```c
printf("Position is %d\n", bam->core.pos);
```
to:
```c
printf("Position is %"PRIhts_pos"\n", bam->core.pos);
```
If for some reason compatibility with older versions of HTSlib (which do
not have `hts_pos_t` or `PRIhts_pos`) is needed, the value can be cast to
`int64_t` and printed as an explicitly 64-bit value:
```c
#include <inttypes.h> // For PRId64 and int64_t
printf("Position is %" PRId64 "\n", (int64_t) bam->core.pos);
```
Passing incorrect types to variadic functions like `printf()` can lead
to incorrect behaviour and security risks, so it important to track down
and fix all of the places where this may happen. Modern C compilers like
gcc (version 3.0 onwards) and clang can check `printf()` and `scanf()`
parameter types for compatibility against the format string. To
enable this, build code with `-Wall` or `-Wformat` and fix all the
reported warnings.
Where functions that take `printf`-style format strings are implemented,
they should use the appropriate gcc attributes to enable format string
checking. `htslib/hts_defs.h` includes macros `HTS_FORMAT` and
`HTS_PRINTF_FMT` which can be used to provide the attribute declaration
in a portable way. For example, `test/sam.c` uses them for a function
that prints error messages:
```
void HTS_FORMAT(HTS_PRINTF_FMT, 1, 2) fail(const char *fmt, ...) { /* ... */ }
```
## Implicit type conversions
Conversion of signed `int` or `int32_t` to `hts_pos_t` will always work.
Conversion of `hts_pos_t` to `int` or `int32_t` will work as long as the value
converted is within the range that can be stored in the destination.
Code that casts unsigned `uint32_t` values to signed with the expectation
that the result may be negative will no longer work as `hts_pos_t` can store
values over UINT32_MAX. Such code should be changed to use signed values.
Functions hts_parse_region() and hts_parse_reg64() return special value
`HTS_POS_MAX` for regions which extend to the end of the reference.
This value is slightly smaller than INT64_MAX, but should be larger than
any reference that is likely to be used. When cast to `int32_t` the
result should be `INT32_MAX`.
# Upgrading code to work with 64 bit positions
Variables used to store reference positions should be changed to
type `hts_pos_t`. Use `PRIhts_pos` in format strings when printing them.
When converting positions stored in strings, use `strtoll()` in place of
`atoi()` or `strtol()` (which produces a 32 bit value on 64-bit Windows and
all 32-bit platforms).
Programs which need to look up a reference sequence length from a `sam_hdr_t`
structure should use `sam_hdr_tid2len()` instead of the old
`sam_hdr_t::target_len` array (which is left as 32-bit for reasons of
compatibility). `sam_hdr_tid2len()` returns `hts_pos_t`, so works correctly
for large references.
Various functions which take pointer arguments have new versions which
support `hts_pos_t *` arguments. Code supporting 64-bit positions should
use the new versions. These are:
Original function | 64-bit version
------------------ | --------------------
fai_fetch() | fai_fetch64()
fai_fetchqual() | fai_fetchqual64()
faidx_fetch_seq() | faidx_fetch_seq64()
faidx_fetch_qual() | faidx_fetch_qual64()
hts_parse_reg() | hts_parse_reg64() or hts_parse_region()
bam_plp_auto() | bam_plp64_auto()
bam_plp_next() | bam_plp64_next()
bam_mplp_auto() | bam_mplp64_auto()
Limited support has been added for 64-bit INFO values in VCF files, for large
values in structural variant END tags. New functions `bcf_update_info_int64()`
and `bcf_get_info_int64()` can be used to set and fetch 64-bit INFO values.
They both take arrays of `int64_t`. `bcf_int64_missing` and
`bcf_int64_vector_end` can be used to set missing and vector end values in
these arrays. The INFO data is stored in the minimum size needed, so there
is no harm in using these functions to store smaller integer values.
# Structure members that have changed size
```
File htslib/hts.h:
hts_pair32_t::begin
hts_pair32_t::end
(typedef hts_pair_pos_t is provided as a better-named replacement for hts_pair32_t)
hts_reglist_t::min_beg
hts_reglist_t::max_end
hts_itr_t::beg
hts_itr_t::end
hts_itr_t::curr_beg
hts_itr_t::curr_end
File htslib/regidx.h:
reg_t::start
reg_t::end
File htslib/sam.h:
bam1_core_t::pos
bam1_core_t::mpos
bam1_core_t::isize
File htslib/synced_bcf_reader.h:
bcf_sr_regions_t::start
bcf_sr_regions_t::end
bcf_sr_regions_t::prev_start
File htslib/vcf.h:
bcf_idinfo_t::info
bcf_info_t::v1::i
bcf1_t::pos
bcf1_t::rlen
```
# Functions where parameters or the return value have changed size
Functions are annotated as follows:
* `[new]` The function has been added since version 1.9
* `[parameters]` Function parameters have changed size
* `[return]` Function return value has changed size
```
File htslib/faidx.h:
[new] fai_fetch64()
[new] fai_fetchqual64()
[new] faidx_fetch_seq64()
[new] faidx_fetch_qual64()
[new] fai_parse_region()
File htslib/hts.h:
[parameters] hts_idx_push()
[new] hts_parse_reg64()
[parameters] hts_itr_query()
[parameters] hts_reg2bin()
File htslib/kstring.h:
[new] kputll()
File htslib/regidx.h:
[parameters] regidx_overlap()
File htslib/sam.h:
[new] sam_hdr_tid2len()
[return] bam_cigar2qlen()
[return] bam_cigar2rlen()
[return] bam_endpos()
[parameters] bam_itr_queryi()
[parameters] sam_itr_queryi()
[new] bam_plp64_next()
[new] bam_plp64_auto()
[new] bam_mplp64_auto()
[parameters] sam_cap_mapq()
[parameters] sam_prob_realn()
File htslib/synced_bcf_reader.h:
[parameters] bcf_sr_seek()
[parameters] bcf_sr_regions_overlap()
File htslib/tbx.h:
[parameters] tbx_readrec()
File htslib/vcf.h:
[parameters] bcf_readrec()
[new] bcf_update_info_int64()
[new] bcf_get_info_int64()
[return] bcf_dec_int1()
[return] bcf_dec_typed_int1()
```
# generated automatically by aclocal 1.14.1 -*- Autoconf -*-
# Copyright (C) 1996-2013 Free Software Foundation, Inc.
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])])
# pkg.m4 - Macros to locate and utilise pkg-config. -*- Autoconf -*-
# serial 1 (pkg-config-0.24)
#
# Copyright © 2004 Scott James Remnant <scott@netsplit.com>.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.
# PKG_PROG_PKG_CONFIG([MIN-VERSION])
# ----------------------------------
AC_DEFUN([PKG_PROG_PKG_CONFIG],
[m4_pattern_forbid([^_?PKG_[A-Z_]+$])
m4_pattern_allow([^PKG_CONFIG(_(PATH|LIBDIR|SYSROOT_DIR|ALLOW_SYSTEM_(CFLAGS|LIBS)))?$])
m4_pattern_allow([^PKG_CONFIG_(DISABLE_UNINSTALLED|TOP_BUILD_DIR|DEBUG_SPEW)$])
AC_ARG_VAR([PKG_CONFIG], [path to pkg-config utility])
AC_ARG_VAR([PKG_CONFIG_PATH], [directories to add to pkg-config's search path])
AC_ARG_VAR([PKG_CONFIG_LIBDIR], [path overriding pkg-config's built-in search path])
if test "x$ac_cv_env_PKG_CONFIG_set" != "xset"; then
AC_PATH_TOOL([PKG_CONFIG], [pkg-config])
fi
if test -n "$PKG_CONFIG"; then
_pkg_min_version=m4_default([$1], [0.9.0])
AC_MSG_CHECKING([pkg-config is at least version $_pkg_min_version])
if $PKG_CONFIG --atleast-pkgconfig-version $_pkg_min_version; then
AC_MSG_RESULT([yes])
else
AC_MSG_RESULT([no])
PKG_CONFIG=""
fi
fi[]dnl
])# PKG_PROG_PKG_CONFIG
# PKG_CHECK_EXISTS(MODULES, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])
#
# Check to see whether a particular set of modules exists. Similar
# to PKG_CHECK_MODULES(), but does not set variables or print errors.
#
# Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG])
# only at the first occurence in configure.ac, so if the first place
# it's called might be skipped (such as if it is within an "if", you
# have to call PKG_CHECK_EXISTS manually
# --------------------------------------------------------------
AC_DEFUN([PKG_CHECK_EXISTS],
[AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
if test -n "$PKG_CONFIG" && \
AC_RUN_LOG([$PKG_CONFIG --exists --print-errors "$1"]); then
m4_default([$2], [:])
m4_ifvaln([$3], [else
$3])dnl
fi])
# _PKG_CONFIG([VARIABLE], [COMMAND], [MODULES])
# ---------------------------------------------
m4_define([_PKG_CONFIG],
[if test -n "$$1"; then
pkg_cv_[]$1="$$1"
elif test -n "$PKG_CONFIG"; then
PKG_CHECK_EXISTS([$3],
[pkg_cv_[]$1=`$PKG_CONFIG --[]$2 "$3" 2>/dev/null`
test "x$?" != "x0" && pkg_failed=yes ],
[pkg_failed=yes])
else
pkg_failed=untried
fi[]dnl
])# _PKG_CONFIG
# _PKG_SHORT_ERRORS_SUPPORTED
# -----------------------------
AC_DEFUN([_PKG_SHORT_ERRORS_SUPPORTED],
[AC_REQUIRE([PKG_PROG_PKG_CONFIG])
if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
_pkg_short_errors_supported=yes
else
_pkg_short_errors_supported=no
fi[]dnl
])# _PKG_SHORT_ERRORS_SUPPORTED
# PKG_CHECK_MODULES(VARIABLE-PREFIX, MODULES, [ACTION-IF-FOUND],
# [ACTION-IF-NOT-FOUND])
#
#
# Note that if there is a possibility the first call to
# PKG_CHECK_MODULES might not happen, you should be sure to include an
# explicit call to PKG_PROG_PKG_CONFIG in your configure.ac
#
#
# --------------------------------------------------------------
AC_DEFUN([PKG_CHECK_MODULES],
[AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
AC_ARG_VAR([$1][_CFLAGS], [C compiler flags for $1, overriding pkg-config])dnl
AC_ARG_VAR([$1][_LIBS], [linker flags for $1, overriding pkg-config])dnl
pkg_failed=no
AC_MSG_CHECKING([for $1])
_PKG_CONFIG([$1][_CFLAGS], [cflags], [$2])
_PKG_CONFIG([$1][_LIBS], [libs], [$2])
m4_define([_PKG_TEXT], [Alternatively, you may set the environment variables $1[]_CFLAGS
and $1[]_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.])
if test $pkg_failed = yes; then
AC_MSG_RESULT([no])
_PKG_SHORT_ERRORS_SUPPORTED
if test $_pkg_short_errors_supported = yes; then
$1[]_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "$2" 2>&1`
else
$1[]_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "$2" 2>&1`
fi
# Put the nasty error message in config.log where it belongs
echo "$$1[]_PKG_ERRORS" >&AS_MESSAGE_LOG_FD
m4_default([$4], [AC_MSG_ERROR(
[Package requirements ($2) were not met:
$$1_PKG_ERRORS
Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.
_PKG_TEXT])[]dnl
])
elif test $pkg_failed = untried; then
AC_MSG_RESULT([no])
m4_default([$4], [AC_MSG_FAILURE(
[The pkg-config script could not be found or is too old. Make sure it
is in your PATH or set the PKG_CONFIG environment variable to the full
path to pkg-config.
_PKG_TEXT
To get pkg-config, see <http://pkg-config.freedesktop.org/>.])[]dnl
])
else
$1[]_CFLAGS=$pkg_cv_[]$1[]_CFLAGS
$1[]_LIBS=$pkg_cv_[]$1[]_LIBS
AC_MSG_RESULT([yes])
$3
fi[]dnl
])# PKG_CHECK_MODULES
/*
Copyright (C) 2017 Genome Research Ltd.
Copyright (C) 2017-2019 Genome Research Ltd.
Author: Petr Danecek <pd3@sanger.ac.uk>
......@@ -22,12 +22,14 @@
THE SOFTWARE.
*/
#define HTS_BUILDING_LIBRARY // Enables HTSLIB_EXPORT, see htslib/hts_defs.h
#include <config.h>
#include <strings.h>
#include "bcf_sr_sort.h"
#include "htslib/khash_str2int.h"
#include "htslib/kbitset.h"
#define SR_REF 1
#define SR_SNP 2
......@@ -35,22 +37,6 @@
#define SR_OTHER 8
#define SR_SCORE(srt,a,b) (srt)->score[((a)<<4)|(b)]
// Resize a bit set.
static inline kbitset_t *kbs_resize(kbitset_t *bs, size_t ni)
{
if ( !bs ) return kbs_init(ni);
size_t n = (ni + KBS_ELTBITS-1) / KBS_ELTBITS;
if ( n==bs->n ) return bs;
bs = (kbitset_t *) realloc(bs, sizeof(kbitset_t) + n * sizeof(unsigned long));
if ( bs==NULL ) return NULL;
if ( n > bs->n )
memset(bs->b + bs->n, 0, (n - bs->n) * sizeof (unsigned long));
bs->n = n;
bs->b[n] = ~0UL;
return bs;
}
// Logical AND
static inline int kbs_logical_and(kbitset_t *bs1, kbitset_t *bs2)
{
......@@ -162,7 +148,7 @@ static int multi_is_subset(var_t *avar, var_t *bvar)
}
return 0;
}
int32_t pairing_score(sr_sort_t *srt, int ivset, int jvset)
static uint32_t pairing_score(sr_sort_t *srt, int ivset, int jvset)
{
varset_t *iv = &srt->vset[ivset];
varset_t *jv = &srt->vset[jvset];
......@@ -200,9 +186,9 @@ int32_t pairing_score(sr_sort_t *srt, int ivset, int jvset)
for (i=0; i<iv->nvar; i++) cnt += srt->var[iv->var[i]].nvcf;
for (j=0; j<jv->nvar; j++) cnt += srt->var[jv->var[j]].nvcf;
return (1<<(28+min)) + cnt;
return (1u<<(28+min)) + cnt;
}
void remove_vset(sr_sort_t *srt, int jvset)
static void remove_vset(sr_sort_t *srt, int jvset)
{
if ( jvset+1 < srt->nvset )
{
......@@ -217,7 +203,7 @@ void remove_vset(sr_sort_t *srt, int jvset)
}
srt->nvset--;
}
int merge_vsets(sr_sort_t *srt, int ivset, int jvset)
static int merge_vsets(sr_sort_t *srt, int ivset, int jvset)
{
int i,j;
if ( ivset > jvset ) { i = ivset; ivset = jvset; jvset = i; }
......@@ -241,7 +227,8 @@ int merge_vsets(sr_sort_t *srt, int ivset, int jvset)
return ivset;
}
void push_vset(sr_sort_t *srt, int ivset)
static int push_vset(sr_sort_t *srt, int ivset)
{
varset_t *iv = &srt->vset[ivset];
int i,j;
......@@ -263,6 +250,7 @@ void push_vset(sr_sort_t *srt, int ivset)
}
}
remove_vset(srt, ivset);
return 0; // FIXME: check for errs in this function
}
static int cmpstringp(const void *p1, const void *p2)
......@@ -301,14 +289,14 @@ void debug_vbuf(sr_sort_t *srt)
for (i=0; i<srt->sr->nreaders; i++)
{
vcf_buf_t *buf = &srt->vcf_buf[i];
fprintf(stderr,"\t%d", buf->rec[j] ? buf->rec[j]->pos+1 : 0);
fprintf(stderr,"\t%"PRIhts_pos, buf->rec[j] ? buf->rec[j]->pos+1 : 0);
}
fprintf(stderr,"\n");
}
}
#endif
char *grp_create_key(sr_sort_t *srt)
static char *grp_create_key(sr_sort_t *srt)
{
if ( !srt->str.l ) return strdup("");
int i;
......@@ -334,16 +322,16 @@ int bcf_sr_sort_set_active(sr_sort_t *srt, int idx)
hts_expand(int,idx+1,srt->mactive,srt->active);
srt->nactive = 1;
srt->active[srt->nactive - 1] = idx;
return 0;
return 0; // FIXME: check for errs in this function
}
int bcf_sr_sort_add_active(sr_sort_t *srt, int idx)
{
hts_expand(int,idx+1,srt->mactive,srt->active);
srt->nactive++;
srt->active[srt->nactive - 1] = idx;
return 0;
return 0; // FIXME: check for errs in this function
}
static void bcf_sr_sort_set(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, int min_pos)
static int bcf_sr_sort_set(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, hts_pos_t min_pos)
{
if ( !srt->grp_str2int )
{
......@@ -469,7 +457,11 @@ static void bcf_sr_sort_set(bcf_srs_t *readers, sr_sort_t *srt, const char *chr,
// initialize bitmask - which groups is the variant present in
for (ivar=0; ivar<srt->nvar; ivar++)
{
srt->var[ivar].mask = kbs_resize(srt->var[ivar].mask, srt->ngrp);
if ( kbs_resize(&srt->var[ivar].mask, srt->ngrp) < 0 )
{
fprintf(stderr, "[%s:%d %s] kbs_resize failed\n", __FILE__,__LINE__,__func__);
exit(1);
}
kbs_clear(srt->var[ivar].mask);
}
for (igrp=0; igrp<srt->ngrp; igrp++)
......@@ -493,7 +485,11 @@ static void bcf_sr_sort_set(bcf_srs_t *readers, sr_sort_t *srt, const char *chr,
vset->var[vset->nvar-1] = ivar;
var_t *var = &srt->var[ivar];
vset->cnt = var->nvcf;
vset->mask = kbs_resize(vset->mask, srt->ngrp);
if ( kbs_resize(&vset->mask, srt->ngrp) < 0 )
{
fprintf(stderr, "[%s:%d %s] kbs_resize failed\n", __FILE__,__LINE__,__func__);
exit(1);
}
kbs_clear(vset->mask);
kbs_bitwise_or(vset->mask, var->mask);
......@@ -557,9 +553,11 @@ static void bcf_sr_sort_set(bcf_srs_t *readers, sr_sort_t *srt, const char *chr,
srt->chr = chr;
srt->pos = min_pos;
return 0; // FIXME: check for errs in this function
}
int bcf_sr_sort_next(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, int min_pos)
int bcf_sr_sort_next(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, hts_pos_t min_pos)
{
int i,j;
assert( srt->nactive>0 );
......
......@@ -31,8 +31,8 @@
*/
#ifndef __BCF_SR_SORT_H__
#define __BCF_SR_SORT_H__
#ifndef BCF_SR_SORT_H
#define BCF_SR_SORT_H
#include "htslib/synced_bcf_reader.h"
#include "htslib/kbitset.h"
......@@ -90,7 +90,8 @@ typedef struct
int moff, noff, *off, mcharp;
char **charp;
const char *chr;
int pos, nsr, msr;
hts_pos_t pos;
int nsr, msr;
int pair;
int nactive, mactive, *active; // list of readers with lines at the current pos
}
......@@ -98,7 +99,7 @@ sr_sort_t;
sr_sort_t *bcf_sr_sort_init(sr_sort_t *srt);
void bcf_sr_sort_reset(sr_sort_t *srt);
int bcf_sr_sort_next(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, int pos);
int bcf_sr_sort_next(bcf_srs_t *readers, sr_sort_t *srt, const char *chr, hts_pos_t pos);
int bcf_sr_sort_set_active(sr_sort_t *srt, int i);
int bcf_sr_sort_add_active(sr_sort_t *srt, int i);
void bcf_sr_sort_destroy(sr_sort_t *srt);
......
This diff is collapsed.
.TH bgzip 1 "18 July 2018" "htslib-1.9" "Bioinformatics tools"
.TH bgzip 1 "6 December 2019" "htslib-1.10" "Bioinformatics tools"
.SH NAME
.PP
bgzip \- Block compression/decompression utility
......@@ -86,7 +86,9 @@ Write to standard output, keep original files unchanged.
Decompress.
.TP
.B "-f, --force"
Overwrite files without asking.
Overwrite files without asking, or decompress files that don't have a known
compression filename extension (e.g., \fI.gz\fR) without asking.
Use \fB--force\fR twice to do both without asking.
.TP
.B "-h, --help"
Displays a help message.
......
/* bgzip.c -- Block compression/decompression utility.
Copyright (C) 2008, 2009 Broad Institute / Massachusetts Institute of Technology
Copyright (C) 2010, 2013-2018 Genome Research Ltd.
Copyright (C) 2010, 2013-2019 Genome Research Ltd.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
......@@ -26,6 +26,7 @@
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
......@@ -33,7 +34,6 @@
#include <stdarg.h>
#include <getopt.h>
#include <inttypes.h>
#include <sys/stat.h>
#include "htslib/bgzf.h"
#include "htslib/hts.h"
......@@ -53,44 +53,78 @@ static void error(const char *format, ...)
exit(EXIT_FAILURE);
}
static int ask_yn()
{
char line[1024];
if (fgets(line, sizeof line, stdin) == NULL)
return 0;
return line[0] == 'Y' || line[0] == 'y';
}
static int confirm_overwrite(const char *fn)
{
int save_errno = errno;
int ret = 0;
if (isatty(STDIN_FILENO)) {
char c;
fprintf(stderr, "[bgzip] %s already exists; do you wish to overwrite (y or n)? ", fn);
if (scanf("%c", &c) == 1 && (c == 'Y' || c == 'y')) ret = 1;
if (ask_yn()) ret = 1;
}
errno = save_errno;
return ret;
}
static int bgzip_main_usage(void)
static int known_extension(const char *ext)
{
fprintf(stderr, "\n");
fprintf(stderr, "Version: %s\n", hts_version());
fprintf(stderr, "Usage: bgzip [OPTIONS] [FILE] ...\n");
fprintf(stderr, "Options:\n");
fprintf(stderr, " -b, --offset INT decompress at virtual file pointer (0-based uncompressed offset)\n");
fprintf(stderr, " -c, --stdout write on standard output, keep original files unchanged\n");
fprintf(stderr, " -d, --decompress decompress\n");
fprintf(stderr, " -f, --force overwrite files without asking\n");
fprintf(stderr, " -h, --help give this help\n");
fprintf(stderr, " -i, --index compress and create BGZF index\n");
fprintf(stderr, " -I, --index-name FILE name of BGZF index file [file.gz.gzi]\n");
fprintf(stderr, " -l, --compress-level INT Compression level to use when compressing; 0 to 9, or -1 for default [-1]\n");
fprintf(stderr, " -r, --reindex (re)index compressed file\n");
fprintf(stderr, " -g, --rebgzip use an index file to bgzip a file\n");
fprintf(stderr, " -s, --size INT decompress INT bytes (uncompressed size)\n");
fprintf(stderr, " -@, --threads INT number of compression threads to use [1]\n");
fprintf(stderr, " -t, --test test integrity of compressed file");
fprintf(stderr, "\n");
static const char *known[] = {
"gz", "bgz", "bgzf",
NULL
};
const char **p;
for (p = known; *p; p++)
if (strcasecmp(ext, *p) == 0) return 1;
return 0;
}
static int confirm_filename(int *is_forced, const char *name, const char *ext)
{
if (*is_forced) {
(*is_forced)--;
return 1;
}
if (!isatty(STDIN_FILENO))
return 0;
fprintf(stderr, "[bgzip] .%s is not a known extension; do you wish to decompress to %s (y or n)? ", ext, name);
return ask_yn();
}
static int bgzip_main_usage(FILE *fp, int status)
{
fprintf(fp, "\n");
fprintf(fp, "Version: %s\n", hts_version());
fprintf(fp, "Usage: bgzip [OPTIONS] [FILE] ...\n");
fprintf(fp, "Options:\n");
fprintf(fp, " -b, --offset INT decompress at virtual file pointer (0-based uncompressed offset)\n");
fprintf(fp, " -c, --stdout write on standard output, keep original files unchanged\n");
fprintf(fp, " -d, --decompress decompress\n");
fprintf(fp, " -f, --force overwrite files without asking\n");
fprintf(fp, " -h, --help give this help\n");
fprintf(fp, " -i, --index compress and create BGZF index\n");
fprintf(fp, " -I, --index-name FILE name of BGZF index file [file.gz.gzi]\n");
fprintf(fp, " -l, --compress-level INT Compression level to use when compressing; 0 to 9, or -1 for default [-1]\n");
fprintf(fp, " -r, --reindex (re)index compressed file\n");
fprintf(fp, " -g, --rebgzip use an index file to bgzip a file\n");
fprintf(fp, " -s, --size INT decompress INT bytes (uncompressed size)\n");
fprintf(fp, " -@, --threads INT number of compression threads to use [1]\n");
fprintf(fp, " -t, --test test integrity of compressed file");
fprintf(fp, "\n");
return status;
}
int main(int argc, char **argv)
{
int c, compress, compress_level = -1, pstdout, is_forced, test, index = 0, rebgzip = 0, reindex = 0;
......@@ -126,7 +160,7 @@ int main(int argc, char **argv)
case 'c': pstdout = 1; break;
case 'b': start = atol(optarg); compress = 0; pstdout = 1; break;
case 's': size = atol(optarg); pstdout = 1; break;
case 'f': is_forced = 1; break;
case 'f': is_forced++; break;
case 'i': index = 1; break;
case 'I': index_fname = optarg; break;
case 'l': compress_level = atol(optarg); break;
......@@ -137,10 +171,10 @@ int main(int argc, char **argv)
case 1:
printf(
"bgzip (htslib) %s\n"
"Copyright (C) 2018 Genome Research Ltd.\n", hts_version());
"Copyright (C) 2019 Genome Research Ltd.\n", hts_version());
return EXIT_SUCCESS;
case 'h':
case '?': return bgzip_main_usage();
case 'h': return bgzip_main_usage(stdout, EXIT_SUCCESS);
case '?': return bgzip_main_usage(stderr, EXIT_FAILURE);
}
}
if (size >= 0) end = start + size;
......@@ -149,7 +183,6 @@ int main(int argc, char **argv)
return 1;
}
if (compress == 1) {
struct stat sbuf;
int f_src = fileno(stdin);
char out_mode[3] = "w\0";
char out_mode_exclusive[4] = "wx\0";
......@@ -165,12 +198,6 @@ int main(int argc, char **argv)
if ( argc>optind )
{
if ( stat(argv[optind],&sbuf)<0 )
{
fprintf(stderr, "[bgzip] %s: %s\n", strerror(errno), argv[optind]);
return 1;
}
if ((f_src = open(argv[optind], O_RDONLY)) < 0) {
fprintf(stderr, "[bgzip] %s: %s\n", strerror(errno), argv[optind]);
return 1;
......@@ -195,7 +222,7 @@ int main(int argc, char **argv)
}
}
else if (!pstdout && isatty(fileno((FILE *)stdout)) )
return bgzip_main_usage();
return bgzip_main_usage(stderr, EXIT_FAILURE);
else if ( index && !index_fname )
{
fprintf(stderr, "[bgzip] Index file name expected when writing to stdout\n");
......@@ -216,10 +243,10 @@ int main(int argc, char **argv)
return 1;
}
if ( index ) bgzf_index_build_init(fp);
if (threads > 1)
bgzf_mt(fp, threads, 256);
if ( index ) bgzf_index_build_init(fp);
buffer = malloc(WINDOW_SIZE);
#ifdef _WIN32
_setmode(f_src, O_BINARY);
......@@ -284,26 +311,18 @@ int main(int argc, char **argv)
}
else
{
struct stat sbuf;
int f_dst;
if ( argc>optind )
{
if ( stat(argv[optind],&sbuf)<0 )
{
fprintf(stderr, "[bgzip] %s: %s\n", strerror(errno), argv[optind]);
return 1;
}
char *name;
int len = strlen(argv[optind]);
if ( strcmp(argv[optind]+len-3,".gz") && !test)
{
fprintf(stderr, "[bgzip] %s: unknown suffix -- ignored\n", argv[optind]);
return 1;
}
fp = bgzf_open(argv[optind], "r");
if (fp == NULL) {
fprintf(stderr, "[bgzip] Could not open file: %s\n", argv[optind]);
fprintf(stderr, "[bgzip] Could not open %s: %s\n", argv[optind], strerror(errno));
return 1;
}
if (bgzf_compression(fp) == no_compression) {
fprintf(stderr, "[bgzip] %s: not a compressed file -- ignored\n", argv[optind]);
bgzf_close(fp);
return 1;
}
......@@ -312,8 +331,24 @@ int main(int argc, char **argv)
}
else {
const int wrflags = O_WRONLY | O_CREAT | O_TRUNC;
char *name = argv[optind], *ext;
size_t pos;
for (pos = strlen(name); pos > 0; --pos)
if (name[pos] == '.' || name[pos] == '/') break;
if (pos == 0 || name[pos] != '.') {
fprintf(stderr, "[bgzip] can't remove an extension from %s -- please rename\n", argv[optind]);
bgzf_close(fp);
return 1;
}
name = strdup(argv[optind]);
name[strlen(name) - 3] = '\0';
name[pos] = '\0';
ext = &name[pos+1];
if (! (known_extension(ext) || confirm_filename(&is_forced, name, ext))) {
fprintf(stderr, "[bgzip] unknown extension .%s -- declining to decompress to %s\n", ext, name);
bgzf_close(fp);
free(name);
return 1;
}
f_dst = open(name, is_forced? wrflags : wrflags|O_EXCL, 0666);
if (f_dst < 0 && errno == EEXIST && confirm_overwrite(name))
f_dst = open(name, wrflags, 0666);
......@@ -326,7 +361,7 @@ int main(int argc, char **argv)
}
}
else if (!pstdout && isatty(fileno((FILE *)stdin)) )
return bgzip_main_usage();
return bgzip_main_usage(stderr, EXIT_FAILURE);
else
{
f_dst = fileno(stdout);
......@@ -335,22 +370,33 @@ int main(int argc, char **argv)
fprintf(stderr, "[bgzip] Could not read from stdin: %s\n", strerror(errno));
return 1;
}
}
if (!fp->is_compressed) {
fprintf(stderr, "[bgzip] Expected compressed file -- ignored\n");
if (bgzf_compression(fp) == no_compression) {
fprintf(stderr, "[bgzip] stdin is not compressed -- ignored\n");
bgzf_close(fp);
return 1;
}
if (threads > 1)
bgzf_mt(fp, threads, 256);
}
buffer = malloc(WINDOW_SIZE);
if ( start>0 )
{
if ( bgzf_index_load(fp, argv[optind], ".gzi") < 0 ) error("Could not load index: %s.gzi\n", argv[optind]);
if (index_fname) {
if ( bgzf_index_load(fp, index_fname, NULL) < 0 )
error("Could not load index: %s\n", index_fname);
} else {
if (optind >= argc) {
error("The -b option requires -I when reading from stdin "
"(and stdin must be seekable)\n");
}
if ( bgzf_index_load(fp, argv[optind], ".gzi") < 0 )
error("Could not load index: %s.gzi\n", argv[optind]);
}
if ( bgzf_useek(fp, start, SEEK_SET) < 0 ) error("Could not seek to %d-th (uncompressd) byte\n", start);
}
if (threads > 1)
bgzf_mt(fp, threads, 256);
#ifdef _WIN32
_setmode(f_dst, O_BINARY);
#endif
......@@ -370,7 +416,7 @@ int main(int argc, char **argv)
}
free(buffer);
if (bgzf_close(fp) < 0) error("Close failed: Error %d\n",fp->errcode);
if (!pstdout && !test) unlink(argv[optind]);
if (argc > optind && !pstdout && !test) unlink(argv[optind]);
return 0;
}
}
/* config.h.in. Generated from configure.ac by autoheader. */
/* If you use configure, this file provides #defines reflecting your
configuration choices. If you have not run configure, suitable
conservative defaults will be used.
Autoheader adds a number of items to this template file that are not
used by HTSlib: STDC_HEADERS and most HAVE_*_H header file defines
are immaterial, as we assume standard ISO C headers and facilities;
the PACKAGE_* defines are unused and are overridden by the more
accurate PACKAGE_VERSION as computed by the Makefile. */
/* Define if HTSlib should enable GCS support. */
#undef ENABLE_GCS
/* Define if HTSlib should enable plugins. */
#undef ENABLE_PLUGINS
/* Define if HTSlib should enable S3 support. */
#undef ENABLE_S3
/* Define if you have the Common Crypto library. */
#undef HAVE_COMMONCRYPTO
/* Define to 1 if you have the `drand48' function. */
#undef HAVE_DRAND48
/* Define to 1 if you have the `fdatasync' function. */
#undef HAVE_FDATASYNC
/* Define to 1 if you have the `fsync' function. */
#undef HAVE_FSYNC
/* Define to 1 if you have the `getpagesize' function. */
#undef HAVE_GETPAGESIZE
/* Define to 1 if you have the `gmtime_r' function. */
#undef HAVE_GMTIME_R
/* Define if you have libcrypto-style HMAC(). */
#undef HAVE_HMAC
/* Define to 1 if you have the <inttypes.h> header file. */
#undef HAVE_INTTYPES_H
/* Define to 1 if you have the `bz2' library (-lbz2). */
#undef HAVE_LIBBZ2
/* Define if libcurl file access is enabled. */
#undef HAVE_LIBCURL
/* Define if libdeflate is available. */
#undef HAVE_LIBDEFLATE
/* Define to 1 if you have the `lzma' library (-llzma). */
#undef HAVE_LIBLZMA
/* Define to 1 if you have the `z' library (-lz). */
#undef HAVE_LIBZ
/* Define to 1 if you have the <lzma.h> header file. */
#undef HAVE_LZMA_H
/* Define to 1 if you have the <memory.h> header file. */
#undef HAVE_MEMORY_H
/* Define to 1 if you have a working `mmap' system call. */
#undef HAVE_MMAP
/* Define to 1 if you have the <stdint.h> header file. */
#undef HAVE_STDINT_H
/* Define to 1 if you have the <stdlib.h> header file. */
#undef HAVE_STDLIB_H
/* Define to 1 if you have the <strings.h> header file. */
#undef HAVE_STRINGS_H
/* Define to 1 if you have the <string.h> header file. */
#undef HAVE_STRING_H
/* Define to 1 if you have the <sys/param.h> header file. */
#undef HAVE_SYS_PARAM_H
/* Define to 1 if you have the <sys/stat.h> header file. */
#undef HAVE_SYS_STAT_H
/* Define to 1 if you have the <sys/types.h> header file. */
#undef HAVE_SYS_TYPES_H
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H
/* Define to the address where bug reports for this package should be sent. */
#undef PACKAGE_BUGREPORT
/* Define to the full name of this package. */
#undef PACKAGE_NAME
/* Define to the full name and version of this package. */
#undef PACKAGE_STRING
/* Define to the one symbol short name of this package. */
#undef PACKAGE_TARNAME
/* Define to the home page for this package. */
#undef PACKAGE_URL
/* Define to the version of this package. */
#undef PACKAGE_VERSION
/* Platform-dependent plugin filename extension. */
#undef PLUGIN_EXT
/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS
/* Number of bits in a file offset, on hosts where this is settable. */
#undef _FILE_OFFSET_BITS
/* Define for large files, on AIX-style hosts. */
#undef _LARGE_FILES
/* Needed for PTHREAD_MUTEX_RECURSIVE */
#undef _XOPEN_SOURCE
# Optional configure Makefile overrides for htslib.
#
# Copyright (C) 2015-2017 Genome Research Ltd.
# Copyright (C) 2015-2017, 2019 Genome Research Ltd.
#
# Author: John Marshall <jm18@sanger.ac.uk>
#
......@@ -48,6 +48,10 @@ LIBS = @LIBS@
PLATFORM = @PLATFORM@
PLUGIN_EXT = @PLUGIN_EXT@
# The default Makefile enables some of the optional files, but we blank
# them so they can be controlled by configure instead.
NONCONFIGURE_OBJS =
# Lowercase here indicates these are "local" to config.mk
plugin_OBJS =
noplugin_LDFLAGS =
......@@ -74,10 +78,12 @@ endif
ifeq "s3-@s3@" "s3-enabled"
plugin_OBJS += hfile_s3.o
plugin_OBJS += hfile_s3_write.o
CRYPTO_LIBS = @CRYPTO_LIBS@
noplugin_LIBS += $(CRYPTO_LIBS)
hfile_s3$(PLUGIN_EXT): LIBS += $(CRYPTO_LIBS)
hfile_s3_write$(PLUGIN_EXT): LIBS += $(CRYPTO_LIBS) $(LIBCURL_LIBS)
endif
ifeq "plugins-@enable_plugins@" "plugins-yes"
......@@ -94,6 +100,7 @@ plugin.o plugin.pico: CPPFLAGS += -DPLUGINPATH=\"$(pluginpath)\"
hfile_gcs.o hfile_gcs.pico: version.h
hfile_libcurl.o hfile_libcurl.pico: version.h
hfile_s3.o hfile_s3.pico: version.h
hfile_s3_write.o hfile_s3_write.pico: version.h
# Windows DLL plugins depend on the import library, built as a byproduct.
$(plugin_OBJS:.o=.cygdll): cyghts-$(LIBHTS_SOVERSION).dll
......
This diff is collapsed.
......@@ -30,6 +30,7 @@ AC_CONFIG_SRCDIR(hts.c)
AC_CONFIG_HEADERS(config.h)
m4_include([m4/hts_prog_cc_warnings.m4])
m4_include([m4/hts_hide_dynamic_syms.m4])
dnl Copyright notice to be copied into the generated configure script
AC_COPYRIGHT([Portions copyright (C) 2018 Genome Research Ltd.
......@@ -89,7 +90,6 @@ AC_ARG_ENABLE([gcs],
[], [enable_gcs=check])
AC_SYS_LARGEFILE
AC_FUNC_FSEEKO
AC_ARG_ENABLE([libcurl],
[AS_HELP_STRING([--enable-libcurl],
......@@ -167,6 +167,10 @@ esac
AC_MSG_RESULT([$host_result])
AC_SUBST([PLATFORM])
dnl Try to get more control over which symbols are exported in the shared
dnl library.
HTS_HIDE_DYNAMIC_SYMBOLS
dnl FIXME This pulls in dozens of standard header checks
AC_FUNC_MMAP
AC_CHECK_FUNCS([gmtime_r fsync drand48])
......@@ -180,9 +184,12 @@ if test $enable_plugins != no; then
Plugin support requires dynamic linking facilities from the operating system.
Either configure with --disable-plugins or resolve this error to build HTSlib.])])
# Check if the compiler understands -rdynamic
# TODO Test whether this is required and/or needs tweaking per-platform
LDFLAGS="$LDFLAGS -rdynamic"
static_LDFLAGS="$static_LDFLAGS -rdynamic"
HTS_TEST_CC_C_LD_FLAG([-rdynamic],[rdynamic_flag])
AS_IF([test x"$rdynamic_flag" != "xno"],
[LDFLAGS="$LDFLAGS $rdynamic_flag"
static_LDFLAGS="$static_LDFLAGS $rdynamic_flag"])
case "$ac_cv_search_dlopen" in
-l*) static_LIBS="$static_LIBS $ac_cv_search_dlopen" ;;
esac
......@@ -275,9 +282,10 @@ fi
AS_IF([test "x$with_libdeflate" != "xno"],
[libdeflate=ok
AC_CHECK_HEADER([libdeflate.h],[],[libdeflate='missing header'],[;])
AC_CHECK_LIB([deflate], [libdeflate_deflate_compress],[],[libdeflate='missing library'])
AC_CHECK_LIB([deflate], [libdeflate_deflate_compress],[:],[libdeflate='missing library'])
AS_IF([test "$libdeflate" = "ok"],
[AC_DEFINE([HAVE_LIBDEFLATE], 1, [Define if libdeflate is available.])
LIBS="-ldeflate $LIBS"
private_LIBS="$private_LIBS -ldeflate"
static_LIBS="$static_LIBS -ldeflate"],
[AS_IF([test "x$with_libdeflate" != "xcheck"],
......
/*
Copyright (c) 2012-2013 Genome Research Ltd.
Copyright (c) 2012-2013, 2015, 2018 Genome Research Ltd.
Author: James Bonfield <jkb@sanger.ac.uk>
Redistribution and use in source and binary forms, with or without
......@@ -39,11 +39,11 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
* although these should not be included directly (use this file instead).
*/
#ifndef _CRAM_H_
#define _CRAM_H_
#ifndef CRAM_ALL_H
#define CRAM_ALL_H
#include "cram/cram_samtools.h"
#include "cram/sam_header.h"
#include "header.h"
#include "cram_structs.h"
#include "cram_io.h"
#include "cram_encode.h"
......
This diff is collapsed.
/*
Copyright (c) 2012-2013 Genome Research Ltd.
Copyright (c) 2012-2015, 2018 Genome Research Ltd.
Author: James Bonfield <jkb@sanger.ac.uk>
Redistribution and use in source and binary forms, with or without
......@@ -28,10 +28,10 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef _CRAM_ENCODINGS_H_
#define _CRAM_ENCODINGS_H_
#ifndef CRAM_CODECS_H
#define CRAM_CODECS_H
#include <inttypes.h>
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
......@@ -49,7 +49,7 @@ struct cram_codec;
* appears.
*/
typedef struct {
int32_t symbol;
int64_t symbol;
int32_t p; // next code start value, minus index to codes[]
int32_t code;
int32_t len;
......@@ -65,6 +65,7 @@ typedef struct {
cram_huffman_code *codes;
int nvals;
int val2code[MAX_HUFF+1]; // value to code lookup for small values
int option;
} cram_huffman_encoder;
typedef struct {
......@@ -108,9 +109,6 @@ typedef struct {
/*
* A generic codec structure.
*/
#ifdef __SUNPRO_C
# pragma error_messages(off, E_ANONYMOUS_UNION_DECL)
#endif
typedef struct cram_codec {
enum cram_encoding codec;
cram_block *out;
......@@ -136,11 +134,8 @@ typedef struct cram_codec {
cram_byte_array_stop_decoder e_byte_array_stop;
cram_byte_array_len_encoder e_byte_array_len;
cram_beta_decoder e_beta;
};
} u;
} cram_codec;
#ifdef __SUNPRO_C
# pragma error_messages(default, E_ANONYMOUS_UNION_DECL)
#endif
const char *cram_encoding2str(enum cram_encoding t);
......@@ -198,4 +193,4 @@ int cram_codec_decoder2encoder(cram_fd *fd, cram_codec *c);
}
#endif
#endif /* _CRAM_ENCODINGS_H_ */
#endif /* CRAM_CODECS_H */
This diff is collapsed.
/*
Copyright (c) 2012-2013 Genome Research Ltd.
Copyright (c) 2012-2013, 2018 Genome Research Ltd.
Author: James Bonfield <jkb@sanger.ac.uk>
Redistribution and use in source and binary forms, with or without
......@@ -38,8 +38,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
* cram_codecs.[ch] for the actual encoding functions themselves.
*/
#ifndef _CRAM_READ_H_
#define _CRAM_READ_H_
#ifndef CRAM_DECODE_H
#define CRAM_DECODE_H
#ifdef __cplusplus
extern "C" {
......@@ -102,7 +102,7 @@ cram_block_slice_hdr *cram_decode_slice_header(cram_fd *fd, cram_block *b);
* -1 on failure
*/
int cram_decode_slice(cram_fd *fd, cram_container *c, cram_slice *s,
SAM_hdr *hdr);
sam_hdr_t *hdr);
/*
......
This diff is collapsed.