Skip to content
Commits on Source (9)
......@@ -21,3 +21,4 @@ depcomp
install-sh
missing
Makefile.in
.vscode
......@@ -22,7 +22,7 @@ Most of the nucleotide based commands and options in USEARCH version 7 are suppo
## Getting Help
If you can't find an answer in the VSEARCH documentation, please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
## Example
......@@ -35,9 +35,9 @@ In the example below, VSEARCH will identify sequences in the file database.fsa t
**Source distribution** To download the source distribution from a [release](https://github.com/torognes/vsearch/releases) and build the executable and the documentation, use the following commands:
```
wget https://github.com/torognes/vsearch/archive/v2.4.3.tar.gz
tar xzf v2.4.3.tar.gz
cd vsearch-2.4.3
wget https://github.com/torognes/vsearch/archive/v2.7.1.tar.gz
tar xzf v2.7.1.tar.gz
cd vsearch-2.7.1
./autogen.sh
./configure
make
......@@ -68,41 +68,47 @@ Binary distributions are provided for x86-64 systems running GNU/Linux, macOS (v
Download the appropriate executable for your system using the following commands if you are using a Linux x86_64 system:
```sh
wget https://github.com/torognes/vsearch/releases/download/v2.4.3/vsearch-2.4.3-linux-x86_64.tar.gz
tar xzf vsearch-2.4.3-linux-x86_64.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch-2.7.1-linux-x86_64.tar.gz
tar xzf vsearch-2.7.1-linux-x86_64.tar.gz
```
Or these commands if you are using a Linux ppc64le system:
```sh
wget https://github.com/torognes/vsearch/releases/download/v2.4.3/vsearch-2.4.3-linux-ppc64le.tar.gz
tar xzf vsearch-2.4.3-linux-ppc64le.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch-2.7.1-linux-ppc64le.tar.gz
tar xzf vsearch-2.7.1-linux-ppc64le.tar.gz
```
Or these commands if you are using a Mac:
```sh
wget https://github.com/torognes/vsearch/releases/download/v2.4.3/vsearch-2.4.3-macos-x86_64.tar.gz
tar xzf vsearch-2.4.3-macos-x86_64.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch-2.7.1-macos-x86_64.tar.gz
tar xzf vsearch-2.7.1-macos-x86_64.tar.gz
```
Or if you are using Windows, download and extract (unzip) the contents of this file:
```
https://github.com/torognes/vsearch/releases/download/v2.4.3/vsearch-2.4.3-win-x86_64.zip
https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch-2.7.1-win-x86_64.zip
```
Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.4.3-linux-x86_64` or `vsearch-2.4.3-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.7.1-linux-x86_64` or `vsearch-2.7.1-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
Windows: You will now have the binary distribution in a folder called `vsearch-2.4.3-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
Windows: You will now have the binary distribution in a folder called `vsearch-2.7.1-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/doc/vsearch.1). A pdf version (vsearch_manual.pdf) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form (vsearch_manual.pdf) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
**Galaxy wrapper** Thanks to the work of the [Intergalactic Utilities Commission](https://wiki.galaxyproject.org/IUC) members, vsearch is now part of the [Galaxy ToolShed](https://toolshed.g2.bx.psu.edu/view/iuc/vsearch/).
**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.7.1/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
## Plugins, packages, and wrappers
**QIIME 2 plugin** Thanks to the [QIIME 2](https://github.com/qiime2) team, there is now a plugin called [q2-vsearch](https://github.com/qiime2/q2-vsearch) for [QIIME 2](https://qiime2.org).
**Homebrew package** Thanks to [Torsten Seeman](https://github.com/tseemann), a [vsearch package](https://github.com/Homebrew/homebrew-science/pull/2409) for [Homebrew](http://brew.sh/) has been made.
**Debian package** Thanks to the [Debian Med](https://www.debian.org/devel/debian-med/) team, there is now a [vsearch](https://packages.debian.org/sid/vsearch) package in [Debian](https://www.debian.org/).
**Homebrew package** Thanks to [Torsten Seeman](https://github.com/tseemann), a vsearch package for [Homebrew](http://brew.sh/) [has been made](https://github.com/Homebrew/homebrew-science/pull/2409).
**Galaxy wrapper** Thanks to the work of the [Intergalactic Utilities Commission](https://wiki.galaxyproject.org/IUC) members, vsearch is now part of the [Galaxy ToolShed](https://toolshed.g2.bx.psu.edu/view/iuc/vsearch/).
## Converting output to a biom file for use in QIIME and other software
......@@ -179,6 +185,7 @@ File | Description
**fastq.cc** | FASTQ file parser
**fastqops.cc** | FASTQ file statistics etc
**fastx.cc** | Detection of FASTA and FASTQ files, wrapper for FASTA and FASTQ parsers
**kmerhash.cc** | Hash for kmers used by paired-end read merger
**linmemalign.cc** | Linear memory global sequence aligner
**maps.cc** | Various character mapping arrays
**mask.cc** | Masking (DUST)
......@@ -198,6 +205,7 @@ File | Description
**sortbylength.cc** | Code for sorting by length
**sortbysize.cc** | Code for sorting by size (abundance)
**subsample.cc** | Subsampling reads from a FASTA file
**udb.cc** | UDB database file handling
**unique.cc** | Find unique kmers in a sequence
**userfields.cc** | Code for parsing the userfields option argument
**util.cc** | Various common utility functions
......@@ -243,6 +251,7 @@ The main contributors to VSEARCH:
Special thanks to the following people for patches, suggestions, computer access etc:
* Davide Albanese
* Colin Brislawn
* Jeff Epler
* Christopher M. Sullivan
......
......@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.63])
AC_INIT([vsearch], [2.4.3], [torognes@ifi.uio.no])
AC_INIT([vsearch], [2.7.1], [torognes@ifi.uio.no])
AC_CANONICAL_TARGET
AM_INIT_AUTOMAKE([subdir-objects])
AC_LANG([C++])
......
vsearch (2.7.1-1) unstable; urgency=medium
* New upstream version
* Standards-Version: 4.1.3
* debhelper 11
* do not parse d/changelog
* hardening=+all
* d/rules
- fix clean target
- use DEB_HOST_GNU_CPU directly
-- Andreas Tille <tille@debian.org> Sat, 17 Feb 2018 11:30:23 +0100
vsearch (2.4.3-1) unstable; urgency=medium
* New upstrem version
......
......@@ -4,14 +4,13 @@ Uploaders: Tim Booth <tbooth@ceh.ac.uk>,
Andreas Tille <tille@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 10),
dh-autoreconf,
Build-Depends: debhelper (>= 11~),
zlib1g-dev,
libbz2-dev,
python-markdown,
ghostscript,
time
Standards-Version: 3.9.8
Standards-Version: 4.1.3
Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/vsearch.git
Vcs-Git: https://anonscm.debian.org/git/debian-med/vsearch.git
Homepage: https://github.com/torognes/vsearch/
......
......@@ -3,24 +3,24 @@
# Uncomment this to turn on verbose mode.
#export DH_VERBOSE=1
pkg := $(shell dpkg-parsechangelog | sed -n 's/^Source: //p')
docdir := $(CURDIR)/debian/$(pkg)/usr/share/doc/$(pkg)
include /usr/share/dpkg/default.mk
HOST:=$(shell dpkg-architecture -qDEB_HOST_GNU_CPU)
ifeq ($(HOST),x86_64)
docdir := $(CURDIR)/debian/$(DEB_SOURCE)/usr/share/doc/$(DEB_SOURCE)
ifeq ($(DEB_HOST_GNU_CPU),x86_64)
USEOPT:=yes
else
ifeq ($(HOST),i386)
ifeq ($(DEB_HOST_GNU_CPU),i386)
USEOPT:=yes
else
USEOPT:=no
endif
endif
export DEB_BUILD_MAINT_OPTIONS = hardening=+bindnow
export DEB_BUILD_MAINT_OPTIONS = hardening=+all
%:
dh $@ --with autoreconf
dh $@
override_dh_auto_build:
dh_auto_build
......@@ -48,13 +48,15 @@ endif
override_dh_clean:
dh_clean
rm -rf data/simm
find test -mindepth 1 -not -name '*.sh' -delete
if [ -d test ] ; then find test -mindepth 1 -not -name '*.sh' -delete ; fi
rm -f README.html data/README.html
rm -f *.out
override_dh_install:
dh_install
# tweak path tp vsearch binary in test scripts
mkdir -p $(docdir)/test
for tst in test/*.sh ; do sed 's?\.\./bin/vsearch?/usr/bin/vsearch?' $${tst} > $(docdir)/$${tst} ; done
if [ -d test ] ; then \
# tweak path tp vsearch binary in test scripts \
mkdir -p $(docdir)/test ; \
for tst in test/*.sh ; do sed 's?\.\./bin/vsearch?/usr/bin/vsearch?' $${tst} > $(docdir)/$${tst} ; done ; \
fi
#!/bin/bash
DIR=../../vsearch-data/simm
DB=$DIR/simm_sp.fa
THREADS=0
UCHIME=$(which uchime)
USEARCH=$(which usearch)
VSEARCH=../bin/vsearch
echo Evaluation of uchime_ref
for t in - m1 m2 m3 m4 m5 i1 i2 i3 i4 i5; do
echo
echo Running programs on dataset $t
echo
if [ "$t" == "-" ]; then
INPUT=$DIR/simm.fa
else
INPUT=$DIR/simm.$t.fa
fi
$UCHIME --input $INPUT --db $DB --uchimeout o.$t.uchimeout --minh 0.28 --mindiv 0.8 ; grep Y$ o.$t.uchimeout | cut -f2 > o.$t.chimeras
$VSEARCH --uchime_ref $INPUT --db $DB --chimeras v.$t.chimeras
$USEARCH --uchime_ref $INPUT --db $DB --strand plus --chimeras u.$t.chimeras
done
echo
echo -e "\t__________m=2__________\t__________m=3__________\t__________m=4__________"
echo -ne "Div/Evo\tUSEARCH\tUCHIME\tVSEARCH"
echo -ne "\tUSEARCH\tUCHIME\tVSEARCH"
echo -e "\tUSEARCH\tUCHIME\tVSEARCH"
for r in 97_99 95_97 90_95; do
for t in - i1 i2 i3 i4 i5 m1 m2 m3 m4 m5; do
EXT="$t.chimeras"
echo -ne "$r$t"
for m in 2 3 4; do
echo -ne "\t$(grep -c _m${m}_${r} u.$EXT)"
echo -ne "\t$(grep -c _m${m}_${r} o.$EXT)"
echo -ne "\t$(grep -c _m${m}_${r} v.$EXT)"
done
echo
done
echo
done
rm -f o.* u.* v.*
echo Done
__________m=2__________ __________m=3__________ __________m=4__________
Div/Evo USEARCH UCHIME VSEARCH USEARCH UCHIME VSEARCH USEARCH UCHIME VSEARCH
97_99- 88 89 88 52 56 54 33 38 38
97_99i1 79 79 83 44 46 53 27 32 37
97_99i2 57 64 75 32 33 53 20 24 35
97_99i3 45 48 68 35 37 45 17 16 26
97_99i4 24 29 63 11 18 36 9 9 27
97_99i5 22 27 53 12 15 38 8 7 19
97_99m1 83 83 81 48 53 49 29 33 36
97_99m2 71 73 72 44 49 49 22 28 28
97_99m3 66 66 67 40 40 40 20 21 23
97_99m4 54 55 59 24 28 27 18 21 24
97_99m5 44 44 47 19 20 25 14 16 16
95_97- 100 100 99 77 80 78 60 64 61
95_97i1 98 100 99 75 77 75 55 54 60
95_97i2 94 96 100 55 60 70 44 48 57
95_97i3 82 86 94 50 61 68 36 38 53
95_97i4 66 75 94 41 48 64 29 29 46
95_97i5 58 64 85 32 37 62 19 24 47
95_97m1 99 99 99 73 76 74 57 60 59
95_97m2 97 98 97 69 71 70 48 50 48
95_97m3 94 93 94 61 63 62 41 41 40
95_97m4 92 92 91 55 56 57 39 39 41
95_97m5 86 86 85 51 53 50 35 35 32
90_95- 100 100 100 93 93 93 88 88 88
90_95i1 100 100 100 88 88 91 86 86 88
90_95i2 97 99 99 79 83 88 72 74 85
90_95i3 100 100 100 76 79 89 69 74 83
90_95i4 94 99 100 71 80 85 62 66 80
90_95i5 84 95 100 65 74 87 48 55 72
90_95m1 100 100 100 89 89 93 87 87 84
90_95m2 100 100 100 87 87 90 78 78 80
90_95m3 99 100 100 86 86 91 76 76 79
90_95m4 100 100 100 82 82 84 73 73 76
90_95m5 98 99 99 81 82 85 73 75 78
#!/bin/bash
P=$1
SEED=1
THREADS=0
DUPLICATES=100
DIR=.
DB=../../vsearch-data/Rfam_11_0.fasta
ID=0.5
VSEARCH=../bin/vsearch
if [ "$P" == "u" ]; then
PROG=$(which usearch)
else
if [ "$P" == "v" ]; then
PROG=$VSEARCH
else
echo You must specify u or v as first argument
exit
fi
fi
echo Creating random test set
$VSEARCH --shuffle $DB --output $DIR/temp.fsa --randseed $SEED > /dev/null 2> /dev/null
./select.pl $DIR/temp.fsa $DIR/q.fsa $DIR/db.fsa
cat q.fsa > qq.fsa
for (( i=2; i <= $DUPLICATES; i++ )); do
cat q.fsa >> qq.fsa
done
echo
echo Running search
/usr/bin/time $PROG \
--usearch_global $DIR/qq.fsa \
--db $DIR/db.fsa \
--id $ID \
--maxaccepts 1 \
--maxrejects 32 \
--strand plus \
--threads $THREADS \
--userout $DIR/userout.$P.txt \
--userfields query+target+id+qcov
# --maxhits 1
# --iddef 1
# --fulldp
# --minseqlength 1
echo
echo Results
./stats.pl $(grep -c "^>" $DIR/qq.fsa) $DIR/userout.$P.txt
rm temp.fsa q.fsa qq.fsa db.fsa userout.$P.txt
#!/usr/bin/perl -w
# Print the first representative of each Rfam familiy member
# with more than one member to one file.
# Print the others to another file.
use strict;
use warnings;
my $rfam;
my %count = ();
die "Not enough arguments" if (scalar @ARGV < 3);
my ($input, $first, $second) = @ARGV;
# first pass, just count
open I, "$input";
while(<I>)
{
if (/[>;](RF\d+);/)
{
$rfam = $1;
if (defined $count{$rfam})
{
$count{$rfam}++;
}
else
{
$count{$rfam} = 1;
}
}
}
close I;
# second pass, print
open I, "$input";
open F, ">$first";
open S, ">$second";
my $select = 0;
while(<I>)
{
if (/[>;](RF\d+);/)
{
$rfam = $1;
if ($count{$rfam} > 1)
{
$select = 1;
$count{$rfam} = 0;
}
else
{
$select = 0;
}
}
if ($select)
{
print F;
}
else
{
print S;
}
}
close S;
close F;
close I;
#!/usr/bin/perl -w
use strict;
use warnings;
my $truepos = 0;
my $falsepos = 0;
my $matched = 0;
my ($gold, $fn) = @ARGV;
open I, "$fn";
while (<I>)
{
chomp;
my @col = split /\t/;
$col[0] =~ /(^|;)(RF\d+);/;
my $x = $2;
$col[1] =~ /(^|;)(RF\d+);/;
my $y = $2;
if (($col[2] >= 75.0) && ($col[3] >= 90))
{
$matched++;
}
if ($x eq $y)
{
$truepos++;
}
else
{
$falsepos++;
}
}
close I;
my $pos = $truepos + $falsepos;
my $falseneg = $gold - $truepos;
my $prec = 100.0 * $truepos / $pos;
my $fdr = 100.0 * $falsepos / $pos;
my $recall = 100.0 * $truepos / $gold;
my $fnr = 100.0 * $falseneg / $gold;
my $fscore = 2.0 * ($prec * $recall) / ($prec + $recall);
my $matchedrate = 100.0 * $matched / $gold;
printf "Total queries: %7d\n", $gold;
printf "True positives: %7d\n", $truepos;
printf "False positives: %7d\n", $falsepos;
printf "False negatives: %7d\n", $falseneg;
printf "\n";
printf "Recall: %10.2f%%\n", $recall;
printf "Precision: %10.2f%%\n", $prec;
printf "False negative rate: %10.2f%%\n", $fnr;
printf "False discovery rate: %10.2f%%\n", $fdr;
printf "F-score: %10.2f%%\n", $fscore;
printf "\n";
printf "Matched (id>=70%%;cov>=90%%): %7d\n", $matched;
printf "Matched percentage: %10.2f%%\n", $matchedrate;
This diff is collapsed.
......@@ -32,6 +32,7 @@ fasta.h \
fastq.h \
fastqops.h \
fastx.h \
kmerhash.h \
linmemalign.h \
maps.h \
mask.h \
......@@ -51,6 +52,7 @@ shuffle.h \
sortbylength.h \
sortbysize.h \
subsample.h \
udb.h \
unique.h \
userfields.h \
util.h \
......@@ -107,6 +109,7 @@ fasta.cc \
fastq.cc \
fastqops.cc \
fastx.cc \
kmerhash.cc \
linmemalign.cc \
maps.cc \
mask.cc \
......@@ -126,6 +129,7 @@ shuffle.cc \
sortbylength.cc \
sortbysize.cc \
subsample.cc \
udb.cc \
unique.cc \
userfields.cc \
util.cc \
......
......@@ -60,122 +60,108 @@
#include "vsearch.h"
abundance_t * abundance_init(void)
bool header_find_attribute(const char * header,
int header_length,
const char * attribute,
int * start,
int * end)
{
abundance_t * a = (abundance_t *) xmalloc(sizeof(abundance_t));
if (regcomp(&a->regex, "(^|;)size=([0-9]+)(;|$)", REG_EXTENDED))
fatal("Compilation of regular expression for abundance annotation failed");
return a;
}
/*
Identify the first occurence of the pattern (^|;)size=([0-9]+)(;|$)
in the header string, where "size=" is the specified attribute.
*/
void abundance_exit(abundance_t * a)
{
regfree(&a->regex);
xfree(a);
}
const char * digit_chars = "0123456789";
int64_t abundance_get(abundance_t * a, char * header)
{
/* read size/abundance annotation */
if ((! header) || (! attribute))
return false;
int64_t abundance = 1;
regmatch_t pmatch[4];
int hlen = header_length;
int alen = strlen(attribute);
if (!regexec(&a->regex, header, 4, pmatch, 0))
int i = 0;
while (i < hlen - alen)
{
int64_t number = atol(header + pmatch[2].rm_so);
if (number > 0)
abundance = number;
else
fatal("Invalid (zero) abundance annotation in fasta header");
}
return abundance;
}
char * r = (char *) strstr(header + i, attribute);
void abundance_fprint_header_with_size(abundance_t * a,
FILE * fp,
char * header,
int header_length,
uint64_t size)
/* no match */
if (r == NULL)
break;
i = r - header;
/* check for ';' in front */
if ((i > 0) && (header[i-1] != ';'))
{
/* remove any previous size annotation */
/* regexp search for "(^|;)(\d+)(;|$)" */
/* replace by ';' if not at either end */
i += alen + 1;
continue;
}
regmatch_t pmatch[1];
int digits = (int) strspn(header + i + alen, digit_chars);
if (!regexec(&a->regex, header, 1, pmatch, 0))
/* check for at least one digit */
if (digits == 0)
{
int pat_start = pmatch[0].rm_so;
int pat_end = pmatch[0].rm_eo;
fprintf(fp,
"%.*s%s%.*s%ssize=%" PRIu64 ";",
pat_start, header,
(pat_start > 0 ? ";" : ""),
header_length - pat_end, header + pat_end,
(((pat_end < header_length) &&
(header[header_length - 1] != ';')) ? ";" : ""),
size);
i += alen + 1;
continue;
}
else
/* check for ';' after */
if ((i + alen + digits < hlen) && (header[i + alen + digits] != ';'))
{
fprintf(fp,
"%s%ssize=%" PRIu64 ";",
header,
(((header_length == 0) ||
(header[header_length - 1] != ';')) ? ";" : ""),
size);
i += alen + digits + 2;
continue;
}
/* ok */
* start = i;
* end = i + alen + digits;
return true;
}
return false;
}
void abundance_fprint_header_strip_size(abundance_t * a,
FILE * fp,
char * header,
int header_length)
int64_t abundance_get(char * header, int header_length)
{
regmatch_t pmatch[1];
if (!regexec(&a->regex, header, 1, pmatch, 0))
/* read size/abundance annotation */
int64_t abundance = 1;
int start = 0;
int end = 0;
if (header_find_attribute(header, header_length, "size=", & start, & end))
{
int pat_start = pmatch[0].rm_so;
int pat_end = pmatch[0].rm_eo;
fprintf(fp,
"%.*s%s%.*s",
pat_start, header,
((pat_start > 0) && (pat_end < header_length)) ? ";" : "",
header_length - pat_end, header + pat_end);
}
int64_t number = atol(header + start + 5);
if (number > 0)
abundance = number;
else
fprintf(fp, "%s", header);
fatal("Invalid (zero) abundance annotation in FASTA file header");
}
return abundance;
}
char * abundance_strip_size(abundance_t * a,
void abundance_fprint_header_strip_size(FILE * fp,
char * header,
int header_length)
{
int ret;
char * temp = 0;
regmatch_t pmatch[1];
if (!regexec(&a->regex, header, 1, pmatch, 0))
int start = 0;
int end = 0;
if (header_find_attribute(header, header_length, "size=", & start, & end))
{
if (start <= 1)
{
int pat_start = pmatch[0].rm_so;
int pat_end = pmatch[0].rm_eo;
ret = xsprintf(&temp,
"%.*s%s%.*s",
pat_start, header,
((pat_start > 0) && (pat_end < header_length)) ? ";" : "",
header_length - pat_end, header + pat_end);
if (end < header_length)
fprintf(fp, "%s", header + end + 1);
}
else
ret = xsprintf(&temp, "%s", header);
if (ret == -1)
fatal("Out of memory");
return temp;
{
if (end == header_length)
fprintf(fp, "%.*s", start - 1, header);
else
fprintf(fp, "%.*s;%.*s",
start - 1, header,
header_length - end - 1, header + end + 1);
}
}
else
fprintf(fp, "%s", header);
}
......@@ -58,28 +58,16 @@
*/
typedef struct abundance_s
{
regex_t regex;
} abundance_t;
int64_t abundance_get(char * header, int header_length);
abundance_t * abundance_init(void);
void abundance_exit(abundance_t * a);
int64_t abundance_get(abundance_t * a, char * header);
void abundance_fprint_header_with_size(abundance_t * a,
FILE * fp,
void abundance_fprint_header_with_size(FILE * fp,
char * header,
int header_length,
uint64_t size);
void abundance_fprint_header_strip_size(abundance_t * a,
FILE * fp,
void abundance_fprint_header_strip_size(FILE * fp,
char * header,
int header_length);
char * abundance_strip_size(abundance_t * a,
char * header,
char * abundance_strip_size(char * header,
int header_length);
......@@ -891,10 +891,10 @@ void backtrack16(s16info_s * s,
uint64_t qlen = s->qlen;
char * qseq = s->qseq;
uint64_t maskup = 3UL << (2*channel+ 0);
uint64_t maskleft = 3UL << (2*channel+16);
uint64_t maskextup = 3UL << (2*channel+32);
uint64_t maskextleft = 3UL << (2*channel+48);
uint64_t maskup = 3ULL << (2*channel+ 0);
uint64_t maskleft = 3ULL << (2*channel+16);
uint64_t maskextup = 3ULL << (2*channel+32);
uint64_t maskextleft = 3ULL << (2*channel+48);
#if 0
......
......@@ -81,6 +81,9 @@ static FILE * fp_fastapairs = 0;
static FILE * fp_matched = 0;
static FILE * fp_notmatched = 0;
static int count_matched = 0;
static int count_notmatched = 0;
inline int allpairs_hit_compare_typed(struct hit * x, struct hit * y)
{
// high id, then low id
......@@ -213,23 +216,31 @@ void allpairs_output_results(int hit_count,
if (hit_count)
{
count_matched++;
if (opt_matched)
{
fasta_print(fp_matched,
query_head,
fasta_print_general(fp_matched,
0,
qsequence,
qseqlen);
}
qseqlen,
query_head,
strlen(query_head),
0,
count_matched,
-1, -1, 0, 0.0);
}
else
{
count_notmatched++;
if (opt_notmatched)
{
fasta_print(fp_notmatched,
query_head,
fasta_print_general(fp_notmatched,
0,
qsequence,
qseqlen);
}
qseqlen,
query_head,
strlen(query_head),
0,
count_notmatched,
-1, -1, 0, 0.0);
}
}
......@@ -534,7 +545,7 @@ void allpairs_global(char * cmdline, char * progheader)
if (opt_alnout)
{
fp_alnout = fopen(opt_alnout, "w");
fp_alnout = fopen_output(opt_alnout);
if (! fp_alnout)
fatal("Unable to open alignment output file for writing");
......@@ -544,49 +555,49 @@ void allpairs_global(char * cmdline, char * progheader)
if (opt_samout)
{
fp_samout = fopen(opt_samout, "w");
fp_samout = fopen_output(opt_samout);
if (! fp_samout)
fatal("Unable to open SAM output file for writing");
}
if (opt_userout)
{
fp_userout = fopen(opt_userout, "w");
fp_userout = fopen_output(opt_userout);
if (! fp_userout)
fatal("Unable to open user-defined output file for writing");
}
if (opt_blast6out)
{
fp_blast6out = fopen(opt_blast6out, "w");
fp_blast6out = fopen_output(opt_blast6out);
if (! fp_blast6out)
fatal("Unable to open blast6-like output file for writing");
}
if (opt_uc)
{
fp_uc = fopen(opt_uc, "w");
fp_uc = fopen_output(opt_uc);
if (! fp_uc)
fatal("Unable to open uc output file for writing");
}
if (opt_fastapairs)
{
fp_fastapairs = fopen(opt_fastapairs, "w");
fp_fastapairs = fopen_output(opt_fastapairs);
if (! fp_fastapairs)
fatal("Unable to open fastapairs output file for writing");
}
if (opt_matched)
{
fp_matched = fopen(opt_matched, "w");
fp_matched = fopen_output(opt_matched);
if (! fp_matched)
fatal("Unable to open matched output file for writing");
}
if (opt_notmatched)
{
fp_notmatched = fopen(opt_notmatched, "w");
fp_notmatched = fopen_output(opt_notmatched);
if (! fp_notmatched)
fatal("Unable to open notmatched output file for writing");
}
......
......@@ -241,4 +241,60 @@ void xfree(void * ptr)
fatal("Trying to free a null pointer");
}
int xfstat(int fd, xstat_t * buf)
{
#ifdef _WIN32
return _fstat64(fd, buf);
#else
return fstat(fd, buf);
#endif
}
int xstat(const char * path, xstat_t * buf)
{
#ifdef _WIN32
return _stat64(path, buf);
#else
return stat(path, buf);
#endif
}
uint64_t xlseek(int fd, uint64_t offset, int whence)
{
#ifdef _WIN32
return _lseeki64(fd, offset, whence);
#else
return lseek(fd, offset, whence);
#endif
}
uint64_t xftello(FILE * stream)
{
#ifdef _WIN32
return _ftelli64(stream);
#else
return ftello(stream);
#endif
}
int xopen_read(const char * path)
{
#ifdef _WIN32
return _open(path, _O_RDONLY | _O_BINARY);
#else
return open(path, O_RDONLY);
#endif
}
int xopen_write(const char * path)
{
#ifdef _WIN32
return _open(path,
_O_WRONLY | _O_CREAT | _O_TRUNC | _O_BINARY,
_S_IREAD | _S_IWRITE);
#else
return open(path,
O_WRONLY | O_CREAT | O_TRUNC,
S_IRUSR | S_IWUSR);
#endif
}
......@@ -58,6 +58,12 @@
*/
#ifdef _WIN32
typedef struct __stat64 xstat_t;
#else
typedef struct stat xstat_t;
#endif
uint64_t arch_get_memused();
uint64_t arch_get_memtotal();
long arch_get_cores();
......@@ -67,3 +73,11 @@ uint64_t arch_random();
void * xmalloc(size_t size);
void * xrealloc(void * ptr, size_t size);
void xfree(void * ptr);
int xfstat(int fd, xstat_t * buf);
int xstat(const char * path, xstat_t * buf);
uint64_t xlseek(int fd, uint64_t offset, int whence);
uint64_t xftello(FILE * stream);
int xopen_read(const char * path);
int xopen_write(const char * path);