Commit 82d7c331 authored by Andreas Tille's avatar Andreas Tille

Imported Upstream version 2.5.7

parent 05ef62fe
/win
.DS_Store
._*
*.d
*.db
*.o
*.pyc
/bin64
/build/BUILD.*
/build/COMP.mac
/centos
/ilib64
/lib64
/mac
/Makefile.config.*
/reconfigure
/schema
/test-bin64
/win
This diff is collapsed.
CONTENTS
Public Domain Notice
Exceptions (for bundled 3rd-party code)
Copyright F.A.Q.
==============================================================
PUBLIC DOMAIN NOTICE
National Center for Biotechnology Information
With the exception of certain third-party files summarized below, this
software is a "United States Government Work" under the terms of the
United States Copyright Act. It was written as part of the authors'
official duties as United States Government employees and thus cannot
be copyrighted. This software is freely available to the public for
use. The National Library of Medicine and the U.S. Government have not
placed any restriction on its use or reproduction.
Although all reasonable efforts have been taken to ensure the accuracy
and reliability of the software and data, the NLM and the U.S.
Government do not and cannot warrant the performance or results that
may be obtained by using this software or data. The NLM and the U.S.
Government disclaim all warranties, express or implied, including
warranties of performance, merchantability or fitness for any
particular purpose.
Please cite the authors in any work or product based on this material.
==============================================================
EXCEPTIONS (in all cases excluding NCBI-written makefiles):
See LICENSE from https://github.com/ncbi/ncbi-vdb
==============================================================
Copyright F.A.Q.
--------------------------------------------------------------
Q. Our product makes use of the NCBI source code, and we made changes
and additions to that version of the NCBI code to better fit it to
our needs. Can we copyright the code, and how?
A. You can copyright only the *changes* or the *additions* you made to the
NCBI source code. You should identify unambiguously those sections of
the code that were modified, e.g. by commenting any changes you made
in the code you distribute. Therefore, your license has to make clear
to users that your product is a combination of code that is public domain
within the U.S. (but may be subject to copyright by the U.S. in foreign
countries) and code that has been created or modified by you.
--------------------------------------------------------------
Q. Can we (re)license all or part of the NCBI source code?
A. No, you cannot license or relicense the source code written by NCBI
since you cannot claim any copyright in the software that was developed
at NCBI as a 'government work' and consequently is in the public domain
within the U.S.
--------------------------------------------------------------
Q. What if these copyright guidelines are not clear enough or are not
applicable to my particular case?
A. Contact us. Send your questions to 'sra-tools@ncbi.nlm.nih.gov'.
......@@ -61,10 +61,15 @@ $(SUBDIRS_STD):
#-------------------------------------------------------------------------------
# install
#
install: std
$(MAKE) -s TOP=$(CURDIR) -f build/Makefile.install install
install:
@ echo "Checking make status of tools..."
@ $(MAKE) -s --no-print-directory TOP=$(CURDIR) std
@ $(MAKE) -s TOP=$(CURDIR) -f build/Makefile.install install
.PHONY: install
uninstall:
@ $(MAKE) -s TOP=$(CURDIR) -f build/Makefile.install uninstall
.PHONY: install uninstall
#-------------------------------------------------------------------------------
# clean
......@@ -128,7 +133,7 @@ $(REPORTS):
# configuration help
#
help configure:
@ echo "Before initial build, run 'make OUTDIR=<dir> out' from"
@ echo "Before initial build, run './configure --build-prefix=<out>' from"
@ echo "the project root to set the output directory of your builds."
@ echo
@ echo "To select a compiler, run 'make <comp>' where"
......
......@@ -27,36 +27,29 @@ The NCBI SRA ( Sequence Read Archive )
Contact: sra-tools@ncbi.nlm.nih.gov
http://trace.ncbi.nlm.nih.gov/Traces/sra/std
The SRA Toolkit and SDK from NCBI is a collection of tools and
libraries for using data in the INSDC Sequence Read Archives.
With this release, NCBI has implemented Compression by Reference, a
sequence alignment compression process for storing sequence data.
Currently BAM, Complete Genomics and Illumina export.txt formats
contain alignment information. Compression by Reference only stores
the difference in base pairs between sequence data and the segments it
aligns to. The decompression process to restore original data such as
fastq-dump would require fast access to the actual sequences of the
references. NCBI recommends that SRA users dedicate local disk space
to store local references downloaded from the NCBI SRA site. Linked
references should be in a location accessible by the SRA Reader
software.
Older files in the NCBI system may not have been compressed using
Compression by Reference. For more information on how to use
Reference-based compressed files, download local references, and use
related tools please refer to Compression by Reference file on the
NCBI SRA website:
"http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=softwareReadme"
For additional information on using and building the toolkit,
please visit our web site at:
"http://trace.ncbi.nlm.nih.gov/Traces/sra/std"
SRA Tools web site: http://www.ncbi.nlm.nih.gov/Traces/sra/?view=toolkit_doc
Download page: https://github.com/ncbi/sra-tools/wiki/Downloads
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for
using data in the INSDC Sequence Read Archives.
Much of the data submitted these days contain alignment information, for example
in BAM, Illumina export.txt, and Complete Genomics formats. With aligned data,
NCBI uses Compression by Reference, which only stores the differences in base
pairs between sequence data and the segment it aligns to. The process to
restore original data, for example as FastQ, requires fast access to the
reference sequences that the original data was aligned to. NCBI recommends that
SRA users dedicate local disk space to store references downloaded from the NCBI
SRA site. As of February 2015, the complete collection of these reference sequences
is 98 GB. While it isn't usually necessary to download the entirety of the
reference sequences, this should give you an idea of the scale of the storage
requirement. By default, the Toolkit will download missing reference sequences
on demand and cache them in the user's home directory. The location of this
cache is configurable, as is whether the download is automatic or manual.
For additional information on using, configuring, and building the toolkit,
please visit our wiki: https://github.com/ncbi/sra-tools/wiki
or our web site at NCBI: http://www.ncbi.nlm.nih.gov/Traces/sra/?view=toolkit_doc
SRA Toolkit Development Team
# ===========================================================================
#
# PUBLIC DOMAIN NOTICE
# National Center for Biotechnology Information
#
# This software/database is a "United States Government Work" under the
# terms of the United States Copyright Act. It was written as part of
# the author's official duties as a United States Government employee and
# thus cannot be copyrighted. This software/database is freely available
# to the public for use. The National Library of Medicine and the U.S.
# Government have not placed any restriction on its use or reproduction.
#
# Although all reasonable efforts have been taken to ensure the accuracy
# and reliability of the software and data, the NLM and the U.S.
# Government do not and cannot warrant the performance or results that
# may be obtained by using this software or data. The NLM and the U.S.
# Government disclaim all warranties, express or implied, including
# warranties of performance, merchantability or fitness for any particular
# purpose.
#
# Please cite the author in any work or product based on this material.
#
# ===========================================================================
The NCBI SRA ( Sequence Read Archive ) SDK ( Software Development Kit )
Contact: sra-tools@ncbi.nlm.nih.gov
NOTICE:
Effective as of release 2.1.8, NCBI is no longer supporting public sources for
Windows builds. The sources will still contain everything that we use to build
the binaries, but may not build in your environment.
The reason for this change has to do with dependencies upon third party
libraries, which are commonly installed on other platforms, but could be
described as "hit and miss" on Windows. When the libraries DO exist, it is
difficult to know if they have a correct or compatible calling convention.
We will continue to distribute pre-built binaries for Windows. You are welcome
to try your luck at compiling the sources yourself under Cygwin.
ENVIRONMENT:
The Windows build uses the same makefiles as Linux and Mac, and has
been tested under Cygwin. You need to execute Cygwin AFTER sourcing
the Microsoft batch file from Visual Studio. We edit the
"cygwin.bat" file to first source "vsvars32.bat":
@echo off
call "%VS80COMNTOOLS%\vsvars32.bat"
C:
chdir C:\cygwin\bin
bash --login -i
By including vsvars32.bat before launching Cygwin, your bash shell
will have all of the Microsoft tools in its path, and the Microsoft
tools will know how to find includes and libraries.
There is a conflict between the Cygwin and Microsoft link tools. The
GNU version (that "avoids the bells and whistles of the more
commonly-used `ln' command" but otherwise duplicates it) is not used
or needed by the build, while the Microsoft version IS needed. To
address this issue, our linker scripts (build/ld.win.*.sh) reorder
the PATH directories in an attempt to ensure that the correct tool
is located. If this proves ineffective in your environment, try
renaming the GNU tool, e.g.:
$ mv /usr/bin/link.exe /usr/bin/gnu-link.exe
CYGWIN:
While we are using Cygwin as a build environment, the binaries are
NOT Cygwin binaries and do not link against any of their
libraries. As a result, there is no dependency upon their runtime
being present, but neither is there any of their emulation.
In particular, paths behave very differently. Our SDK is based upon
POSIX path conventions. Our Windows tools accept relative and
absolute paths well, in DOS, Cygwin or MinGW POSIX-style form. But
since we do not use Cygwin libraries, our tools have no idea of how
to interpret "full" paths within the Cygwin Unix root, e.g.:
$ cygpath -w /home
C:\cygwin\home
They CAN, however, interpret fully drive-letter qualified Cygwin
paths:
$ fastq-dump /cygdrive/C/cygwin/home/myname/SRRxxxxxx.sra
Internally, this Cygwin path will be interpreted as
C:\cygwin\home\myname\SRRxxxxxx.sra
WINDOWS PATHS IN GENERAL:
Windows is the only supported platform that does not present a
single-root unified file system, such as POSIX. In the POSIX model,
navigation from any one point in the file system to any other is a
matter of entering and exiting directories by name. Under Windows,
each drive has its own file system, and network paths form yet
another file system.
Like Cygwin and MinGW and probably countless other projects, we have
artificially bridged the drive letter file systems by introducing a
virtual root node whose immediate sub-directories are the
single-letter drive names that are currently mounted, e.g.:
C:, D: are real drives, while Z: is a mounted network drive
"/C" and "/D" and "/Z" are all subdirectories of "/"
navigation from C:\cygwin\home to D:\data
/C/cygwin/home/../../../D/data
We have NOT bridged drive letters and network paths at this
point. There is no way for us to get from C:\cygwin\home to
\\server\sradata for example.
Network paths are represented using POSIX-style slashes, but
otherwise look much like their Windows counterparts:
\\server\data
//server/data
Because network and drive paths cannot be connected, we recommend
that you execute your tools completely within one space. If you are
running your tools from local or network mounted drive letters, you
should access data from local or network mounted drives. If you are
using network paths during operation, you should run your binaries
from a network path.
Note that the Windows tool "cmd.exe" does not support cd'ing to a
network path for similar reasons.
Your tools will execute from "cmd.exe" or from bash under Cygwin
since they are general Windows command line utilities.
# ===========================================================================
#
# PUBLIC DOMAIN NOTICE
# National Center for Biotechnology Information
#
# This software/database is a "United States Government Work" under the
# terms of the United States Copyright Act. It was written as part of
# the author's official duties as a United States Government employee and
# thus cannot be copyrighted. This software/database is freely available
# to the public for use. The National Library of Medicine and the U.S.
# Government have not placed any restriction on its use or reproduction.
#
# Although all reasonable efforts have been taken to ensure the accuracy
# and reliability of the software and data, the NLM and the U.S.
# Government do not and cannot warrant the performance or results that
# may be obtained by using this software or data. The NLM and the U.S.
# Government disclaim all warranties, express or implied, including
# warranties of performance, merchantability or fitness for any particular
# purpose.
#
# Please cite the author in any work or product based on this material.
#
# ===========================================================================
The NCBI SRA ( Sequence Read Archive ) SDK ( Software Development Kit )
Contact: sra-tools@ncbi.nlm.nih.gov
http://trace.ncbi.nlm.nih.gov/Traces/sra/std
This version of the NCBI SRA SDK generates loading and dumping tools with
their respective libraries for building new and accessing existing runs.
It may be built with GCC, ICC or Microsoft VC++.
REQUIREMENTS:
This software release was designed to run under Linux, MacOSX and Windows
operating systems on Intel x86-compatible 32 and 64 bit architectures.
ar # tested with version 2.15.90
bash # certain scripts require bash
make # GNU make version 3.80 or later
gcc, g++ # tested with 4.1.2, but should work with others
libz # version 1
libbz2 # version 1
libxml2 # tested with version 2.6.7 [Linux and Mac only]
If your system does NOT have libz or libbz2, or if the build fails due to
missing one of the expected libraries, try running "make all" which will
attempt to download the sources to libz and libbz2 and build them.
OPTIONS:
Specific versions of ICC are supported as an alternate compiler.
icc, icpc # tested with 11.0 (64-bit) and 10.1 (32-bit)
# 32-bit 11.0 does not work
WINDOWS BUILD:
The Windows build uses the same makefiles as Linux and Mac, and has been tested
under Cygwin. You need to execute Cygwin AFTER sourcing the Microsoft batch file
from Visual Studio.
CONTENTS:
CHANGES # describes changes at pertinent levels
Makefile # drives configuration and sub-target builds
README
README-WINDOWS.txt
USAGE
build # holds special makefiles and configuration
interfaces # contains module interfaces, schema, plus
compiler and platform specific includes
libs # sdk library code
tools # toolkit code
test # testing code
CONFIGURATION:
There are three configurable parameters:
1) BUILD = 'debug', 'release' etc.
2) COMP = 'GCC' etc.
3) OUTDIR = <path-to-binaries-libs-objfiles>
The target architecture is chosen to match your build host. At this
time, only the Macintosh build will support cross-compilation. In the
instructions below, x86_64 is the assumed architecture. If your host
is i386 (32-bit), then you would substitute 32 for paths that contain
64.
BUILD INSTRUCTIONS:
## create output directories and symlinks for first time
$ OUTDIR=<path-to-output>
$ make OUTDIR="$OUTDIR" out
The path in OUTDIR MUST be a full path - relative paths may fail.