Skip to content
Commits on Source (6)
cmake_minimum_required(VERSION 3.2)
project(spoa LANGUAGES CXX VERSION 1.1.3)
project(spoa LANGUAGES CXX VERSION 1.1.5)
include(GNUInstallDirs)
......
......@@ -4,7 +4,7 @@
[![Build status for c++/clang++](https://travis-ci.org/rvaser/spoa.svg?branch=master)](https://travis-ci.org/rvaser/spoa)
[![Published in Genome Research](https://img.shields.io/badge/published%20in-Genome%20Research-blue.svg)](https://doi.org/10.1101/gr.214270.116)
Spoa (SIMD POA) is a c++ implementation of the partial order alignment (POA) algorithm (as described in 10.1093/bioinformatics/18.3.452) which is used to generate consensus sequences (as described in 10.1093/bioinformatics/btg109). It supports three alignment modes: local (Smith-Waterman), global (Needleman-Wunsch) and semi-global alignment (overlap). It supports Intel SSE4.1+ and AVX2 (marginally faster due to high latency shifts) vectorization.
Spoa (SIMD POA) is a c++ implementation of the partial order alignment (POA) algorithm (as described in 10.1093/bioinformatics/18.3.452) which is used to generate consensus sequences (as described in 10.1093/bioinformatics/btg109). It supports three alignment modes: local (Smith-Waterman), global (Needleman-Wunsch) and semi-global alignment (overlap). It supports Intel SSE4.1+ and AVX2 vectorization (marginally faster due to high latency shifts).
## Dependencies
......@@ -39,6 +39,8 @@ Optionally, you can run `sudo make install` to install spoa library (and executa
***Note***: if you omitted `--recursive` from `git clone`, run `git submodule init` and `git submodule update` before proceeding with compilation.
To build unit tests add `-Dspoa_build_tests=ON` while running `cmake`. After installation, an executable named `spoa_test` will be created in `build/bin`.
## Usage
Usage of spoa is as following:
......@@ -46,7 +48,8 @@ Usage of spoa is as following:
spoa [options ...] <sequences>
<sequences>
input file in FASTA/FASTQ format containing sequences
input file in FASTA/FASTQ format (can be compressed with gzip)
containing sequences
options:
-m, --match <int>
......@@ -70,6 +73,8 @@ Usage of spoa is as following:
0 - consensus
1 - multiple sequence alignment
2 - 0 & 1
-d, --dot <file>
output file for the final POA graph in DOT format
--version
prints the version number
-h, --help
......
spoa (1.1.5-1) UNRELEASED; urgency=medium
* New upstream version
* debhelper 12
* Standards-Version: 4.3.0
* Upstream changed ABI and we stick here to the broken way to set
soversion = version but upstream was informed to invent better soversions
https://github.com/rvaser/spoa/issues/14
-- Andreas Tille <tille@debian.org> Mon, 28 Jan 2019 20:04:58 +0100
spoa (1.1.3-4) unstable; urgency=medium
* Set architecture to amd64 since the code is not really portable
......
......@@ -3,14 +3,14 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.
Uploaders: Andreas Tille <tille@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 11~),
Build-Depends: debhelper (>= 12~),
cmake,
d-shlibs,
rename,
libbioparser-dev,
libgtest-dev,
zlib1g-dev
Standards-Version: 4.2.1
Standards-Version: 4.3.0
Vcs-Browser: https://salsa.debian.org/med-team/spoa
Vcs-Git: https://salsa.debian.org/med-team/spoa.git
Homepage: https://github.com/rvaser/spoa
......@@ -27,7 +27,7 @@ Description: SIMD partial order alignment tool
(Smith-Waterman), global (Needleman-Wunsch) and semi-global alignment
(overlap).
Package: libspoa1.1.3
Package: libspoa1.1.5
Architecture: amd64
Section: libs
Depends: ${shlibs:Depends},
......@@ -47,7 +47,7 @@ Architecture: amd64
Section: libdevel
Depends: ${shlibs:Depends},
${misc:Depends},
libspoa1.1.3 (= ${binary:Version})
libspoa1.1.5 (= ${binary:Version})
Description: SIMD partial order alignment library (development files)
Spoa (SIMD POA) is a c++ implementation of the partial order alignment
(POA) algorithm (as described in 10.1093/bioinformatics/18.3.452) which
......
libspoa.so.1.1.3 libspoa1.1.3 #MINVER#
libspoa.so.1.1.5 libspoa1.1.5 #MINVER#
_ZN4spoa11createGraphEv@Base 1.1.3
_ZN4spoa15AlignmentEngine25align_sequence_with_graphERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt10unique_ptrINS_5GraphESt14default_deleteISA_EE@Base 1.1.3
_ZN4spoa15AlignmentEngineC1ENS_13AlignmentTypeEaaa@Base 1.1.3
_ZN4spoa15AlignmentEngineC2ENS_13AlignmentTypeEaaa@Base 1.1.3
(optional)_ZN4spoa19SimdAlignmentEngine10initializeINS_14InstructionSetIiEEEEvPKcRKSt10unique_ptrINS_5GraphESt14default_deleteIS7_EEjjj@Base 1.1.3
(optional)_ZN4spoa19SimdAlignmentEngine10initializeINS_14InstructionSetIsEEEEvPKcRKSt10unique_ptrINS_5GraphESt14default_deleteIS7_EEjjj@Base 1.1.3
_ZN4spoa19SimdAlignmentEngine25align_sequence_with_graphEPKcjRKSt10unique_ptrINS_5GraphESt14default_deleteIS4_EE@Base 1.1.3
(optional)_ZN4spoa19SimdAlignmentEngine5alignINS_14InstructionSetIiEEEESt6vectorISt4pairIiiESaIS6_EEPKcjRKSt10unique_ptrINS_5GraphESt14default_deleteISC_EE@Base 1.1.3
(optional)_ZN4spoa19SimdAlignmentEngine5alignINS_14InstructionSetIsEEEESt6vectorISt4pairIiiESaIS6_EEPKcjRKSt10unique_ptrINS_5GraphESt14default_deleteISC_EE@Base 1.1.3
_ZN4spoa19SimdAlignmentEngine7reallocEjjj@Base 1.1.3
_ZN4spoa19SimdAlignmentEngine8preallocEjj@Base 1.1.3
_ZN4spoa19SimdAlignmentEngineC1ENS_13AlignmentTypeEaaa@Base 1.1.3
......@@ -60,12 +56,13 @@ libspoa.so.1.1.3 libspoa1.1.3 #MINVER#
_ZN4spoa5GraphD2Ev@Base 1.1.3
_ZNK4spoa4Node8coverageEv@Base 1.1.3
_ZNK4spoa4Node9successorERjj@Base 1.1.3
_ZNK4spoa5Graph14print_graphvizEv@Base 1.1.3
_ZNK4spoa5Graph16update_alignmentERSt6vectorISt4pairIiiESaIS3_EERKS1_IiSaIiEE@Base 1.1.3
_ZNK4spoa5Graph22extract_subgraph_nodesERSt6vectorIbSaIbEEjj@Base 1.1.3
_ZNK4spoa5Graph23is_topologically_sortedEv@Base 1.1.3
_ZNK4spoa5Graph38initialize_multiple_sequence_alignmentERSt6vectorIjSaIjEE@Base 1.1.3
_ZNK4spoa5Graph8subgraphEjjRSt6vectorIiSaIiEE@Base 1.1.3
_ZNK4spoa5Graph9print_dotERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@Base 1.1.5
_ZNKSt5ctypeIcE8do_widenEc@Base 1.1.5
_ZNSt10_HashtableIjjSaIjENSt8__detail9_IdentityESt8equal_toIjESt4hashIjENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIjLb0EEEm@Base 1.1.3
_ZNSt11_Deque_baseIjSaIjEE17_M_initialize_mapEm@Base 1.1.3
_ZNSt11_Deque_baseIjSaIjEED1Ev@Base 1.1.3
......@@ -85,17 +82,12 @@ libspoa.so.1.1.3 libspoa1.1.3 #MINVER#
_ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE17_M_realloc_insertIJRS5_EEEvN9__gnu_cxx17__normal_iteratorIPS5_S7_EEDpOT_@Base 1.1.3
_ZNSt6vectorISt10shared_ptrIN4spoa4EdgeEESaIS3_EE17_M_realloc_insertIJRS3_EEEvN9__gnu_cxx17__normal_iteratorIPS3_S5_EEDpOT_@Base 1.1.3
_ZNSt6vectorISt10unique_ptrIN4spoa4NodeESt14default_deleteIS2_EESaIS5_EE17_M_realloc_insertIJS5_EEEvN9__gnu_cxx17__normal_iteratorIPS5_S7_EEDpOT_@Base 1.1.3
(optional)_ZNSt6vectorISt4pairIiiESaIS1_EE12emplace_backIJRKjiEEEvDpOT_@Base 1.1.3
(optional)_ZNSt6vectorISt4pairIiiESaIS1_EE12emplace_backIJiRiEEEvDpOT_@Base 1.1.3
(optional)_ZNSt6vectorISt4pairIiiESaIS1_EE12emplace_backIJjiEEEvDpOT_@Base 1.1.3
_ZNSt6vectorISt4pairIiiESaIS1_EE17_M_realloc_insertIJjjEEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_@Base 1.1.3
_ZNSt6vectorIbSaIbEE14_M_fill_insertESt13_Bit_iteratormb@Base 1.1.3
_ZNSt6vectorIiSaIiEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPiS1_EEmRKi@Base 1.1.3
_ZNSt6vectorIjSaIjEE12emplace_backIJRKiEEEvDpOT_@Base 1.1.3
_ZNSt6vectorIjSaIjEE12emplace_backIJRiEEEvDpOT_@Base 1.1.3
_ZNSt6vectorIjSaIjEE12emplace_backIJRjEEEvDpOT_@Base 1.1.3
(optional)_ZNSt6vectorIjSaIjEE12emplace_backIJiEEEvDpOT_@Base 1.1.3
(optional)_ZNSt6vectorIjSaIjEE12emplace_backIJjEEEvDpOT_@Base 1.1.3
_ZNSt6vectorIjSaIjEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPjS1_EEmRKj@Base 1.1.3
_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJRKjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_@Base 1.1.3
_ZTIN4spoa15AlignmentEngineE@Base 1.1.3
......
Note: Patch is not activated to not derive to much from upstream
Description: Invent SOVERSION different from upstream version
Author: Andreas Tille <tille@debian.org>
Last-Update: Mon, 28 Jan 2019 20:04:58 +0100
Bug-Upstream: https://github.com/rvaser/spoa/issues/14
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -41,7 +41,7 @@ target_include_directories(spoa_static P
set_target_properties(spoa
PROPERTIES
VERSION ${spoa_VERSION}
- SOVERSION ${spoa_VERSION})
+ SOVERSION 2)
install(TARGETS spoa DESTINATION ${CMAKE_INSTALL_LIBDIR})
install(TARGETS spoa_static DESTINATION ${CMAKE_INSTALL_LIBDIR})
use_debian_packaged_libs.patch
shared_and_static.patch
no_march.patch
# fix_soversion.patch
......@@ -78,7 +78,7 @@ public:
void update_alignment(Alignment& alignment,
const std::vector<int32_t>& subgraph_to_graph_mapping) const;
void print_graphviz() const;
void print_dot(const std::string& path) const;
friend std::unique_ptr<Graph> createGraph();
private:
......
......@@ -8,6 +8,7 @@
#include <assert.h>
#include <algorithm>
#include <stack>
#include <fstream>
#include "spoa/graph.hpp"
......@@ -679,7 +680,13 @@ void Graph::update_alignment(Alignment& alignment,
}
}
void Graph::print_graphviz() const {
void Graph::print_dot(const std::string& path) const {
if (path.empty()) {
return;
}
std::ofstream out(path);
std::vector<int32_t> in_consensus(nodes_.size(), -1);
int32_t rank = 0;
......@@ -687,30 +694,34 @@ void Graph::print_graphviz() const {
in_consensus[id] = rank++;
}
printf("digraph %u {\n", num_sequences_);
printf(" graph [rankdir=LR]\n");
out << "digraph " << num_sequences_ << " {" << std::endl;
out << " graph [rankdir=LR]" << std::endl;
for (uint32_t i = 0; i < nodes_.size(); ++i) {
printf(" %u [label = \"%u - %c\"", i, i, decoder_[nodes_[i]->code_]);
out << " " << i << " [label = \"" << i << " - ";
out << static_cast<char>(decoder_[nodes_[i]->code_]) << "\"";
if (in_consensus[i] != -1) {
printf(", style=filled, fillcolor=goldenrod1");
out << ", style=filled, fillcolor=goldenrod1";
}
printf("]\n");
out << "]" << std::endl;
for (const auto& edge: nodes_[i]->out_edges_) {
printf(" %u -> %u [label = \"%lu\"", i, edge->end_node_id_,
edge->total_weight_);
out << " " << i << " -> " << edge->end_node_id_;
out << " [label = \"" << edge->total_weight_ << "\"";
if (in_consensus[i] + 1 == in_consensus[edge->end_node_id_]) {
printf(", color=goldenrod1");
out << ", color=goldenrod1";
}
printf("]\n");
out << "]" << std::endl;
}
for (const auto& aid: nodes_[i]->aligned_nodes_ids_) {
if (aid > i) {
printf(" %u -> %u [style = dotted, arrowhead = none]\n", i, aid);
out << " " << i << " -> " << aid;
out << " [style = dotted, arrowhead = none]" << std::endl;
}
}
}
printf("}\n");
out << "}" << std::endl;
out.close();
}
}
......@@ -7,7 +7,7 @@
#include "spoa/spoa.hpp"
#include "bioparser/bioparser.hpp"
static const char* version = "v1.1.3";
static const char* version = "v1.1.5";
static struct option options[] = {
{"match", required_argument, 0, 'm'},
......@@ -15,6 +15,7 @@ static struct option options[] = {
{"gap", required_argument, 0, 'g'},
{"algorithm", required_argument, 0, 'l'},
{"result", required_argument, 0, 'r'},
{"dot", required_argument, 0, 'd'},
{"version", no_argument, 0, 'v'},
{"help", no_argument, 0, 'h'},
{0, 0, 0, 0}
......@@ -31,8 +32,10 @@ int main(int argc, char** argv) {
uint8_t algorithm = 0;
uint8_t result = 0;
std::string dot_path = "";
char opt;
while ((opt = getopt_long(argc, argv, "m:x:g:l:r:h", options, nullptr)) != -1) {
while ((opt = getopt_long(argc, argv, "m:x:g:l:r:d:h", options, nullptr)) != -1) {
switch (opt) {
case 'm':
match = atoi(optarg);
......@@ -49,6 +52,9 @@ int main(int argc, char** argv) {
case 'r':
result = atoi(optarg);
break;
case 'd':
dot_path = optarg;
break;
case 'v':
printf("%s\n", version);
exit(0);
......@@ -66,22 +72,30 @@ int main(int argc, char** argv) {
exit(1);
}
std::string input_path = argv[optind];
auto extension = input_path.substr(std::min(input_path.rfind('.'),
input_path.size()));
std::string sequences_path = argv[optind];
auto is_suffix = [](const std::string& src, const std::string& suffix) -> bool {
if (src.size() < suffix.size()) {
return false;
}
return src.compare(src.size() - suffix.size(), suffix.size(), suffix) == 0;
};
std::unique_ptr<bioparser::Parser<spoa::Sequence>> sparser = nullptr;
if (extension == ".fasta" || extension == ".fa") {
if (is_suffix(sequences_path, ".fasta") || is_suffix(sequences_path, ".fa") ||
is_suffix(sequences_path, ".fasta.gz") || is_suffix(sequences_path, ".fa.gz")) {
sparser = bioparser::createParser<bioparser::FastaParser, spoa::Sequence>(
input_path);
} else if (extension == ".fastq" || extension == ".fq") {
sequences_path);
} else if (is_suffix(sequences_path, ".fastq") || is_suffix(sequences_path, ".fq") ||
is_suffix(sequences_path, ".fastq.gz") || is_suffix(sequences_path, ".fq.gz")) {
sparser = bioparser::createParser<bioparser::FastqParser, spoa::Sequence>(
input_path);
sequences_path);
} else {
fprintf(stderr, "[spoa::] error: "
"file %s has unsupported format extension (valid extensions: "
".fasta, .fa, .fastq, .fq)!\n", input_path.c_str());
".fasta, .fasta.gz, .fa, .fa.gz, .fastq, .fastq.gz, .fq, .fq.gz)!\n",
sequences_path.c_str());
exit(1);
}
......@@ -120,6 +134,8 @@ int main(int argc, char** argv) {
}
}
graph->print_dot(dot_path);
return 0;
}
......@@ -128,7 +144,8 @@ void help() {
"usage: spoa [options ...] <sequences>\n"
"\n"
" <sequences>\n"
" input file in FASTA/FASTQ format containing sequences\n"
" input file in FASTA/FASTQ format (can be compressed with gzip)\n"
" containing sequences\n"
"\n"
" options:\n"
" -m, --match <int>\n"
......@@ -152,6 +169,8 @@ void help() {
" 0 - consensus\n"
" 1 - multiple sequence alignment\n"
" 2 - 0 & 1\n"
" -d, --dot <file>\n"
" output file for the final POA graph in DOT format\n"
" --version\n"
" prints the version number\n"
" -h, --help\n"
......