Skip to content
Commits on Source (8)
......@@ -8,7 +8,7 @@ cmake_minimum_required (VERSION 2.6)
# The default version number is the latest official build
SET (gatb-tool_VERSION_MAJOR 3)
SET (gatb-tool_VERSION_MINOR 2)
SET (gatb-tool_VERSION_PATCH 0)
SET (gatb-tool_VERSION_PATCH 1)
# But, it is possible to define another release number during a local build
IF (DEFINED MAJOR)
......@@ -84,6 +84,8 @@ link_directories (${gatb-core-extra-libraries-path})
set (PROGRAM_SOURCE_DIR ${PROJECT_SOURCE_DIR}/src)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
cmake_policy(SET CMP0009 NEW) # fixes cmake complaining about symlinks
include_directories (${PROGRAM_SOURCE_DIR})
file (GLOB_RECURSE ProjectFiles ${PROGRAM_SOURCE_DIR}/*)
add_executable(${PROJECT_NAME} ${ProjectFiles})
......
......@@ -2,24 +2,27 @@
[![License](http://img.shields.io/:license-affero-blue.svg)](http://www.gnu.org/licenses/agpl-3.0.en.html)
<!---
| **Linux** | **Mac OSX** |
|-----------|-------------|
[![Build Status](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-debian7-64bits-gcc-4.7/badge/icon)](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-debian7-64bits-gcc-4.7/) | [![Build Status](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-macos-10.9.5-gcc-4.2.1/badge/icon)](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-macos-10.9.5-gcc-4.2.1/)
--->
# Before continuing..
# What is Minia ?
If you are looking to do high-quality genome or metagenome assemblies, please go here: https://github.com/GATB/gatb-minia-pipeline This is a pipeline built on top of Minia that does a similar algorithm to metaSpades and MEGAHIT (multi-k assembly).
Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).
# Introduction
# Getting the latest source code
Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Back when it was released, Minia produced results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet). Now (2015 onwards), genome assemblers have evolved and in order ot have high contiguity, see the previous section.
## Requirements
# Getting the latest source code
CMake 2.6+; see http://www.cmake.org/cmake/resources/software.html
## Instructions
C++11 compiler; (g++ version>=4.7 (Linux), clang version>=4.3 (Mac OSX))
It is recommended to use download the latest binary release (Linux or OSX) there: https://github.com/GATB/minia/releases
## Instructions
Otherwise, Minia may be compiled from sources as follows:
# get a local copy of minia source code
git clone --recursive https://github.com/GATB/minia.git
......@@ -28,6 +31,13 @@ C++11 compiler; (g++ version>=4.7 (Linux), clang version>=4.3 (Mac OSX))
cd minia
sh INSTALL
## Requirements
CMake 3.10+; see http://www.cmake.org/cmake/resources/software.html
C++11 compiler; (g++ version>=4.7 (Linux), clang version>=4.3 (Mac OSX))
# User manual
Type `minia` without any arguments for usage instructions.
......
minia (3.2.1-2) UNRELEASED; urgency=medium
minia (3.2.1+git20191130.5b131b9-1) UNRELEASED; urgency=medium
Not uploaded! Just verified that building against
gatb-core 1.4.1+git20191130.664696c+dfsg is OK!
[ Andreas Tille ]
* New upstream git commit since released version is not compatible
with gatb-core 1.4.1+git20191130.664696c+dfsg
* Test-Depends: bandage
* debhelper-compat 12
* Standards-Version: 4.4.1
* Trim trailing whitespace.
* Set upstream metadata fields: Repository, Repository-Browse.
[ Andrius Merkys ]
* Adding missing copyright details for files under thirdparty/contig2fastg/.
-- Andrius Merkys <merkys@debian.org> Wed, 13 Nov 2019 07:06:30 -0500
-- Andreas Tille <tille@debian.org> Thu, 05 Dec 2019 15:25:30 +0100
minia (3.2.1-1) unstable; urgency=medium
......@@ -88,4 +97,3 @@ minia (1.6067+dfsg-1) unstable; urgency=medium
* First debian package (Closes: #735158).
-- Olivier Sallou <osallou@debian.org> Sat, 21 Dec 2013 16:55:29 +0100
......@@ -4,14 +4,14 @@ Uploaders: Olivier Sallou <osallou@debian.org>,
Andreas Tille <tille@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 12~),
Build-Depends: debhelper-compat (= 12),
cmake,
bc,
zlib1g-dev,
libboost-dev,
libgatbcore-dev (>= 1.4.1+git20181225.44d5a44~),
libgatbcore-dev,
libhdf5-dev
Standards-Version: 4.3.0
Standards-Version: 4.4.1
Vcs-Browser: https://salsa.debian.org/med-team/minia
Vcs-Git: https://salsa.debian.org/med-team/minia.git
Homepage: http://minia.genouest.org/
......
Reference:
Author: Rayan Chikhi and Guillaume Rizk
Title: >
Space-Efficient and Exact de Bruijn Graph Representation
Based on a Bloom Filter.
Journal: Algorithms for Molecular Biology
Year: 2013
Volume: 8
Number: 1
Pages: 22
DOI: 10.1186/1748-7188-8-22
PMID: 24040893
URL: http://www.almob.org/content/8/1/22
eprint: http://minia.genouest.org/files/minia.pdf
Author: Rayan Chikhi and Guillaume Rizk
Title: >
Space-Efficient and Exact de Bruijn Graph Representation
Based on a Bloom Filter.
Journal: Algorithms for Molecular Biology
Year: 2013
Volume: 8
Number: 1
Pages: 22
DOI: 10.1186/1748-7188-8-22
PMID: 24040893
URL: http://www.almob.org/content/8/1/22
eprint: http://minia.genouest.org/files/minia.pdf
Registry:
- Name: OMICtools
Entry: OMICS_00022
- Name: bio.tools
Entry: minia
- Name: SciCrunch
Entry: SCR_004986
- Name: OMICtools
Entry: OMICS_00022
- Name: bio.tools
Entry: minia
- Name: SciCrunch
Entry: SCR_004986
Repository: https://github.com/GATB/minia
Repository-Browse: https://github.com/GATB/minia
version=4
https://github.com/GATB/minia/releases .*/archive/v(\d[\d.-]+)\.(?:tar(?:\.gz|\.bz2)?|tgz)
opts="mode=git,pretty=3.2.1+git%cd.%h" \
https://github.com/GATB/minia.git HEAD
# Released version is not compatible with gatb-core 1.4.1+git20191130.664696c+dfsg
# So stick to latest Git commit for the moment
# https://github.com/GATB/minia/releases .*/archive/v(\d[\d.-]+)\.(?:tar(?:\.gz|\.bz2)?|tgz)
......@@ -373,6 +373,65 @@ static bool maybe_merge(uint64_t packed, connections_index_t &connections_index,
return true;
}
static void
parse_unitig_header(string header, float& mean_abundance)
{
bool debug = false;
if (debug) std::cout << "parsing unitig links for " << header << std::endl;
std::stringstream stream(header);
while(1) {
string tok;
stream >> tok;
if(!stream)
break;
if (tok.size() < 3)
// that's the id, skip it
continue;
string field = tok.substr(0,2);
if (field == "km")
{
mean_abundance = atof(tok.substr(tok.find_last_of(':')+1).c_str());
//std::cout << "unitig " << header << " mean abundance " << mean_abundance << std::endl;
}
}
}
void renumber_glue_file(string glue_filename, uint64_t nb_out_tigs)
{
{
std::ifstream infile(glue_filename);
std::ofstream outfile(glue_filename+".tmp");
std::string line;
uint64_t counter = 1;
while (std::getline(infile, line))
{
if (line[0] == '>')
{
size_t space_pos = line.find(' ');
/* // yolo
if (space_pos >= line.size())
{
std::cout << "error: no space in this glue file header (" << line << ") contact a developer." << std::endl;
exit(1);
}
*/
auto end_header = line.substr(space_pos);
string new_header = ">" + std::to_string(nb_out_tigs+counter) + end_header;
outfile << new_header << std::endl;
counter++;
}
else
outfile << line << std::endl;
}
} // closes files
file_copy(glue_filename+".tmp",glue_filename);
System::file().remove (glue_filename+".tmp");
}
static void
extend_assembly_with_connections(const string assembly, int k, int nb_threads, bool verbose, connections_index_t &connections_index, connections_t &connections, BankFasta &out, BankFasta &glue)
......@@ -415,6 +474,13 @@ extend_assembly_with_connections(const string assembly, int k, int nb_threads, b
s.getData().setRef ((char*)seq.c_str(), seq.size());
s._comment = string(lmark?"1":"0")+string(rmark?"1":"0"); //We set the sequence comment.
s._comment += " ";
// add coverage information
float mean_abundance;
parse_unitig_header(comment,mean_abundance);
uint nb_kmers = seq.size() - k + 1;
for (uint i = 0; i < nb_kmers; i++)
s._comment += std::to_string((uint)mean_abundance) + " ";
if (lmark || rmark)
glue.insert(s);
......@@ -438,7 +504,8 @@ void merci(int k, string reads, string assembly, int nb_threads, bool verbose)
string linked_assembly = assembly + ".linked";
file_copy(assembly, linked_assembly);
uint64_t nb_tigs = 0;
link_tigs<span>( linked_assembly, k, nb_threads, nb_tigs, verbose);
bool renumber_unitigs = true; // let's allow the input to be anything. Here it doesn't amtter much. We anyway renumber at the end
link_tigs<span>( linked_assembly, k, nb_threads, nb_tigs, verbose, renumber_unitigs);
// real trick here
// tigs of length exactly k are annoying, they need to be handled carefully with UNITIG_BOTH positions
......@@ -463,11 +530,20 @@ void merci(int k, string reads, string assembly, int nb_threads, bool verbose)
glue.flush();
// glue what needs to be glued. magic, we're re-using bcalm code
bglue<span> (nullptr /*no storage*/, assembly+".glue", k, 0, nb_threads, verbose);
bglue<span> (nullptr /*no storage*/, assembly+".glue", k, 0, nb_threads, false, verbose);
// renumber the .glue file just to avoid ID collision with .merci file
renumber_glue_file(assembly+".glue", nb_tigs );
// append glued to merci
out.flush();
file_append(assembly+".merci", assembly+".glue");
// bglue drop links so let's recreate them
k += 1;
file_copy(assembly+".merci", assembly+".merci.b4link");
renumber_unitigs = true; // here it's absolutely mandatory to renumber if we want the output to be processed by minia
link_tigs<span>( assembly+".merci", k, nb_threads, nb_tigs, verbose, false, renumber_unitigs);
}
class Merci : public gatb::core::tools::misc::impl::Tool
......
......@@ -154,8 +154,7 @@ struct MiniaFunctor { void operator () (Parameter parameter)
// link contigs
uint nb_threads = 1; // doesn't matter because for now link_tigs is single-threaded
bool verbose = true;
link_tigs<span>(output, minia.k, nb_threads, minia.nbContigs, verbose);
link_tigs<span>(output, minia.k, nb_threads, minia.nbContigs, verbose, false);
/** We gather some statistics. */
minia.getInfo()->add (1, minia.getTimeInfo().getProperties("time"));
......@@ -274,8 +273,8 @@ string Minia::assemble (/*const, removed because Simplifications isn't const any
graphSimplifications._bulgeLen_kAdd = getInput()->getDouble("-bulge-len-kadd");
if (getParser()->saw("-bulge-altpath-kadd"))
graphSimplifications._bulgeAltPath_kAdd = getInput()->getDouble("-bulge-altpath-kadd");
if (getParser()->saw("-bulge-altpath-covMult"))
graphSimplifications._bulgeAltPath_covMult = getInput()->getDouble("-bulge-altpath-covMult");
if (getParser()->saw("-bulge-altpath-covmult"))
graphSimplifications._bulgeAltPath_covMult = getInput()->getDouble("-bulge-altpath-covmult");
if (getParser()->saw("-ec-len-kmult"))
graphSimplifications._ecLen_kMult = getInput()->getDouble("-ec-len-kmult");
......
3732560f98d63897d2b7a122938d7a42 # osx CI
037b126f9e37db1db55d23eadc40477d # gcc 7 blok-bok
c6e5a2cf1b9c6246129ae4263da749cc # debian CI
e92d66d1e5b7450e6f6d8f6cc1de24bf # osx CI
3192031d3491f3488a210419c50b9d4d # gcc 7 blok-bok
dc556ec0e91c9aad6c1a68e48a2d8456 # debian CI
>works well for k=21; part of genome10K.fasta
CATCGATGCGAGACGCCTGTCGCGGGGAATTGTGGGGCGGACCACGCTCTGGCTAACGAGCTACCGTTTCCTTTAACCTGCCAGACGGTGACCAGGGCCGTTCGGCGTTGCATCGAGCGGTGTCGCTAGCGCAATGCGCAAGATTTTGACATTTACAAGGCAACATTGCAGCGTCCGATGGTCCGGTGGCCTCCAGATAGTGTCCAGTCGCTCTAACTGTATGGAGACCATAGGCATTTACCTTATTCTCATCGCCACGCCCCAAGATCTTTAGGACCCAGCATTCCTTTAACCACTAACATAACGCGTGTCATCTAGTTCAACAACC
>that's the bubble coverage 4
TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
>that's the bubble
TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
>that's the bubble
TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
>that's the bubble
TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
>that's the bubble path 2, coverage 2
TGTCATCTAGTTCAACAACCAAAAAAACGACTCTTGCGCTCGGATGT
>that's the bubble
TGTCATCTAGTTCAACAACCAAAAAAACGACTCTTGCGCTCGGATGT
>remaining part
CGACTCTTGCGCTCGGATGTCCGCAATGGGTTATCCCTATGTTCCGGTAATCTCTCATCTACTAAGCGCCCTAAAGGTCGTATGGTTGGAGGGCGGTTACACACCCTTAAGTACCGAACGATAGAGCACCCGTCTAGGAGGGCGTGCAGGGTCTCCCGCTAGCTAATGGTCACGGCCTCTCTGGGAAAGCTGAACAACGGATGATACCCATACTGCCACTCCAGTACCTGGGCCGCGTGTTGTACGCTGTGTATCTTGAGAGCGTTTCCAGCAGATAGAACAGGATCACATGTACAAA