Skip to content
Commits on Source (8)
language: c++
os: linux
dist: bionic
compiler: gcc
before_install:
- sudo apt-get install -y valgrind
script:
- make
- export PATH=$PWD/bin:$PATH
- git clone https://github.com/frederic-mahe/swarm-tests.git && cd swarm-tests && bash ./run_all_tests.sh | tee tests.log && ! grep -q FAIL tests.log
# SWARM
#
# Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# Contact: Torbjorn Rognes <torognes@ifi.uio.no>,
# Department of Informatics, University of Oslo,
# PO Box 1080 Blindern, NO-0316 Oslo, Norway
# Makefile for SWARM
PROG=bin/swarm
MAN=man/swarm.1
swarm : $(PROG)
$(PROG) :
make -C src swarm
install : $(PROG) $(MAN)
/usr/bin/install -c $(PROG) '/usr/local/bin'
/usr/bin/install -c $(MAN) '/usr/local/share/man/man1'
clean :
make -C src clean
[![Build Status](https://travis-ci.org/torognes/swarm.svg?branch=swarm3)](https://travis-ci.org/torognes/swarm)
# swarm
A robust and fast clustering method for amplicon-based studies.
......@@ -16,7 +18,18 @@ To help users, we describe
starting from raw fastq files, clustering with **swarm** and producing
a filtered OTU table.
swarm 2.0 introduces several novelties and improvements over swarm
swarm 3.0 introduces:
* a much faster default algorithm,
* a reduced memory footprint,
* binaries for Windows x86-64, GNU/Linux ARM 64, and GNU/Linux POWER8,
* an updated, hardened, and thoroughly tested code.
Please note that:
* strict dereplication of input sequences is now mandatory,
* \-\-seeds option (\-w) now outputs results sorted by decreasing
abundance, and then by alphabetical order of sequence labels.
swarm 2.0 introduced several novelties and improvements over swarm
1.0:
* built-in breaking phase now performed automatically,
* possibility to output OTU representatives in fasta format (option
......@@ -24,13 +37,13 @@ swarm 2.0 introduces several novelties and improvements over swarm
* fast algorithm now used by default for *d* = 1 (linear time
complexity),
* a new option called *fastidious* that refines *d* = 1 results and
reduces the number of small OTUs,
reduces the number of small OTUs.
## Common misconceptions
**swarm** is a single-linkage clustering method, with some superficial
similarities with other clustering methods (e.g.,
[Huse et al, 2010](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909393/)). **swarm**'s
similarities with other clustering methods (e.g., [Huse et al,
2010](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909393/)). **swarm**'s
novelty is its iterative growth process and the use of sequence
abundance values to delineate OTUs. **swarm** properly delineates
large OTUs (high recall), and can distinguish OTUs with as little as
......@@ -76,18 +89,18 @@ cgtcgtcgtcgtcgt
where sequence identifiers are unique and end with a value indicating
the number of occurrences of the sequence (e.g., `_1000`). Alternative
format is possible with the option `-z`, please see the
[user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf). Swarm
format is possible with the option `-z`, please see the [user
manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf). Swarm
**requires** each fasta entry to present a number of occurrences to
work properly. That crucial information can be produced during the
[dereplication](#dereplication) step.
[dereplication](#dereplication-mandatory) step.
Use `swarm -h` to get a short help, or see the
[user manual](https://github.com/torognes/swarm/blob/master/man/swarm_manual.pdf)
for a complete description of input/output formats and command line
options.
The memory footprint of **swarm** is roughly 1.6 times the size of the
The memory footprint of **swarm** is roughly 0.6 times the size of the
input fasta file. When using the fastidious option, memory footprint
can increase significantly. See options `-c` and `-y` to control and
cap swarm's memory consumption.
......@@ -105,14 +118,14 @@ using the
```sh
git clone https://github.com/torognes/swarm.git
cd swarm/src/
cd swarm/
make
cd ../bin/
```
If you have administrator privileges, you can make **swarm**
accessible for all users. Simply copy the binary to `/usr/bin/`. The
man page can be installed this way:
accessible for all users. Simply copy the binary `./bin/swarm` to
`/usr/local/bin/` or to `/usr/bin/`. The man page can be installed
this way:
```sh
cd ./man/
......@@ -210,15 +223,10 @@ from two different sets have the same hash code, it means that the
sequences they represent are identical.
If for some reason your fasta entries don't have abundance values, and
you still want to run swarm, you can easily add fake abundance values:
```sh
sed '/^>/ s/$/_1/' amplicons.fasta > amplicons_with_abundances.fasta
```
Alternatively, you may specify a default abundance value with
**swarm**'s `--append-abundance` (`-a`) option to be used when
abundance information is missing from a sequence.
you still want to run swarm (not recommended), you can specify a
default abundance value with **swarm**'s `--append-abundance` (`-a`)
option to be used when abundance information is missing from a
sequence.
### Launch swarm ###
......@@ -305,15 +313,6 @@ rm "${AMPLICONS}"
```
## Troubleshooting ##
If **swarm** exits with an error message saying `This program
requires a processor with SSE2`, your computer is too old to run
**swarm** (or based on a non x86-64 architecture). **swarm** only runs
on CPUs with the SSE2 instructions, i.e. most Intel and AMD CPUs
released since 2004.
## Citation ##
To cite **swarm**, please refer to:
......@@ -333,7 +332,7 @@ You are welcome to:
* submit suggestions and bug-reports at: https://github.com/torognes/swarm/issues
* send a pull request on: https://github.com/torognes/swarm/
* compose a friendly e-mail to: Frédéric Mahé <mahe@rhrk.uni-kl.de> and Torbjørn Rognes <torognes@ifi.uio.no>
* compose a friendly e-mail to: Frédéric Mahé <frederic.mahe@cirad.fr> and Torbjørn Rognes <torognes@ifi.uio.no>
## Third-party pipelines ##
......@@ -356,14 +355,19 @@ You are welcome to:
If you want to try alternative free and open-source clustering
methods, here are some links:
* [VSEARCH](https://github.com/torognes/vsearch)
* [vsearch](https://github.com/torognes/vsearch)
* [Oligotyping](http://merenlab.org/projects/oligotyping/)
* [DNAclust](http://dnaclust.sourceforge.net/)
* [Sumaclust](http://metabarcoding.org/sumatra)
* [Crunchclust](https://code.google.com/p/crunchclust/)
## Version history##
## Version history ##
### version 3.0 ###
**swarm** 3.0 is much faster when _d_ = 1, and consumes less memory.
Strict dereplication is now mandatory.
### version 2.2.2 ###
......
swarm-cluster (3.0.0+dfsg-1) unstable; urgency=medium
* Replace python-markdown by markdown
Closes: #943281
* New upstream version
* debhelper-compat 12
* Standards-Version: 4.4.1
-- Andreas Tille <tille@debian.org> Tue, 17 Dec 2019 11:05:34 +0100
swarm-cluster (2.2.2+dfsg-2) unstable; urgency=medium
* Team upload.
......
......@@ -4,10 +4,10 @@ Uploaders: Tim Booth <tbooth@ceh.ac.uk>,
Andreas Tille <tille@debian.org>
Section: science
Priority: optional
Build-Depends: debhelper (>= 11~),
Build-Depends: debhelper-compat (= 12),
man-db,
python-markdown
Standards-Version: 4.1.4
markdown
Standards-Version: 4.4.1
Vcs-Browser: https://salsa.debian.org/med-team/swarm-cluster
Vcs-Git: https://salsa.debian.org/med-team/swarm-cluster.git
Homepage: https://github.com/torognes/swarm
......
......@@ -2,15 +2,12 @@ Description: allow to override $CXX
Author: Sascha Steinbiss <satta@debian.org>
--- a/src/Makefile
+++ b/src/Makefile
@@ -28,9 +28,9 @@ COMMON=-g
LIBS=-lpthread
LINKFLAGS=$(COMMON) $(LDFLAGS)
@@ -57,7 +57,7 @@ COMMON=$(PROFILING) -g -flto -O3 $(ARCHO
-CXX=g++
+CXX?=g++
WARNINGS=-Wall -Wsign-compare -Wextra -Wpedantic -Wno-long-long
-CXXFLAGS=$(COMMON) $(WARNINGS) -O3 -msse2 -mtune=core2 -Icityhash
+CXXFLAGS+=$(COMMON) $(WARNINGS) -O3 -msse2 -mtune=core2 -Icityhash
LINKFLAGS=$(COMMON) $(LINKOPT) $(LDFLAGS)
-CXXFLAGS=$(COMMON) $(WARNINGS)
+CXXFLAGS+=$(COMMON) $(WARNINGS)
PROG=swarm
......@@ -4,26 +4,26 @@ Description: Propagate hardening options
--- a/src/Makefile
+++ b/src/Makefile
@@ -26,7 +26,7 @@
COMMON=-g
@@ -55,7 +55,7 @@ WARNINGS = -Wall -Wextra $(WARNOPT) \
LIBS=-lpthread
-LINKFLAGS=$(COMMON)
+LINKFLAGS=$(COMMON) $(LDFLAGS)
COMMON=$(PROFILING) -g -flto -O3 $(ARCHOPT)
CXX=g++
WARNINGS=-Wall -Wsign-compare -Wextra -Wpedantic -Wno-long-long
@@ -43,7 +43,7 @@ DEPS=Makefile swarm.h bitmap.h bloom.h c
-LINKFLAGS=$(COMMON) $(LINKOPT)
+LINKFLAGS=$(COMMON) $(LINKOPT) $(LDFLAGS)
CXXFLAGS=$(COMMON) $(WARNINGS)
@@ -72,7 +72,7 @@ DEPS=Makefile swarm.h city.h citycrc.h \
all : $(PROG)
swarm : $(OBJS)
swarm : $(OBJS) $(DEPS)
- $(CXX) $(LINKFLAGS) -o $@ $(OBJS) $(LIBS)
+ $(CXX) $(CPPFLAGS) $(LINKFLAGS) -o $@ $(OBJS) $(LIBS)
mkdir -p ../bin
cp -a swarm ../bin
@@ -51,4 +51,4 @@ clean :
rm -rf swarm *.o *~ ../bin/ gmon.out cityhash/*.o ../man/*~ ../*~
@@ -83,4 +83,4 @@ clean :
$(CXX) $(CXXFLAGS) -c -o $@ $<
ssse3.o : ssse3.cc $(DEPS)
- $(CXX) $(CXXFLAGS) -mssse3 -c -o $@ $<
......
hardening.patch
allow_cxx_override.patch
fix_gcc6.patch
# fix_gcc6.patch
......@@ -8,7 +8,7 @@ export DEB_BUILD_MAINT_OPTIONS = hardening=+all
override_dh_auto_build:
dh_auto_build
markdown_py -f README.html README.md
markdown README.md > README.html
override_dh_auto_clean:
dh_auto_clean
......
This diff is collapsed.
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Read all fasta files and build a sorted amplicon contingency
table. Usage: python amplicon_contingency_table.py samples_*.fas
table. Usage: python3 amplicon_contingency_table.py samples_*.fas
"""
from __future__ import print_function
__author__ = "Frédéric Mahé <mahe@rhrk.uni-kl.fr>"
__date__ = "2016/03/12"
__version__ = "$Revision: 2.1"
__author__ = "Frédéric Mahé <frederic.mahe@cirad.fr>"
__date__ = "2019/09/24"
__version__ = "$Revision: 3.0"
import os
import sys
......@@ -35,7 +33,7 @@ def fasta_parse():
sample = os.path.basename(fasta_file)
sample = os.path.splitext(sample)[0]
samples[sample] = samples.get(sample, 0) + 1
with open(fasta_file, "rU") as fasta_file:
with open(fasta_file, "r") as fasta_file:
for line in fasta_file:
if line.startswith(">"):
amplicon, abundance = line.strip(">;\n").split(separator)
......@@ -65,7 +63,7 @@ def main():
all_amplicons, amplicons2samples, samples = fasta_parse()
# Sort amplicons by decreasing abundance (and by amplicon name)
sorted_all_amplicons = sorted(all_amplicons.iteritems(),
sorted_all_amplicons = sorted(iter(all_amplicons.items()),
key=operator.itemgetter(1, 0))
sorted_all_amplicons.reverse()
......
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Visualize the internal structure of a swarm (color vertices by
abundance). Requires the module igraph and python 2.7+.
Limitations: amplicons grafted with the fastidious option will be
discarded and will not be visualized.
abundance). Requires the module igraph and python 3.
"""
from __future__ import print_function
__author__ = "Frédéric Mahé <mahe@rhrk.uni-kl.fr>"
__date__ = "2016/11/09"
__version__ = "$Revision: 3.1"
__author__ = "Frédéric Mahé <frederic.mahe@cirad.fr>"
__date__ = "2019/09/24"
__version__ = "$Revision: 4.0"
import sys
import os.path
from igraph import Graph, plot
from optparse import OptionParser
#*****************************************************************************#
# *************************************************************************** #
# #
# Functions #
# #
#*****************************************************************************#
# *************************************************************************** #
def option_parse():
......@@ -76,10 +71,11 @@ def parse_files(swarms, internal_structure, OTU, drop):
"""
# List amplicon ids and abundances
amplicons = list()
with open(swarms, "rU") as swarms:
with open(swarms, "r") as swarms:
for i, swarm in enumerate(swarms):
if i == OTU - 1:
# Deal with ";size=" in a rather clumsy way... but it works
print("Reading target OTU", file=sys.stdout)
amplicons = [
tuple(
item.replace(";size=", "_").rstrip(";").rsplit("_", 1))
......@@ -88,6 +84,7 @@ def parse_files(swarms, internal_structure, OTU, drop):
# Drop amplicons with a low abundance (remove connections too)
if drop:
print("Excluding amplicons below threshold", file=sys.stdout)
amplicons = [amplicon for amplicon in amplicons
if int(amplicon[1]) > drop]
......@@ -98,7 +95,8 @@ def parse_files(swarms, internal_structure, OTU, drop):
# List pairwise relations
relations = list()
with open(internal_structure, "rU") as internal_structure:
with open(internal_structure, "r") as internal_structure:
print("Parsing amplicon relationships", file=sys.stdout)
for line in internal_structure:
# Get the first four elements of the line
ampliconA, ampliconB, d, OTU_number = line.strip().split("\t")[0:4]
......@@ -135,7 +133,7 @@ def build_graph(amplicons, relations):
amplicon_ids = [amplicon[0] for amplicon in amplicons]
abundances = [int(amplicon[1]) for amplicon in amplicons]
minimum, maximum = min(abundances), max(abundances)
maximum = max(abundances)
# Determine canvas size
if len(abundances) < 500:
......@@ -149,6 +147,7 @@ def build_graph(amplicons, relations):
node_colors = list()
node_sizes = list()
node_labels = list()
print("Building graph", file=sys.stdout)
for abundance in abundances:
# Color is coded by a 3-tuple of float values (0.0 to 1.0)
# Start from a max color in rgb(red, green, blue)
......@@ -210,11 +209,11 @@ def main():
return
#*****************************************************************************#
# *************************************************************************** #
# #
# Body #
# #
#*****************************************************************************#
# *************************************************************************** #
if __name__ == '__main__':
......
# SWARM
#
# Copyright (C) 2012-2017 Torbjorn Rognes and Frederic Mahe
#
# Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# Contact: Torbjorn Rognes <torognes@ifi.uio.no>,
# Department of Informatics, University of Oslo,
#
# Contact: Torbjorn Rognes <torognes@ifi.uio.no>,
# Department of Informatics, University of Oslo,
# PO Box 1080 Blindern, NO-0316 Oslo, Norway
# Makefile for SWARM
# Profiling options
#COMMON=-pg -g
COMMON=-g
# PROFILING=-pg
PROFILING=
# Machine specific
MACHINE=$(shell uname -m)
ifeq ($(MACHINE), x86_64)
ARCHOPT = -march=x86-64 -mtune=generic -std=c++11
EXTRAOBJ = ssse3.o
else ifeq ($(MACHINE), aarch64)
ARCHOPT = -march=armv8-a+simd -mtune=generic \
-flax-vector-conversions -std=c++11
EXTRAOBJ =
else ifeq ($(MACHINE), ppc64le)
ARCHOPT = -mcpu=power8 -std=gnu++11
EXTRAOBJ =
endif
LIBS=-lpthread
LINKFLAGS=$(COMMON)
# OS specific
ifeq ($(CXX), x86_64-w64-mingw32-g++)
LIBS = -lpthread -lpsapi
WARNOPT =
LINKOPT = -static
else
LIBS = -lpthread
WARNOPT = -pedantic
LINKOPT =
endif
CXX=g++
WARNINGS=-Wall -Wsign-compare -Wextra -Wpedantic -Wno-long-long
CXXFLAGS=$(COMMON) $(WARNINGS) -O3 -msse2 -mtune=core2 -Icityhash
WARNINGS = -Wall -Wextra $(WARNOPT) \
# -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic
COMMON=$(PROFILING) -g -flto -O3 $(ARCHOPT)
LINKFLAGS=$(COMMON) $(LINKOPT)
CXXFLAGS=$(COMMON) $(WARNINGS)
PROG=swarm
OBJS=swarm.o db.o search8.o search16.o nw.o matrix.o util.o scan.o \
algo.o algod1.o qgram.o ssse3.o derep.o arch.o cityhash/city.o
algo.o algod1.o qgram.o derep.o arch.o city.o \
zobrist.o bloompat.o bloomflex.o variants.o hashtable.o \
$(EXTRAOBJ)
DEPS=Makefile swarm.h bitmap.h bloom.h cityhash/config.h cityhash/city.h \
threads.h
DEPS=Makefile swarm.h city.h citycrc.h \
threads.h zobrist.h bloompat.h bloomflex.h variants.h hashtable.h
all : $(PROG)
swarm : $(OBJS)
swarm : $(OBJS) $(DEPS)
$(CXX) $(LINKFLAGS) -o $@ $(OBJS) $(LIBS)
mkdir -p ../bin
cp -a swarm ../bin
clean :
rm -rf swarm *.o *~ ../bin/ gmon.out cityhash/*.o ../man/*~ ../*~
rm -rf swarm *.o *~ gmon.out
.o : .cc $(DEPS)
$(CXX) $(CXXFLAGS) -c -o $@ $<
ssse3.o : ssse3.cc $(DEPS)
$(CXX) $(CXXFLAGS) -mssse3 -c -o $@ $<
This diff is collapsed.
This diff is collapsed.
/*
SWARM
Copyright (C) 2012-2017 Torbjorn Rognes and Frederic Mahe
Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
......@@ -23,46 +23,100 @@
#include "swarm.h"
unsigned long arch_get_memused()
uint64_t arch_get_memused()
{
#ifdef _WIN32
PROCESS_MEMORY_COUNTERS pmc;
GetProcessMemoryInfo(GetCurrentProcess(),
&pmc,
sizeof(PROCESS_MEMORY_COUNTERS));
return pmc.PeakWorkingSetSize;
#else
struct rusage r_usage;
getrusage(RUSAGE_SELF, & r_usage);
#if defined __APPLE__
# ifdef __APPLE__
/* Mac: ru_maxrss gives the size in bytes */
return r_usage.ru_maxrss;
#else
return static_cast<uint64_t>(r_usage.ru_maxrss);
# else
/* Linux: ru_maxrss gives the size in kilobytes */
return r_usage.ru_maxrss * 1024;
return static_cast<uint64_t>(r_usage.ru_maxrss * 1024);
# endif
#endif
}
unsigned long arch_get_memtotal()
uint64_t arch_get_memtotal()
{
#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
long phys_pages = sysconf(_SC_PHYS_PAGES);
long pagesize = sysconf(_SC_PAGESIZE);
#ifdef _WIN32
if ((phys_pages == -1) || (pagesize == -1))
fatal("Cannot determine amount of memory");
return pagesize * phys_pages;
MEMORYSTATUSEX ms;
ms.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&ms);
return ms.ullTotalPhys;
#elif defined(__APPLE__)
int mib [] = { CTL_HW, HW_MEMSIZE };
int64_t ram = 0;
size_t length = sizeof(ram);
if(sysctl(mib, 2, &ram, &length, NULL, 0) == -1)
fatal("Cannot determine amount of memory");
return ram;
if(sysctl(mib, 2, &ram, &length, nullptr, 0) == -1)
fatal("Cannot determine amount of RAM");
return static_cast<uint64_t>(ram);
#elif defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
int64_t phys_pages = sysconf(_SC_PHYS_PAGES);
int64_t pagesize = sysconf(_SC_PAGESIZE);
if ((phys_pages == -1) || (pagesize == -1))
fatal("Cannot determine amount of RAM");
return static_cast<uint64_t>(pagesize * phys_pages);
#else
struct sysinfo si;
if (sysinfo(&si))
fatal("Cannot determine amount of memory");
fatal("Cannot determine amount of RAM");
return si.totalram * si.mem_unit;
#endif
}
void arch_srandom(unsigned int seed)
{
/* initialize pseudo-random number generator */
if (seed == 0)
{
#ifdef _WIN32
srand(GetTickCount());
#else
int fd = open("/dev/urandom", O_RDONLY);
if (fd < 0)
fatal("Unable to open /dev/urandom");
if (read(fd, & seed, sizeof(seed)) < 0)
fatal("Unable to read from /dev/urandom");
close(fd);
srandom(seed);
#endif
}
else
{
#ifdef _WIN32
srand(seed);
#else
srandom(seed);
#endif
}
}
uint64_t arch_random()
{
#ifdef _WIN32
return static_cast<uint64_t>(rand());
#else
return static_cast<uint64_t>(random());
#endif
}
/*
SWARM
Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact: Torbjorn Rognes <torognes@ifi.uio.no>,
Department of Informatics, University of Oslo,
PO Box 1080 Blindern, NO-0316 Oslo, Norway
*/
/*
Blocked bloom filter with precomputed bit patterns
as described in
Putze F, Sanders P, Singler J (2009)
Cache-, Hash- and Space-Efficient Bloom Filters
Journal of Experimental Algorithmics, 14, 4
https://doi.org/10.1145/1498698.1594230
*/
#include "swarm.h"
void bloomflex_patterns_generate(struct bloomflex_s * b);
void bloomflex_patterns_generate(struct bloomflex_s * b)
{
#if 0
printf("Generating %" PRIu64 " patterns with %" PRIu64 " bits set.\n",
b->pattern_count,
b->pattern_k);
#endif
for (unsigned int i = 0; i < b->pattern_count; i++)
{
uint64_t pattern = 0;
for (unsigned int j = 0; j < b->pattern_k; j++)
{
uint64_t onebit;
onebit = 1ULL << (arch_random() & 63);
while (pattern & onebit)
onebit = 1ULL << (arch_random() & 63);
pattern |= onebit;
}
b->patterns[i] = pattern;
}
}
struct bloomflex_s * bloomflex_init(uint64_t size, unsigned int k)
{
/* Input size is in bytes for full bitmap */
struct bloomflex_s * b = static_cast<struct bloomflex_s *>(xmalloc(sizeof(struct bloomflex_s)));
b->size = size >> 3;
b->pattern_shift = 16;
b->pattern_count = 1 << b->pattern_shift;
b->pattern_mask = b->pattern_count - 1;
b->pattern_k = k;
b->patterns = static_cast<uint64_t *>(xmalloc(b->pattern_count * 8));
bloomflex_patterns_generate(b);
b->bitmap = static_cast<uint64_t *>(xmalloc(size));
memset(b->bitmap, 0xff, size);
return b;
}
void bloomflex_exit(struct bloomflex_s * b)
{
xfree(b->bitmap);
xfree(b->patterns);
xfree(b);
}
/*
SWARM
Copyright (C) 2012-2017 Torbjorn Rognes and Frederic Mahe
Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
......@@ -21,56 +21,38 @@
PO Box 1080 Blindern, NO-0316 Oslo, Norway
*/
class Bitmap
struct bloomflex_s
{
private:
size_t size; /* size in bits */
unsigned char * data; /* the actual bitmap */
public:
uint64_t size; /* size in number of longs (8 bytes) */
uint64_t pattern_shift;
uint64_t pattern_count;
uint64_t pattern_mask;
uint64_t pattern_k;
uint64_t * bitmap;
uint64_t * patterns;
};
explicit Bitmap(size_t _size)
{
size = _size;
data = (unsigned char *) xmalloc((size+7)/8);
}
~Bitmap()
{
if (data)
free(data);
}
bool get(size_t x)
{
return (data[x >> 3] >> (x & 7)) & 1;
}
struct bloomflex_s * bloomflex_init(uint64_t size, unsigned int k);
void reset_all()
{
memset(data, 0, (size+7)/8);
}
void set_all()
{
memset(data, 255, (size+7)/8);
}
void reset(size_t x)
{
// data[x >> 3] &= ~ (1 << (x & 7));
__sync_fetch_and_and(data + (x >> 3), ~(1 << (x & 7)));
}
void set(size_t x)
{
// data[x >> 3] |= 1 << (x & 7);
__sync_fetch_and_or(data + (x >> 3), 1 << (x & 7));
}
void flip(size_t x)
{
// data[x >> 3] ^= 1 << (x & 7);
__sync_fetch_and_xor(data + (x >> 3), 1 << (x & 7));
}
};
void bloomflex_exit(struct bloomflex_s * b);
inline uint64_t * bloomflex_adr(struct bloomflex_s * b, uint64_t h)
{
return b->bitmap + ((h >> b->pattern_shift) % b->size);
}
inline uint64_t bloomflex_pat(struct bloomflex_s * b,
uint64_t h)
{
return b->patterns[h & b->pattern_mask];
}
inline void bloomflex_set(struct bloomflex_s * b, uint64_t h)
{
* bloomflex_adr(b, h) &= ~ bloomflex_pat(b, h);
}
inline bool bloomflex_get(struct bloomflex_s * b, uint64_t h)
{
return ! (* bloomflex_adr(b, h) & bloomflex_pat(b, h));
}
/*
SWARM
Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact: Torbjorn Rognes <torognes@ifi.uio.no>,
Department of Informatics, University of Oslo,
PO Box 1080 Blindern, NO-0316 Oslo, Norway
*/
/*
Blocked bloom filter with precomputed bit patterns
as described in
Putze F, Sanders P, Singler J (2009)
Cache-, Hash- and Space-Efficient Bloom Filters
Journal of Experimental Algorithmics, 14, 4
https://doi.org/10.1145/1498698.1594230
*/
#include "swarm.h"
void bloom_patterns_generate(struct bloom_s * b);
void bloom_patterns_generate(struct bloom_s * b)
{
const unsigned int k = 8;
for (unsigned int i = 0; i < BLOOM_PATTERN_COUNT; i++)
{
uint64_t pattern = 0;
for (unsigned int j = 0; j < k; j++)
{
uint64_t onebit;
onebit = 1ULL << (arch_random() & 63);
while (pattern & onebit)
onebit = 1ULL << (arch_random() & 63);
pattern |= onebit;
}
b->patterns[i] = pattern;
}
}
void bloom_zap(struct bloom_s * b)
{
memset(b->bitmap, 0xff, b->size);
}
struct bloom_s * bloom_init(uint64_t size)
{
// Size is in bytes for full bitmap, must be power of 2
// at least 8
size = MAX(size, 8);
struct bloom_s * b = static_cast<struct bloom_s *>(xmalloc(sizeof(struct bloom_s)));
b->size = size;
b->mask = (size >> 3) - 1;
b->bitmap = static_cast<uint64_t *>(xmalloc(size));
bloom_zap(b);
bloom_patterns_generate(b);
return b;
}
void bloom_exit(struct bloom_s * b)
{
xfree(b->bitmap);
xfree(b);
}