Skip to content
Commits on Source (109)
......@@ -19,6 +19,3 @@
# Windows
*.bat text eol=crlf
*.cmd text eol=crlf
# .travis.yml merging
.travis.yml merge=ours
......@@ -23,6 +23,7 @@ zstdmt
# Test artefacts
tmp*
dictionary*
NUL
# Build artefacts
projects/
......
v1.3.8
perf: better decompression speed on large files (+7%) and cold dictionaries (+15%)
perf: slightly better compression ratio at high compression modes
api : finalized advanced API, last stage before "stable" status
api : new --rsyncable mode, by @terrelln
api : support decompression of empty frames into NULL (used to be an error) (#1385)
build: new set of macros to build a minimal size decoder, by @felixhandte
build: fix compilation on MIPS32, reported by @clbr (#1441)
build: fix compilation with multiple -arch flags, by @ryandesign
build: highly upgraded meson build, by @lzutao
build: improved buck support, by @obelisk
build: fix cmake script : can create debug build, by @pitrou
build: Makefile : grep works on both colored consoles and systems without color support
build: fixed zstd-pgo, by @bmwiedemann
cli : support ZSTD_CLEVEL environment variable, by @yijinfb (#1423)
cli : --no-progress flag, preserving final summary (#1371), by @terrelln
cli : ensure destination file is not source file (#1422)
cli : clearer error messages, especially when input file not present
doc : clarified zstd_compression_format.md, by @ulikunitz
misc: fixed zstdgrep, returns 1 on failure, by @lzutao
misc: NEWS renamed as CHANGELOG, in accordance with fboss
v1.3.7
perf: slightly better decompression speed on clang (depending on hardware target)
fix : performance of dictionary compression for small input < 4 KB at levels 9 and 10
build: no longer build backtrace by default in release mode; restrict further automatic mode
build: control backtrace support through build macro BACKTRACE
misc: added man pages for zstdless and zstdgrep, by @samrussell
v1.3.6
perf: much faster dictionary builder, by @jenniferliu
perf: faster dictionary compression on small data when using multiple contexts, by @felixhandte
perf: faster dictionary decompression when using a very large number of dictionaries simultaneously
cli : fix : does no longer overwrite destination when source does not exist (#1082)
cli : new command --adapt, for automatic compression level adaptation
api : fix : block api can be streamed with > 4 GB, reported by @catid
api : reduced ZSTD_DDict size by 2 KB
api : minimum negative compression level is defined, and can be queried using ZSTD_minCLevel().
build: support Haiku target, by @korli
build: Read Legacy format is limited to v0.5+ by default. Can be changed at compile time with macro ZSTD_LEGACY_SUPPORT.
doc : zstd_compression_format.md updated to match wording in IETF RFC 8478
misc: tests/paramgrill, a parameter optimizer, by @GeorgeLu97
v1.3.5
perf: much faster dictionary compression, by @felixhandte
perf: small quality improvement for dictionary generation, by @terrelln
perf: slightly improved high compression levels (notably level 19)
mem : automatic memory release for long duration contexts
cli : fix : overlapLog can be manually set
cli : fix : decoding invalid lz4 frames
api : fix : performance degradation for dictionary compression when using advanced API, by @terrelln
api : change : clarify ZSTD_CCtx_reset() vs ZSTD_CCtx_resetParameters(), by @terrelln
build: select custom libzstd scope through control macros, by @GeorgeLu97
build: OpenBSD patch, by @bket
build: make and make all are compatible with -j
doc : clarify zstd_compression_format.md, updated for IETF RFC process
misc: pzstd compatible with reproducible compilation, by @lamby
v1.3.4
perf: faster speed (especially decoding speed) on recent cpus (haswell+)
perf: much better performance associating --long with multi-threading, by @terrelln
perf: better compression at levels 13-15
cli : asynchronous compression by default, for faster experience (use --single-thread for former behavior)
cli : smoother status report in multi-threading mode
cli : added command --fast=#, for faster compression modes
cli : fix crash when not overwriting existing files, by Pádraig Brady (@pixelb)
api : `nbThreads` becomes `nbWorkers` : 1 triggers asynchronous mode
api : compression levels can be negative, for even more speed
api : ZSTD_getFrameProgression() : get precise progress status of ZSTDMT anytime
api : ZSTDMT can accept new compression parameters during compression
api : implemented all advanced dictionary decompression prototypes
build: improved meson recipe, by Shawn Landden (@shawnl)
build: VS2017 scripts, by @HaydnTrigg
misc: all /contrib projects fixed
misc: added /contrib/docker script by @gyscos
v1.3.3
perf: faster zstd_opt strategy (levels 17-19)
perf: faster zstd_opt strategy (levels 16-19)
fix : bug #944 : multithreading with shared ditionary and large data, reported by @gsliepen
cli : fix : content size written in header by default
cli : fix : improved LZ4 format support, by @felixhandte
......
# Code of Conduct
Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
Please read the [full text](https://code.fb.com/codeofconduct/)
so that you can understand what actions will and will not be tolerated.
......@@ -23,20 +23,19 @@ else
EXT =
endif
## default: Build lib-release and zstd-release
.PHONY: default
default: lib-release zstd-release
.PHONY: all
all: | allmost examples manual
all: allmost examples manual contrib
.PHONY: allmost
allmost: allzstd
$(MAKE) -C $(ZWRAPDIR) all
allmost: allzstd zlibwrapper
# skip zwrapper, can't build that on alternate architectures without the proper zlib installed
.PHONY: allzstd
allzstd:
$(MAKE) -C $(ZSTDDIR) all
allzstd: lib
$(MAKE) -C $(PRGDIR) all
$(MAKE) -C $(TESTDIR) all
......@@ -45,49 +44,64 @@ all32:
$(MAKE) -C $(PRGDIR) zstd32
$(MAKE) -C $(TESTDIR) all32
.PHONY: lib
lib:
.PHONY: lib lib-release libzstd.a
lib lib-release :
@$(MAKE) -C $(ZSTDDIR) $@
.PHONY: lib-release
lib-release:
@$(MAKE) -C $(ZSTDDIR)
.PHONY: zstd
zstd:
.PHONY: zstd zstd-release
zstd zstd-release:
@$(MAKE) -C $(PRGDIR) $@
cp $(PRGDIR)/zstd$(EXT) .
.PHONY: zstd-release
zstd-release:
@$(MAKE) -C $(PRGDIR)
cp $(PRGDIR)/zstd$(EXT) .
.PHONY: zstdmt
zstdmt:
@$(MAKE) -C $(PRGDIR) $@
cp $(PRGDIR)/zstd$(EXT) ./zstdmt$(EXT)
.PHONY: zlibwrapper
zlibwrapper:
$(MAKE) -C $(ZWRAPDIR) test
zlibwrapper: lib
$(MAKE) -C $(ZWRAPDIR) all
.PHONY: check
check: shortest
## test: run long-duration tests
.PHONY: test
DEBUGLEVEL ?= 1
test: MOREFLAGS += -g -DDEBUGLEVEL=$(DEBUGLEVEL) -Werror
test:
MOREFLAGS="$(MOREFLAGS)" $(MAKE) -j -C $(PRGDIR) allVariants
$(MAKE) -C $(TESTDIR) $@
.PHONY: test shortest
test shortest:
$(MAKE) -C $(PRGDIR) allVariants MOREFLAGS="-g -DZSTD_DEBUG=1"
## shortest: same as `make check`
.PHONY: shortest
shortest:
$(MAKE) -C $(TESTDIR) $@
## check: run basic tests for `zstd` cli
.PHONY: check
check: shortest
## examples: build all examples in `/examples` directory
.PHONY: examples
examples:
examples: lib
CPPFLAGS=-I../lib LDFLAGS=-L../lib $(MAKE) -C examples/ all
## manual: generate API documentation in html format
.PHONY: manual
manual:
$(MAKE) -C contrib/gen_html $@
## man: generate man page
.PHONY: man
man:
$(MAKE) -C programs $@
## contrib: build all supported projects in `/contrib` directory
.PHONY: contrib
contrib: lib
$(MAKE) -C contrib/pzstd all
$(MAKE) -C contrib/seekable_format/examples all
$(MAKE) -C contrib/adaptive-compression all
$(MAKE) -C contrib/largeNbDicts all
.PHONY: cleanTabs
cleanTabs:
cd contrib; ./cleanTabs
......@@ -100,21 +114,47 @@ clean:
@$(MAKE) -C $(ZWRAPDIR) $@ > $(VOID)
@$(MAKE) -C examples/ $@ > $(VOID)
@$(MAKE) -C contrib/gen_html $@ > $(VOID)
@$(MAKE) -C contrib/pzstd $@ > $(VOID)
@$(MAKE) -C contrib/seekable_format/examples $@ > $(VOID)
@$(MAKE) -C contrib/adaptive-compression $@ > $(VOID)
@$(MAKE) -C contrib/largeNbDicts $@ > $(VOID)
@$(RM) zstd$(EXT) zstdmt$(EXT) tmp*
@$(RM) -r lz4
@echo Cleaning completed
#------------------------------------------------------------------------------
# make install is validated only for Linux, OSX, Hurd and some BSD targets
# make install is validated only for Linux, macOS, Hurd and some BSD targets
#------------------------------------------------------------------------------
ifneq (,$(filter $(shell uname),Linux Darwin GNU/kFreeBSD GNU FreeBSD DragonFly NetBSD MSYS_NT))
ifneq (,$(filter $(shell uname),Linux Darwin GNU/kFreeBSD GNU OpenBSD FreeBSD DragonFly NetBSD MSYS_NT Haiku))
HOST_OS = POSIX
CMAKE_PARAMS = -DZSTD_BUILD_CONTRIB:BOOL=ON -DZSTD_BUILD_STATIC:BOOL=ON -DZSTD_BUILD_TESTS:BOOL=ON -DZSTD_ZLIB_SUPPORT:BOOL=ON -DZSTD_LZMA_SUPPORT:BOOL=ON
CMAKE_PARAMS = -DZSTD_BUILD_CONTRIB:BOOL=ON -DZSTD_BUILD_STATIC:BOOL=ON -DZSTD_BUILD_TESTS:BOOL=ON -DZSTD_ZLIB_SUPPORT:BOOL=ON -DZSTD_LZMA_SUPPORT:BOOL=ON -DCMAKE_BUILD_TYPE=Release
HAVE_COLORNEVER = $(shell echo a | egrep --color=never a > /dev/null 2> /dev/null && echo 1 || echo 0)
EGREP_OPTIONS ?=
ifeq ($HAVE_COLORNEVER, 1)
EGREP_OPTIONS += --color=never
endif
EGREP = egrep $(EGREP_OPTIONS)
# Print a two column output of targets and their description. To add a target description, put a
# comment in the Makefile with the format "## <TARGET>: <DESCRIPTION>". For example:
#
## list: Print all targets and their descriptions (if provided)
.PHONY: list
list:
@$(MAKE) -pRrq -f $(lastword $(MAKEFILE_LIST)) : 2>/dev/null | awk -v RS= -F: '/^# File/,/^# Finished Make data base/ {if ($$1 !~ "^[#.]") {print $$1}}' | sort | egrep -v -e '^[^[:alnum:]]' -e '^$@$$' | xargs
@TARGETS=$$($(MAKE) -pRrq -f $(lastword $(MAKEFILE_LIST)) : 2>/dev/null \
| awk -v RS= -F: '/^# File/,/^# Finished Make data base/ {if ($$1 !~ "^[#.]") {print $$1}}' \
| $(EGREP) -v -e '^[^[:alnum:]]' | sort); \
{ \
printf "Target Name\tDescription\n"; \
printf "%0.s-" {1..16}; printf "\t"; printf "%0.s-" {1..40}; printf "\n"; \
for target in $$TARGETS; do \
line=$$($(EGREP) "^##[[:space:]]+$$target:" $(lastword $(MAKEFILE_LIST))); \
description=$$(echo $$line | awk '{i=index($$0,":"); print substr($$0,i+1)}' | xargs); \
printf "$$target\t$$description\n"; \
done \
} | column -t -s $$'\t'
.PHONY: install clangtest armtest usan asan uasan
install:
......@@ -170,6 +210,7 @@ armfuzz: clean
CC=arm-linux-gnueabi-gcc QEMU_SYS=qemu-arm-static MOREFLAGS="-static" FUZZER_FLAGS=--no-big-tests $(MAKE) -C $(TESTDIR) fuzztest
aarch64fuzz: clean
ld -v
CC=aarch64-linux-gnu-gcc QEMU_SYS=qemu-aarch64-static MOREFLAGS="-static" FUZZER_FLAGS=--no-big-tests $(MAKE) -C $(TESTDIR) fuzztest
ppcfuzz: clean
......@@ -193,7 +234,7 @@ gcc6test: clean
clangtest: clean
clang -v
$(MAKE) all CXX=clang-++ CC=clang MOREFLAGS="-Werror -Wconversion -Wno-sign-conversion -Wdocumentation"
$(MAKE) all CXX=clang++ CC=clang MOREFLAGS="-Werror -Wconversion -Wno-sign-conversion -Wdocumentation"
armtest: clean
$(MAKE) -C $(TESTDIR) datagen # use native, faster
......@@ -231,31 +272,31 @@ msanregressiontest:
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303
usan: clean
$(MAKE) test CC=clang MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize-recover=signed-integer-overflow -fsanitize=undefined"
$(MAKE) test CC=clang MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize-recover=signed-integer-overflow -fsanitize=undefined -Werror"
asan: clean
$(MAKE) test CC=clang MOREFLAGS="-g -fsanitize=address"
$(MAKE) test CC=clang MOREFLAGS="-g -fsanitize=address -Werror"
asan-%: clean
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize=address" $(MAKE) -C $(TESTDIR) $*
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize=address -Werror" $(MAKE) -C $(TESTDIR) $*
msan: clean
$(MAKE) test CC=clang MOREFLAGS="-g -fsanitize=memory -fno-omit-frame-pointer" HAVE_LZMA=0 # datagen.c fails this test for no obvious reason
$(MAKE) test CC=clang MOREFLAGS="-g -fsanitize=memory -fno-omit-frame-pointer -Werror" HAVE_LZMA=0 # datagen.c fails this test for no obvious reason
msan-%: clean
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize=memory -fno-omit-frame-pointer" FUZZER_FLAGS=--no-big-tests $(MAKE) -C $(TESTDIR) HAVE_LZMA=0 $*
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize=memory -fno-omit-frame-pointer -Werror" FUZZER_FLAGS=--no-big-tests $(MAKE) -C $(TESTDIR) HAVE_LZMA=0 $*
asan32: clean
$(MAKE) -C $(TESTDIR) test32 CC=clang MOREFLAGS="-g -fsanitize=address"
uasan: clean
$(MAKE) test CC=clang MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize-recover=signed-integer-overflow -fsanitize=address,undefined"
$(MAKE) test CC=clang MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize-recover=signed-integer-overflow -fsanitize=address,undefined -Werror"
uasan-%: clean
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize-recover=signed-integer-overflow -fsanitize=address,undefined" $(MAKE) -C $(TESTDIR) $*
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize-recover=signed-integer-overflow -fsanitize=address,undefined -Werror" $(MAKE) -C $(TESTDIR) $*
tsan-%: clean
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize=thread" $(MAKE) -C $(TESTDIR) $* FUZZER_FLAGS=--no-big-tests
LDFLAGS=-fuse-ld=gold MOREFLAGS="-g -fno-sanitize-recover=all -fsanitize=thread -Werror" $(MAKE) -C $(TESTDIR) $* FUZZER_FLAGS=--no-big-tests
apt-install:
sudo apt-get -yq --no-install-suggests --no-install-recommends --force-yes install $(APT_PACKAGES)
......@@ -279,6 +320,12 @@ libc6install:
gcc6install: apt-add-repo
APT_PACKAGES="libc6-dev-i386 gcc-multilib gcc-6 gcc-6-multilib" $(MAKE) apt-install
gcc7install: apt-add-repo
APT_PACKAGES="libc6-dev-i386 gcc-multilib gcc-7 gcc-7-multilib" $(MAKE) apt-install
gcc8install: apt-add-repo
APT_PACKAGES="libc6-dev-i386 gcc-multilib gcc-8 gcc-8-multilib" $(MAKE) apt-install
gpp6install: apt-add-repo
APT_PACKAGES="libc6-dev-i386 g++-multilib gcc-6 g++-6 g++-6-multilib" $(MAKE) apt-install
......@@ -310,23 +357,23 @@ cmakebuild:
c90build: clean
$(CC) -v
CFLAGS="-std=c90" $(MAKE) allmost # will fail, due to missing support for `long long`
CFLAGS="-std=c90 -Werror" $(MAKE) allmost # will fail, due to missing support for `long long`
gnu90build: clean
$(CC) -v
CFLAGS="-std=gnu90" $(MAKE) allmost
CFLAGS="-std=gnu90 -Werror" $(MAKE) allmost
c99build: clean
$(CC) -v
CFLAGS="-std=c99" $(MAKE) allmost
CFLAGS="-std=c99 -Werror" $(MAKE) allmost
gnu99build: clean
$(CC) -v
CFLAGS="-std=gnu99" $(MAKE) allmost
CFLAGS="-std=gnu99 -Werror" $(MAKE) allmost
c11build: clean
$(CC) -v
CFLAGS="-std=c11" $(MAKE) allmost
CFLAGS="-std=c11 -Werror" $(MAKE) allmost
bmix64build: clean
$(CC) -v
......@@ -340,7 +387,10 @@ bmi32build: clean
$(CC) -v
CFLAGS="-O3 -mbmi -m32 -Werror" $(MAKE) -C $(TESTDIR) test
staticAnalyze: clean
# static analyzer test uses clang's scan-build
# does not analyze zlibWrapper, due to detected issues in zlib source code
staticAnalyze: SCANBUILD ?= scan-build
staticAnalyze:
$(CC) -v
CPPFLAGS=-g scan-build --status-bugs -v $(MAKE) all
CC=$(CC) CPPFLAGS=-g $(SCANBUILD) --status-bugs -v $(MAKE) allzstd examples contrib
endif
<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/readme/doc/images/zstd_logo86.png" alt="Zstandard"></p>
<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/zstd_logo86.png" alt="Zstandard"></p>
__Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm,
targeting real-time compression scenarios at zlib-level and better compression ratios.
It's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy).
The project is provided as an open-source BSD-licensed **C** library,
The project is provided as an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C** library,
and a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
Should your project require another programming language,
a list of known ports and bindings is provided on [Zstandard homepage](http://www.zstd.net/#other-languages).
Development branch status : [![Build Status][travisDevBadge]][travisLink] [![Build status][AppveyorDevBadge]][AppveyorLink] [![Build status][CircleDevBadge]][CircleLink]
**Development branch status:**
[![Build Status][travisDevBadge]][travisLink]
[![Build status][AppveyorDevBadge]][AppveyorLink]
[![Build status][CircleDevBadge]][CircleLink]
[travisDevBadge]: https://travis-ci.org/facebook/zstd.svg?branch=dev "Continuous Integration test suite"
[travisLink]: https://travis-ci.org/facebook/zstd
......@@ -18,27 +22,28 @@ Development branch status : [![Build Status][travisDevBadge]][travisLink] [![B
[CircleDevBadge]: https://circleci.com/gh/facebook/zstd/tree/dev.svg?style=shield "Short test suite"
[CircleLink]: https://circleci.com/gh/facebook/zstd
### Benchmarks
## Benchmarks
For reference, several fast compression algorithms were tested and compared
on a server running Linux Debian (`Linux version 4.8.0-1-amd64`),
on a server running Linux Debian (`Linux version 4.14.0-3-amd64`),
with a Core i7-6700K CPU @ 4.0GHz,
using [lzbench], an open-source in-memory benchmark by @inikep
compiled with GCC 6.3.0,
compiled with [gcc] 7.3.0,
on the [Silesia compression corpus].
[lzbench]: https://github.com/inikep/lzbench
[Silesia compression corpus]: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
[gcc]: https://gcc.gnu.org/
| Compressor name | Ratio | Compression| Decompress.|
| --------------- | ------| -----------| ---------- |
| **zstd 1.1.3 -1** | 2.877 | 430 MB/s | 1110 MB/s |
| zlib 1.2.8 -1 | 2.743 | 110 MB/s | 400 MB/s |
| brotli 0.5.2 -0 | 2.708 | 400 MB/s | 430 MB/s |
| **zstd 1.3.4 -1** | 2.877 | 470 MB/s | 1380 MB/s |
| zlib 1.2.11 -1 | 2.743 | 110 MB/s | 400 MB/s |
| brotli 1.0.2 -0 | 2.701 | 410 MB/s | 430 MB/s |
| quicklz 1.5.0 -1 | 2.238 | 550 MB/s | 710 MB/s |
| lzo1x 2.09 -1 | 2.108 | 650 MB/s | 830 MB/s |
| lz4 1.7.5 | 2.101 | 720 MB/s | 3600 MB/s |
| snappy 1.1.3 | 2.091 | 500 MB/s | 1650 MB/s |
| lz4 1.8.1 | 2.101 | 750 MB/s | 3700 MB/s |
| snappy 1.1.4 | 2.091 | 530 MB/s | 1800 MB/s |
| lzf 3.6 -1 | 2.077 | 400 MB/s | 860 MB/s |
[zlib]: http://www.zlib.net/
......@@ -50,21 +55,21 @@ Decompression speed is preserved and remains roughly the same at all settings,
a property shared by most LZ compression algorithms, such as [zlib] or lzma.
The following tests were run
on a server running Linux Debian (`Linux version 4.8.0-1-amd64`)
on a server running Linux Debian (`Linux version 4.14.0-3-amd64`)
with a Core i7-6700K CPU @ 4.0GHz,
using [lzbench], an open-source in-memory benchmark by @inikep
compiled with GCC 6.3.0,
compiled with [gcc] 7.3.0,
on the [Silesia compression corpus].
Compression Speed vs Ratio | Decompression Speed
---------------------------|--------------------
![Compression Speed vs Ratio](doc/images/Cspeed4.png "Compression Speed vs Ratio") | ![Decompression Speed](doc/images/Dspeed4.png "Decompression Speed")
![Compression Speed vs Ratio](doc/images/CSpeed2.png "Compression Speed vs Ratio") | ![Decompression Speed](doc/images/DSpeed3.png "Decompression Speed")
A few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph.
For a larger picture including slow modes, [click on this link](doc/images/DCspeed5.png).
### The case for Small Data compression
## The case for Small Data compression
Previous charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives.
......@@ -88,24 +93,24 @@ Training works if there is some correlation in a family of small data samples. T
Hence, deploying one dictionary per type of data will provide the greatest benefits.
Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
#### Dictionary compression How To:
### Dictionary compression How To:
1) Create the dictionary
1. Create the dictionary
`zstd --train FullPathToTrainingSet/* -o dictionaryName`
2) Compress with dictionary
2. Compress with dictionary
`zstd -D dictionaryName FILE`
3) Decompress with dictionary
3. Decompress with dictionary
`zstd -D dictionaryName --decompress FILE.zst`
### Build instructions
## Build instructions
#### Makefile
### Makefile
If your system is compatible with standard `make` (or `gmake`),
invoking `make` in root directory will generate `zstd` cli in root directory.
......@@ -114,35 +119,47 @@ Other available options include:
- `make install` : create and install zstd cli, library and man pages
- `make check` : create and run `zstd`, tests its behavior on local platform
#### cmake
### cmake
A `cmake` project generator is provided within `build/cmake`.
It can generate Makefiles or other build scripts
to create `zstd` binary, and `libzstd` dynamic and static libraries.
#### Meson
By default, `CMAKE_BUILD_TYPE` is set to `Release`.
### Meson
A Meson project is provided within [`build/meson`](build/meson). Follow
build instructions in that directory.
A Meson project is provided within `contrib/meson`.
You can also take a look at [`.travis.yml`](.travis.yml) file for an
example about how Meson is used to build this project.
#### Visual Studio (Windows)
Note that default build type is **release**.
### Visual Studio (Windows)
Going into `build` directory, you will find additional possibilities:
- Projects for Visual Studio 2005, 2008 and 2010.
+ VS2010 project is compatible with VS2012, VS2013 and VS2015.
- Automated build scripts for Visual compiler by @KrzysFR , in `build/VS_scripts`,
+ VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017.
- Automated build scripts for Visual compiler by [@KrzysFR](https://github.com/KrzysFR), in `build/VS_scripts`,
which will build `zstd` cli and `libzstd` library without any need to open Visual Studio solution.
### Buck
You can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo.
The output binary will be in `buck-out/gen/programs/`.
### Status
## Status
Zstandard is currently deployed within Facebook. It is used continuously to compress large amounts of data in multiple formats and use cases.
Zstandard is considered safe for production environments.
### License
## License
Zstandard is dual-licensed under [BSD](LICENSE) and [GPLv2](COPYING).
### Contributing
## Contributing
The "dev" branch is the one where all contributions are merged before reaching "master".
If you plan to propose a patch, please commit into the "dev" branch, or its own feature branch.
......
......@@ -41,4 +41,4 @@ They consist of the following tests:
- `pzstd` with asan and tsan, as well as in 32-bits mode
- Testing `zstd` with legacy mode off
- Testing `zbuff` (old streaming API)
- Entire test suite and make install on OS X
- Entire test suite and make install on macOS
# binaries generated
adapt
datagen
......@@ -22,10 +22,10 @@ FLAGS = $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) $(MULTITHREAD_LDFLAGS)
all: adapt datagen
adapt: $(ZSTD_FILES) adapt.c
adapt: $(ZSTD_FILES) $(PRGDIR)/util.c adapt.c
$(CC) $(FLAGS) $^ -o $@
adapt-debug: $(ZSTD_FILES) adapt.c
adapt-debug: $(ZSTD_FILES) $(PRGDIR)/util.c adapt.c
$(CC) $(FLAGS) -DDEBUG_MODE=2 $^ -o adapt
datagen : $(PRGDIR)/datagen.c datagencli.c
......@@ -48,7 +48,7 @@ clean:
@echo "finished cleaning"
#-----------------------------------------------------------------------------
# make install is validated only for Linux, OSX, BSD, Hurd and Solaris targets
# make install is validated only for Linux, macOS, BSD, Hurd and Solaris targets
#-----------------------------------------------------------------------------
ifneq (,$(filter $(shell uname),Linux Darwin GNU/kFreeBSD GNU OpenBSD FreeBSD NetBSD DragonFly SunOS))
......
......@@ -40,7 +40,6 @@ static unsigned g_compressionLevel = DEFAULT_COMPRESSION_LEVEL;
static UTIL_time_t g_startTime;
static size_t g_streamedSize = 0;
static unsigned g_useProgressBar = 1;
static UTIL_freq_t g_ticksPerSecond;
static unsigned g_forceCompressionLevel = 0;
static unsigned g_minCLevel = 1;
static unsigned g_maxCLevel;
......@@ -576,13 +575,12 @@ static void* compressionThread(void* arg)
/* begin compression */
{
size_t const useDictSize = MIN(getUseableDictSize(cLevel), job->dictSize);
size_t const dictModeError = ZSTD_setCCtxParameter(ctx->cctx, ZSTD_p_forceRawDict, 1);
ZSTD_parameters params = ZSTD_getParams(cLevel, 0, useDictSize);
params.cParams.windowLog = 23;
{
size_t const initError = ZSTD_compressBegin_advanced(ctx->cctx, job->src.start + job->dictSize - useDictSize, useDictSize, params, 0);
size_t const windowSizeError = ZSTD_setCCtxParameter(ctx->cctx, ZSTD_p_forceWindow, 1);
if (ZSTD_isError(dictModeError) || ZSTD_isError(initError) || ZSTD_isError(windowSizeError)) {
size_t const windowSizeError = ZSTD_CCtx_setParameter(ctx->cctx, ZSTD_c_forceMaxWindow, 1);
if (ZSTD_isError(initError) || ZSTD_isError(windowSizeError)) {
DISPLAY("Error: something went wrong while starting compression\n");
signalErrorToThreads(ctx);
return arg;
......@@ -644,21 +642,17 @@ static void* compressionThread(void* arg)
static void displayProgress(unsigned cLevel, unsigned last)
{
UTIL_time_t currTime;
UTIL_getTime(&currTime);
UTIL_time_t currTime = UTIL_getTime();
if (!g_useProgressBar) return;
{
double const timeElapsed = (double)(UTIL_getSpanTimeMicro(g_ticksPerSecond, g_startTime, currTime) / 1000.0);
{ double const timeElapsed = (double)(UTIL_getSpanTimeMicro(g_startTime, currTime) / 1000.0);
double const sizeMB = (double)g_streamedSize / (1 << 20);
double const avgCompRate = sizeMB * 1000 / timeElapsed;
fprintf(stderr, "\r| Comp. Level: %2u | Time Elapsed: %7.2f s | Data Size: %7.1f MB | Avg Comp. Rate: %6.2f MB/s |", cLevel, timeElapsed/1000.0, sizeMB, avgCompRate);
if (last) {
fprintf(stderr, "\n");
}
else {
} else {
fflush(stderr);
}
}
} }
}
static void* outputThread(void* arg)
......@@ -971,7 +965,6 @@ static int compressFilename(const char* const srcFilename, const char* const dst
{
int ret = 0;
fcResources fcr = createFileCompressionResources(srcFilename, dstFilenameOrNull);
UTIL_getTime(&g_startTime);
g_streamedSize = 0;
ret |= performCompression(fcr.ctx, fcr.srcFile, fcr.otArg);
ret |= freeFileCompressionResources(&fcr);
......@@ -1044,8 +1037,6 @@ int main(int argCount, const char* argv[])
filenameTable[0] = stdinmark;
g_maxCLevel = ZSTD_maxCLevel();
UTIL_initTimer(&g_ticksPerSecond);
if (filenameTable == NULL) {
DISPLAY("Error: could not allocate sapce for filename table.\n");
return 1;
......
......@@ -120,7 +120,7 @@ int main(int argc, const char** argv)
DISPLAYLEVEL(4, "Compressible data Generator \n");
if (probaU32!=COMPRESSIBILITY_DEFAULT)
DISPLAYLEVEL(3, "Compressibility : %i%%\n", probaU32);
DISPLAYLEVEL(3, "Seed = %u \n", seed);
DISPLAYLEVEL(3, "Seed = %u \n", (unsigned)seed);
RDG_genStdout(size, (double)probaU32/100, litProba, seed);
DISPLAYLEVEL(1, "\n");
......
# Dockerfile
# First image to build the binary
FROM alpine as builder
RUN apk --no-cache add make gcc libc-dev
COPY . /src
RUN mkdir /pkg && cd /src && make && make DESTDIR=/pkg install
# Second minimal image to only keep the built binary
FROM alpine
# Copy the built files
COPY --from=builder /pkg /
# Copy the license as well
RUN mkdir -p /usr/local/share/licenses/zstd
COPY --from=builder /src/LICENSE /usr/local/share/licences/zstd/
# Just run `zstd` if no other command is given
CMD ["/usr/local/bin/zstd"]
## Requirement
The `Dockerfile` script requires a version of `docker` >= 17.05
## Installing docker
The officiel docker install docs use a ppa with a modern version available:
https://docs.docker.com/install/linux/docker-ce/ubuntu/
## How to run
`docker build -t zstd .`
## test
```
echo foo | docker run -i --rm zstd | docker run -i --rm zstd zstdcat
foo
```
ARG :=
CC ?= gcc
CFLAGS ?= -O3
INCLUDES := -I ../randomDictBuilder -I ../../../programs -I ../../../lib/common -I ../../../lib -I ../../../lib/dictBuilder
RANDOM_FILE := ../randomDictBuilder/random.c
IO_FILE := ../randomDictBuilder/io.c
all: run clean
.PHONY: run
run: benchmark
echo "Benchmarking with $(ARG)"
./benchmark $(ARG)
.PHONY: test
test: benchmarkTest clean
.PHONY: benchmarkTest
benchmarkTest: benchmark test.sh
sh test.sh
benchmark: benchmark.o io.o random.o libzstd.a
$(CC) $(CFLAGS) benchmark.o io.o random.o libzstd.a -o benchmark
benchmark.o: benchmark.c
$(CC) $(CFLAGS) $(INCLUDES) -c benchmark.c
random.o: $(RANDOM_FILE)
$(CC) $(CFLAGS) $(INCLUDES) -c $(RANDOM_FILE)
io.o: $(IO_FILE)
$(CC) $(CFLAGS) $(INCLUDES) -c $(IO_FILE)
libzstd.a:
$(MAKE) -C ../../../lib libzstd.a
mv ../../../lib/libzstd.a .
.PHONY: clean
clean:
rm -f *.o benchmark libzstd.a
$(MAKE) -C ../../../lib clean
echo "Cleaning is completed"
#include <stdio.h> /* fprintf */
#include <stdlib.h> /* malloc, free, qsort */
#include <string.h> /* strcmp, strlen */
#include <errno.h> /* errno */
#include <ctype.h>
#include <time.h>
#include "random.h"
#include "dictBuilder.h"
#include "zstd_internal.h" /* includes zstd.h */
#include "io.h"
#include "util.h"
#include "zdict.h"
/*-*************************************
* Console display
***************************************/
#define DISPLAY(...) fprintf(stderr, __VA_ARGS__)
#define DISPLAYLEVEL(l, ...) if (displayLevel>=l) { DISPLAY(__VA_ARGS__); }
static const U64 g_refreshRate = SEC_TO_MICRO / 6;
static UTIL_time_t g_displayClock = UTIL_TIME_INITIALIZER;
#define DISPLAYUPDATE(l, ...) { if (displayLevel>=l) { \
if ((UTIL_clockSpanMicro(g_displayClock) > g_refreshRate) || (displayLevel>=4)) \
{ g_displayClock = UTIL_getTime(); DISPLAY(__VA_ARGS__); \
if (displayLevel>=4) fflush(stderr); } } }
/*-*************************************
* Exceptions
***************************************/
#ifndef DEBUG
# define DEBUG 0
#endif
#define DEBUGOUTPUT(...) if (DEBUG) DISPLAY(__VA_ARGS__);
#define EXM_THROW(error, ...) \
{ \
DEBUGOUTPUT("Error defined at %s, line %i : \n", __FILE__, __LINE__); \
DISPLAY("Error %i : ", error); \
DISPLAY(__VA_ARGS__); \
DISPLAY("\n"); \
exit(error); \
}
/*-*************************************
* Constants
***************************************/
static const unsigned g_defaultMaxDictSize = 110 KB;
#define DEFAULT_CLEVEL 3
#define DEFAULT_DISPLAYLEVEL 2
/*-*************************************
* Struct
***************************************/
typedef struct {
const void* dictBuffer;
size_t dictSize;
} dictInfo;
/*-*************************************
* Dictionary related operations
***************************************/
/** createDictFromFiles() :
* Based on type of param given, train dictionary using the corresponding algorithm
* @return dictInfo containing dictionary buffer and dictionary size
*/
dictInfo* createDictFromFiles(sampleInfo *info, unsigned maxDictSize,
ZDICT_random_params_t *randomParams, ZDICT_cover_params_t *coverParams,
ZDICT_legacy_params_t *legacyParams, ZDICT_fastCover_params_t *fastParams) {
unsigned const displayLevel = randomParams ? randomParams->zParams.notificationLevel :
coverParams ? coverParams->zParams.notificationLevel :
legacyParams ? legacyParams->zParams.notificationLevel :
fastParams ? fastParams->zParams.notificationLevel :
DEFAULT_DISPLAYLEVEL; /* no dict */
void* const dictBuffer = malloc(maxDictSize);
dictInfo* dInfo = NULL;
/* Checks */
if (!dictBuffer)
EXM_THROW(12, "not enough memory for trainFromFiles"); /* should not happen */
{ size_t dictSize;
if(randomParams) {
dictSize = ZDICT_trainFromBuffer_random(dictBuffer, maxDictSize, info->srcBuffer,
info->samplesSizes, info->nbSamples, *randomParams);
}else if(coverParams) {
/* Run the optimize version if either k or d is not provided */
if (!coverParams->d || !coverParams->k){
dictSize = ZDICT_optimizeTrainFromBuffer_cover(dictBuffer, maxDictSize, info->srcBuffer,
info->samplesSizes, info->nbSamples, coverParams);
} else {
dictSize = ZDICT_trainFromBuffer_cover(dictBuffer, maxDictSize, info->srcBuffer,
info->samplesSizes, info->nbSamples, *coverParams);
}
} else if(legacyParams) {
dictSize = ZDICT_trainFromBuffer_legacy(dictBuffer, maxDictSize, info->srcBuffer,
info->samplesSizes, info->nbSamples, *legacyParams);
} else if(fastParams) {
/* Run the optimize version if either k or d is not provided */
if (!fastParams->d || !fastParams->k) {
dictSize = ZDICT_optimizeTrainFromBuffer_fastCover(dictBuffer, maxDictSize, info->srcBuffer,
info->samplesSizes, info->nbSamples, fastParams);
} else {
dictSize = ZDICT_trainFromBuffer_fastCover(dictBuffer, maxDictSize, info->srcBuffer,
info->samplesSizes, info->nbSamples, *fastParams);
}
} else {
dictSize = 0;
}
if (ZDICT_isError(dictSize)) {
DISPLAYLEVEL(1, "dictionary training failed : %s \n", ZDICT_getErrorName(dictSize)); /* should not happen */
free(dictBuffer);
return dInfo;
}
dInfo = (dictInfo *)malloc(sizeof(dictInfo));
dInfo->dictBuffer = dictBuffer;
dInfo->dictSize = dictSize;
}
return dInfo;
}
/** compressWithDict() :
* Compress samples from sample buffer given dicionary stored on dictionary buffer and compression level
* @return compression ratio
*/
double compressWithDict(sampleInfo *srcInfo, dictInfo* dInfo, int compressionLevel, int displayLevel) {
/* Local variables */
size_t totalCompressedSize = 0;
size_t totalOriginalSize = 0;
const unsigned hasDict = dInfo->dictSize > 0 ? 1 : 0;
double cRatio;
size_t dstCapacity;
int i;
/* Pointers */
ZSTD_CDict *cdict = NULL;
ZSTD_CCtx* cctx = NULL;
size_t *offsets = NULL;
void* dst = NULL;
/* Allocate dst with enough space to compress the maximum sized sample */
{
size_t maxSampleSize = 0;
for (i = 0; i < srcInfo->nbSamples; i++) {
maxSampleSize = MAX(srcInfo->samplesSizes[i], maxSampleSize);
}
dstCapacity = ZSTD_compressBound(maxSampleSize);
dst = malloc(dstCapacity);
}
/* Calculate offset for each sample */
offsets = (size_t *)malloc((srcInfo->nbSamples + 1) * sizeof(size_t));
offsets[0] = 0;
for (i = 1; i <= srcInfo->nbSamples; i++) {
offsets[i] = offsets[i - 1] + srcInfo->samplesSizes[i - 1];
}
/* Create the cctx */
cctx = ZSTD_createCCtx();
if(!cctx || !dst) {
cRatio = -1;
goto _cleanup;
}
/* Create CDict if there's a dictionary stored on buffer */
if (hasDict) {
cdict = ZSTD_createCDict(dInfo->dictBuffer, dInfo->dictSize, compressionLevel);
if(!cdict) {
cRatio = -1;
goto _cleanup;
}
}
/* Compress each sample and sum their sizes*/
const BYTE *const samples = (const BYTE *)srcInfo->srcBuffer;
for (i = 0; i < srcInfo->nbSamples; i++) {
size_t compressedSize;
if(hasDict) {
compressedSize = ZSTD_compress_usingCDict(cctx, dst, dstCapacity, samples + offsets[i], srcInfo->samplesSizes[i], cdict);
} else {
compressedSize = ZSTD_compressCCtx(cctx, dst, dstCapacity,samples + offsets[i], srcInfo->samplesSizes[i], compressionLevel);
}
if (ZSTD_isError(compressedSize)) {
cRatio = -1;
goto _cleanup;
}
totalCompressedSize += compressedSize;
}
/* Sum orignal sizes */
for (i = 0; i<srcInfo->nbSamples; i++) {
totalOriginalSize += srcInfo->samplesSizes[i];
}
/* Calculate compression ratio */
DISPLAYLEVEL(2, "original size is %lu\n", totalOriginalSize);
DISPLAYLEVEL(2, "compressed size is %lu\n", totalCompressedSize);
cRatio = (double)totalOriginalSize/(double)totalCompressedSize;
_cleanup:
free(dst);
free(offsets);
ZSTD_freeCCtx(cctx);
ZSTD_freeCDict(cdict);
return cRatio;
}
/** FreeDictInfo() :
* Free memory allocated for dictInfo
*/
void freeDictInfo(dictInfo* info) {
if (!info) return;
if (info->dictBuffer) free((void*)(info->dictBuffer));
free(info);
}
/*-********************************************************
* Benchmarking functions
**********************************************************/
/** benchmarkDictBuilder() :
* Measure how long a dictionary builder takes and compression ratio with the dictionary built
* @return 0 if benchmark successfully, 1 otherwise
*/
int benchmarkDictBuilder(sampleInfo *srcInfo, unsigned maxDictSize, ZDICT_random_params_t *randomParam,
ZDICT_cover_params_t *coverParam, ZDICT_legacy_params_t *legacyParam,
ZDICT_fastCover_params_t *fastParam) {
/* Local variables */
const unsigned displayLevel = randomParam ? randomParam->zParams.notificationLevel :
coverParam ? coverParam->zParams.notificationLevel :
legacyParam ? legacyParam->zParams.notificationLevel :
fastParam ? fastParam->zParams.notificationLevel:
DEFAULT_DISPLAYLEVEL; /* no dict */
const char* name = randomParam ? "RANDOM" :
coverParam ? "COVER" :
legacyParam ? "LEGACY" :
fastParam ? "FAST":
"NODICT"; /* no dict */
const unsigned cLevel = randomParam ? randomParam->zParams.compressionLevel :
coverParam ? coverParam->zParams.compressionLevel :
legacyParam ? legacyParam->zParams.compressionLevel :
fastParam ? fastParam->zParams.compressionLevel:
DEFAULT_CLEVEL; /* no dict */
int result = 0;
/* Calculate speed */
const UTIL_time_t begin = UTIL_getTime();
dictInfo* dInfo = createDictFromFiles(srcInfo, maxDictSize, randomParam, coverParam, legacyParam, fastParam);
const U64 timeMicro = UTIL_clockSpanMicro(begin);
const double timeSec = timeMicro / (double)SEC_TO_MICRO;
if (!dInfo) {
DISPLAYLEVEL(1, "%s does not train successfully\n", name);
result = 1;
goto _cleanup;
}
DISPLAYLEVEL(1, "%s took %f seconds to execute \n", name, timeSec);
/* Calculate compression ratio */
const double cRatio = compressWithDict(srcInfo, dInfo, cLevel, displayLevel);
if (cRatio < 0) {
DISPLAYLEVEL(1, "Compressing with %s dictionary does not work\n", name);
result = 1;
goto _cleanup;
}
DISPLAYLEVEL(1, "Compression ratio with %s dictionary is %f\n", name, cRatio);
_cleanup:
freeDictInfo(dInfo);
return result;
}
int main(int argCount, const char* argv[])
{
const int displayLevel = DEFAULT_DISPLAYLEVEL;
const char* programName = argv[0];
int result = 0;
/* Initialize arguments to default values */
unsigned k = 200;
unsigned d = 8;
unsigned f;
unsigned accel;
unsigned i;
const unsigned cLevel = DEFAULT_CLEVEL;
const unsigned dictID = 0;
const unsigned maxDictSize = g_defaultMaxDictSize;
/* Initialize table to store input files */
const char** filenameTable = (const char**)malloc(argCount * sizeof(const char*));
unsigned filenameIdx = 0;
char* fileNamesBuf = NULL;
unsigned fileNamesNb = filenameIdx;
const int followLinks = 0;
const char** extendedFileList = NULL;
/* Parse arguments */
for (i = 1; i < argCount; i++) {
const char* argument = argv[i];
if (longCommandWArg(&argument, "in=")) {
filenameTable[filenameIdx] = argument;
filenameIdx++;
continue;
}
DISPLAYLEVEL(1, "benchmark: Incorrect parameters\n");
return 1;
}
/* Get the list of all files recursively (because followLinks==0)*/
extendedFileList = UTIL_createFileList(filenameTable, filenameIdx, &fileNamesBuf,
&fileNamesNb, followLinks);
if (extendedFileList) {
unsigned u;
for (u=0; u<fileNamesNb; u++) DISPLAYLEVEL(4, "%u %s\n", u, extendedFileList[u]);
free((void*)filenameTable);
filenameTable = extendedFileList;
filenameIdx = fileNamesNb;
}
/* get sampleInfo */
size_t blockSize = 0;
sampleInfo* srcInfo= getSampleInfo(filenameTable,
filenameIdx, blockSize, maxDictSize, displayLevel);
/* set up zParams */
ZDICT_params_t zParams;
zParams.compressionLevel = cLevel;
zParams.notificationLevel = displayLevel;
zParams.dictID = dictID;
/* with no dict */
{
const int noDictResult = benchmarkDictBuilder(srcInfo, maxDictSize, NULL, NULL, NULL, NULL);
if(noDictResult) {
result = 1;
goto _cleanup;
}
}
/* for random */
{
ZDICT_random_params_t randomParam;
randomParam.zParams = zParams;
randomParam.k = k;
const int randomResult = benchmarkDictBuilder(srcInfo, maxDictSize, &randomParam, NULL, NULL, NULL);
DISPLAYLEVEL(2, "k=%u\n", randomParam.k);
if(randomResult) {
result = 1;
goto _cleanup;
}
}
/* for legacy */
{
ZDICT_legacy_params_t legacyParam;
legacyParam.zParams = zParams;
legacyParam.selectivityLevel = 9;
const int legacyResult = benchmarkDictBuilder(srcInfo, maxDictSize, NULL, NULL, &legacyParam, NULL);
DISPLAYLEVEL(2, "selectivityLevel=%u\n", legacyParam.selectivityLevel);
if(legacyResult) {
result = 1;
goto _cleanup;
}
}
/* for cover */
{
/* for cover (optimizing k and d) */
ZDICT_cover_params_t coverParam;
memset(&coverParam, 0, sizeof(coverParam));
coverParam.zParams = zParams;
coverParam.splitPoint = 1.0;
coverParam.steps = 40;
coverParam.nbThreads = 1;
const int coverOptResult = benchmarkDictBuilder(srcInfo, maxDictSize, NULL, &coverParam, NULL, NULL);
DISPLAYLEVEL(2, "k=%u\nd=%u\nsteps=%u\nsplit=%u\n", coverParam.k, coverParam.d, coverParam.steps, (unsigned)(coverParam.splitPoint * 100));
if(coverOptResult) {
result = 1;
goto _cleanup;
}
/* for cover (with k and d provided) */
const int coverResult = benchmarkDictBuilder(srcInfo, maxDictSize, NULL, &coverParam, NULL, NULL);
DISPLAYLEVEL(2, "k=%u\nd=%u\nsteps=%u\nsplit=%u\n", coverParam.k, coverParam.d, coverParam.steps, (unsigned)(coverParam.splitPoint * 100));
if(coverResult) {
result = 1;
goto _cleanup;
}
}
/* for fastCover */
for (f = 15; f < 25; f++){
DISPLAYLEVEL(2, "current f is %u\n", f);
for (accel = 1; accel < 11; accel++) {
DISPLAYLEVEL(2, "current accel is %u\n", accel);
/* for fastCover (optimizing k and d) */
ZDICT_fastCover_params_t fastParam;
memset(&fastParam, 0, sizeof(fastParam));
fastParam.zParams = zParams;
fastParam.f = f;
fastParam.steps = 40;
fastParam.nbThreads = 1;
fastParam.accel = accel;
const int fastOptResult = benchmarkDictBuilder(srcInfo, maxDictSize, NULL, NULL, NULL, &fastParam);
DISPLAYLEVEL(2, "k=%u\nd=%u\nf=%u\nsteps=%u\nsplit=%u\naccel=%u\n", fastParam.k, fastParam.d, fastParam.f, fastParam.steps, (unsigned)(fastParam.splitPoint * 100), fastParam.accel);
if(fastOptResult) {
result = 1;
goto _cleanup;
}
/* for fastCover (with k and d provided) */
for (i = 0; i < 5; i++) {
const int fastResult = benchmarkDictBuilder(srcInfo, maxDictSize, NULL, NULL, NULL, &fastParam);
DISPLAYLEVEL(2, "k=%u\nd=%u\nf=%u\nsteps=%u\nsplit=%u\naccel=%u\n", fastParam.k, fastParam.d, fastParam.f, fastParam.steps, (unsigned)(fastParam.splitPoint * 100), fastParam.accel);
if(fastResult) {
result = 1;
goto _cleanup;
}
}
}
}
/* Free allocated memory */
_cleanup:
UTIL_freeFileList(extendedFileList, fileNamesBuf);
freeSampleInfo(srcInfo);
return result;
}
/* ZDICT_trainFromBuffer_legacy() :
* issue : samplesBuffer need to be followed by a noisy guard band.
* work around : duplicate the buffer, and add the noise */
size_t ZDICT_trainFromBuffer_legacy(void* dictBuffer, size_t dictBufferCapacity,
const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
ZDICT_legacy_params_t params);
echo "Benchmark with in=../../lib/common"
./benchmark in=../../../lib/common
ARG :=
CC ?= gcc
CFLAGS ?= -O3 -g
INCLUDES := -I ../../../programs -I ../randomDictBuilder -I ../../../lib/common -I ../../../lib -I ../../../lib/dictBuilder
IO_FILE := ../randomDictBuilder/io.c
TEST_INPUT := ../../../lib
TEST_OUTPUT := fastCoverDict
all: main run clean
.PHONY: test
test: main testrun testshell clean
.PHONY: run
run:
echo "Building a fastCover dictionary with given arguments"
./main $(ARG)
main: main.o io.o fastCover.o libzstd.a
$(CC) $(CFLAGS) main.o io.o fastCover.o libzstd.a -o main
main.o: main.c
$(CC) $(CFLAGS) $(INCLUDES) -c main.c
fastCover.o: fastCover.c
$(CC) $(CFLAGS) $(INCLUDES) -c fastCover.c
io.o: $(IO_FILE)
$(CC) $(CFLAGS) $(INCLUDES) -c $(IO_FILE)
libzstd.a:
$(MAKE) MOREFLAGS=-g -C ../../../lib libzstd.a
mv ../../../lib/libzstd.a .
.PHONY: testrun
testrun: main
echo "Run with $(TEST_INPUT) and $(TEST_OUTPUT) "
./main in=$(TEST_INPUT) out=$(TEST_OUTPUT)
zstd -be3 -D $(TEST_OUTPUT) -r $(TEST_INPUT) -q
rm -f $(TEST_OUTPUT)
.PHONY: testshell
testshell: test.sh
sh test.sh
echo "Finish running test.sh"
.PHONY: clean
clean:
rm -f *.o main libzstd.a
$(MAKE) -C ../../../lib clean
echo "Cleaning is completed"
FastCover Dictionary Builder
### Permitted Arguments:
Input File/Directory (in=fileName): required; file/directory used to build dictionary; if directory, will operate recursively for files inside directory; can include multiple files/directories, each following "in="
Output Dictionary (out=dictName): if not provided, default to fastCoverDict
Dictionary ID (dictID=#): nonnegative number; if not provided, default to 0
Maximum Dictionary Size (maxdict=#): positive number; in bytes, if not provided, default to 110KB
Size of Selected Segment (k=#): positive number; in bytes; if not provided, default to 200
Size of Dmer (d=#): either 6 or 8; if not provided, default to 8
Number of steps (steps=#): positive number, if not provided, default to 32
Percentage of samples used for training(split=#): positive number; if not provided, default to 100
###Running Test:
make test
###Usage:
To build a FASTCOVER dictionary with the provided arguments: make ARG= followed by arguments
If k or d is not provided, the optimize version of FASTCOVER is run.
### Examples:
make ARG="in=../../../lib/dictBuilder out=dict100 dictID=520"
make ARG="in=../../../lib/dictBuilder in=../../../lib/compress"