Skip to content
Commits on Source (2)
# Ning-Compress
# LZF Compressor
## Overview
Ning-compress is a Java library for encoding and decoding data in LZF format, written by Tatu Saloranta (tatu.saloranta@iki.fi)
LZF-compress is a Java library for encoding and decoding data in LZF format,
written by Tatu Saloranta (tatu.saloranta@iki.fi)
Data format and algorithm based on original [LZF library](http://freshmeat.net/projects/liblzf) by Marc A Lehmann. See [LZF Format](https://github.com/ning/compress/wiki/LZFFormat) for full description.
Data format and algorithm based on original [LZF library](http://freshmeat.net/projects/liblzf) by Marc A Lehmann.
See [LZF Format Specification](https://github.com/ning/compress/wiki/LZFFormat) for full description.
Format differs slightly from some other adaptations, such as one used by [H2 database project](http://www.h2database.com) (by Thomas Mueller); although internal block compression structure is the same, block identifiers differ.
This package uses the original LZF identifiers to be 100% compatible with existing command-line lzf tool(s).
Format differs slightly from some other adaptations, such as the one used
by [H2 database project](http://www.h2database.com) (by Thomas Mueller);
although internal block compression structure is the same, block identifiers differ.
This package uses the original LZF identifiers to be 100% compatible with existing command-line `lzf` tool(s).
LZF alfgorithm itself is optimized for speed, with somewhat more modest compression: compared to Deflate (algorithm gzip uses) LZF can be 5-6 times as fast to compress, and twice as fast to decompress.
LZF alfgorithm itself is optimized for speed, with somewhat more modest compression.
Compared to the standard `Deflate` (algorithm gzip uses) LZF can be 5-6 times as fast to compress,
and twice as fast to decompress. Compression rate is lower since no Huffman-encoding is used
after lempel-ziv substring elimintation.
## Usage
......@@ -17,28 +24,52 @@ See [Wiki](https://github.com/ning/compress/wiki) for more details; here's a "TL
Both compression and decompression can be done either by streaming approach:
```java
InputStream in = new LZFInputStream(new FileInputStream("data.lzf"));
OutputStream out = new LZFOutputStream(new FileOutputStream("results.lzf"));
InputStream compIn = new LZFCompressingInputStream(new FileInputStream("stuff.txt"));
```
or by block operation:
```java
byte[] compressed = LZFEncoder.encode(uncompressedData);
byte[] uncompressed = LZFDecoder.decode(compressedData);
```
and you can even use the LZF jar as a command-line tool (it has manifest that points to 'com.ning.compress.lzf.LZF' as the class having main() method to call), like so:
java -jar compress-lzf-1.0.0.jar
java -jar compress-lzf-1.0.3.jar
(which will display necessary usage arguments for `-c`(ompressing) or `-d`(ecompressing) files.
### Parallel processing
Since the compression is more CPU-heavy than decompression, it could benefit from concurrent operation.
This works well with LZF because of its block-oriented nature, so that although there is need for
sequential processing within block (of up to 64kB), encoding of separate blocks can be done completely
independently: there are no dependencies to earlier blocks.
The main abstraction to use is `PLZFOutputStream` which a `FilterOutputStream` and implements
`java.nio.channels.WritableByteChannel` as well. It use is like that of any `OutputStream`:
```java
PLZFOutputStream output = new PLZFOutputStream(new FileOutputStream("stuff.lzf"));
// then write contents:
output.write(buffer);
// ...
output.close();
```
## Interoperability
Besides Java support, LZF codecs / bindings exist for non-JVM languages as well:
* C: [liblzf](http://oldhome.schmorp.de/marc/liblzf.html) (the original LZF package!)
* C#: [C# LZF](https://csharplzfcompression.codeplex.com/)
* Go: [Golly](https://github.com/tav/golly)
* Javascript(!): [http://freecode.com/projects/lzf](freecode LZF) (or via [SourceForge](http://sourceforge.net/projects/lzf/))
* Javascript(!): [freecode LZF](http://freecode.com/projects/lzf) (or via [SourceForge](http://sourceforge.net/projects/lzf/))
* Perl: [Compress::LZF](http://search.cpan.org/dist/Compress-LZF/LZF.pm)
* Python: [Python-LZF](https://github.com/teepark/python-lzf)
* Ruby: [glebtv/lzf](https://github.com/glebtv/lzf), [LZF/Ruby](https://rubyforge.org/projects/lzfruby/)
......@@ -50,3 +81,20 @@ Check out [jvm-compress-benchmark](https://github.com/ning/jvm-compressor-benchm
## More
[Project Wiki](https://github.com/ning/compress/wiki).
## Alternative High-Speed Lempel-Ziv Compressors
LZF belongs to a family of compression codecs called "simple Lempel-Ziv" codecs.
Since LZ compression is also the first part of `deflate` compression (which is used,
along with simple framing, for `gzip`), it can be viewed as "first-part of gzip"
(second part being Huffman-encoding of compressed content).
There are many other codecs in this category, most notable (and competitive being)
* [Snappy](http://en.wikipedia.org/wiki/Snappy_%28software%29)
* [LZ4](http://en.wikipedia.org/wiki/LZ4_%28compression_algorithm%29)
all of which have very similar compression ratios (due to same underlying algorithm,
differences coming from slight encoding variations, and efficiency differences in
back-reference matching), and similar performance profiles regarding ratio of
compression vs uncompression speeds.
1.0.4 (12-Mar-2017)
#43: estimateMaxWorkspaceSize() is too small
(reported by Roman L, leventow@github)
1.0.3 (15-Aug-2014)
#37: Incorrect de-serialization on Big Endian systems, due to incorrect usage of #numberOfTrailingZeroes
(pointed out by Gireesh P, gireeshpunathil@github)
1.0.2 (09-Aug-2014)
#38: Overload of factory methods and constructors in Encoders and Streams
to allow specifying custom `BufferRecycler` instance
(contributed by `serverperformance@github`)
#39: VanillaChunkEncoder.tryCompress() not using 'inPos' as it should, potentially
causing corruption in rare cases
(contributed by Ryan E, rjerns@github)
1.0.1 (08-Apr-2014)
#35: Fix a problem with closing of `DeflaterOutputStream` (for gzip output)
......
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<!-- 13-Mar-2017, tatu: use FasterXML oss-parent over sonatype's, more
likely to get settings that work for releases
-->
<parent>
<groupId>org.sonatype.oss</groupId>
<groupId>com.fasterxml</groupId>
<artifactId>oss-parent</artifactId>
<version>7</version>
<version>24</version>
</parent>
<groupId>com.ning</groupId>
<artifactId>compress-lzf</artifactId>
<name>Compress-LZF</name>
<version>1.0.1</version>
<version>1.0.4</version>
<packaging>bundle</packaging>
<description>
Compression codec for LZF encoding for particularly encoding/decoding, with reasonable compression.
Compressor is basic Lempel-Ziv codec, without Huffman (deflate/gzip) or statistical post-encoding.
See "http://oldhome.schmorp.de/marc/liblzf.html" for more on original LZF package.
</description>
<prerequisites>
<maven>2.2.1</maven>
</prerequisites>
<url>http://github.com/ning/compress</url>
<scm>
<connection>scm:git:git@github.com:ning/compress.git</connection>
<developerConnection>scm:git:git@github.com:ning/compress.git</developerConnection>
<url>http://github.com/ning/compress</url>
<tag>compress-lzf-1.0.4</tag>
</scm>
<issueManagement>
<url>http://github.com/ning/compress/issues</url>
......@@ -60,7 +61,7 @@ See "http://oldhome.schmorp.de/marc/liblzf.html" for more on original LZF packag
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.5.2</version>
<version>6.8.21</version>
<type>jar</type>
<scope>test</scope>
</dependency>
......@@ -71,7 +72,7 @@ See "http://oldhome.schmorp.de/marc/liblzf.html" for more on original LZF packag
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<version>3.1</version>
<!-- 1.6 since 0.9.7 -->
<configuration>
<source>1.6</source>
......@@ -95,13 +96,13 @@ See "http://oldhome.schmorp.de/marc/liblzf.html" for more on original LZF packag
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.6.1</version>
<version>${version.plugin.javadoc}</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>UTF-8</encoding>
<links>
<link>http://docs.oracle.com/javase/6/docs/api/</link>
<link>http://docs.oracle.com/javase/7/docs/api/</link>
</links>
</configuration>
<executions>
......@@ -117,7 +118,6 @@ See "http://oldhome.schmorp.de/marc/liblzf.html" for more on original LZF packag
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-release-plugin</artifactId>
<version>2.1</version>
<configuration>
<mavenExecutorId>forked-path</mavenExecutorId>
</configuration>
......@@ -126,7 +126,7 @@ See "http://oldhome.schmorp.de/marc/liblzf.html" for more on original LZF packag
<plugin>
<groupId>org.apache.felix</groupId>
<artifactId>maven-bundle-plugin</artifactId>
<version>2.3.7</version>
<version>2.5.3</version>
<extensions>true</extensions>
<configuration>
<instructions><!-- note: artifact id, name, version and description use defaults (which are fine) -->
......@@ -149,6 +149,61 @@ com.ning.compress.lzf.util
</instructions>
</configuration>
</plugin>
<!-- EVEN BETTER; make executable! -->
<!-- 08-Sep-2014, tatu: except, doesn't quite work yet. Sigh.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.ning.compress.lzf.LZF</mainClass>
</transformer>
</transformers>
<createDependencyReducedPom>false</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.skife.maven</groupId>
<artifactId>really-executable-jar-maven-plugin</artifactId>
<version>1.2.0</version>
<configuration>
<programFile>lzf</programFile>
<flags>-Xmx200m</flags>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>really-executable-jar</goal>
</goals>
</execution>
</executions>
</plugin>
-->
</plugins>
</build>
......@@ -179,33 +234,5 @@ com.ning.compress.lzf.util
</plugins>
</build>
</profile>
<profile>
<id>offline-testing</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<groups>standalone</groups>
</configuration>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>online-testing</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<groups>standalone, online</groups>
</configuration>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
......@@ -4,7 +4,7 @@ import java.io.IOException;
/**
* Abstract class that defines "push" style API for various uncompressors
* (aka decompressors or decoders). Implements are alternatives to stream
* (aka decompressors or decoders). Implementations are alternatives to stream
* based uncompressors (such as {@link com.ning.compress.lzf.LZFInputStream})
* in cases where "push" operation is important and/or blocking is not allowed;
* for example, when handling asynchronous HTTP responses.
......
......@@ -171,17 +171,22 @@ public class GZIPUncompressor extends Uncompressor
public GZIPUncompressor(DataHandler h)
{
this(h, DEFAULT_CHUNK_SIZE);
this(h, DEFAULT_CHUNK_SIZE, BufferRecycler.instance(), GZIPRecycler.instance());
}
public GZIPUncompressor(DataHandler h, int inputChunkLength)
{
this(h, inputChunkLength, BufferRecycler.instance(), GZIPRecycler.instance());
}
public GZIPUncompressor(DataHandler h, int inputChunkLength, BufferRecycler bufferRecycler, GZIPRecycler gzipRecycler)
{
_inputChunkLength = inputChunkLength;
_handler = h;
_recycler = BufferRecycler.instance();
_decodeBuffer = _recycler.allocDecodeBuffer(DECODE_BUFFER_SIZE);
_gzipRecycler = GZIPRecycler.instance();
_inflater = _gzipRecycler.allocInflater();
_recycler = bufferRecycler;
_decodeBuffer = bufferRecycler.allocDecodeBuffer(DECODE_BUFFER_SIZE);
_gzipRecycler = gzipRecycler;
_inflater = gzipRecycler.allocInflater();
_crc = new CRC32();
}
......
......@@ -77,15 +77,20 @@ public class OptimizedGZIPInputStream
*/
public OptimizedGZIPInputStream(InputStream in) throws IOException
{
this(in, BufferRecycler.instance(), GZIPRecycler.instance());
}
public OptimizedGZIPInputStream(InputStream in, BufferRecycler bufferRecycler, GZIPRecycler gzipRecycler) throws IOException
{
super();
_bufferRecycler = BufferRecycler.instance();
_gzipRecycler = GZIPRecycler.instance();
_bufferRecycler = bufferRecycler;
_gzipRecycler = gzipRecycler;
_rawInput = in;
_buffer = _bufferRecycler.allocInputBuffer(INPUT_BUFFER_SIZE);
_buffer = bufferRecycler.allocInputBuffer(INPUT_BUFFER_SIZE);
_bufferPtr = _bufferEnd = 0;
_inflater = _gzipRecycler.allocInflater();
_inflater = gzipRecycler.allocInflater();
_crc = new CRC32();
// And then need to process header...
......
......@@ -75,22 +75,35 @@ public abstract class ChunkEncoder
protected byte[] _headerBuffer;
/**
* Uses a ThreadLocal soft-referenced BufferRecycler instance.
*
* @param totalLength Total encoded length; used for calculating size
* of hash table to use
*/
protected ChunkEncoder(int totalLength)
{
this(totalLength, BufferRecycler.instance());
}
/**
* @param totalLength Total encoded length; used for calculating size
* of hash table to use
* @param bufferRecycler Buffer recycler instance, for usages where the
* caller manages the recycler instances
*/
protected ChunkEncoder(int totalLength, BufferRecycler bufferRecycler)
{
// Need room for at most a single full chunk
int largestChunkLen = Math.min(totalLength, LZFChunk.MAX_CHUNK_LEN);
int suggestedHashLen = calcHashLen(largestChunkLen);
_recycler = BufferRecycler.instance();
_hashTable = _recycler.allocEncodingHash(suggestedHashLen);
_recycler = bufferRecycler;
_hashTable = bufferRecycler.allocEncodingHash(suggestedHashLen);
_hashModulo = _hashTable.length - 1;
// Ok, then, what's the worst case output buffer length?
// length indicator for each 32 literals, so:
// 21-Feb-2013, tatu: Plus we want to prepend chunk header in place:
int bufferLen = largestChunkLen + ((largestChunkLen + 31) >> 5) + LZFChunk.MAX_HEADER_LEN;
_encodeBuffer = _recycler.allocEncodingBuffer(bufferLen);
_encodeBuffer = bufferRecycler.allocEncodingBuffer(bufferLen);
}
/**
......@@ -98,11 +111,20 @@ public abstract class ChunkEncoder
* buffer, in cases where caller wants full control over allocations.
*/
protected ChunkEncoder(int totalLength, boolean bogus)
{
this(totalLength, BufferRecycler.instance(), bogus);
}
/**
* Alternate constructor used when we want to avoid allocation encoding
* buffer, in cases where caller wants full control over allocations.
*/
protected ChunkEncoder(int totalLength, BufferRecycler bufferRecycler, boolean bogus)
{
int largestChunkLen = Math.max(totalLength, LZFChunk.MAX_CHUNK_LEN);
int suggestedHashLen = calcHashLen(largestChunkLen);
_recycler = BufferRecycler.instance();
_hashTable = _recycler.allocEncodingHash(suggestedHashLen);
_recycler = bufferRecycler;
_hashTable = bufferRecycler.allocEncodingHash(suggestedHashLen);
_hashModulo = _hashTable.length - 1;
_encodeBuffer = null;
}
......@@ -297,6 +319,10 @@ public abstract class ChunkEncoder
return false;
}
public BufferRecycler getBufferRecycler() {
return _recycler;
}
/*
///////////////////////////////////////////////////////////////////////
// Abstract methods for sub-classes
......
......@@ -76,16 +76,24 @@ public class LZFCompressingInputStream extends InputStream
public LZFCompressingInputStream(InputStream in)
{
this(null, in);
this(null, in, BufferRecycler.instance());
}
public LZFCompressingInputStream(final ChunkEncoder encoder, InputStream in)
{
this(encoder, in, null);
}
public LZFCompressingInputStream(final ChunkEncoder encoder, InputStream in, BufferRecycler bufferRecycler)
{
// may be passed by caller, or could be null
_encoder = encoder;
_inputStream = in;
_recycler = BufferRecycler.instance();
_inputBuffer = _recycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
if (bufferRecycler==null) {
bufferRecycler = (encoder!=null) ? _encoder._recycler : BufferRecycler.instance();
}
_recycler = bufferRecycler;
_inputBuffer = bufferRecycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
// let's not yet allocate encoding buffer; don't know optimal size
}
......@@ -259,7 +267,7 @@ public class LZFCompressingInputStream extends InputStream
if (_encoder == null) {
// need 7 byte header, plus regular max buffer size:
int bufferLen = chunkLength + ((chunkLength + 31) >> 5) + 7;
_encoder = ChunkEncoderFactory.optimalNonAllocatingInstance(bufferLen);
_encoder = ChunkEncoderFactory.optimalNonAllocatingInstance(bufferLen, _recycler);
}
if (_encodedBytes == null) {
int bufferLen = chunkLength + ((chunkLength + 31) >> 5) + 7;
......
......@@ -11,6 +11,7 @@
package com.ning.compress.lzf;
import com.ning.compress.BufferRecycler;
import com.ning.compress.lzf.util.ChunkEncoderFactory;
/**
......@@ -22,11 +23,19 @@ import com.ning.compress.lzf.util.ChunkEncoderFactory;
*/
public class LZFEncoder
{
/* Approximate maximum size for a full chunk, in case where it does not compress
* at all. Such chunks are converted to uncompressed chunks, but during compression
* process this amount of space is still needed.
/* Approximate maximum size for a full chunk DURING PROCESSING, in case where it does
* not compress at all. Such chunks are converted to uncompressed chunks,
* but during compression process this amount of space is still needed.
*<p>
* NOTE: eventual maximum size is different, see below
*/
public final static int MAX_CHUNK_RESULT_SIZE = LZFChunk.MAX_HEADER_LEN + LZFChunk.MAX_CHUNK_LEN + (LZFChunk.MAX_CHUNK_LEN * 32 / 31);
public final static int MAX_CHUNK_RESULT_SIZE = LZFChunk.MAX_HEADER_LEN + LZFChunk.MAX_CHUNK_LEN + ((LZFChunk.MAX_CHUNK_LEN + 30) / 31);
// since 1.0.4 (better name that MAX_CHUNK_RESULT_SIZE, same value)
private final static int MAX_CHUNK_WORKSPACE_SIZE = LZFChunk.MAX_HEADER_LEN + LZFChunk.MAX_CHUNK_LEN + ((LZFChunk.MAX_CHUNK_LEN + 30) / 31);
// since 1.0.4
private final static int FULL_UNCOMP_ENCODED_CHUNK = LZFChunk.MAX_HEADER_LEN + LZFChunk.MAX_CHUNK_LEN;
// Static methods only, no point in instantiating
private LZFEncoder() { }
......@@ -49,18 +58,25 @@ public class LZFEncoder
*/
public static int estimateMaxWorkspaceSize(int inputSize)
{
// single chunk; give a rough estimate with +5% (1 + 1/32 + 1/64)
// single chunk; give a rough estimate with +4.6% (1 + 1/32 + 1/64)
// 12-Mar-2017, tatu: as per [compress-lzf#43], rounding down would mess this
// up for small sizes; but effect should go away after sizes of 64 and more,
// before which we may need up to 2 markers
if (inputSize <= LZFChunk.MAX_CHUNK_LEN) {
return LZFChunk.MAX_HEADER_LEN + inputSize + (inputSize >> 5) + (inputSize >> 6);
return LZFChunk.MAX_HEADER_LEN + 2 + inputSize + (inputSize >> 5) + (inputSize >> 6);
}
// one more special case, 2 chunks
inputSize -= LZFChunk.MAX_CHUNK_LEN;
if (inputSize <= LZFChunk.MAX_CHUNK_LEN) { // uncompressed chunk actually has 5 byte header but
return MAX_CHUNK_RESULT_SIZE + inputSize + LZFChunk.MAX_HEADER_LEN;
return MAX_CHUNK_WORKSPACE_SIZE + (LZFChunk.MAX_HEADER_LEN + inputSize);
}
// check number of chunks we should be creating (assuming use of full chunks)
int chunkCount = 1 + ((inputSize + (LZFChunk.MAX_CHUNK_LEN-1)) / LZFChunk.MAX_CHUNK_LEN);
return MAX_CHUNK_RESULT_SIZE + chunkCount * (LZFChunk.MAX_CHUNK_LEN + LZFChunk.MAX_HEADER_LEN);
// check number of full chunks we should be creating:
int chunkCount = inputSize / LZFChunk.MAX_CHUNK_LEN;
inputSize -= chunkCount * LZFChunk.MAX_CHUNK_LEN; // will now be remainders
// So: first chunk has type marker, rest not, but for simplicity assume as if they all
// could. But take into account that last chunk is smaller
return MAX_CHUNK_WORKSPACE_SIZE + (chunkCount * FULL_UNCOMP_ENCODED_CHUNK)
+ (LZFChunk.MAX_HEADER_LEN + inputSize);
}
/*
......@@ -121,6 +137,36 @@ public class LZFEncoder
return result;
}
/**
* Method for compressing given input data using LZF encoding and
* block structure (compatible with lzf command line utility).
* Result consists of a sequence of chunks.
*<p>
* Note that {@link ChunkEncoder} instance used is one produced by
* {@link ChunkEncoderFactory#optimalInstance}, which typically
* is "unsafe" instance if one can be used on current JVM.
*/
public static byte[] encode(byte[] data, int offset, int length, BufferRecycler bufferRecycler)
{
ChunkEncoder enc = ChunkEncoderFactory.optimalInstance(length, bufferRecycler);
byte[] result = encode(enc, data, offset, length);
enc.close(); // important for buffer reuse!
return result;
}
/**
* Method that will use "safe" {@link ChunkEncoder}, as produced by
* {@link ChunkEncoderFactory#safeInstance}, for encoding. Safe here
* means that it does not use any non-compliant features beyond core JDK.
*/
public static byte[] safeEncode(byte[] data, int offset, int length, BufferRecycler bufferRecycler)
{
ChunkEncoder enc = ChunkEncoderFactory.safeInstance(length, bufferRecycler);
byte[] result = encode(enc, data, offset, length);
enc.close();
return result;
}
/**
* Compression method that uses specified {@link ChunkEncoder} for actual
* encoding.
......@@ -207,6 +253,36 @@ public class LZFEncoder
/**
* Alternate version that accepts pre-allocated output buffer.
*<p>
* Note that {@link ChunkEncoder} instance used is one produced by
* {@link ChunkEncoderFactory#optimalNonAllocatingInstance}, which typically
* is "unsafe" instance if one can be used on current JVM.
*/
public static int appendEncoded(byte[] input, int inputPtr, int inputLength,
byte[] outputBuffer, int outputPtr, BufferRecycler bufferRecycler) {
ChunkEncoder enc = ChunkEncoderFactory.optimalNonAllocatingInstance(inputLength, bufferRecycler);
int len = appendEncoded(enc, input, inputPtr, inputLength, outputBuffer, outputPtr);
enc.close();
return len;
}
/**
* Alternate version that accepts pre-allocated output buffer.
*<p>
* Method that will use "safe" {@link ChunkEncoder}, as produced by
* {@link ChunkEncoderFactory#safeInstance}, for encoding. Safe here
* means that it does not use any non-compliant features beyond core JDK.
*/
public static int safeAppendEncoded(byte[] input, int inputPtr, int inputLength,
byte[] outputBuffer, int outputPtr, BufferRecycler bufferRecycler) {
ChunkEncoder enc = ChunkEncoderFactory.safeNonAllocatingInstance(inputLength, bufferRecycler);
int len = appendEncoded(enc, input, inputPtr, inputLength, outputBuffer, outputPtr);
enc.close();
return len;
}
/**
* Alternate version that accepts pre-allocated output buffer.
*/
public static int appendEncoded(ChunkEncoder enc, byte[] input, int inputPtr, int inputLength,
byte[] outputBuffer, int outputPtr)
......
......@@ -83,7 +83,7 @@ public class LZFInputStream extends InputStream
public LZFInputStream(final ChunkDecoder decoder, final InputStream in)
throws IOException
{
this(decoder, in, false);
this(decoder, in, BufferRecycler.instance(), false);
}
/**
......@@ -94,21 +94,45 @@ public class LZFInputStream extends InputStream
*/
public LZFInputStream(final InputStream in, boolean fullReads) throws IOException
{
this(ChunkDecoderFactory.optimalInstance(), in, fullReads);
this(ChunkDecoderFactory.optimalInstance(), in, BufferRecycler.instance(), fullReads);
}
public LZFInputStream(final ChunkDecoder decoder, final InputStream in, boolean fullReads)
throws IOException
{
this(decoder, in, BufferRecycler.instance(), fullReads);
}
public LZFInputStream(final InputStream inputStream, final BufferRecycler bufferRecycler) throws IOException
{
this(inputStream, bufferRecycler, false);
}
/**
* @param in Underlying input stream to use
* @param fullReads Whether {@link #read(byte[])} should try to read exactly
* as many bytes as requested (true); or just however many happen to be
* available (false)
* @param bufferRecycler Buffer recycler instance, for usages where the
* caller manages the recycler instances
*/
public LZFInputStream(final InputStream in, final BufferRecycler bufferRecycler, boolean fullReads) throws IOException
{
this(ChunkDecoderFactory.optimalInstance(), in, bufferRecycler, fullReads);
}
public LZFInputStream(final ChunkDecoder decoder, final InputStream in, final BufferRecycler bufferRecycler, boolean fullReads)
throws IOException
{
super();
_decoder = decoder;
_recycler = BufferRecycler.instance();
_recycler = bufferRecycler;
_inputStream = in;
_inputStreamClosed = false;
_cfgFullReads = fullReads;
_inputBuffer = _recycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = _recycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_inputBuffer = bufferRecycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = bufferRecycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
}
/**
......
......@@ -28,7 +28,7 @@ import com.ning.compress.lzf.util.ChunkEncoderFactory;
*/
public class LZFOutputStream extends FilterOutputStream implements WritableByteChannel
{
private static final int OUTPUT_BUFFER_SIZE = LZFChunk.MAX_CHUNK_LEN;
private static final int DEFAULT_OUTPUT_BUFFER_SIZE = LZFChunk.MAX_CHUNK_LEN;
private final ChunkEncoder _encoder;
private final BufferRecycler _recycler;
......@@ -58,15 +58,34 @@ public class LZFOutputStream extends FilterOutputStream implements WritableByteC
public LZFOutputStream(final OutputStream outputStream)
{
this(ChunkEncoderFactory.optimalInstance(OUTPUT_BUFFER_SIZE), outputStream);
this(ChunkEncoderFactory.optimalInstance(DEFAULT_OUTPUT_BUFFER_SIZE), outputStream);
}
public LZFOutputStream(final ChunkEncoder encoder, final OutputStream outputStream)
{
this(encoder, outputStream, DEFAULT_OUTPUT_BUFFER_SIZE, encoder._recycler);
}
public LZFOutputStream(final OutputStream outputStream, final BufferRecycler bufferRecycler)
{
this(ChunkEncoderFactory.optimalInstance(bufferRecycler), outputStream, bufferRecycler);
}
public LZFOutputStream(final ChunkEncoder encoder, final OutputStream outputStream, final BufferRecycler bufferRecycler)
{
this(encoder, outputStream, DEFAULT_OUTPUT_BUFFER_SIZE, bufferRecycler);
}
public LZFOutputStream(final ChunkEncoder encoder, final OutputStream outputStream,
final int bufferSize, BufferRecycler bufferRecycler)
{
super(outputStream);
_encoder = encoder;
_recycler = BufferRecycler.instance();
_outputBuffer = _recycler.allocOutputBuffer(OUTPUT_BUFFER_SIZE);
if (bufferRecycler==null) {
bufferRecycler = _encoder._recycler;
}
_recycler = bufferRecycler;
_outputBuffer = bufferRecycler.allocOutputBuffer(bufferSize);
_outputStreamClosed = false;
}
......
......@@ -109,14 +109,23 @@ public class LZFUncompressor extends Uncompressor
*/
public LZFUncompressor(DataHandler handler) {
this(handler, ChunkDecoderFactory.optimalInstance());
this(handler, ChunkDecoderFactory.optimalInstance(), BufferRecycler.instance());
}
public LZFUncompressor(DataHandler handler, BufferRecycler bufferRecycler) {
this(handler, ChunkDecoderFactory.optimalInstance(), bufferRecycler);
}
public LZFUncompressor(DataHandler handler, ChunkDecoder dec)
{
this(handler, dec, BufferRecycler.instance());
}
public LZFUncompressor(DataHandler handler, ChunkDecoder dec, BufferRecycler bufferRecycler)
{
_handler = handler;
_decoder = dec;
_recycler = BufferRecycler.instance();
_recycler = bufferRecycler;
}
/*
......
package com.ning.compress.lzf.impl;
import com.ning.compress.BufferRecycler;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
......@@ -44,6 +45,14 @@ public abstract class UnsafeChunkEncoder
super(totalLength, bogus);
}
public UnsafeChunkEncoder(int totalLength, BufferRecycler bufferRecycler) {
super(totalLength, bufferRecycler);
}
public UnsafeChunkEncoder(int totalLength, BufferRecycler bufferRecycler, boolean bogus) {
super(totalLength, bufferRecycler, bogus);
}
/*
///////////////////////////////////////////////////////////////////////
// Shared helper methods
......
package com.ning.compress.lzf.impl;
import com.ning.compress.BufferRecycler;
import com.ning.compress.lzf.LZFChunk;
/**
......@@ -16,6 +17,14 @@ public final class UnsafeChunkEncoderBE
public UnsafeChunkEncoderBE(int totalLength, boolean bogus) {
super(totalLength, bogus);
}
public UnsafeChunkEncoderBE(int totalLength, BufferRecycler bufferRecycler) {
super(totalLength, bufferRecycler);
}
public UnsafeChunkEncoderBE(int totalLength, BufferRecycler bufferRecycler, boolean bogus) {
super(totalLength, bufferRecycler, bogus);
}
@Override
protected int tryCompress(byte[] in, int inPos, int inEnd, byte[] out, int outPos)
......@@ -120,7 +129,7 @@ public final class UnsafeChunkEncoderBE
long l1 = unsafe.getLong(in, BYTE_ARRAY_OFFSET + ptr1);
long l2 = unsafe.getLong(in, BYTE_ARRAY_OFFSET + ptr2);
if (l1 != l2) {
return ptr1 - base + (Long.numberOfLeadingZeros(l1 ^ l2) >> 3);
return ptr1 - base + _leadingBytes(l1, l2);
}
ptr1 += 8;
ptr2 += 8;
......@@ -133,7 +142,15 @@ public final class UnsafeChunkEncoderBE
return ptr1 - base; // i.e.
}
/* With Big-Endian, in-memory layout is "natural", so what we consider
* leading is also leading for in-register.
*/
private final static int _leadingBytes(int i1, int i2) {
return (Long.numberOfLeadingZeros(i1 ^ i2) >> 3);
return Integer.numberOfLeadingZeros(i1 ^ i2) >> 3;
}
private final static int _leadingBytes(long l1, long l2) {
return Long.numberOfLeadingZeros(l1 ^ l2) >> 3;
}
}
package com.ning.compress.lzf.impl;
import com.ning.compress.BufferRecycler;
import com.ning.compress.lzf.LZFChunk;
/**
......@@ -17,6 +18,14 @@ public class UnsafeChunkEncoderLE
super(totalLength, bogus);
}
public UnsafeChunkEncoderLE(int totalLength, BufferRecycler bufferRecycler) {
super(totalLength, bufferRecycler);
}
public UnsafeChunkEncoderLE(int totalLength, BufferRecycler bufferRecycler, boolean bogus) {
super(totalLength, bufferRecycler, bogus);
}
@Override
protected int tryCompress(byte[] in, int inPos, int inEnd, byte[] out, int outPos)
{
......@@ -122,7 +131,7 @@ public class UnsafeChunkEncoderLE
long l1 = unsafe.getLong(in, BYTE_ARRAY_OFFSET + ptr1);
long l2 = unsafe.getLong(in, BYTE_ARRAY_OFFSET + ptr2);
if (l1 != l2) {
return ptr1 - base + (Long.numberOfTrailingZeros(l1 ^ l2) >> 3);
return ptr1 - base + _leadingBytes(l1, l2);
}
ptr1 += 8;
ptr2 += 8;
......@@ -135,7 +144,16 @@ public class UnsafeChunkEncoderLE
return ptr1 - base; // i.e.
}
/* With Little-Endian, in-memory layout is reverse of what we expect for
* in-register, so we either have to reverse bytes, or, simpler,
* calculate trailing zeroes instead.
*/
private final static int _leadingBytes(int i1, int i2) {
return (Long.numberOfTrailingZeros(i1 ^ i2) >> 3);
return Integer.numberOfTrailingZeros(i1 ^ i2) >> 3;
}
private final static int _leadingBytes(long l1, long l2) {
return Long.numberOfTrailingZeros(l1 ^ l2) >> 3;
}
}
......@@ -11,6 +11,7 @@
package com.ning.compress.lzf.impl;
import com.ning.compress.BufferRecycler;
import java.nio.ByteOrder;
......@@ -39,4 +40,18 @@ public final class UnsafeChunkEncoders
}
return new UnsafeChunkEncoderBE(totalLength, false);
}
public static UnsafeChunkEncoder createEncoder(int totalLength, BufferRecycler bufferRecycler) {
if (LITTLE_ENDIAN) {
return new UnsafeChunkEncoderLE(totalLength, bufferRecycler);
}
return new UnsafeChunkEncoderBE(totalLength, bufferRecycler);
}
public static UnsafeChunkEncoder createNonAllocatingEncoder(int totalLength, BufferRecycler bufferRecycler) {
if (LITTLE_ENDIAN) {
return new UnsafeChunkEncoderLE(totalLength, bufferRecycler, false);
}
return new UnsafeChunkEncoderBE(totalLength, bufferRecycler, false);
}
}
package com.ning.compress.lzf.impl;
import com.ning.compress.BufferRecycler;
import com.ning.compress.lzf.ChunkEncoder;
import com.ning.compress.lzf.LZFChunk;
......@@ -22,10 +23,31 @@ public class VanillaChunkEncoder
super(totalLength, bogus);
}
/**
* @param totalLength Total encoded length; used for calculating size
* of hash table to use
* @param bufferRecycler The BufferRecycler instance
*/
public VanillaChunkEncoder(int totalLength, BufferRecycler bufferRecycler) {
super(totalLength, bufferRecycler);
}
/**
* Alternate constructor used when we want to avoid allocation encoding
* buffer, in cases where caller wants full control over allocations.
*/
protected VanillaChunkEncoder(int totalLength, BufferRecycler bufferRecycler, boolean bogus) {
super(totalLength, bufferRecycler, bogus);
}
public static VanillaChunkEncoder nonAllocatingEncoder(int totalLength) {
return new VanillaChunkEncoder(totalLength, true);
}
public static VanillaChunkEncoder nonAllocatingEncoder(int totalLength, BufferRecycler bufferRecycler) {
return new VanillaChunkEncoder(totalLength, bufferRecycler, true);
}
/*
///////////////////////////////////////////////////////////////////////
// Abstract method implementations
......@@ -44,7 +66,7 @@ public class VanillaChunkEncoder
{
final int[] hashTable = _hashTable;
++outPos; // To leave one byte for literal-length indicator
int seen = first(in, 0); // past 4 bytes we have seen... (last one is LSB)
int seen = first(in, inPos); // past 4 bytes we have seen... (last one is LSB)
int literals = 0;
inEnd -= TAIL_LENGTH;
final int firstPos = inPos; // so that we won't have back references across block boundary
......
......@@ -41,7 +41,7 @@ import com.ning.compress.lzf.LZFChunk;
*/
public class PLZFOutputStream extends FilterOutputStream implements WritableByteChannel
{
private static final int OUTPUT_BUFFER_SIZE = LZFChunk.MAX_CHUNK_LEN;
private static final int DEFAULT_OUTPUT_BUFFER_SIZE = LZFChunk.MAX_CHUNK_LEN;
protected byte[] _outputBuffer;
protected int _position = 0;
......@@ -65,16 +65,20 @@ public class PLZFOutputStream extends FilterOutputStream implements WritableByte
*/
public PLZFOutputStream(final OutputStream outputStream) {
this(outputStream, getNThreads());
this(outputStream, DEFAULT_OUTPUT_BUFFER_SIZE, getNThreads());
}
protected PLZFOutputStream(final OutputStream outputStream, int nThreads) {
this(outputStream, DEFAULT_OUTPUT_BUFFER_SIZE, nThreads);
}
protected PLZFOutputStream(final OutputStream outputStream, final int bufferSize, int nThreads) {
super(outputStream);
_outputStreamClosed = false;
compressExecutor = new ThreadPoolExecutor(nThreads, nThreads, 60L, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>()); // unbounded
((ThreadPoolExecutor)compressExecutor).allowCoreThreadTimeOut(true);
writeExecutor = Executors.newSingleThreadExecutor(); // unbounded
blockManager = new BlockManager(nThreads * 2, OUTPUT_BUFFER_SIZE); // this is where the bounds will be enforced!
blockManager = new BlockManager(nThreads * 2, bufferSize); // this is where the bounds will be enforced!
_outputBuffer = blockManager.getBlockFromPool();
}
......
package com.ning.compress.lzf.util;
import com.ning.compress.BufferRecycler;
import com.ning.compress.lzf.ChunkEncoder;
import com.ning.compress.lzf.LZFChunk;
import com.ning.compress.lzf.impl.UnsafeChunkEncoders;
......@@ -34,6 +35,8 @@ public class ChunkEncoderFactory
* this method as implementations are dynamically loaded; however, on some
* non-standard platforms it may be necessary to either directly load
* instances, or use {@link #safeInstance}.
*
* <p/>Uses a ThreadLocal soft-referenced BufferRecycler instance.
*
* @param totalLength Expected total length of content to compress; only matters
* for content that is smaller than maximum chunk size (64k), to optimize
......@@ -50,6 +53,8 @@ public class ChunkEncoderFactory
/**
* Factory method for constructing encoder that is always passed buffer
* externally, so that it will not (nor need) allocate encoding buffer.
* <p>
* Uses a ThreadLocal soft-referenced BufferRecycler instance.
*/
public static ChunkEncoder optimalNonAllocatingInstance(int totalLength) {
try {
......@@ -68,9 +73,12 @@ public class ChunkEncoderFactory
public static ChunkEncoder safeInstance() {
return safeInstance(LZFChunk.MAX_CHUNK_LEN);
}
/**
* Method that can be used to ensure that a "safe" compressor instance is loaded.
* Safe here means that it should work on any and all Java platforms.
* <p>
* Uses a ThreadLocal soft-referenced BufferRecycler instance.
*
* @param totalLength Expected total length of content to compress; only matters
* for content that is smaller than maximum chunk size (64k), to optimize
......@@ -83,8 +91,81 @@ public class ChunkEncoderFactory
/**
* Factory method for constructing encoder that is always passed buffer
* externally, so that it will not (nor need) allocate encoding buffer.
*<p>Uses a ThreadLocal soft-referenced BufferRecycler instance.
*/
public static ChunkEncoder safeNonAllocatingInstance(int totalLength) {
return VanillaChunkEncoder.nonAllocatingEncoder(totalLength);
}
/**
* Convenience method, equivalent to:
*<code>
* return optimalInstance(LZFChunk.MAX_CHUNK_LEN, bufferRecycler);
*</code>
*/
public static ChunkEncoder optimalInstance(BufferRecycler bufferRecycler) {
return optimalInstance(LZFChunk.MAX_CHUNK_LEN, bufferRecycler);
}
/**
* Method to use for getting compressor instance that uses the most optimal
* available methods for underlying data access. It should be safe to call
* this method as implementations are dynamically loaded; however, on some
* non-standard platforms it may be necessary to either directly load
* instances, or use {@link #safeInstance}.
*
* @param totalLength Expected total length of content to compress; only matters
* for content that is smaller than maximum chunk size (64k), to optimize
* encoding hash tables
* @param bufferRecycler The BufferRecycler instance
*/
public static ChunkEncoder optimalInstance(int totalLength, BufferRecycler bufferRecycler) {
try {
return UnsafeChunkEncoders.createEncoder(totalLength, bufferRecycler);
} catch (Exception e) {
return safeInstance(totalLength, bufferRecycler);
}
}
/**
* Factory method for constructing encoder that is always passed buffer
* externally, so that it will not (nor need) allocate encoding buffer.
*/
public static ChunkEncoder optimalNonAllocatingInstance(int totalLength, BufferRecycler bufferRecycler) {
try {
return UnsafeChunkEncoders.createNonAllocatingEncoder(totalLength, bufferRecycler);
} catch (Exception e) {
return safeNonAllocatingInstance(totalLength, bufferRecycler);
}
}
/**
* Convenience method, equivalent to:
*<code>
* return safeInstance(LZFChunk.MAX_CHUNK_LEN, bufferRecycler);
*</code>
*/
public static ChunkEncoder safeInstance(BufferRecycler bufferRecycler) {
return safeInstance(LZFChunk.MAX_CHUNK_LEN, bufferRecycler);
}
/**
* Method that can be used to ensure that a "safe" compressor instance is loaded.
* Safe here means that it should work on any and all Java platforms.
*
* @param totalLength Expected total length of content to compress; only matters
* for content that is smaller than maximum chunk size (64k), to optimize
* encoding hash tables
* @param bufferRecycler The BufferRecycler instance
*/
public static ChunkEncoder safeInstance(int totalLength, BufferRecycler bufferRecycler) {
return new VanillaChunkEncoder(totalLength, bufferRecycler);
}
/**
* Factory method for constructing encoder that is always passed buffer
* externally, so that it will not (nor need) allocate encoding buffer.
*/
public static ChunkEncoder safeNonAllocatingInstance(int totalLength, BufferRecycler bufferRecycler) {
return VanillaChunkEncoder.nonAllocatingEncoder(totalLength, bufferRecycler);
}
}
......@@ -77,47 +77,62 @@ public class LZFFileInputStream
*/
public LZFFileInputStream(File file) throws FileNotFoundException {
this(file, ChunkDecoderFactory.optimalInstance());
this(file, ChunkDecoderFactory.optimalInstance(), BufferRecycler.instance());
}
public LZFFileInputStream(FileDescriptor fdObj) {
this(fdObj, ChunkDecoderFactory.optimalInstance());
this(fdObj, ChunkDecoderFactory.optimalInstance(), BufferRecycler.instance());
}
public LZFFileInputStream(String name) throws FileNotFoundException {
this(name, ChunkDecoderFactory.optimalInstance());
this(name, ChunkDecoderFactory.optimalInstance(), BufferRecycler.instance());
}
public LZFFileInputStream(File file, ChunkDecoder decompressor) throws FileNotFoundException
{
this(file, decompressor, BufferRecycler.instance());
}
public LZFFileInputStream(FileDescriptor fdObj, ChunkDecoder decompressor)
{
this(fdObj, decompressor, BufferRecycler.instance());
}
public LZFFileInputStream(String name, ChunkDecoder decompressor) throws FileNotFoundException
{
this(name, decompressor, BufferRecycler.instance());
}
public LZFFileInputStream(File file, ChunkDecoder decompressor, BufferRecycler bufferRecycler) throws FileNotFoundException
{
super(file);
_decompressor = decompressor;
_recycler = BufferRecycler.instance();
_recycler = bufferRecycler;
_inputStreamClosed = false;
_inputBuffer = _recycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = _recycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_inputBuffer = bufferRecycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = bufferRecycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_wrapper = new Wrapper();
}
public LZFFileInputStream(FileDescriptor fdObj, ChunkDecoder decompressor)
public LZFFileInputStream(FileDescriptor fdObj, ChunkDecoder decompressor, BufferRecycler bufferRecycler)
{
super(fdObj);
_decompressor = decompressor;
_recycler = BufferRecycler.instance();
_recycler = bufferRecycler;
_inputStreamClosed = false;
_inputBuffer = _recycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = _recycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_inputBuffer = bufferRecycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = bufferRecycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_wrapper = new Wrapper();
}
public LZFFileInputStream(String name, ChunkDecoder decompressor) throws FileNotFoundException
public LZFFileInputStream(String name, ChunkDecoder decompressor, BufferRecycler bufferRecycler) throws FileNotFoundException
{
super(name);
_decompressor = decompressor;
_recycler = BufferRecycler.instance();
_recycler = bufferRecycler;
_inputStreamClosed = false;
_inputBuffer = _recycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = _recycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_inputBuffer = bufferRecycler.allocInputBuffer(LZFChunk.MAX_CHUNK_LEN);
_decodedBytes = bufferRecycler.allocDecodeBuffer(LZFChunk.MAX_CHUNK_LEN);
_wrapper = new Wrapper();
}
......