Snappy (compression)

Not to be confused with Snappii.

Snappy
Original author(s)	Jeff Dean, Sanjay Ghemawat, Steinar H. Gunderson
Developer(s)	Google
Initial release	March 18, 2011 (2011-03-18)

Stable release	1.1.3 / July 6, 2015 (2015-07-06)
Repository	github.com/google/snappy
Development status	Active
Written in	C++
Operating system	Cross-platform
Platform	Portable
Size	2 MB
Type	data compression
License	Apache 2 (up to 1.0.1)/New BSD
Website	http://google.github.io/snappy/

Snappy (previously known as Zippy) is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011.^[1]^[2] It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.^[3]

Snappy is widely used in Google projects like BigTable, MapReduce and in compression data in Google's internal RPC systems. It can be used in open-source projects like Cassandra, Hadoop, LevelDB, MongoDB, RocksDB, Lucene.^[4] Decompression is tested to detect any errors in the compressed stream. Snappy does not use inline assembler and is portable.

Stream format

Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream). The format uses no entropy encoder, like Huffman tree or arithmetic encoder.

The first bytes of the stream are the length of uncompressed data, stored as a little-endian varint, which allows for variable-length encoding. The lower seven bits of each byte are used for data and the high bit is a flag which tells if the next byte is used for the same integer.

The remaining bytes in the stream are encoded using one of four element types. The element type is encoded in the first byte (tag byte) of the element. The two lower bits of this byte is the type code:^[5]

00 – Literal – uncompressed data; upper 6 bits are used to store length of data; if the length of data is more 60 bytes, additional variable-length encoding is added
01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;
10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;
11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;

The copy refers to the dictionary (just-decompressed data). The offset is the shift from the current position back to the already decompressed stream. The length is the number of bytes to copy from the dictionary. The size of the dictionary was limited by the 1.0 Snappy compressor to 32768 bytes, and updated to 65536 in version 1.1.

Example of a compressed stream

The text

Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project.

may be compressed to this, shown as hex data with explanations:

0000000: ca02 f042 5769 6b69 7065 6469 6120 6973  ...BWikipedia is

The first 2 bytes, ca02 are the length, as a little-endian varint (see protocol buffer for the varint specification). Thus the most-significant byte is '02' . 0x02ca(varint) = 0x014a = 330 bytes. The next two bytes, 0xf042, indicate that a literal of 66+1 bytes follows

0000010: 2061 2066 7265 652c 2077 6562 2d62 6173   a free, web-bas
0000020: 6564 2c20 636f 6c6c 6162 6f72 6174 6976  ed, collaborativ
0000030: 652c 206d 756c 7469 6c69 6e67 7561 6c20  e, multilingual
0000040: 656e 6379 636c 6f09 3ff0 8170 726f 6a65  encyclo.?..proje

0x09 is tag-byte of type 01 with length - 4 = 010₂ = 2₁₀ and offset = 0x03f = 63 or "pedia ";
0xf081 is a literal with length of 129+1 bytes

0000050: 6374 2e00 0000 0000 0000 0000 0000 0000  ct.

In this example, all common substrings with four or more characters were eliminated by the compression process. More common compressors can compress this better. Unlike compression methods such as gzip and bzip2, there is no entropy encoding used to pack alphabet into the bit stream.

Interfaces

Snappy distributions include C++ and C bindings. Third party-provided bindings and ports include:^[6]

References

↑ "Google Snappy–A Fast Compressing Library". InfoQ. Retrieved August 1, 2011.
↑ Google open sources MapReduce compression. In the name of speed // The Register, 2011-03-24
↑ "Snappy: A fast compressor/decompressor: Readme". Google Code. Retrieved August 1, 2011. "Snappy vs lzo vs zlib".
↑ snappy. A fast compressor/decompressor - Project page at Google Code
↑ https://github.com/google/snappy/blob/master/format_description.txt
↑ https://google.github.io/snappy/

External links

Snappy mailing list

Data compression methods

Lossless

Entropy type	Unary Arithmetic Asymmetric Numeral Systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Universal Exp-Golomb Fibonacci Gamma Levenshtein

Dictionary type	Byte pair encoding DEFLATE Snappy Lempel–Ziv LZ77 / LZ78 (LZ1 / LZ2) LZJB LZMA LZO LZRW LZS LZSS LZW LZWL LZX LZ4 Brotli Statistical

Other types	BWT CTW Delta DMC MTF PAQ PPM RLE

Audio

Concepts	Bit rate average (ABR) constant (CBR) variable (VBR) Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Sound quality Speech coding Sub-band coding

Codec parts	A-law μ-law ACELP ADPCM CELP DPCM Fourier transform LPC LAR LSP MDCT Psychoacoustic model WLPC

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image

Methods	Chain code DCT EZW Fractal KLT LP RLE SPIHT Wavelet

Video

Concepts	Bit rate average (ABR) constant (CBR) variable (VBR) Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality

Codec parts	Lapped transform DCT Deblocking filter Motion compensation

Theory

Compression formats
Compression software (codecs)

Data compression software

Archivers with
compression
(comparison)

Free software	7-Zip Archive Manager Ark Expander FreeArc Info-ZIP KGB Archiver PAQ PeaZip The Unarchiver (decompression only) tar Xarchiver Zipeg ZPAQ

Freeware	Filzip LHA StuffIt Expander (decompression only) TUGZip ZipGenius

Commercial	ARC ALZip Archive Utility ARJ BetterZip BulkZip JAR MacBinary PKZIP/SecureZIP PowerArchiver StuffIt WinAce WinRAR WinZip

Non-archiving
compressors

Generic	bzip2 compress gzip lzip lzop pack rzip Snappy XZ Utils

For code	UPX

Audio
compression
(comparison)

Lossy	Fraunhofer FDK AAC Nero AAC Codec Freeware Advanced Audio Coder (FAAC) Helix DNA Producer l3enc LAME TooLAME libavcodec libcelt libopus libspeex Musepack libvorbis Windows Media Encoder

Lossless	ALAC FLAC libavcodec Monkey's Audio mp4als OptimFROG Shorten TTA (True Audio) WavPack

Video
compression
(comparison)

Lossy

MPEG-4 ASP	3ivx DivX Nero Digital FFmpeg HDX4 Xvid

H.264 / MPEG-4 AVC	CoreAVC Blu-code DivX FFmpeg Nero Digital OpenH264 QuickTime x264

HEVC	DivX x265

Others	CineForm Cinepak Daala DNxHD Helix DNA Producer Indeo libavcodec Schrödinger (Dirac) SBC Sorenson VP7 libtheora libvpx Windows Media Encoder