You are here

Asymmetric numeral systems - entropy coding replacing Huffman and range coding

1 post / 0 new
Jarek Duda
Asymmetric numeral systems - entropy coding replacing Huffman and range coding

ANS is a new family of large alphabet accurate entropy coders (compression ratio like arithmetic coding), which offer many times better speeds than range coding - even faster than Huffman (PCS article).
The most interesting variants are:
- tANS (tabled) - decoding step is very similar to Huffman (no multiplication), but it can operate on fractional bits (no longer limited to probabilities being a power of 1/2), e.g. https://github.com/Cyan4973/FiniteStateEntropy
- tANS (range) - analogous to range coding, but using single multiplication per symbol instead of two, has single number state instead of two (perfect for SIMD), has more convenient frequency handling (perfect for adaptive coding), e.g.: https://github.com/rygorous/ryg_rans

Used for example in
- Apple LZFSE = Lempel-Ziv + Finite State Entropy (tANS) for iOS9 - 3x faster, better compression than zlib,
- CRAM 3.0 of European Bioinformatics Institute: http://www.htslib.org/benchmarks/CRAM.html
and many others like Oodle LZNA (rANS), lzturbo (tANS), zhuff (tANS), ZSTD (tANS), LZA (rANS): http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-impleme...

Here are some compressed sizes, decoding, encoding times (ns/byte) for enwiki8 from http://mattmahoney.net/dc/text.html :
LZA 0.82b –mx9 –b7 –h7 26,396,613 449 9.7
lzturbo 1.2 –39 –b24 26,915,461 582 2.8
WinRAR 5.00 –ma5 –m5 27,835,431 1004 31
WinRAR 5.00 –ma5 –m2 29,758,785 228 30
lzturbo 1.2 –32 30,979,376 19 2.7
zhuff 0.97 –c2 34,907,478 24 3.5
gzip 1.3.5 –9 36,445,248 101 17
pkzip 2.0.4 –ex 36,556,552 171 50
ZSTD 0.0.1 40,024,854 7.7 3.8
WinRAR 5.00 –ma5 –m1 40,565,268 54 31
All these compressors can be downloaded, ZSTD is open source ( https://github.com/Cyan4973/zstd ).

ANS (especially tANS) could be also used in FPGA or ASIC as cheap better ratio replacement for prefix codes - I wanted to propose a discussion about possibilities of using it to improve some current telecommunication protocols?