research/dictionary_generator.cc


Log

Author Commit Date CI Message
Eugene Kliuchnikov 8376f72e 2021-11-10T10:34:39 Prepare for copybara (#939) Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
Eugene Kliuchnikov 62662f87 2021-09-08T09:18:45 Strip "./" in includes (#925) Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
Eugene Kliuchnikov 0e42caf3 2021-08-31T14:07:17 Migrate to github actions (#920) Not all combinations are migrated to the initial configuration; corresponding TODOs added. Drive-by: additional combinations uncovered minor portability problems -> fixed Drive-by: remove no-longer used "script" files. Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
Eugene Kliuchnikov f8c67177 2021-06-23T09:40:57 Update (#908) * re-enable Js build/test * improve decoder performance * rewrite dictionary data in Java/Js to a shorter uncompressed form * improve dictionary generation tool
Eugene Kliuchnikov 7f740f13 2020-05-15T11:06:21 Update (#807) - fix formatting - fix type conversion - fix no-op arithmetic with null-pointer - improve performance of hash_longest_match64 - go: detect read after close - java decoder: support compound dictionary - remove executable flag on non-scripts
Eugene Kliuchnikov 4b2b2d4f 2019-04-12T13:57:42 Update (#749) Update: * Bazel: fix MSVC configuration * C: common: extended documentation and helpers around distance codes * C: common: enable BROTLI_DCHECK in "debug" builds * C: common: fix implicit trailing zero in `kPrefixSuffix` * C: dec: fix possible bit reader discharge for "large-window" mode * C: dec: simplify distance decoding via lookup table * C: dec: reuse decoder state members memory via union with lookup table * C: dec: add decoder state diagram * C: enc: clarify access to static dictionary * C: enc: improve static dictionary hash * C: enc: add "stream offset" parameter for parallel encoding * C: enc: reorganize hasher; now Q2-Q3 require exactly 256KiB to avoid global TCMalloc lock * C: enc: fix rare access to uninitialized data in ring-buffer * C: enc: reorganize logging / checks in `write_bits.h` * Java: dec: add "large-window" support * Java: dec: improve speed * Java: dec: debug and 32-bit mode are now activated via system properties * Java: dec: demystify some state variables (use better names) * Dictionary generator: add single input mode * Java: dec: modernize tests * Bazel: js: pick working commit for closure rules
Eugene Kliuchnikov 35e69fc7 2018-02-26T09:04:36 New feature: "Large Window Brotli" (#640) * New feature: "Large Window Brotli" By setting special encoder/decoder flag it is now possible to extend LZ-window up to 30 bits; though produced stream will not be RFC7932 compliant. Added new dictionary generator - "DSH". It combines speed of "Sieve" and quality of "DM". Plus utilities to prepare train corpora (remove unique strings). Improved compression ratio: now two sub-blocks could be stitched: the last copy command could be extended to span the next sub-block. Fixed compression ineffectiveness caused by floating numbers rounding and wrong cost heuristic. Other C changes: - combined / moved `context.h` to `common` - moved transforms to `common` - unified some aspects of code formatting - added an abstraction for encoder (static) dictionary - moved default allocator/deallocator functions to `common` brotli CLI: - window size is auto-adjusted if not specified explicitly Java: - added "eager" decoding both to JNI wrapper and pure decoder - huge speed-up of `DictionaryData` initialization * Add dictionaryless compressed dictionary * Fix `sources.lst` * Fix `sources.lst` and add a note that `libtool` is also required. * Update setup.py * Fix `EagerStreamTest` * Fix BUILD file * Add missing `libdivsufsort` dependency * Fix "unused parameter" warning.
Eugene Kliuchnikov 39ef4bbd 2017-10-13T11:25:03 Add new (fast) dictionary generator engine. (#616) Add CLI for dictionary generation. Add BUILD file for research folder