• Show log

    Commit

  • Hash : 087c29e0
    Author : DRC
    Date : 2018-10-22T10:05:18

    Optimize Huffman encoding
    
    This commit improves the C and SSE2 Huffman encoding implementations in
    the following ways:
    
    - Avoid using xmm8-xmm15 in the x86-64 SSE2 implementation.  There is no
      actual need to use those registers, and avoiding them produces a
      cleaner WIN64 function entry/exit-- as well as shorter code, since REX
      prefixes can be avoided (this is helpful on certain CPUs, such as
      Intel Atom, for which instruction fetch and decoding can be a
      bottleneck.)
    - Optimize register usage so that fewer REX prefixes and
      register-register moves are needed.
    - Use the bit counter to store the number of free bits in the bit buffer
      rather than the number of bits in the bit buffer.  This changes the
      method for inserting a code into the bit buffer to:
    
      (put_buffer |= code << (free_bits -= code_size));
    
      As a result:
      * Only one bit counter needs to stay in a register (we just keep it in
        cl.)
      * The bit buffer contents are already properly aligned to be written
        out (after a byte swap.)
      * Adjusting the free bits counter and checking if the bit buffer is
        full can be combined into a single operation.
      * We can wait to flush the bit buffer until the buffer is actually
        full and not just in danger of becoming full.  Thus, eight bytes can
        be flushed at a time.
    
    - Speed is quite sensitive to the alignment of branch target labels, so
      insert some padding and remove branches from the flush code.
      (Flushing this way isn't actually faster when compared to using
      branches, but the branchless code doesn't need extra alignment and is
      thus smaller.)
    - Speculatively write out the bit buffer as a single 8-byte write,
      falling back to a byte-by-byte write only if there are any 0xFF bytes
      in the bit buffer that need to be encoded as 0xFF 0x00.
    - Use MMX registers for the 32-bit implementation (so the bit buffer can
      be 64 bits wide.)
    - Slightly reduce overall function code size.
    - Eliminate or combine a few SSE instructions.
    - Make some minor improvements to instruction scheduling.
    - Adjust flush_bits() in jchuff.c to handle cases in which the bit
      buffer has less than 7 free bits (apparently that couldn't happen
      before.)
    
    Based on:
    https://github.com/1camper/libjpeg-turbo/commit/947a09defa2ec848322b1bae050d1b57b316a32a
    https://github.com/1camper/libjpeg-turbo/commit/262ebb6b816fd8a49ff4d7185f6c5153dddde02f
    https://github.com/1camper/libjpeg-turbo/commit/6e9a091221bb244c8ba232a942650e94254ffcf0
    
    See change log for performance claims.
    
    Closes #292
    

  • Properties

  • Git HTTP https://git.kmx.io/kc3-lang/libjpeg-turbo.git
    Git SSH git@git.kmx.io:kc3-lang/libjpeg-turbo.git
    Public access ? public
    Description

    Fork of libjpeg with SIMD

    Users
    thodg_m kc3_lang_org thodg_w www_kmx_io thodg_l thodg
    Tags