• Show log

    Commit

  • Hash : c5f269eb
    Author : Jonathan Wright
    Date : 2021-09-03T11:52:40

    Neon/AArch64: Explicitly unroll quant loop w/Clang
    
    The loop in jsimd_quantize_neon() is only executed twice and should be
    unrolled for AArch64 targets.  GCC does that by default, but Clang 11
    and later versions available at the time of this writing do not.  This
    patch adds an unroll pragma when targetting AArch64 with Clang.  We do
    not use the unroll pragma for AArch32 targets, because it causes the
    Clang-generated assembly code to exhaust the available Neon registers
    (32 x 64-bit) and spill to the stack.  (DRC: Referring to the discussion
    in #570, this is likely due to compiler confusion that results in poor
    register allocation.  It is possible to eliminate the spillage and
    reduce the instruction count by loading the data on a just-in-time
    basis, thus explicitly interleaving compute and I/O, but the performance
    implications of that are currently unknown.)
    
    The effects of unrolling the quantization loop are:
    1) elimination of the loop control flow overhead and
    2) enabling the use of LDP/STP instructions that work from a single
       base pointer, instead of using double the number of LDR/STR
       instructions, each requiring an address calculation.
    
    Closes #570
    

  • Properties

  • Git HTTP https://git.kmx.io/kc3-lang/libjpeg-turbo.git
    Git SSH git@git.kmx.io:kc3-lang/libjpeg-turbo.git
    Public access ? public
    Description

    Fork of libjpeg with SIMD

    Users
    thodg_m kc3_lang_org thodg_w www_kmx_io thodg_l thodg
    Tags