• Show log

    Commit

  • Hash : 33859880
    Author : DRC
    Date : 2020-11-13T12:12:47

    Neon: Auto-detect compiler intrinsics completeness
    
    This allows the Neon intrinsics code to be built successfully (albeit
    likely with reduced run-time performance) with Xcode 5.0-6.2
    (iOS/AArch64) and Android NDK < r19 (AArch32).  Note that Xcode 5.0-6.2
    will not build the Armv8 GAS code without gas-preprocessor.pl, and no
    version of Xcode will build the Armv7 GAS code without
    gas-preprocessor.pl, so we always use the full Neon intrinsics
    implementation by default with macOS and iOS builds.
    
    Auto-detecting the completeness of the compiler's set of Neon intrinsics
    also allows us to more intelligently set the default value of
    NEON_INTRINSICS, based on the values of HAVE_VLD1*.  This is a
    reasonable, albeit imperfect, proxy for whether a compiler has a full
    and optimal set of Neon intrinsics.  Specific notes:
    
      - 64-bit RGB-to-YCbCr color conversion
        does not use any of the intrinsics in question, regresses with GCC
      - 64-bit accurate integer forward DCT
        uses vld1_s16_x3(), regresses with GCC
      - 64-bit Huffman encoding
        uses vld1q_u8_x4(), regresses with GCC
      - 64-bit YCbCr-to-RGB color conversion
        does not use any of the intrinsics in question, regresses with GCC
      - 64-bit accurate integer inverse DCT
        uses vld1_s16_x3(), regresses with GCC
      - 64-bit 4x4 inverse DCT
        uses vld1_s16_x3().  I did not test this algorithm in isolation, so
        it may in fact regress with GCC, but the regression may be hidden by
        the speedup from the new SIMD-accelerated upsampling algorithms.
    
      - 32-bit RGB-to-YCbCr color conversion:
        uses vld1_u16_x2(), regresses with GCC
      - 32-bit accurate integer forward DCT
        uses vld1_s16_x3(), regression irrelevant because there was no
        previous implementation
      - 32-bit accurate integer inverse DCT
        uses vld1_s16_x3(), regresses with GCC
      - 32-bit fast integer inverse DCT
        does not use any of the intrinsics in question, regresses with GCC
      - 32-bit 4x4 inverse DCT
        uses vld1_s16_x3().  I did not test this algorithm in isolation, so
        it may in fact regress with GCC, but the regression may be hidden by
        the speedup from the new SIMD-accelerated upsampling algorithms.
    
    Presumably when GCC includes a full and optimal set of Neon intrinsics,
    the HAVE_VLD1* tests will pass, and the full Neon intrinsics
    implementation will be enabled automatically.
    

  • Properties

  • Git HTTP https://git.kmx.io/kc3-lang/libjpeg-turbo.git
    Git SSH git@git.kmx.io:kc3-lang/libjpeg-turbo.git
    Public access ? public
    Description

    Fork of libjpeg with SIMD

    Users
    thodg_m kc3_lang_org thodg_w www_kmx_io thodg_l thodg
    Tags