• Show log

    Commit

  • Hash : d38b4f21
    Author : DRC
    Date : 2016-01-16T01:53:32

    Optimize ARM64 SIMD code for Cavium ThunderX
    
    Per @ssvb:
    ThunderX is an ARM64 chip that dedicates most of its transistor real
    estate to providing 48 cores, so each core is not as fast as a result.
    Each core is dual-issue & in-order for scalar instructions and has only
    a single-issue half-width NEON unit, so the peak throughput is one
    128-bit instruction per 2 cycles.  So careful instruction scheduling is
    important.  Furthermore, ThunderX has an extremely slow implementation
    of ld2 and ld3, so this commit implements the equivalent of those
    instructions using ld1.
    
    Compression speedup relative to libjpeg-turbo 1.4.2:
    48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 58-85% (avg. 74%)
    relative to jpeg-6b: 1.75-2.14x (avg. 1.95x)
    
    Refer to #49 and #51 for discussion.
    
    Closes #51.
    
    This commit also wordsmiths the ChangeLog entry (the ARMv8 SIMD
    implementation is "complete" only for compression-- it still lacks some
    decompression algorithms, as does the ARMv7 implementation.)
    
    Based on:
    https://github.com/mayeut/libjpeg-turbo/commit/9405b5fd031558113bdfeae193a2b14baa589a75
    
    which is based on:
    https://github.com/libjpeg-turbo/libjpeg-turbo/commit/f561944ff70adef65bb36212913bd28e6a2926d6
    https://github.com/libjpeg-turbo/libjpeg-turbo/commit/962c8ab21feb3d7fc2a7a1ec8d26f6b985bbb86f
    

  • Properties

  • Git HTTP https://git.kmx.io/kc3-lang/libjpeg-turbo.git
    Git SSH git@git.kmx.io:kc3-lang/libjpeg-turbo.git
    Public access ? public
    Description

    Fork of libjpeg with SIMD

    Users
    thodg_m kc3_lang_org thodg_w www_kmx_io thodg_l thodg
    Tags