kmx git

Show log
Commit

Hash : c5f269eb
Author :
Date : 2021-09-03T11:52:40

Neon/AArch64: Explicitly unroll quant loop w/Clang

The loop in jsimd_quantize_neon() is only executed twice and should be
unrolled for AArch64 targets.  GCC does that by default, but Clang 11
and later versions available at the time of this writing do not.  This
patch adds an unroll pragma when targetting AArch64 with Clang.  We do
not use the unroll pragma for AArch32 targets, because it causes the
Clang-generated assembly code to exhaust the available Neon registers
(32 x 64-bit) and spill to the stack.  (DRC: Referring to the discussion
in #570, this is likely due to compiler confusion that results in poor
register allocation.  It is possible to eliminate the spillage and
reduce the instruction count by loading the data on a just-in-time
basis, thus explicitly interleaving compute and I/O, but the performance
implications of that are currently unknown.)

The effects of unrolling the quantization loop are:
1) elimination of the loop control flow overhead and
2) enabling the use of LDP/STP instructions that work from a single
   base pointer, instead of using double the number of LDR/STR
   instructions, each requiring an address calculation.

Closes #570

Properties

Git HTTP	https://git.kmx.io/kc3-lang/libjpeg-turbo.git
Git SSH	git@git.kmx.io:kc3-lang/libjpeg-turbo.git
Public access ?	public
Description	Fork of libjpeg with SIMD Upstream Homepage Github Fork Github
Users
Tags

kc3-lang/libjpeg-turbo/simd

Commit

Files

Properties