|
e69dd40c
|
2024-01-23T13:26:41
|
|
Reorganize source to make things easier to find
- Move all libjpeg documentation, except for README.ijg, into the doc/
subdirectory.
- Move the TurboJPEG C API documentation from doc/html/ into
doc/turbojpeg/.
- Move all C source code and headers into a src/ subdirectory.
- Move turbojpeg-jni.c into the java/ subdirectory.
Referring to #226, there is no ideal solution to this problem. A
semantically ideal solution would have involved placing all source code,
including the SIMD and Java source code, under src/ (or perhaps placing
C library source code under lib/ and C test program source code under
test/), all header files under include/, and all documentation under
doc/. However:
- To me it makes more sense to have separate top-level directories for
each language, since the SIMD extensions and the Java API are
technically optional features. src/ now contains only the code that
is relevant to the core C API libraries and associated programs.
- I didn't want to bury the java/ and simd/ directories or add a level
of depth to them, since both directories already contain source code
that is 3-4 levels deep.
- I would prefer not to separate the header files from the C source
code, because:
1. It would be disruptive. libjpeg and libjpeg-turbo have
historically placed C source code and headers in the same
directory, and people who are familiar with both projects (self
included) are used to looking for the headers in the same directory
as the C source code.
2. In terms of how the headers are used internally in libjpeg-turbo,
the distinction between public and private headers is a bit fuzzy.
- It didn't make sense to separate the test source code from the library
source code, since there is not a clear distinction in some cases.
(For instance, the IJG image I/O functions are used by cjpeg and djpeg
as well as by the TurboJPEG API.)
This solution is minimally disruptive, since it keeps all C source code
and headers together and keeps java/ and simd/ as top-level directories.
It is a bit awkward, because java/ and simd/ technically contain source
code, even though they are not under src/. However, other solutions
would have been more awkward for different reasons.
Closes #226
|
|
98bc3eeb
|
2022-02-24T23:09:58
|
|
Neon/AArch64: Fix/suppress UBSan warnings
- Suppress a UBSan warning regarding storing a 64-bit value to a
non-64-bit-aligned address. That behavior is technically undefined
per the C spec but is supported in the context of the AArch64
architecture and compilers.
- Explicitly promote block_diff[i] to unsigned int prior to left
shifting it, in order to avoid a UBSan warning. This warning also
described behavior that is technically undefined per the C spec but is
supported in the context of the AArch64 architecture and compilers.
Changing the type cast order eliminated the warning without changing
the generated assembly code.
Closes #582
|
|
147548c0
|
2021-09-06T11:31:37
|
|
Neon/AArch64: Accelerate Huffman encoding
- Make better use of 128-bit vector registers, thus reducing the number
of Neon instructions required to construct the AC coefficient bitmap.
- Refactor the Neon computations of 'nbits' and 'diff' to use shorter
and higher-throughput instruction sequences.
DRC's notes:
This commit partially integrates #570. Arm reported a 1-4% speedup on
Cortex-A55 and Neoverse-N1 cores when using recent compilers but little
or no speedup with Clang 10. I observed no speedup with Clang 10 on my
Cortex-A53 and Cortex-A72 cores. Thus, referring to #582, the primary
purpose of this commit is to fix UBSan warnings regarding the shift
operations previously located at Line 253:
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/d640a457305164417a60f30c6457d316f0b44a9d/simd/arm/aarch64/jchuff-neon.c#L253
|
|
74e6ea45
|
2021-01-05T20:23:11
|
|
Neon: Fix Huffman enc. error w/Visual Studio+Clang
The GNU builtin function __builtin_clzl() accepts an unsigned long
argument, which is 8 bytes wide on LP64 systems (most Un*x systems,
including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused
the Neon intrinsics implementation of Huffman encoding to produce
mathematically incorrect results when compiled using Visual Studio with
Clang.
This commit changes all invocations of __builtin_clzl() in the Neon SIMD
extensions to __builtin_clzll(), which accepts an unsigned long long
argument that is guaranteed to be 8 bytes wide on all systems.
Fixes #480
Closes #490
|
|
eb14189c
|
2020-11-17T12:48:49
|
|
Fix Neon SIMD build issues with Visual Studio
- Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for
compile-time detection of Arm builds, since __arm__ and __aarch64__
are only present in GNU-compatible compilers.
- Neon/intrinsics: Use the _CountLeadingZeros() and
_CountLeadingZeros64() intrinsics provided by Visual Studio, since
__builtin_clz() and __builtin_clzl() are only present in
GNU-compatible compilers.
- Neon/intrinsics: Since Visual Studio does not support static vector
initialization, replace static initialization of Neon vectors with the
appropriate intrinsics. Compared to the static initialization
approach, this produces identical assembly code with both GCC and
Clang.
- Neon/intrinsics: Since Visual Studio does not support inline assembly
code, provide alternative code paths for Visual Studio whenever inline
assembly is used.
- Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds
(Visual Studio does not emit fused multiply-add [FMA] instructions by
default for such builds.)
- Neon/intrinsics: Move temporary buffer allocation outside of nested
loops. Since Visual Studio configures Arm builds with a relatively
small amount of stack memory, attempting to allocate those buffers
within the inner loops caused a stack overflow.
Closes #461
Closes #475
|
|
33859880
|
2020-11-13T12:12:47
|
|
Neon: Auto-detect compiler intrinsics completeness
This allows the Neon intrinsics code to be built successfully (albeit
likely with reduced run-time performance) with Xcode 5.0-6.2
(iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2
will not build the Armv8 GAS code without gas-preprocessor.pl, and no
version of Xcode will build the Armv7 GAS code without
gas-preprocessor.pl, so we always use the full Neon intrinsics
implementation by default with macOS and iOS builds.
Auto-detecting the completeness of the compiler's set of Neon intrinsics
also allows us to more intelligently set the default value of
NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a
reasonable, albeit imperfect, proxy for whether a compiler has a full
and optimal set of Neon intrinsics. Specific notes:
- 64-bit RGB-to-YCbCr color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer forward DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit Huffman encoding
uses vld1q_u8_x4(), regresses with GCC
- 64-bit YCbCr-to-RGB color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
- 32-bit RGB-to-YCbCr color conversion:
uses vld1_u16_x2(), regresses with GCC
- 32-bit accurate integer forward DCT
uses vld1_s16_x3(), regression irrelevant because there was no
previous implementation
- 32-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 32-bit fast integer inverse DCT
does not use any of the intrinsics in question, regresses with GCC
- 32-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
Presumably when GCC includes a full and optimal set of Neon intrinsics,
the HAVE_VLD1* tests will pass, and the full Neon intrinsics
implementation will be enabled automatically.
|
|
f3c3f01d
|
2018-09-24T04:35:20
|
|
Neon: Intrinsics impl. of Huffman encoding
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
|