simd/arm/jcphuff-neon.c

Branch


Log

Author Commit Date CI Message
DRC 4e151a4a 2025-08-26T21:11:07 Remove vestigial filenames from SIMD code headers These were a relic of libjpeg/SIMD, which attempted to follow the conventions of the libjpeg source code, but they are no longer relevant (or even accurate in some cases.)
DRC 98c45838 2025-08-21T11:22:51 Fix issues with Windows Arm64EC builds Arm64EC basically wraps native Arm64 functions with an emulated Windows/x64 ABI, which can improve performance for Windows/x64 applications running under the x64 emulator on Windows/Arm. When building for Arm64EC, the compiler defines _M_X64 and _M_ARM64EC but not _M_ARM64.
DRC e69dd40c 2024-01-23T13:26:41 Reorganize source to make things easier to find - Move all libjpeg documentation, except for README.ijg, into the doc/ subdirectory. - Move the TurboJPEG C API documentation from doc/html/ into doc/turbojpeg/. - Move all C source code and headers into a src/ subdirectory. - Move turbojpeg-jni.c into the java/ subdirectory. Referring to #226, there is no ideal solution to this problem. A semantically ideal solution would have involved placing all source code, including the SIMD and Java source code, under src/ (or perhaps placing C library source code under lib/ and C test program source code under test/), all header files under include/, and all documentation under doc/. However: - To me it makes more sense to have separate top-level directories for each language, since the SIMD extensions and the Java API are technically optional features. src/ now contains only the code that is relevant to the core C API libraries and associated programs. - I didn't want to bury the java/ and simd/ directories or add a level of depth to them, since both directories already contain source code that is 3-4 levels deep. - I would prefer not to separate the header files from the C source code, because: 1. It would be disruptive. libjpeg and libjpeg-turbo have historically placed C source code and headers in the same directory, and people who are familiar with both projects (self included) are used to looking for the headers in the same directory as the C source code. 2. In terms of how the headers are used internally in libjpeg-turbo, the distinction between public and private headers is a bit fuzzy. - It didn't make sense to separate the test source code from the library source code, since there is not a clear distinction in some cases. (For instance, the IJG image I/O functions are used by cjpeg and djpeg as well as by the TurboJPEG API.) This solution is minimally disruptive, since it keeps all C source code and headers together and keeps java/ and simd/ as top-level directories. It is a bit awkward, because java/ and simd/ technically contain source code, even though they are not under src/. However, other solutions would have been more awkward for different reasons. Closes #226
DRC 78a36f6d 2022-11-15T17:01:17 Fix buffer overrun in 12-bit prog Huffman encoder Regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8 and 5b177b3cab5cfb661256c1e74df160158ec6c34e The pre-computed absolute values used in encode_mcu_AC_first() and encode_mcu_AC_refine() were stored in a JCOEF (signed short) array. When attempting to losslessly transform a specially-crafted malformed 12-bit JPEG image with a coefficient value of -32768 into a progressive 12-bit JPEG image, the progressive Huffman encoder attempted to store the absolute value of -32768 in the JCOEF array, thus overflowing the 16-bit signed data type. Therefore, at this point in the code: https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L889 the absolute value was read as -32768, which caused the test at https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L896 to fail, falling through to https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L908 with an overly large value of r (46) that, when shifted left four places, incremented, and passed to emit_symbol(), exceeded the maximum index (255) for the derived code tables. Fortunately, the buffer overrun was fully contained within phuff_entropy_encoder, so the issue did not generate a segfault or other user-visible errant behavior, but it did cause a UBSan failure that was detected by OSS-Fuzz. This commit introduces an unsigned JCOEF (UJCOEF) data type and uses it to store the absolute values of DCT coefficients computed by the AC_first_prepare() and AC_refine_prepare() methods. Note that the changes to the Arm Neon progressive Huffman encoder extensions cause signed 16-bit instructions to be replaced with equivalent unsigned 16-bit instructions, so the changes should be performance-neutral. Based on: https://github.com/mayeut/libjpeg-turbo/commit/bbf61c0382c4f8bd1f1cfc666467581496c2fb7c Closes #628
DRC eb0a024a 2022-10-04T12:51:38 Remove redundant jconfigint.h #includes Because of 607b668ff96e40fdc749de9b1bb98e7f40c86d93, jconfigint.h is included by jinclude.h.
Peter Kasting b201838d 2021-07-10T16:07:05 Neon: Silence -Wimplicit-fallthrough warnings Refer to https://bugs.chromium.org/p/chromium/issues/detail?id=995993 Closes #534
Richard Townsend 74e6ea45 2021-01-05T20:23:11 Neon: Fix Huffman enc. error w/Visual Studio+Clang The GNU builtin function __builtin_clzl() accepts an unsigned long argument, which is 8 bytes wide on LP64 systems (most Un*x systems, including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused the Neon intrinsics implementation of Huffman encoding to produce mathematically incorrect results when compiled using Visual Studio with Clang. This commit changes all invocations of __builtin_clzl() in the Neon SIMD extensions to __builtin_clzll(), which accepts an unsigned long long argument that is guaranteed to be 8 bytes wide on all systems. Fixes #480 Closes #490
Jonathan Wright eb14189c 2020-11-17T12:48:49 Fix Neon SIMD build issues with Visual Studio - Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for compile-time detection of Arm builds, since __arm__ and __aarch64__ are only present in GNU-compatible compilers. - Neon/intrinsics: Use the _CountLeadingZeros() and _CountLeadingZeros64() intrinsics provided by Visual Studio, since __builtin_clz() and __builtin_clzl() are only present in GNU-compatible compilers. - Neon/intrinsics: Since Visual Studio does not support static vector initialization, replace static initialization of Neon vectors with the appropriate intrinsics. Compared to the static initialization approach, this produces identical assembly code with both GCC and Clang. - Neon/intrinsics: Since Visual Studio does not support inline assembly code, provide alternative code paths for Visual Studio whenever inline assembly is used. - Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds (Visual Studio does not emit fused multiply-add [FMA] instructions by default for such builds.) - Neon/intrinsics: Move temporary buffer allocation outside of nested loops. Since Visual Studio configures Arm builds with a relatively small amount of stack memory, attempting to allocate those buffers within the inner loops caused a stack overflow. Closes #461 Closes #475
Jonathan Wright 240ba417 2020-01-07T16:40:32 Neon: Intrinsics impl. of prog. Huffman encoding The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.