|
78a36f6d
|
2022-11-15T17:01:17
|
|
Fix buffer overrun in 12-bit prog Huffman encoder
Regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8 and
5b177b3cab5cfb661256c1e74df160158ec6c34e
The pre-computed absolute values used in encode_mcu_AC_first() and
encode_mcu_AC_refine() were stored in a JCOEF (signed short) array.
When attempting to losslessly transform a specially-crafted malformed
12-bit JPEG image with a coefficient value of -32768 into a progressive
12-bit JPEG image, the progressive Huffman encoder attempted to store
the absolute value of -32768 in the JCOEF array, thus overflowing the
16-bit signed data type. Therefore, at this point in the code:
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L889
the absolute value was read as -32768, which caused the test at
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L896
to fail, falling through to
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L908
with an overly large value of r (46) that, when shifted left four
places, incremented, and passed to emit_symbol(), exceeded the maximum
index (255) for the derived code tables. Fortunately, the buffer
overrun was fully contained within phuff_entropy_encoder, so the issue
did not generate a segfault or other user-visible errant behavior, but
it did cause a UBSan failure that was detected by OSS-Fuzz.
This commit introduces an unsigned JCOEF (UJCOEF) data type and uses it
to store the absolute values of DCT coefficients computed by the
AC_first_prepare() and AC_refine_prepare() methods.
Note that the changes to the Arm Neon progressive Huffman encoder
extensions cause signed 16-bit instructions to be replaced with
equivalent unsigned 16-bit instructions, so the changes should be
performance-neutral.
Based on:
https://github.com/mayeut/libjpeg-turbo/commit/bbf61c0382c4f8bd1f1cfc666467581496c2fb7c
Closes #628
|
|
eb0a024a
|
2022-10-04T12:51:38
|
|
Remove redundant jconfigint.h #includes
Because of 607b668ff96e40fdc749de9b1bb98e7f40c86d93, jconfigint.h is
included by jinclude.h.
|
|
f579cc11
|
2022-10-03T19:46:09
|
|
Make SIMD capability variables thread-local ...
... on platforms that support TLS, which should include all
currently-supported platforms
(https://libjpeg-turbo.org/Documentation/OfficialBinaries)
Addresses a concern raised in #87
Although it is still my opinion that the data race in init_simd() was
innocuous, we can now fix it for free thanks to
ae87a958613b69628b92088b313ded0d4f59a716, so why not?
|
|
9abeff46
|
2022-03-09T11:48:30
|
|
Remove extraneous #include directives
jinclude.h already includes stdio.h, stdlib.h, and string.h.
|
|
98bc3eeb
|
2022-02-24T23:09:58
|
|
Neon/AArch64: Fix/suppress UBSan warnings
- Suppress a UBSan warning regarding storing a 64-bit value to a
non-64-bit-aligned address. That behavior is technically undefined
per the C spec but is supported in the context of the AArch64
architecture and compilers.
- Explicitly promote block_diff[i] to unsigned int prior to left
shifting it, in order to avoid a UBSan warning. This warning also
described behavior that is technically undefined per the C spec but is
supported in the context of the AArch64 architecture and compilers.
Changing the type cast order eliminated the warning without changing
the generated assembly code.
Closes #582
|
|
147548c0
|
2021-09-06T11:31:37
|
|
Neon/AArch64: Accelerate Huffman encoding
- Make better use of 128-bit vector registers, thus reducing the number
of Neon instructions required to construct the AC coefficient bitmap.
- Refactor the Neon computations of 'nbits' and 'diff' to use shorter
and higher-throughput instruction sequences.
DRC's notes:
This commit partially integrates #570. Arm reported a 1-4% speedup on
Cortex-A55 and Neoverse-N1 cores when using recent compilers but little
or no speedup with Clang 10. I observed no speedup with Clang 10 on my
Cortex-A53 and Cortex-A72 cores. Thus, referring to #582, the primary
purpose of this commit is to fix UBSan warnings regarding the shift
operations previously located at Line 253:
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/d640a457305164417a60f30c6457d316f0b44a9d/simd/arm/aarch64/jchuff-neon.c#L253
|
|
607b668f
|
2022-02-10T11:33:49
|
|
MSVC: Eliminate C4996 warnings in API libs
The primary purpose of this is to encourage adoption of libjpeg-turbo in
downstream Windows projects that forbid the use of "deprecated"
functions. libjpeg-turbo's usage of those functions was not actually
unsafe, because:
- libjpeg-turbo always checks the return value of fopen() and ensures
that a NULL filename can never be passed to it.
- libjpeg-turbo always checks the return value of getenv() and never
passes a NULL argument to it.
- The sprintf() calls in format_message() (jerror.c) could never
overflow the destination string buffer or leave it unterminated as
long as the buffer was at least JMSG_LENGTH_MAX bytes in length, as
instructed. (Regardless, this commit replaces those calls with
snprintf() calls.)
- libjpeg-turbo never uses sscanf() to read strings or multi-byte
character arrays.
- Because of b7d6e84d6a9283dc2bc50ef9fcaadc0cdeb25c9f, wrjpgcom
explicitly checks the bounds of the source and destination strings
before calling strcat() and strcpy().
- libjpeg-turbo always ensures that the destination string is
terminated when using strncpy().
(548490fe5e2aa31cb00f6602d5a478b068b99682 made this explicit.)
Regarding thread safety:
Technically speaking, getenv() is not thread-safe, because the returned
pointer may be invalidated if another thread sets the same environment
variable between the time that the first thread calls getenv() and the
time that that thread uses the return value. In practice, however, this
could only occur with libjpeg-turbo if:
(1) A multithreaded calling application used the deprecated and
undocumented TJFLAG_FORCEMMX/TJFLAG_FORCESSE/TJFLAG_FORCESSE2 flags in
the TurboJPEG API or set one of the corresponding environment variables
(which are only intended for testing purposes.) Since the TurboJPEG API
library only ever passed string constants to putenv(), the only inherent
risk (i.e. the only risk introduced by the library and not the calling
application) was that the SIMD extensions may have read an incorrect
value from one of the aforementioned environment variables.
or
(2) A multithreaded calling application modified the value of the
JPEGMEM environment variable in one thread while another thread was
reading the value of that environment variable (in the body of
jpeg_create_compress() or jpeg_create_decompress().) Given that the
libjpeg API provides a thread-safe way for applications to modify the
default memory limit without using the JPEGMEM environment variable,
direct modification of that environment variable by calling applications
is not supported.
Microsoft's implementation of getenv_s() does not claim to be
thread-safe either, so this commit uses getenv_s() solely to mollify
Visual Studio. New inline functions and macros (GETENV_S() and
PUTENV_S) wrap getenv_s()/_putenv_s() when building for Visual Studio
and getenv()/setenv() otherwise, but GETENV_S()/PUTENV_S() provide no
advantages over getenv()/setenv() other than parameter validation. They
are implemented solely for convenience.
Technically speaking, strerror() is not thread-safe, because the
returned pointer may be invalidated if another thread changes the locale
and/or calls strerror() between the time that the first thread calls
strerror() and the time that that thread uses the return value. In
practice, however, this could only occur with libjpeg-turbo if a
multithreaded calling application encountered a file I/O error in
tjLoadImage() or tjSaveImage(). Since both of those functions
immediately copy the string returned from strerror() into a thread-local
buffer, the risk is minimal, and the worst case would involve an
incorrect error string being reported to the calling application.
Regardless, this commit uses strerror_s() in the TurboJPEG API library
when building for Visual Studio. Note that strerror_r() could have been
used on Un*x systems, but it would have been necessary to handle both
the POSIX and GNU implementations of that function and perform
widespread compatibility testing. Such is left as an exercise for
another day.
Fixes #568
|
|
129f0cb7
|
2021-08-25T12:07:58
|
|
Neon/AArch64: Don't put GAS functions in .rodata
Regression introduced by 240ba417aa4b3174850d05ea0d22dbe5f80553c1
Closes #546
|
|
74e6ea45
|
2021-01-05T20:23:11
|
|
Neon: Fix Huffman enc. error w/Visual Studio+Clang
The GNU builtin function __builtin_clzl() accepts an unsigned long
argument, which is 8 bytes wide on LP64 systems (most Un*x systems,
including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused
the Neon intrinsics implementation of Huffman encoding to produce
mathematically incorrect results when compiled using Visual Studio with
Clang.
This commit changes all invocations of __builtin_clzl() in the Neon SIMD
extensions to __builtin_clzll(), which accepts an unsigned long long
argument that is guaranteed to be 8 bytes wide on all systems.
Fixes #480
Closes #490
|
|
eb14189c
|
2020-11-17T12:48:49
|
|
Fix Neon SIMD build issues with Visual Studio
- Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for
compile-time detection of Arm builds, since __arm__ and __aarch64__
are only present in GNU-compatible compilers.
- Neon/intrinsics: Use the _CountLeadingZeros() and
_CountLeadingZeros64() intrinsics provided by Visual Studio, since
__builtin_clz() and __builtin_clzl() are only present in
GNU-compatible compilers.
- Neon/intrinsics: Since Visual Studio does not support static vector
initialization, replace static initialization of Neon vectors with the
appropriate intrinsics. Compared to the static initialization
approach, this produces identical assembly code with both GCC and
Clang.
- Neon/intrinsics: Since Visual Studio does not support inline assembly
code, provide alternative code paths for Visual Studio whenever inline
assembly is used.
- Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds
(Visual Studio does not emit fused multiply-add [FMA] instructions by
default for such builds.)
- Neon/intrinsics: Move temporary buffer allocation outside of nested
loops. Since Visual Studio configures Arm builds with a relatively
small amount of stack memory, attempting to allocate those buffers
within the inner loops caused a stack overflow.
Closes #461
Closes #475
|
|
33859880
|
2020-11-13T12:12:47
|
|
Neon: Auto-detect compiler intrinsics completeness
This allows the Neon intrinsics code to be built successfully (albeit
likely with reduced run-time performance) with Xcode 5.0-6.2
(iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2
will not build the Armv8 GAS code without gas-preprocessor.pl, and no
version of Xcode will build the Armv7 GAS code without
gas-preprocessor.pl, so we always use the full Neon intrinsics
implementation by default with macOS and iOS builds.
Auto-detecting the completeness of the compiler's set of Neon intrinsics
also allows us to more intelligently set the default value of
NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a
reasonable, albeit imperfect, proxy for whether a compiler has a full
and optimal set of Neon intrinsics. Specific notes:
- 64-bit RGB-to-YCbCr color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer forward DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit Huffman encoding
uses vld1q_u8_x4(), regresses with GCC
- 64-bit YCbCr-to-RGB color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
- 32-bit RGB-to-YCbCr color conversion:
uses vld1_u16_x2(), regresses with GCC
- 32-bit accurate integer forward DCT
uses vld1_s16_x3(), regression irrelevant because there was no
previous implementation
- 32-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 32-bit fast integer inverse DCT
does not use any of the intrinsics in question, regresses with GCC
- 32-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
Presumably when GCC includes a full and optimal set of Neon intrinsics,
the HAVE_VLD1* tests will pass, and the full Neon intrinsics
implementation will be enabled automatically.
|
|
141f26ff
|
2018-09-18T18:28:31
|
|
Neon: Intrinsics impl. of 2x2 and 4x4 scaled IDCTs
The previous AArch32 and AArch64 GAS implementations have been removed,
since the intrinsics implementations provide the same or better
performance.
|
|
4574f01f
|
2018-06-28T16:17:36
|
|
Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp
There was no previous GAS implementation.
NOTE: This doesn't produce much of a speedup when using -O3, because -O3
already enables Neon autovectorization, which works well for the scalar
C implementation of plain upsampling. However, the Neon SIMD
implementation will benefit other optimization levels.
|
|
ba52a3de
|
2018-07-19T18:46:24
|
|
Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp
There was no previous GAS implementation.
This commit also reverts 40557b23015d2f8b576420231b8dd1f39f2ceed8 and
7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.
7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57 was only necessary because
there was no Neon implementation of merged upsampling/color conversion,
and 40557b23015d2f8b576420231b8dd1f39f2ceed8 was only necessary because
of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.
|
|
240ba417
|
2020-01-07T16:40:32
|
|
Neon: Intrinsics impl. of prog. Huffman encoding
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
|
|
ed581cd9
|
2019-06-12T18:16:53
|
|
Neon: Intrinsics impl. of accurate int inverse DCT
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using the new
NEON_INTRINSICS CMake variable.
|
|
2c6b68e2
|
2018-09-25T18:20:25
|
|
Neon: Intrinsics impl. of fast integer Inverse DCT
The previous AArch32 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch64 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
|
|
2acfb93c
|
2019-05-08T15:43:26
|
|
Neon: Intrinsics impl. of h1v2 fancy upsamling
There was no previous GAS implementation.
|
|
97530777
|
2018-06-15T11:13:52
|
|
Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp
The previous AArch32 GAS implementation of h2v1 fancy upsampling has
been removed, since the intrinsics implementation provides the same or
better performance. There was no previous GAS implementation of h2v2
fancy upsampling, and there was no previous AArch64 GAS implementation
of h2v1 fancy upsampling.
|
|
5dbd3932
|
2018-08-01T16:52:31
|
|
Neon: Intrinsics implementation of YCbCr->RGB565
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
|
|
0f35cd68
|
2018-07-16T10:25:14
|
|
Neon: Intrinsics implementation of YCbCr->RGB
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
|
|
f3c3f01d
|
2018-09-24T04:35:20
|
|
Neon: Intrinsics impl. of Huffman encoding
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
|
|
d0004de5
|
2018-08-22T13:38:37
|
|
Neon: Intrinsics impl. of accurate int forward DCT
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. There was no previous AArch32 GAS implementation.
|
|
3d84668d
|
2018-08-23T14:22:23
|
|
Neon: Intrinsics impl. of fast integer forward DCT
The previous AArch32 and AArch64 GAS implementations have been removed,
since the intrinsics implementation provides the same or better
performance.
|
|
951d3677
|
2018-08-24T18:04:21
|
|
Neon: Intrinsics impl. of int sample conv./quant.
The previous AArch32 and AArch64 GAS implementations have been removed,
since the intrinsics implementation provides the same or better
performance.
|
|
366168aa
|
2018-08-06T15:14:34
|
|
Neon: Intrinsics impl. of h2v1 & h2v2 downsampling
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
|
|
f73b1dbc
|
2018-08-09T15:08:21
|
|
Neon: Intrinsics implementation of RGB->Grayscale
There was no previous GAS implementation.
|
|
4f2216b4
|
2019-11-26T18:14:33
|
|
Neon: Intrinsics implementation of RGB->YCbCr
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using a new
NEON_INTRINSICS CMake variable.
|