kmx git

Commit	Date	Message
4de8f692	2021-04-16T16:34:12	jdhuff.h: Fix ASan regression caused by 8fa70367 The 0xFF is, in fact, necessary.
785ec30e	2021-04-16T15:59:38	cjpeg_fuzzer: Add cov for h2v2 smooth downsampling
d147be83	2021-04-15T23:31:51	Huff decs: Fix/suppress more innocuous UBSan errs - UBSan complained that entropy->restarts_to_go was underflowing an unsigned integer when it was decremented while cinfo->restart_interval == 0. That was, of course, completely innocuous behavior, since the result of the underflowing computation was never used. - d3a3a73f64041c6a6905faf6f9f9832e735fd880 and 7bc9fca4309563d66b0c5665a616285d0e9baeb4 silenced a UBSan signed integer overflow error, but unfortunately other malformed JPEG images have been discovered that cause unsigned integer overflow in the same computation. Since, to the best of our understanding, this behavior is innocuous, this commit reverts the commits listed above, suppresses the UBSan errors, and adds code comments to document the issue.
8fa70367	2021-04-15T22:26:53	Huff dec: Fix non-deterministic output w/bad input Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1050342, there are certain very rare circumstances under which a malformed JPEG image can cause different Huffman decoder output to be produced, depending on the size of the source manager's I/O buffer. (More specifically, the fast Huffman decoder didn't handle invalid codes in the same manner as the slow decoder, and since the fast decoder requires all data to be memory-resident, the buffering strategy determines whether or not the fast decoder can be used on a particular MCU block.) After extensive experimentation, the Mozilla and Chrome developers and I determined that this truly was an innocuous issue. The patch that both browsers adopted as a workaround caused a performance regression with 32-bit code, which is why it was not accepted into libjpeg-turbo. This commit fixes the problem in a less disruptive way with no performance regression.
171b875b	2021-04-15T19:03:53	OSS-Fuzz: Check img size b4 readers allocate mem After the completion of the start_input() method, it's too late to check the image size, because the image readers may have already tried to allocate memory for the image. If the width and height are excessively large, then attempting to allocate memory for the image could slow performance or lead to out-of-memory errors prior to the fuzz target checking the image size. NOTE: Specifically, the aforementioned OOM errors and slow units were observed with the compression fuzz targets when using MSan.
3ab32348	2021-04-13T11:51:29	OSS-Fuzz: More code coverage improvements
3e68a5ee	2021-04-12T14:37:43	jchuff.c: Fix MSan error Certain rare malformed input images can cause the Huffman encoder to generate a value for nbits that corresponds to an uninitialized member of the DC code table. The ramifications of this are minimal and would basically amount to a different bogus JPEG image being generated from a particular bogus input image.
4e451616	2021-04-12T11:53:29	compress_yuv_fuzzer: Minor code coverage tweak
629e96ee	2021-04-12T11:52:55	cjpeg.c: Code formatting tweak
ebaa67ea	2021-04-12T10:38:52	rdbmp.c: Fix more innocuous UBSan errors - Referring to 3311fc00010c6cb305d87525c9ef60ebdf036cfc, we need to use unsigned intermediate math in order to make UBSan happy, even though (JDIMENSION)(A * B) is effectively the same as (JDIMENSION)A *(JDIMENSION)B, regardless of intermediate overflow. - Because of the previous commit, it is now possible for bfOffBits to be INT_MIN, which would cause the initial computation of bPad to underflow a signed integer. Thus, we need to check for that possibility as soon as we know the values of bfOffBits and headerSize. The worst case from this regression is that bPad could wrap around to a large positive value, which would cause a "Premature end of input file" error in the subsequent read_byte() loop. Thus, this issue was effectively innocuous as well, since it resulted in catching the same error later and in a different way. Also, the issue was very well-contained, since it was both introduced and fixed as part of the ongoing OSS-Fuzz integration project.
dd830b3f	2021-04-09T17:36:41	rdbmp.c/rdppm.c: Fix more innocuous UBSan errors - rdbmp.c: Because of 8fb37b81713a0cdc14622dc08892ebd28a3233aa, bfOffBits, biClrUsed, and headerSize were made into unsigned ints. Thus, if bPad would eventually be negative due to a malformed header, UBSan complained about unsigned math being used in the intermediate computations. It was unnecessary to make those variables unsigned, since they are only meant to hold small values, so this commit makes them signed again. The UBSan error was innocuous, because it is effectively (if not officially) the case that (int)((unsigned int)a - (unsigned int)b) == (int)a - (int)b. - rdbmp.c: If (biWidth * source->bits_per_pixel / 8) would overflow an unsigned int, then UBSan complained at the point at which row_width was set in start_input_bmp(), even though the overflow would have been detected later in the function. This commit adds overflow checks prior to setting row_width. - rdppm.c: read_pbm_integer() now bounds-checks the intermediate value computations in order to catch integer overflow caused by a malformed text PPM. It's possible, though extremely unlikely, that the intermediate value computations could have wrapped around to a value smaller than maxval, but the worst case is that this would have generated a bogus pixel in the uncompressed image rather than throwing an error.
4ede2ef5	2021-04-09T17:26:19	OSS-Fuzz: cjpeg fuzz target
5cda8c5e	2021-04-09T13:12:32	compress_yuv_fuzzer: Use unique filename template
47b66d1d	2021-04-09T11:26:34	OSS-Fuzz: Fix UBSan err caused by TJFLAG_FUZZING
55ab0d39	2021-04-08T16:13:06	OSS-Fuzz: YUV encoding/compression fuzz target
18bc4c61	2021-04-07T16:04:58	compress.cc: Code formatting tweak
b1079002	2021-04-07T15:51:05	rdppm.c: Fix innocuous MSan error A fuzzing test case that was effectively a 1-pixel PGM file with a maximum value of 1 and an actual value of 8 caused an uninitialized member of the rescale[] array to be accessed in get_gray_rgb_row() or get_gray_cmyk_row(). Since, for performance reasons, those functions do not perform bounds checking on the PPM values, we need to ensure that unused members of the rescale[] array are initialized.
3311fc00	2021-04-07T14:20:49	rdbmp.c: Fix innocuous UBSan error A fuzzing test case with an image width of 838860946 triggered a UBSan error: rdbmp.c:633:34: runtime error: signed integer overflow: 838860946 * 3 cannot be represented in type 'int' Because the result is cast to an unsigned int (JDIMENSION), this error is irrelevant, because (unsigned int)((int)838860946 * (int)3) == (unsigned int)838860946 * (unsigned int)3
34d264d6	2021-04-07T12:44:50	OSS-Fuzz: Private TurboJPEG API flag for fuzzing This limits the tjLoadImage() behavioral changes to the scope of the compress_fuzzer target. Otherwise, TJBench in fuzzer builds would refuse to load images larger than 1 Mpixel.
f35fd27e	2021-04-06T12:51:03	tjLoadImage: Fix issues w/loading 16-bit PPMs/PGMs - The PPM reader now throws an error rather than segfaulting (due to a buffer overrun) if an application attempts to load a 16-bit PPM file into a grayscale uncompressed image buffer. No known applications allowed that (not even the test applications in libjpeg-turbo), because that mode of operation was never expected to work and did not work under any circumstances. (In fact, it was necessary to modify TJBench in order to reproduce the issue outside of a fuzzing environment.) This was purely a matter of making the library bow out gracefully rather than crash if an application tries to do something really stupid. - The PPM reader now throws an error rather than generating incorrect pixels if an application attempts to load a 16-bit PGM file into an RGB uncompressed image buffer. - The PPM reader now correctly loads 16-bit PPM files into extended RGB uncompressed image buffers. (Previously it generated incorrect pixels unless the input colorspace was JCS_RGB or JCS_EXT_RGB.) The only way that users could have potentially encountered these issues was through the tjLoadImage() function. cjpeg and TJBench were unaffected.
df17d398	2021-04-06T11:34:30	jcphuff.c: -Wjump-misses-init warning w/GCC 9 -m32 (verified that this commit does not change the generated 64-bit or 32-bit assembly code)
cd9a3185	2021-04-05T22:20:52	Bump TurboJPEG C API version to 2.1 (because of TJFLAG_LIMITSCANS)
d2d44655	2021-04-05T21:41:30	OSS-Fuzz: Compression fuzz target
5536ace1	2021-04-05T21:12:29	OSS-Fuzz: Fix C++11 compiler warnings in targets
5dd906be	2021-04-05T17:47:34	OSS-Fuzz: Test non-default opts w/ decompress_yuv The non-default options were not being tested because of a pixel format comparison buglet. This commit also changes the code in both decompression fuzz targets such that non-default options are tested based on the pixel format index rather than the pixel format value, which is a bit more idiot-proof.
c81e91e8	2021-04-05T16:08:22	TurboJPEG: New flag for limiting prog JPEG scans This also fixes timeouts reported by OSS-Fuzz.
bff7959e	2021-04-02T14:53:43	OSS-Fuzz: Require static libraries Refer to https://google.github.io/oss-fuzz/further-reading/fuzzer-environment/#runtime-dependencies for the reasons why this is necessary.
6ad658be	2021-04-02T14:50:35	OSS-Fuzz: Build fuzz targets using C++ compiler Otherwise, the targets will require libstdc++, the i386 version of which is not available in the OSS-Fuzz runtime environment. The OSS-Fuzz build environment passes -stdlib:libc++ in the CXXFLAGS environment variable in order to mitigate this issue, since the runtime environment has the i386 version of libc++, but using that compiler flag requires using the C++ compiler.
7b57cba6	2021-03-31T11:16:51	OSS-Fuzz: Fix uninitialized reads detected by MSan
2f9e8a11	2021-03-29T18:54:12	OSS-Fuzz integration This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo source tree, thus obsoleting and improving code coverage relative to Google's OSS-Fuzz target for libjpeg-turbo (previously available here: https://github.com/google/oss-fuzz). I hope to eventually create fuzz targets for the BMP, GIF, and PPM readers as well, which would allow for fuzz-testing compression, but since those readers all require an input file, it is unclear how to build an efficient fuzzer around them. It doesn't make sense to fuzz-test compression in isolation, because compression can't accept arbitrary input data.
e4ec23d7	2021-02-10T16:45:50	Neon: Use byte-swap builtins instead of inline asm Define compiler-independent byte-swap macros and use them instead of executing 'rev' via inline assembly code with GCC-compatible compilers or a slow shift-store sequence with Visual C++. * This produces identical assembly code with: - 64-bit GCC 8.4.0 (Linux) - 64-bit GCC 9.3.0 (Linux) - 64-bit Clang 10.0.0 (Linux) - 64-bit Clang 10.0.0 (MinGW) - 64-bit Clang 12.0.0 (Xcode 12.2, macOS) - 64-bit Clang 12.0.0 (Xcode 12.2, iOS) * This produces different assembly code with: - 64-bit GCC 4.9.1 (Linux) - 32-bit GCC 4.8.2 (Linux) - 32-bit GCC 8.4.0 (Linux) - 32-bit GCC 9.3.0 (Linux) Since the intrinsics implementation of Huffman encoding is not used by default with these compilers, this is not a concern. - 32-bit Clang 10.0.0 (Linux) Verified performance neutrality Closes #507
e795afc3	2021-03-25T22:36:15	SSE2: Fix prog Huff enc err if Sl%32==0 && Al!=0 (regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8) This implements the same fix for jsimd_encode_mcu_AC_refine_prepare_sse2() that a81a8c137b3f1c65082aa61f236aa88af61b3ad4 implemented for jsimd_encode_mcu_AC_first_prepare_sse2(). Based on: https://github.com/MegaByte/libjpeg-turbo/commit/1a59587397150c9ef9dffc5813cb3891db4bc0c8 https://github.com/MegaByte/libjpeg-turbo/commit/eb176a91d87a470bf8c987be786668aa944dd1dd Fixes #509 Closes #510
2c01200c	2021-03-15T19:56:53	Build: Fix incorrect regexes w/ if(...MATCHES...) "arm*" as a regex means 'ar' followed by zero or more 'm' characters, which matches 'parisc' and 'sparc64' as well.
ed70101d	2021-03-15T12:36:55	ChangeLog.md: List CVE ID fixed by 1719d12e Referring to https://bugzilla.redhat.com/show_bug.cgi?id=1937385#c2, it is my opinion that the severity of this bug was grossly overstated and that a CVE never should have been assigned to it, but since one was assigned, users need to know which version of libjpeg-turbo contains the fix. Dear security community, please learn what "DoS" actually means and stop misusing that term for dramatic effect. Thanks.
8a2cad02	2021-01-21T10:51:49	Build: Handle CMAKE_OSX_ARCHITECTURES=(i386\|ppc) We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo anymore, but they still work (bearing in mind that PowerPC builds require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or earlier.) Referring to #495, apparently MacPorts needs this functionality.
b6772910	2021-01-19T15:32:32	Add Sponsor button for GitHub repository
399aa374	2021-01-19T12:25:11	Build: Support CMAKE_OSX_ARCHITECTURES ... as long as it contains only a singular value, which must equal "x86_64" or "arm64". Refer to #495
1719d12e	2021-01-14T18:35:15	cjpeg: Fix FPE when compressing 0-width GIF Fixes #493
486cdcfb	2021-01-12T17:45:55	Fix build with Visual C++ and /std:c11 or /std:c17 Fixes #481 Closes #482
74e6ea45	2021-01-05T20:23:11	Neon: Fix Huffman enc. error w/Visual Studio+Clang The GNU builtin function __builtin_clzl() accepts an unsigned long argument, which is 8 bytes wide on LP64 systems (most Un*x systems, including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused the Neon intrinsics implementation of Huffman encoding to produce mathematically incorrect results when compiled using Visual Studio with Clang. This commit changes all invocations of __builtin_clzl() in the Neon SIMD extensions to __builtin_clzll(), which accepts an unsigned long long argument that is guaranteed to be 8 bytes wide on all systems. Fixes #480 Closes #490
d2c40799	2020-12-17T16:02:47	Use CLZ compiler intrinsic for Windows/Arm builds The __builtin_clz() compiler intrinsic was already used in the C Huffman encoders when building libjpeg-turbo for Arm CPUs using a GCC-compatible compiler. This commit modifies the C Huffman encoders so that they also use__builtin_clz() when building for Arm CPUs using Visual Studio + Clang, as well as the equivalent _CountLeadingZeros() compiler intrinsic when building for Arm CPUs using Visual C++. In addition to making the C Huffman encoders faster on Windows/Arm, this also prevents jpeg_nbits_table from being included in Windows/Arm builds, thus saving 128 KB of memory.
3e8911aa	2021-01-11T13:56:01	Build: Use correct SIMD exts w/VStudio IDE + Arm64 When configuring a Visual Studio IDE build and passing -A arm64 to CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of CMAKE_SYSTEM_PROCESSOR.
4b838c38	2021-01-11T13:45:25	jcphuff.c: Fix compiler warning with clang-cl Fixes #492
944f5915	2021-01-08T12:41:02	Migrate from Travis CI to GitHub Actions Note that this removes our ability to regression test the Armv8 and PowerPC SIMD extensions, effectively reverting a524b9b06be2e0c24d8abc6528cf29316cfe8dc5 and 02227e48a990911a6da35ab8034911a9fbc1055a, but at the moment, there is no other way.
3179f330	2021-01-04T14:54:35	tjexample.c: Fix mem leak if tjTransform() fails Fixes #479
1388ad67	2020-12-08T21:25:47	Build: Officially support Ninja
110d8d6d	2020-12-07T11:12:49	decompress_smooth_data(): Fix another uninit. read Regression introduced by 42825b68d570fb07fe820ac62ad91017e61e9a25 The test case https://user-images.githubusercontent.com/3491627/101376530-fde56180-38b0-11eb-938d-734119a5b5ba.jpg is a malformed progressive JPEG image containing an interleaved Y/Cb/Cr DC scan followed by two non-interleaved Y DC scans. Thus, the prev_coef_bits[] array was initialized for the Y component but not the other components, the uninitialized values for Cb and Cr were transferred to the prev_coef_bits_latch[] array in smoothing_ok(), and because cinfo->master->last_good_iMCU_row was 0, decompress_smooth_data() read those uninitialized values when attempting to smooth the second iMCU row. Possibly fixes #478
7b687649	2020-12-03T19:15:07	LICENSE.md: Remove trailing whitespace Use <br> to indicate a line break, as we do in README.md, in order to make checkstyle happy.
21d05684	2020-12-03T18:50:08	Build: Test for correct AArch32 RPM/DEBARCH value ... based on the floating point ABI being used by the compiler (which do you choose, a hard or soft option?)
6e4509a3	2020-12-01T09:04:27	LICENSE.md: Formatting tweak
c7ca521b	2020-11-28T06:38:27	Fix uninitialized read in decompress_smooth_data() Regression introduced by 42825b68d570fb07fe820ac62ad91017e61e9a25 Referring to the discussion in #459, the OSS-Fuzz test case https://github.com/libjpeg-turbo/libjpeg-turbo/files/5597075/clusterfuzz-testcase-minimized-pngsave_buffer_fuzzer-5728375846731776.txt created a situation in which cinfo->output_iMCU_row > cinfo->master->last_good_iMCU_row but cinfo->input_scan_number == 1 thus causing decompress_smooth_data() to read from prev_coef_bits_latch[], which was uninitialized. I was unable to create the same situation with a real JPEG image.
ccaba5d7	2020-11-25T14:55:55	Fix buffer overrun with certain narrow prog JPEGs Regression introduced by 6d91e950c871103a11bac2f10c63bf998796c719 last_block_column in decompress_smooth_data() can be 0 if, for instance, decompressing a 4:4:4 image of width 8 or less or a 4:2:2 or 4:2:0 image of width 16 or less. Since last_block_column is an unsigned int, subtracting 1 from it produced 0xFFFFFFFF, the test in line 590 passed, and we attempted to access blocks from a second block column that didn't actually exist. Closes #476
cfc7e6e5	2020-11-25T14:10:55	Bump revision to 2.0.91 for post-beta fixes
4e52b66f	2020-11-24T21:54:42	Travis: Use Docker tag that matches Git branch
8cf6f716	2020-11-24T21:32:48	Bump revision to 2.0.90 to prepare for beta
eb14189c	2020-11-17T12:48:49	Fix Neon SIMD build issues with Visual Studio - Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for compile-time detection of Arm builds, since __arm__ and __aarch64__ are only present in GNU-compatible compilers. - Neon/intrinsics: Use the _CountLeadingZeros() and _CountLeadingZeros64() intrinsics provided by Visual Studio, since __builtin_clz() and __builtin_clzl() are only present in GNU-compatible compilers. - Neon/intrinsics: Since Visual Studio does not support static vector initialization, replace static initialization of Neon vectors with the appropriate intrinsics. Compared to the static initialization approach, this produces identical assembly code with both GCC and Clang. - Neon/intrinsics: Since Visual Studio does not support inline assembly code, provide alternative code paths for Visual Studio whenever inline assembly is used. - Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds (Visual Studio does not emit fused multiply-add [FMA] instructions by default for such builds.) - Neon/intrinsics: Move temporary buffer allocation outside of nested loops. Since Visual Studio configures Arm builds with a relatively small amount of stack memory, attempting to allocate those buffers within the inner loops caused a stack overflow. Closes #461 Closes #475
91dd3b23	2020-11-24T19:22:38	ChangeLog: macOS Armv8/x86-64 univ. binary support
7e0d94d3	2020-11-24T20:31:51	Merge branch 'master' into dev
1c839761	2020-11-24T18:51:16	Force Git to treat testorig.ppm as a binary file Otherwise, because the file begins with an ASCII header, Git will erroneously treat is as an ASCII file, and if Git for Windows is configured with default options (specifically, "Checkout windows-style, commit Unix-style line endings"), it will add carriage return characters to all of the "linefeed" characters in the PPM file, thus corrupting it and causing libjpeg-turbo's regression tests to fail.
6d91e950	2020-10-05T13:37:44	Use 5x5 win & 9 AC coeffs when smoothing DC scans ... of progressive images. Based on: https://github.com/mo271/libjpeg-turbo/commit/be8d36d13b79a472e56da0717ba067e6139bc0e1 https://github.com/mo271/libjpeg-turbo/commit/9d528f278ee3a5ba571c0b9ec4567c557614fb25 https://github.com/mo271/libjpeg-turbo/commit/85f36f0765ea2c28909fc4c0e570cd68d3a1ed85 https://github.com/mo271/libjpeg-turbo/commit/63a4d39e387f61bcb83b393838f436b410b97308 https://github.com/mo271/libjpeg-turbo/commit/51336a6ad5acb9379dc8e3e5e5758fd439224b7c Closes #459 Closes #474
d523435e	2020-11-19T19:30:38	Travis: Use Xcode 12.2 for all iOS & macOS builds There doesn't seem to be any performance or compatibility downside to this, and it has the advantages of simplicity and consistency between the PR and official builds.
1ac83cd6	2020-11-18T18:16:12	Travis: The Mac build log is now log-macos.txt (oversight from f7a10a61e3bbab14d2e901c8823cec4961a46b2f)
0ba70b6a	2020-11-18T15:01:24	Build: Support macOS Armv8/x86-64 univ. binaries - Rename IOS_ARMV8_BUILD to ARMV8_BUILD. - Rename install_ios() to install_subbuild() in makemacpkg. - Wordsmith the build instructions accordingly. - Use xcode12.2 image in Travis CI.
e417033d	2020-11-18T14:13:54	Merge branch 'master' into dev
6d2e8837	2020-11-18T13:25:06	jpeg_skip_scanlines(): Avoid NULL + 0 UBSan error This error occurs at the call to (cinfo->cconvert->color_convert)() in sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE (i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is innocuous, since (cinfo->cconvert->color_convert)() points to a dummy function (noop_convert()) in that case. Fixes #470
f7c54892	2020-11-18T10:11:21	Travis: Add /opt/local/bin to PATH for Mac build (oversight from previous commit) macports-ci does this, and it's necessary in order for the build script to find md5sum.
f7a10a61	2020-11-17T13:51:28	Build: "OS X"/"OSX" = "macOS"/"MACOS" There are no supported versions of "OS X" anymore. The operating system has been named "macOS" since 10.12 Sierra, which was released four years ago.
d111d9ff	2020-11-17T11:54:20	Merge branch 'master' into dev
10ba6ed3	2020-11-16T17:30:37	Travis: Install MacPorts without using macports-ci
292d78e7	2020-11-16T15:28:02	Merge branch 'master' into dev
88bf1d16	2020-11-16T14:38:15	Build: Set FLOATTEST more intelligently The "32bit" vs. "64bit" floating point test results actually have nothing to do with the FPU. That was a fallacious assumption based on the observation that, with multiple CPU types, 32-bit and 64-bit builds produce different floating point test results. It seems that this is, in fact, due to differing compiler behavior-- more specifically, whether fused multiply-add (FMA) instructions are used to combine multiple floating point operations into a single instruction ("floating point expression contraction".) GCC does this by default if the target supports FMA instructions, which PowerPC and AArch64 targets both do. Fixes #468
8f830598	2020-11-13T15:21:26	Merge branch 'master' into dev
42f7c78f	2020-11-13T15:18:35	BUILDING.md: Use min. iOS v8 in iOS Armv8 example This is necessary in order to enable thread-local storage.
33859880	2020-11-13T12:12:47	Neon: Auto-detect compiler intrinsics completeness This allows the Neon intrinsics code to be built successfully (albeit likely with reduced run-time performance) with Xcode 5.0-6.2 (iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2 will not build the Armv8 GAS code without gas-preprocessor.pl, and no version of Xcode will build the Armv7 GAS code without gas-preprocessor.pl, so we always use the full Neon intrinsics implementation by default with macOS and iOS builds. Auto-detecting the completeness of the compiler's set of Neon intrinsics also allows us to more intelligently set the default value of NEON_INTRINSICS, based on the values of HAVE_VLD1. This is a reasonable, albeit imperfect, proxy for whether a compiler has a full and optimal set of Neon intrinsics. Specific notes: - 64-bit RGB-to-YCbCr color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer forward DCT uses vld1_s16_x3(), regresses with GCC - 64-bit Huffman encoding uses vld1q_u8_x4(), regresses with GCC - 64-bit YCbCr-to-RGB color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 64-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. - 32-bit RGB-to-YCbCr color conversion: uses vld1_u16_x2(), regresses with GCC - 32-bit accurate integer forward DCT uses vld1_s16_x3(), regression irrelevant because there was no previous implementation - 32-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 32-bit fast integer inverse DCT does not use any of the intrinsics in question, regresses with GCC - 32-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. Presumably when GCC includes a full and optimal set of Neon intrinsics, the HAVE_VLD1 tests will pass, and the full Neon intrinsics implementation will be enabled automatically.
3e9e7c70	2020-11-11T17:54:06	Fix build if WITH_12BIT==1 && WITH_JPEG(7\|8)==1 Fixes #466
bbd80892	2020-11-10T17:54:14	Neon: Finalize intrinsics implementation - Remove gas-preprocessor.pl. None of the compilers that can build the new intrinsics implementation require gas-preprocessor.pl (tested with Xcode and with Clang 3.9+ for Linux.) - Document that Xcode 6.3.x or later is now required for iOS builds (older versions of Xcode do not have a full set of Neon intrinsics.) - Add a change log entry. - Do not enable the ASM CMake language unless NEON_INTRINSICS is false. - Add a Clang/Arm64 test to .travis.yml in order to test the new intrinsics implementation. Closes #455
141f26ff	2018-09-18T18:28:31	Neon: Intrinsics impl. of 2x2 and 4x4 scaled IDCTs The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementations provide the same or better performance.
4574f01f	2018-06-28T16:17:36	Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp There was no previous GAS implementation. NOTE: This doesn't produce much of a speedup when using -O3, because -O3 already enables Neon autovectorization, which works well for the scalar C implementation of plain upsampling. However, the Neon SIMD implementation will benefit other optimization levels.
ba52a3de	2018-07-19T18:46:24	Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp There was no previous GAS implementation. This commit also reverts 40557b23015d2f8b576420231b8dd1f39f2ceed8 and 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57. 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57 was only necessary because there was no Neon implementation of merged upsampling/color conversion, and 40557b23015d2f8b576420231b8dd1f39f2ceed8 was only necessary because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.
240ba417	2020-01-07T16:40:32	Neon: Intrinsics impl. of prog. Huffman encoding The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.
ed581cd9	2019-06-12T18:16:53	Neon: Intrinsics impl. of accurate int inverse DCT The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable.
2c6b68e2	2018-09-25T18:20:25	Neon: Intrinsics impl. of fast integer Inverse DCT The previous AArch32 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
2acfb93c	2019-05-08T15:43:26	Neon: Intrinsics impl. of h1v2 fancy upsamling There was no previous GAS implementation.
97530777	2018-06-15T11:13:52	Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp The previous AArch32 GAS implementation of h2v1 fancy upsampling has been removed, since the intrinsics implementation provides the same or better performance. There was no previous GAS implementation of h2v2 fancy upsampling, and there was no previous AArch64 GAS implementation of h2v1 fancy upsampling.
5dbd3932	2018-08-01T16:52:31	Neon: Intrinsics implementation of YCbCr->RGB565 The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
0f35cd68	2018-07-16T10:25:14	Neon: Intrinsics implementation of YCbCr->RGB The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
f3c3f01d	2018-09-24T04:35:20	Neon: Intrinsics impl. of Huffman encoding The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
d0004de5	2018-08-22T13:38:37	Neon: Intrinsics impl. of accurate int forward DCT The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. There was no previous AArch32 GAS implementation.
3d84668d	2018-08-23T14:22:23	Neon: Intrinsics impl. of fast integer forward DCT The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.
951d3677	2018-08-24T18:04:21	Neon: Intrinsics impl. of int sample conv./quant. The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.
366168aa	2018-08-06T15:14:34	Neon: Intrinsics impl. of h2v1 & h2v2 downsampling The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.
f73b1dbc	2018-08-09T15:08:21	Neon: Intrinsics implementation of RGB->Grayscale There was no previous GAS implementation.
4f2216b4	2019-11-26T18:14:33	Neon: Intrinsics implementation of RGB->YCbCr The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using a new NEON_INTRINSICS CMake variable.
0efc4858	2020-11-09T18:55:21	Merge branch 'master' into dev
02227e48	2020-11-09T16:31:49	Travis: Combine PPC/Arm tests with jpeg-7/8 tests There is no reason not to, since the jpeg-7 and jpeg-8 API/ABI tests do not exercise the SIMD extensions any differently than the other tests.
c7dd1912	2020-11-08T15:15:02	Merge branch 'master' into dev
40557b23	2020-11-06T18:51:55	Build: Fix test failures w/ Arm Neon SIMD exts Regression caused by a46c111d9f3642f0ef3819e7298846ccc61869e0 Because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, which was introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/ color conversion is disabled on platforms that have SIMD-accelerated YCbCr -> RGB color conversion but not SIMD-accelerated merged upsampling/color conversion. This was intended to improve performance with the Neon SIMD extensions, since those are the only SIMD extensions for which those circumstances apply. Under normal circumstances, the separate "plain" (non-fancy) upsampling and color conversion routines will produce bitwise-identical output to the merged upsampling/color conversion routines, but that is not the case when skipping scanlines starting at an odd-numbered scanline. The modified test introduced in a46c111d9f3642f0ef3819e7298846ccc61869e0 does precisely that in order to validate the fixes introduced in 9120a247436e84c0b4eea828cb11e8f665fcde30 and a46c111d9f3642f0ef3819e7298846ccc61869e0. Because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, the segfault fixed in 9120a247436e84c0b4eea828cb11e8f665fcde30 and a46c111d9f3642f0ef3819e7298846ccc61869e0 didn't affect the Neon SIMD extensions, so this commit effectively reverts the test modifications in a46c111d9f3642f0ef3819e7298846ccc61869e0 when using those SIMD extensions. We can get rid of this hack, as well as 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, once a Neon implementation of merged upsampling/color conversion is available.
a524b9b0	2020-11-06T17:24:16	Travis: Regression-test Armv8 and PPC SIMD exts Currently this only tests the 64-bit code paths, but it's better than nothing.
7c1a1789	2020-11-05T16:04:55	Merge branch 'master' into dev
6e632af9	2020-11-04T10:13:06	Demote "fast" [I]DCT algorithms to legacy status - Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.

4de8f692

2021-04-16T16:34:12

jdhuff.h: Fix ASan regression caused by 8fa70367 The 0xFF is, in fact, necessary.

785ec30e

2021-04-16T15:59:38

cjpeg_fuzzer: Add cov for h2v2 smooth downsampling

d147be83

2021-04-15T23:31:51

Huff decs: Fix/suppress more innocuous UBSan errs - UBSan complained that entropy->restarts_to_go was underflowing an unsigned integer when it was decremented while cinfo->restart_interval == 0. That was, of course, completely innocuous behavior, since the result of the underflowing computation was never used. - d3a3a73f64041c6a6905faf6f9f9832e735fd880 and 7bc9fca4309563d66b0c5665a616285d0e9baeb4 silenced a UBSan signed integer overflow error, but unfortunately other malformed JPEG images have been discovered that cause unsigned integer overflow in the same computation. Since, to the best of our understanding, this behavior is innocuous, this commit reverts the commits listed above, suppresses the UBSan errors, and adds code comments to document the issue.

8fa70367

2021-04-15T22:26:53

Huff dec: Fix non-deterministic output w/bad input Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1050342, there are certain very rare circumstances under which a malformed JPEG image can cause different Huffman decoder output to be produced, depending on the size of the source manager's I/O buffer. (More specifically, the fast Huffman decoder didn't handle invalid codes in the same manner as the slow decoder, and since the fast decoder requires all data to be memory-resident, the buffering strategy determines whether or not the fast decoder can be used on a particular MCU block.) After extensive experimentation, the Mozilla and Chrome developers and I determined that this truly was an innocuous issue. The patch that both browsers adopted as a workaround caused a performance regression with 32-bit code, which is why it was not accepted into libjpeg-turbo. This commit fixes the problem in a less disruptive way with no performance regression.

171b875b

2021-04-15T19:03:53

OSS-Fuzz: Check img size b4 readers allocate mem After the completion of the start_input() method, it's too late to check the image size, because the image readers may have already tried to allocate memory for the image. If the width and height are excessively large, then attempting to allocate memory for the image could slow performance or lead to out-of-memory errors prior to the fuzz target checking the image size. NOTE: Specifically, the aforementioned OOM errors and slow units were observed with the compression fuzz targets when using MSan.

3ab32348

2021-04-13T11:51:29

OSS-Fuzz: More code coverage improvements

3e68a5ee

2021-04-12T14:37:43

jchuff.c: Fix MSan error Certain rare malformed input images can cause the Huffman encoder to generate a value for nbits that corresponds to an uninitialized member of the DC code table. The ramifications of this are minimal and would basically amount to a different bogus JPEG image being generated from a particular bogus input image.

4e451616

2021-04-12T11:53:29

compress_yuv_fuzzer: Minor code coverage tweak

629e96ee

2021-04-12T11:52:55

cjpeg.c: Code formatting tweak

ebaa67ea

2021-04-12T10:38:52

rdbmp.c: Fix more innocuous UBSan errors - Referring to 3311fc00010c6cb305d87525c9ef60ebdf036cfc, we need to use unsigned intermediate math in order to make UBSan happy, even though (JDIMENSION)(A * B) is effectively the same as (JDIMENSION)A *(JDIMENSION)B, regardless of intermediate overflow. - Because of the previous commit, it is now possible for bfOffBits to be INT_MIN, which would cause the initial computation of bPad to underflow a signed integer. Thus, we need to check for that possibility as soon as we know the values of bfOffBits and headerSize. The worst case from this regression is that bPad could wrap around to a large positive value, which would cause a "Premature end of input file" error in the subsequent read_byte() loop. Thus, this issue was effectively innocuous as well, since it resulted in catching the same error later and in a different way. Also, the issue was very well-contained, since it was both introduced and fixed as part of the ongoing OSS-Fuzz integration project.

dd830b3f

2021-04-09T17:36:41

rdbmp.c/rdppm.c: Fix more innocuous UBSan errors - rdbmp.c: Because of 8fb37b81713a0cdc14622dc08892ebd28a3233aa, bfOffBits, biClrUsed, and headerSize were made into unsigned ints. Thus, if bPad would eventually be negative due to a malformed header, UBSan complained about unsigned math being used in the intermediate computations. It was unnecessary to make those variables unsigned, since they are only meant to hold small values, so this commit makes them signed again. The UBSan error was innocuous, because it is effectively (if not officially) the case that (int)((unsigned int)a - (unsigned int)b) == (int)a - (int)b. - rdbmp.c: If (biWidth * source->bits_per_pixel / 8) would overflow an unsigned int, then UBSan complained at the point at which row_width was set in start_input_bmp(), even though the overflow would have been detected later in the function. This commit adds overflow checks prior to setting row_width. - rdppm.c: read_pbm_integer() now bounds-checks the intermediate value computations in order to catch integer overflow caused by a malformed text PPM. It's possible, though extremely unlikely, that the intermediate value computations could have wrapped around to a value smaller than maxval, but the worst case is that this would have generated a bogus pixel in the uncompressed image rather than throwing an error.

4ede2ef5

2021-04-09T17:26:19

OSS-Fuzz: cjpeg fuzz target

5cda8c5e

2021-04-09T13:12:32

compress_yuv_fuzzer: Use unique filename template

47b66d1d

2021-04-09T11:26:34

OSS-Fuzz: Fix UBSan err caused by TJFLAG_FUZZING

55ab0d39

2021-04-08T16:13:06

OSS-Fuzz: YUV encoding/compression fuzz target

18bc4c61

2021-04-07T16:04:58

compress.cc: Code formatting tweak

b1079002

2021-04-07T15:51:05

rdppm.c: Fix innocuous MSan error A fuzzing test case that was effectively a 1-pixel PGM file with a maximum value of 1 and an actual value of 8 caused an uninitialized member of the rescale[] array to be accessed in get_gray_rgb_row() or get_gray_cmyk_row(). Since, for performance reasons, those functions do not perform bounds checking on the PPM values, we need to ensure that unused members of the rescale[] array are initialized.

3311fc00

2021-04-07T14:20:49

rdbmp.c: Fix innocuous UBSan error A fuzzing test case with an image width of 838860946 triggered a UBSan error: rdbmp.c:633:34: runtime error: signed integer overflow: 838860946 * 3 cannot be represented in type 'int' Because the result is cast to an unsigned int (JDIMENSION), this error is irrelevant, because (unsigned int)((int)838860946 * (int)3) == (unsigned int)838860946 * (unsigned int)3

34d264d6

2021-04-07T12:44:50

OSS-Fuzz: Private TurboJPEG API flag for fuzzing This limits the tjLoadImage() behavioral changes to the scope of the compress_fuzzer target. Otherwise, TJBench in fuzzer builds would refuse to load images larger than 1 Mpixel.

f35fd27e

2021-04-06T12:51:03

tjLoadImage: Fix issues w/loading 16-bit PPMs/PGMs - The PPM reader now throws an error rather than segfaulting (due to a buffer overrun) if an application attempts to load a 16-bit PPM file into a grayscale uncompressed image buffer. No known applications allowed that (not even the test applications in libjpeg-turbo), because that mode of operation was never expected to work and did not work under any circumstances. (In fact, it was necessary to modify TJBench in order to reproduce the issue outside of a fuzzing environment.) This was purely a matter of making the library bow out gracefully rather than crash if an application tries to do something really stupid. - The PPM reader now throws an error rather than generating incorrect pixels if an application attempts to load a 16-bit PGM file into an RGB uncompressed image buffer. - The PPM reader now correctly loads 16-bit PPM files into extended RGB uncompressed image buffers. (Previously it generated incorrect pixels unless the input colorspace was JCS_RGB or JCS_EXT_RGB.) The only way that users could have potentially encountered these issues was through the tjLoadImage() function. cjpeg and TJBench were unaffected.

df17d398

2021-04-06T11:34:30

jcphuff.c: -Wjump-misses-init warning w/GCC 9 -m32 (verified that this commit does not change the generated 64-bit or 32-bit assembly code)

cd9a3185

2021-04-05T22:20:52

Bump TurboJPEG C API version to 2.1 (because of TJFLAG_LIMITSCANS)

d2d44655

2021-04-05T21:41:30

OSS-Fuzz: Compression fuzz target

5536ace1

2021-04-05T21:12:29

OSS-Fuzz: Fix C++11 compiler warnings in targets

5dd906be

2021-04-05T17:47:34

OSS-Fuzz: Test non-default opts w/ decompress_yuv The non-default options were not being tested because of a pixel format comparison buglet. This commit also changes the code in both decompression fuzz targets such that non-default options are tested based on the pixel format index rather than the pixel format value, which is a bit more idiot-proof.

c81e91e8

2021-04-05T16:08:22

TurboJPEG: New flag for limiting prog JPEG scans This also fixes timeouts reported by OSS-Fuzz.

bff7959e

2021-04-02T14:53:43

OSS-Fuzz: Require static libraries Refer to https://google.github.io/oss-fuzz/further-reading/fuzzer-environment/#runtime-dependencies for the reasons why this is necessary.

6ad658be

2021-04-02T14:50:35

OSS-Fuzz: Build fuzz targets using C++ compiler Otherwise, the targets will require libstdc++, the i386 version of which is not available in the OSS-Fuzz runtime environment. The OSS-Fuzz build environment passes -stdlib:libc++ in the CXXFLAGS environment variable in order to mitigate this issue, since the runtime environment has the i386 version of libc++, but using that compiler flag requires using the C++ compiler.

7b57cba6

2021-03-31T11:16:51

OSS-Fuzz: Fix uninitialized reads detected by MSan

2f9e8a11

2021-03-29T18:54:12

OSS-Fuzz integration This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo source tree, thus obsoleting and improving code coverage relative to Google's OSS-Fuzz target for libjpeg-turbo (previously available here: https://github.com/google/oss-fuzz). I hope to eventually create fuzz targets for the BMP, GIF, and PPM readers as well, which would allow for fuzz-testing compression, but since those readers all require an input file, it is unclear how to build an efficient fuzzer around them. It doesn't make sense to fuzz-test compression in isolation, because compression can't accept arbitrary input data.

e4ec23d7

2021-02-10T16:45:50

Neon: Use byte-swap builtins instead of inline asm Define compiler-independent byte-swap macros and use them instead of executing 'rev' via inline assembly code with GCC-compatible compilers or a slow shift-store sequence with Visual C++. * This produces identical assembly code with: - 64-bit GCC 8.4.0 (Linux) - 64-bit GCC 9.3.0 (Linux) - 64-bit Clang 10.0.0 (Linux) - 64-bit Clang 10.0.0 (MinGW) - 64-bit Clang 12.0.0 (Xcode 12.2, macOS) - 64-bit Clang 12.0.0 (Xcode 12.2, iOS) * This produces different assembly code with: - 64-bit GCC 4.9.1 (Linux) - 32-bit GCC 4.8.2 (Linux) - 32-bit GCC 8.4.0 (Linux) - 32-bit GCC 9.3.0 (Linux) Since the intrinsics implementation of Huffman encoding is not used by default with these compilers, this is not a concern. - 32-bit Clang 10.0.0 (Linux) Verified performance neutrality Closes #507

e795afc3

2021-03-25T22:36:15

SSE2: Fix prog Huff enc err if Sl%32==0 && Al!=0 (regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8) This implements the same fix for jsimd_encode_mcu_AC_refine_prepare_sse2() that a81a8c137b3f1c65082aa61f236aa88af61b3ad4 implemented for jsimd_encode_mcu_AC_first_prepare_sse2(). Based on: https://github.com/MegaByte/libjpeg-turbo/commit/1a59587397150c9ef9dffc5813cb3891db4bc0c8 https://github.com/MegaByte/libjpeg-turbo/commit/eb176a91d87a470bf8c987be786668aa944dd1dd Fixes #509 Closes #510

2c01200c

2021-03-15T19:56:53

Build: Fix incorrect regexes w/ if(...MATCHES...) "arm*" as a regex means 'ar' followed by zero or more 'm' characters, which matches 'parisc' and 'sparc64' as well.

ed70101d

2021-03-15T12:36:55

ChangeLog.md: List CVE ID fixed by 1719d12e Referring to https://bugzilla.redhat.com/show_bug.cgi?id=1937385#c2, it is my opinion that the severity of this bug was grossly overstated and that a CVE never should have been assigned to it, but since one was assigned, users need to know which version of libjpeg-turbo contains the fix. Dear security community, please learn what "DoS" actually means and stop misusing that term for dramatic effect. Thanks.

8a2cad02

2021-01-21T10:51:49

Build: Handle CMAKE_OSX_ARCHITECTURES=(i386|ppc) We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo anymore, but they still work (bearing in mind that PowerPC builds require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or earlier.) Referring to #495, apparently MacPorts needs this functionality.

b6772910

2021-01-19T15:32:32

Add Sponsor button for GitHub repository

399aa374

2021-01-19T12:25:11

Build: Support CMAKE_OSX_ARCHITECTURES ... as long as it contains only a singular value, which must equal "x86_64" or "arm64". Refer to #495

1719d12e

2021-01-14T18:35:15

cjpeg: Fix FPE when compressing 0-width GIF Fixes #493

486cdcfb

2021-01-12T17:45:55

Fix build with Visual C++ and /std:c11 or /std:c17 Fixes #481 Closes #482

74e6ea45

2021-01-05T20:23:11

Neon: Fix Huffman enc. error w/Visual Studio+Clang The GNU builtin function __builtin_clzl() accepts an unsigned long argument, which is 8 bytes wide on LP64 systems (most Un*x systems, including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused the Neon intrinsics implementation of Huffman encoding to produce mathematically incorrect results when compiled using Visual Studio with Clang. This commit changes all invocations of __builtin_clzl() in the Neon SIMD extensions to __builtin_clzll(), which accepts an unsigned long long argument that is guaranteed to be 8 bytes wide on all systems. Fixes #480 Closes #490

d2c40799

2020-12-17T16:02:47

Use CLZ compiler intrinsic for Windows/Arm builds The __builtin_clz() compiler intrinsic was already used in the C Huffman encoders when building libjpeg-turbo for Arm CPUs using a GCC-compatible compiler. This commit modifies the C Huffman encoders so that they also use__builtin_clz() when building for Arm CPUs using Visual Studio + Clang, as well as the equivalent _CountLeadingZeros() compiler intrinsic when building for Arm CPUs using Visual C++. In addition to making the C Huffman encoders faster on Windows/Arm, this also prevents jpeg_nbits_table from being included in Windows/Arm builds, thus saving 128 KB of memory.

3e8911aa

2021-01-11T13:56:01

Build: Use correct SIMD exts w/VStudio IDE + Arm64 When configuring a Visual Studio IDE build and passing -A arm64 to CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of CMAKE_SYSTEM_PROCESSOR.

4b838c38

2021-01-11T13:45:25

jcphuff.c: Fix compiler warning with clang-cl Fixes #492

944f5915

2021-01-08T12:41:02

Migrate from Travis CI to GitHub Actions Note that this removes our ability to regression test the Armv8 and PowerPC SIMD extensions, effectively reverting a524b9b06be2e0c24d8abc6528cf29316cfe8dc5 and 02227e48a990911a6da35ab8034911a9fbc1055a, but at the moment, there is no other way.

3179f330

2021-01-04T14:54:35

tjexample.c: Fix mem leak if tjTransform() fails Fixes #479

1388ad67

2020-12-08T21:25:47

Build: Officially support Ninja

110d8d6d

2020-12-07T11:12:49

decompress_smooth_data(): Fix another uninit. read Regression introduced by 42825b68d570fb07fe820ac62ad91017e61e9a25 The test case https://user-images.githubusercontent.com/3491627/101376530-fde56180-38b0-11eb-938d-734119a5b5ba.jpg is a malformed progressive JPEG image containing an interleaved Y/Cb/Cr DC scan followed by two non-interleaved Y DC scans. Thus, the prev_coef_bits[] array was initialized for the Y component but not the other components, the uninitialized values for Cb and Cr were transferred to the prev_coef_bits_latch[] array in smoothing_ok(), and because cinfo->master->last_good_iMCU_row was 0, decompress_smooth_data() read those uninitialized values when attempting to smooth the second iMCU row. Possibly fixes #478

7b687649

2020-12-03T19:15:07

LICENSE.md: Remove trailing whitespace Use <br> to indicate a line break, as we do in README.md, in order to make checkstyle happy.

21d05684

2020-12-03T18:50:08

Build: Test for correct AArch32 RPM/DEBARCH value ... based on the floating point ABI being used by the compiler (which do you choose, a hard or soft option?)

6e4509a3

2020-12-01T09:04:27

LICENSE.md: Formatting tweak

c7ca521b

2020-11-28T06:38:27

Fix uninitialized read in decompress_smooth_data() Regression introduced by 42825b68d570fb07fe820ac62ad91017e61e9a25 Referring to the discussion in #459, the OSS-Fuzz test case https://github.com/libjpeg-turbo/libjpeg-turbo/files/5597075/clusterfuzz-testcase-minimized-pngsave_buffer_fuzzer-5728375846731776.txt created a situation in which cinfo->output_iMCU_row > cinfo->master->last_good_iMCU_row but cinfo->input_scan_number == 1 thus causing decompress_smooth_data() to read from prev_coef_bits_latch[], which was uninitialized. I was unable to create the same situation with a real JPEG image.

ccaba5d7

2020-11-25T14:55:55

Fix buffer overrun with certain narrow prog JPEGs Regression introduced by 6d91e950c871103a11bac2f10c63bf998796c719 last_block_column in decompress_smooth_data() can be 0 if, for instance, decompressing a 4:4:4 image of width 8 or less or a 4:2:2 or 4:2:0 image of width 16 or less. Since last_block_column is an unsigned int, subtracting 1 from it produced 0xFFFFFFFF, the test in line 590 passed, and we attempted to access blocks from a second block column that didn't actually exist. Closes #476

cfc7e6e5

2020-11-25T14:10:55

Bump revision to 2.0.91 for post-beta fixes

4e52b66f

2020-11-24T21:54:42

Travis: Use Docker tag that matches Git branch

8cf6f716

2020-11-24T21:32:48

Bump revision to 2.0.90 to prepare for beta

eb14189c

2020-11-17T12:48:49

Fix Neon SIMD build issues with Visual Studio - Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for compile-time detection of Arm builds, since __arm__ and __aarch64__ are only present in GNU-compatible compilers. - Neon/intrinsics: Use the _CountLeadingZeros() and _CountLeadingZeros64() intrinsics provided by Visual Studio, since __builtin_clz() and __builtin_clzl() are only present in GNU-compatible compilers. - Neon/intrinsics: Since Visual Studio does not support static vector initialization, replace static initialization of Neon vectors with the appropriate intrinsics. Compared to the static initialization approach, this produces identical assembly code with both GCC and Clang. - Neon/intrinsics: Since Visual Studio does not support inline assembly code, provide alternative code paths for Visual Studio whenever inline assembly is used. - Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds (Visual Studio does not emit fused multiply-add [FMA] instructions by default for such builds.) - Neon/intrinsics: Move temporary buffer allocation outside of nested loops. Since Visual Studio configures Arm builds with a relatively small amount of stack memory, attempting to allocate those buffers within the inner loops caused a stack overflow. Closes #461 Closes #475

91dd3b23

2020-11-24T19:22:38

ChangeLog: macOS Armv8/x86-64 univ. binary support

7e0d94d3

2020-11-24T20:31:51

Merge branch 'master' into dev

1c839761

2020-11-24T18:51:16

Force Git to treat testorig.ppm as a binary file Otherwise, because the file begins with an ASCII header, Git will erroneously treat is as an ASCII file, and if Git for Windows is configured with default options (specifically, "Checkout windows-style, commit Unix-style line endings"), it will add carriage return characters to all of the "linefeed" characters in the PPM file, thus corrupting it and causing libjpeg-turbo's regression tests to fail.

6d91e950

2020-10-05T13:37:44

Use 5x5 win & 9 AC coeffs when smoothing DC scans ... of progressive images. Based on: https://github.com/mo271/libjpeg-turbo/commit/be8d36d13b79a472e56da0717ba067e6139bc0e1 https://github.com/mo271/libjpeg-turbo/commit/9d528f278ee3a5ba571c0b9ec4567c557614fb25 https://github.com/mo271/libjpeg-turbo/commit/85f36f0765ea2c28909fc4c0e570cd68d3a1ed85 https://github.com/mo271/libjpeg-turbo/commit/63a4d39e387f61bcb83b393838f436b410b97308 https://github.com/mo271/libjpeg-turbo/commit/51336a6ad5acb9379dc8e3e5e5758fd439224b7c Closes #459 Closes #474

d523435e

2020-11-19T19:30:38

Travis: Use Xcode 12.2 for all iOS & macOS builds There doesn't seem to be any performance or compatibility downside to this, and it has the advantages of simplicity and consistency between the PR and official builds.

1ac83cd6

2020-11-18T18:16:12

Travis: The Mac build log is now log-macos.txt (oversight from f7a10a61e3bbab14d2e901c8823cec4961a46b2f)

0ba70b6a

2020-11-18T15:01:24

Build: Support macOS Armv8/x86-64 univ. binaries - Rename IOS_ARMV8_BUILD to ARMV8_BUILD. - Rename install_ios() to install_subbuild() in makemacpkg. - Wordsmith the build instructions accordingly. - Use xcode12.2 image in Travis CI.

e417033d

2020-11-18T14:13:54

Merge branch 'master' into dev

6d2e8837

2020-11-18T13:25:06

jpeg_skip_scanlines(): Avoid NULL + 0 UBSan error This error occurs at the call to (*cinfo->cconvert->color_convert)() in sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE (i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is innocuous, since (*cinfo->cconvert->color_convert)() points to a dummy function (noop_convert()) in that case. Fixes #470

f7c54892

2020-11-18T10:11:21

Travis: Add /opt/local/bin to PATH for Mac build (oversight from previous commit) macports-ci does this, and it's necessary in order for the build script to find md5sum.

f7a10a61

2020-11-17T13:51:28

Build: "OS X"/"OSX" = "macOS"/"MACOS" There are no supported versions of "OS X" anymore. The operating system has been named "macOS" since 10.12 Sierra, which was released four years ago.

d111d9ff

2020-11-17T11:54:20

Merge branch 'master' into dev

10ba6ed3

2020-11-16T17:30:37

Travis: Install MacPorts without using macports-ci

292d78e7

2020-11-16T15:28:02

Merge branch 'master' into dev

88bf1d16

2020-11-16T14:38:15

Build: Set FLOATTEST more intelligently The "32bit" vs. "64bit" floating point test results actually have nothing to do with the FPU. That was a fallacious assumption based on the observation that, with multiple CPU types, 32-bit and 64-bit builds produce different floating point test results. It seems that this is, in fact, due to differing compiler behavior-- more specifically, whether fused multiply-add (FMA) instructions are used to combine multiple floating point operations into a single instruction ("floating point expression contraction".) GCC does this by default if the target supports FMA instructions, which PowerPC and AArch64 targets both do. Fixes #468

8f830598

2020-11-13T15:21:26

Merge branch 'master' into dev

42f7c78f

2020-11-13T15:18:35

BUILDING.md: Use min. iOS v8 in iOS Armv8 example This is necessary in order to enable thread-local storage.

33859880

2020-11-13T12:12:47

Neon: Auto-detect compiler intrinsics completeness This allows the Neon intrinsics code to be built successfully (albeit likely with reduced run-time performance) with Xcode 5.0-6.2 (iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2 will not build the Armv8 GAS code without gas-preprocessor.pl, and no version of Xcode will build the Armv7 GAS code without gas-preprocessor.pl, so we always use the full Neon intrinsics implementation by default with macOS and iOS builds. Auto-detecting the completeness of the compiler's set of Neon intrinsics also allows us to more intelligently set the default value of NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a reasonable, albeit imperfect, proxy for whether a compiler has a full and optimal set of Neon intrinsics. Specific notes: - 64-bit RGB-to-YCbCr color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer forward DCT uses vld1_s16_x3(), regresses with GCC - 64-bit Huffman encoding uses vld1q_u8_x4(), regresses with GCC - 64-bit YCbCr-to-RGB color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 64-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. - 32-bit RGB-to-YCbCr color conversion: uses vld1_u16_x2(), regresses with GCC - 32-bit accurate integer forward DCT uses vld1_s16_x3(), regression irrelevant because there was no previous implementation - 32-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 32-bit fast integer inverse DCT does not use any of the intrinsics in question, regresses with GCC - 32-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. Presumably when GCC includes a full and optimal set of Neon intrinsics, the HAVE_VLD1* tests will pass, and the full Neon intrinsics implementation will be enabled automatically.

3e9e7c70

2020-11-11T17:54:06

Fix build if WITH_12BIT==1 && WITH_JPEG(7|8)==1 Fixes #466

bbd80892

2020-11-10T17:54:14

Neon: Finalize intrinsics implementation - Remove gas-preprocessor.pl. None of the compilers that can build the new intrinsics implementation require gas-preprocessor.pl (tested with Xcode and with Clang 3.9+ for Linux.) - Document that Xcode 6.3.x or later is now required for iOS builds (older versions of Xcode do not have a full set of Neon intrinsics.) - Add a change log entry. - Do not enable the ASM CMake language unless NEON_INTRINSICS is false. - Add a Clang/Arm64 test to .travis.yml in order to test the new intrinsics implementation. Closes #455

141f26ff

2018-09-18T18:28:31

Neon: Intrinsics impl. of 2x2 and 4x4 scaled IDCTs The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementations provide the same or better performance.

4574f01f

2018-06-28T16:17:36

Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp There was no previous GAS implementation. NOTE: This doesn't produce much of a speedup when using -O3, because -O3 already enables Neon autovectorization, which works well for the scalar C implementation of plain upsampling. However, the Neon SIMD implementation will benefit other optimization levels.

ba52a3de

2018-07-19T18:46:24

Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp There was no previous GAS implementation. This commit also reverts 40557b23015d2f8b576420231b8dd1f39f2ceed8 and 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57. 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57 was only necessary because there was no Neon implementation of merged upsampling/color conversion, and 40557b23015d2f8b576420231b8dd1f39f2ceed8 was only necessary because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.

240ba417

2020-01-07T16:40:32

Neon: Intrinsics impl. of prog. Huffman encoding The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.

ed581cd9

2019-06-12T18:16:53

Neon: Intrinsics impl. of accurate int inverse DCT The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable.

2c6b68e2

2018-09-25T18:20:25

Neon: Intrinsics impl. of fast integer Inverse DCT The previous AArch32 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.

2acfb93c

2019-05-08T15:43:26

Neon: Intrinsics impl. of h1v2 fancy upsamling There was no previous GAS implementation.

97530777

2018-06-15T11:13:52

Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp The previous AArch32 GAS implementation of h2v1 fancy upsampling has been removed, since the intrinsics implementation provides the same or better performance. There was no previous GAS implementation of h2v2 fancy upsampling, and there was no previous AArch64 GAS implementation of h2v1 fancy upsampling.

5dbd3932

2018-08-01T16:52:31

Neon: Intrinsics implementation of YCbCr->RGB565 The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.

0f35cd68

2018-07-16T10:25:14

Neon: Intrinsics implementation of YCbCr->RGB The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.

f3c3f01d

2018-09-24T04:35:20

Neon: Intrinsics impl. of Huffman encoding The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.

d0004de5

2018-08-22T13:38:37

Neon: Intrinsics impl. of accurate int forward DCT The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. There was no previous AArch32 GAS implementation.

3d84668d

2018-08-23T14:22:23

Neon: Intrinsics impl. of fast integer forward DCT The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.

951d3677

2018-08-24T18:04:21

Neon: Intrinsics impl. of int sample conv./quant. The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.

366168aa

2018-08-06T15:14:34

Neon: Intrinsics impl. of h2v1 & h2v2 downsampling The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.

f73b1dbc

2018-08-09T15:08:21

Neon: Intrinsics implementation of RGB->Grayscale There was no previous GAS implementation.

4f2216b4

2019-11-26T18:14:33

Neon: Intrinsics implementation of RGB->YCbCr The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using a new NEON_INTRINSICS CMake variable.

0efc4858

2020-11-09T18:55:21

Merge branch 'master' into dev

02227e48

2020-11-09T16:31:49

Travis: Combine PPC/Arm tests with jpeg-7/8 tests There is no reason not to, since the jpeg-7 and jpeg-8 API/ABI tests do not exercise the SIMD extensions any differently than the other tests.

c7dd1912

2020-11-08T15:15:02

Merge branch 'master' into dev

40557b23

2020-11-06T18:51:55

Build: Fix test failures w/ Arm Neon SIMD exts Regression caused by a46c111d9f3642f0ef3819e7298846ccc61869e0 Because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, which was introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/ color conversion is disabled on platforms that have SIMD-accelerated YCbCr -> RGB color conversion but not SIMD-accelerated merged upsampling/color conversion. This was intended to improve performance with the Neon SIMD extensions, since those are the only SIMD extensions for which those circumstances apply. Under normal circumstances, the separate "plain" (non-fancy) upsampling and color conversion routines will produce bitwise-identical output to the merged upsampling/color conversion routines, but that is not the case when skipping scanlines starting at an odd-numbered scanline. The modified test introduced in a46c111d9f3642f0ef3819e7298846ccc61869e0 does precisely that in order to validate the fixes introduced in 9120a247436e84c0b4eea828cb11e8f665fcde30 and a46c111d9f3642f0ef3819e7298846ccc61869e0. Because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, the segfault fixed in 9120a247436e84c0b4eea828cb11e8f665fcde30 and a46c111d9f3642f0ef3819e7298846ccc61869e0 didn't affect the Neon SIMD extensions, so this commit effectively reverts the test modifications in a46c111d9f3642f0ef3819e7298846ccc61869e0 when using those SIMD extensions. We can get rid of this hack, as well as 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, once a Neon implementation of merged upsampling/color conversion is available.

a524b9b0

2020-11-06T17:24:16

Travis: Regression-test Armv8 and PPC SIMD exts Currently this only tests the 64-bit code paths, but it's better than nothing.

7c1a1789

2020-11-05T16:04:55

Merge branch 'master' into dev

6e632af9

2020-11-04T10:13:06

Demote "fast" [I]DCT algorithms to legacy status - Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.

kc3-lang/libjpeg-turbo

Log