Log

Author Commit Date CI Message
DRC 4de8f692 2021-04-16T16:34:12 jdhuff.h: Fix ASan regression caused by 8fa70367 The 0xFF is, in fact, necessary.
DRC 785ec30e 2021-04-16T15:59:38 cjpeg_fuzzer: Add cov for h2v2 smooth downsampling
DRC d147be83 2021-04-15T23:31:51 Huff decs: Fix/suppress more innocuous UBSan errs - UBSan complained that entropy->restarts_to_go was underflowing an unsigned integer when it was decremented while cinfo->restart_interval == 0. That was, of course, completely innocuous behavior, since the result of the underflowing computation was never used. - d3a3a73f64041c6a6905faf6f9f9832e735fd880 and 7bc9fca4309563d66b0c5665a616285d0e9baeb4 silenced a UBSan signed integer overflow error, but unfortunately other malformed JPEG images have been discovered that cause unsigned integer overflow in the same computation. Since, to the best of our understanding, this behavior is innocuous, this commit reverts the commits listed above, suppresses the UBSan errors, and adds code comments to document the issue.
DRC 8fa70367 2021-04-15T22:26:53 Huff dec: Fix non-deterministic output w/bad input Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1050342, there are certain very rare circumstances under which a malformed JPEG image can cause different Huffman decoder output to be produced, depending on the size of the source manager's I/O buffer. (More specifically, the fast Huffman decoder didn't handle invalid codes in the same manner as the slow decoder, and since the fast decoder requires all data to be memory-resident, the buffering strategy determines whether or not the fast decoder can be used on a particular MCU block.) After extensive experimentation, the Mozilla and Chrome developers and I determined that this truly was an innocuous issue. The patch that both browsers adopted as a workaround caused a performance regression with 32-bit code, which is why it was not accepted into libjpeg-turbo. This commit fixes the problem in a less disruptive way with no performance regression.
DRC 171b875b 2021-04-15T19:03:53 OSS-Fuzz: Check img size b4 readers allocate mem After the completion of the start_input() method, it's too late to check the image size, because the image readers may have already tried to allocate memory for the image. If the width and height are excessively large, then attempting to allocate memory for the image could slow performance or lead to out-of-memory errors prior to the fuzz target checking the image size. NOTE: Specifically, the aforementioned OOM errors and slow units were observed with the compression fuzz targets when using MSan.
DRC 3ab32348 2021-04-13T11:51:29 OSS-Fuzz: More code coverage improvements
DRC 3e68a5ee 2021-04-12T14:37:43 jchuff.c: Fix MSan error Certain rare malformed input images can cause the Huffman encoder to generate a value for nbits that corresponds to an uninitialized member of the DC code table. The ramifications of this are minimal and would basically amount to a different bogus JPEG image being generated from a particular bogus input image.
DRC 4e451616 2021-04-12T11:53:29 compress_yuv_fuzzer: Minor code coverage tweak
DRC 629e96ee 2021-04-12T11:52:55 cjpeg.c: Code formatting tweak
DRC ebaa67ea 2021-04-12T10:38:52 rdbmp.c: Fix more innocuous UBSan errors - Referring to 3311fc00010c6cb305d87525c9ef60ebdf036cfc, we need to use unsigned intermediate math in order to make UBSan happy, even though (JDIMENSION)(A * B) is effectively the same as (JDIMENSION)A *(JDIMENSION)B, regardless of intermediate overflow. - Because of the previous commit, it is now possible for bfOffBits to be INT_MIN, which would cause the initial computation of bPad to underflow a signed integer. Thus, we need to check for that possibility as soon as we know the values of bfOffBits and headerSize. The worst case from this regression is that bPad could wrap around to a large positive value, which would cause a "Premature end of input file" error in the subsequent read_byte() loop. Thus, this issue was effectively innocuous as well, since it resulted in catching the same error later and in a different way. Also, the issue was very well-contained, since it was both introduced and fixed as part of the ongoing OSS-Fuzz integration project.
DRC dd830b3f 2021-04-09T17:36:41 rdbmp.c/rdppm.c: Fix more innocuous UBSan errors - rdbmp.c: Because of 8fb37b81713a0cdc14622dc08892ebd28a3233aa, bfOffBits, biClrUsed, and headerSize were made into unsigned ints. Thus, if bPad would eventually be negative due to a malformed header, UBSan complained about unsigned math being used in the intermediate computations. It was unnecessary to make those variables unsigned, since they are only meant to hold small values, so this commit makes them signed again. The UBSan error was innocuous, because it is effectively (if not officially) the case that (int)((unsigned int)a - (unsigned int)b) == (int)a - (int)b. - rdbmp.c: If (biWidth * source->bits_per_pixel / 8) would overflow an unsigned int, then UBSan complained at the point at which row_width was set in start_input_bmp(), even though the overflow would have been detected later in the function. This commit adds overflow checks prior to setting row_width. - rdppm.c: read_pbm_integer() now bounds-checks the intermediate value computations in order to catch integer overflow caused by a malformed text PPM. It's possible, though extremely unlikely, that the intermediate value computations could have wrapped around to a value smaller than maxval, but the worst case is that this would have generated a bogus pixel in the uncompressed image rather than throwing an error.
DRC 4ede2ef5 2021-04-09T17:26:19 OSS-Fuzz: cjpeg fuzz target
DRC 5cda8c5e 2021-04-09T13:12:32 compress_yuv_fuzzer: Use unique filename template
DRC 47b66d1d 2021-04-09T11:26:34 OSS-Fuzz: Fix UBSan err caused by TJFLAG_FUZZING
DRC 55ab0d39 2021-04-08T16:13:06 OSS-Fuzz: YUV encoding/compression fuzz target
DRC 18bc4c61 2021-04-07T16:04:58 compress.cc: Code formatting tweak
DRC b1079002 2021-04-07T15:51:05 rdppm.c: Fix innocuous MSan error A fuzzing test case that was effectively a 1-pixel PGM file with a maximum value of 1 and an actual value of 8 caused an uninitialized member of the rescale[] array to be accessed in get_gray_rgb_row() or get_gray_cmyk_row(). Since, for performance reasons, those functions do not perform bounds checking on the PPM values, we need to ensure that unused members of the rescale[] array are initialized.
DRC 3311fc00 2021-04-07T14:20:49 rdbmp.c: Fix innocuous UBSan error A fuzzing test case with an image width of 838860946 triggered a UBSan error: rdbmp.c:633:34: runtime error: signed integer overflow: 838860946 * 3 cannot be represented in type 'int' Because the result is cast to an unsigned int (JDIMENSION), this error is irrelevant, because (unsigned int)((int)838860946 * (int)3) == (unsigned int)838860946 * (unsigned int)3
DRC 34d264d6 2021-04-07T12:44:50 OSS-Fuzz: Private TurboJPEG API flag for fuzzing This limits the tjLoadImage() behavioral changes to the scope of the compress_fuzzer target. Otherwise, TJBench in fuzzer builds would refuse to load images larger than 1 Mpixel.
DRC f35fd27e 2021-04-06T12:51:03 tjLoadImage: Fix issues w/loading 16-bit PPMs/PGMs - The PPM reader now throws an error rather than segfaulting (due to a buffer overrun) if an application attempts to load a 16-bit PPM file into a grayscale uncompressed image buffer. No known applications allowed that (not even the test applications in libjpeg-turbo), because that mode of operation was never expected to work and did not work under any circumstances. (In fact, it was necessary to modify TJBench in order to reproduce the issue outside of a fuzzing environment.) This was purely a matter of making the library bow out gracefully rather than crash if an application tries to do something really stupid. - The PPM reader now throws an error rather than generating incorrect pixels if an application attempts to load a 16-bit PGM file into an RGB uncompressed image buffer. - The PPM reader now correctly loads 16-bit PPM files into extended RGB uncompressed image buffers. (Previously it generated incorrect pixels unless the input colorspace was JCS_RGB or JCS_EXT_RGB.) The only way that users could have potentially encountered these issues was through the tjLoadImage() function. cjpeg and TJBench were unaffected.
DRC df17d398 2021-04-06T11:34:30 jcphuff.c: -Wjump-misses-init warning w/GCC 9 -m32 (verified that this commit does not change the generated 64-bit or 32-bit assembly code)
DRC cd9a3185 2021-04-05T22:20:52 Bump TurboJPEG C API version to 2.1 (because of TJFLAG_LIMITSCANS)
DRC d2d44655 2021-04-05T21:41:30 OSS-Fuzz: Compression fuzz target
DRC 5536ace1 2021-04-05T21:12:29 OSS-Fuzz: Fix C++11 compiler warnings in targets
DRC 5dd906be 2021-04-05T17:47:34 OSS-Fuzz: Test non-default opts w/ decompress_yuv The non-default options were not being tested because of a pixel format comparison buglet. This commit also changes the code in both decompression fuzz targets such that non-default options are tested based on the pixel format index rather than the pixel format value, which is a bit more idiot-proof.
DRC c81e91e8 2021-04-05T16:08:22 TurboJPEG: New flag for limiting prog JPEG scans This also fixes timeouts reported by OSS-Fuzz.
DRC bff7959e 2021-04-02T14:53:43 OSS-Fuzz: Require static libraries Refer to https://google.github.io/oss-fuzz/further-reading/fuzzer-environment/#runtime-dependencies for the reasons why this is necessary.
DRC 6ad658be 2021-04-02T14:50:35 OSS-Fuzz: Build fuzz targets using C++ compiler Otherwise, the targets will require libstdc++, the i386 version of which is not available in the OSS-Fuzz runtime environment. The OSS-Fuzz build environment passes -stdlib:libc++ in the CXXFLAGS environment variable in order to mitigate this issue, since the runtime environment has the i386 version of libc++, but using that compiler flag requires using the C++ compiler.
DRC 7b57cba6 2021-03-31T11:16:51 OSS-Fuzz: Fix uninitialized reads detected by MSan
DRC 2f9e8a11 2021-03-29T18:54:12 OSS-Fuzz integration This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo source tree, thus obsoleting and improving code coverage relative to Google's OSS-Fuzz target for libjpeg-turbo (previously available here: https://github.com/google/oss-fuzz). I hope to eventually create fuzz targets for the BMP, GIF, and PPM readers as well, which would allow for fuzz-testing compression, but since those readers all require an input file, it is unclear how to build an efficient fuzzer around them. It doesn't make sense to fuzz-test compression in isolation, because compression can't accept arbitrary input data.
Jonathan Wright e4ec23d7 2021-02-10T16:45:50 Neon: Use byte-swap builtins instead of inline asm Define compiler-independent byte-swap macros and use them instead of executing 'rev' via inline assembly code with GCC-compatible compilers or a slow shift-store sequence with Visual C++. * This produces identical assembly code with: - 64-bit GCC 8.4.0 (Linux) - 64-bit GCC 9.3.0 (Linux) - 64-bit Clang 10.0.0 (Linux) - 64-bit Clang 10.0.0 (MinGW) - 64-bit Clang 12.0.0 (Xcode 12.2, macOS) - 64-bit Clang 12.0.0 (Xcode 12.2, iOS) * This produces different assembly code with: - 64-bit GCC 4.9.1 (Linux) - 32-bit GCC 4.8.2 (Linux) - 32-bit GCC 8.4.0 (Linux) - 32-bit GCC 9.3.0 (Linux) Since the intrinsics implementation of Huffman encoding is not used by default with these compilers, this is not a concern. - 32-bit Clang 10.0.0 (Linux) Verified performance neutrality Closes #507
DRC e795afc3 2021-03-25T22:36:15 SSE2: Fix prog Huff enc err if Sl%32==0 && Al!=0 (regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8) This implements the same fix for jsimd_encode_mcu_AC_refine_prepare_sse2() that a81a8c137b3f1c65082aa61f236aa88af61b3ad4 implemented for jsimd_encode_mcu_AC_first_prepare_sse2(). Based on: https://github.com/MegaByte/libjpeg-turbo/commit/1a59587397150c9ef9dffc5813cb3891db4bc0c8 https://github.com/MegaByte/libjpeg-turbo/commit/eb176a91d87a470bf8c987be786668aa944dd1dd Fixes #509 Closes #510
Adrian Bunk 2c01200c 2021-03-15T19:56:53 Build: Fix incorrect regexes w/ if(...MATCHES...) "arm*" as a regex means 'ar' followed by zero or more 'm' characters, which matches 'parisc' and 'sparc64' as well.
DRC ed70101d 2021-03-15T12:36:55 ChangeLog.md: List CVE ID fixed by 1719d12e Referring to https://bugzilla.redhat.com/show_bug.cgi?id=1937385#c2, it is my opinion that the severity of this bug was grossly overstated and that a CVE never should have been assigned to it, but since one was assigned, users need to know which version of libjpeg-turbo contains the fix. Dear security community, please learn what "DoS" actually means and stop misusing that term for dramatic effect. Thanks.
DRC 8a2cad02 2021-01-21T10:51:49 Build: Handle CMAKE_OSX_ARCHITECTURES=(i386|ppc) We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo anymore, but they still work (bearing in mind that PowerPC builds require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or earlier.) Referring to #495, apparently MacPorts needs this functionality.
DRC b6772910 2021-01-19T15:32:32 Add Sponsor button for GitHub repository
DRC 399aa374 2021-01-19T12:25:11 Build: Support CMAKE_OSX_ARCHITECTURES ... as long as it contains only a singular value, which must equal "x86_64" or "arm64". Refer to #495
DRC 1719d12e 2021-01-14T18:35:15 cjpeg: Fix FPE when compressing 0-width GIF Fixes #493
DRC 486cdcfb 2021-01-12T17:45:55 Fix build with Visual C++ and /std:c11 or /std:c17 Fixes #481 Closes #482
Richard Townsend 74e6ea45 2021-01-05T20:23:11 Neon: Fix Huffman enc. error w/Visual Studio+Clang The GNU builtin function __builtin_clzl() accepts an unsigned long argument, which is 8 bytes wide on LP64 systems (most Un*x systems, including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused the Neon intrinsics implementation of Huffman encoding to produce mathematically incorrect results when compiled using Visual Studio with Clang. This commit changes all invocations of __builtin_clzl() in the Neon SIMD extensions to __builtin_clzll(), which accepts an unsigned long long argument that is guaranteed to be 8 bytes wide on all systems. Fixes #480 Closes #490
Jonathan Wright d2c40799 2020-12-17T16:02:47 Use CLZ compiler intrinsic for Windows/Arm builds The __builtin_clz() compiler intrinsic was already used in the C Huffman encoders when building libjpeg-turbo for Arm CPUs using a GCC-compatible compiler. This commit modifies the C Huffman encoders so that they also use__builtin_clz() when building for Arm CPUs using Visual Studio + Clang, as well as the equivalent _CountLeadingZeros() compiler intrinsic when building for Arm CPUs using Visual C++. In addition to making the C Huffman encoders faster on Windows/Arm, this also prevents jpeg_nbits_table from being included in Windows/Arm builds, thus saving 128 KB of memory.
DRC 3e8911aa 2021-01-11T13:56:01 Build: Use correct SIMD exts w/VStudio IDE + Arm64 When configuring a Visual Studio IDE build and passing -A arm64 to CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of CMAKE_SYSTEM_PROCESSOR.
DRC 4b838c38 2021-01-11T13:45:25 jcphuff.c: Fix compiler warning with clang-cl Fixes #492
DRC 944f5915 2021-01-08T12:41:02 Migrate from Travis CI to GitHub Actions Note that this removes our ability to regression test the Armv8 and PowerPC SIMD extensions, effectively reverting a524b9b06be2e0c24d8abc6528cf29316cfe8dc5 and 02227e48a990911a6da35ab8034911a9fbc1055a, but at the moment, there is no other way.
DRC 3179f330 2021-01-04T14:54:35 tjexample.c: Fix mem leak if tjTransform() fails Fixes #479
DRC 1388ad67 2020-12-08T21:25:47 Build: Officially support Ninja
DRC 110d8d6d 2020-12-07T11:12:49 decompress_smooth_data(): Fix another uninit. read Regression introduced by 42825b68d570fb07fe820ac62ad91017e61e9a25 The test case https://user-images.githubusercontent.com/3491627/101376530-fde56180-38b0-11eb-938d-734119a5b5ba.jpg is a malformed progressive JPEG image containing an interleaved Y/Cb/Cr DC scan followed by two non-interleaved Y DC scans. Thus, the prev_coef_bits[] array was initialized for the Y component but not the other components, the uninitialized values for Cb and Cr were transferred to the prev_coef_bits_latch[] array in smoothing_ok(), and because cinfo->master->last_good_iMCU_row was 0, decompress_smooth_data() read those uninitialized values when attempting to smooth the second iMCU row. Possibly fixes #478
DRC 7b687649 2020-12-03T19:15:07 LICENSE.md: Remove trailing whitespace Use <br> to indicate a line break, as we do in README.md, in order to make checkstyle happy.
DRC 21d05684 2020-12-03T18:50:08 Build: Test for correct AArch32 RPM/DEBARCH value ... based on the floating point ABI being used by the compiler (which do you choose, a hard or soft option?)
DRC 6e4509a3 2020-12-01T09:04:27 LICENSE.md: Formatting tweak
DRC c7ca521b 2020-11-28T06:38:27 Fix uninitialized read in decompress_smooth_data() Regression introduced by 42825b68d570fb07fe820ac62ad91017e61e9a25 Referring to the discussion in #459, the OSS-Fuzz test case https://github.com/libjpeg-turbo/libjpeg-turbo/files/5597075/clusterfuzz-testcase-minimized-pngsave_buffer_fuzzer-5728375846731776.txt created a situation in which cinfo->output_iMCU_row > cinfo->master->last_good_iMCU_row but cinfo->input_scan_number == 1 thus causing decompress_smooth_data() to read from prev_coef_bits_latch[], which was uninitialized. I was unable to create the same situation with a real JPEG image.
DRC ccaba5d7 2020-11-25T14:55:55 Fix buffer overrun with certain narrow prog JPEGs Regression introduced by 6d91e950c871103a11bac2f10c63bf998796c719 last_block_column in decompress_smooth_data() can be 0 if, for instance, decompressing a 4:4:4 image of width 8 or less or a 4:2:2 or 4:2:0 image of width 16 or less. Since last_block_column is an unsigned int, subtracting 1 from it produced 0xFFFFFFFF, the test in line 590 passed, and we attempted to access blocks from a second block column that didn't actually exist. Closes #476
DRC cfc7e6e5 2020-11-25T14:10:55 Bump revision to 2.0.91 for post-beta fixes
DRC 4e52b66f 2020-11-24T21:54:42 Travis: Use Docker tag that matches Git branch
DRC 8cf6f716 2020-11-24T21:32:48 Bump revision to 2.0.90 to prepare for beta
Jonathan Wright eb14189c 2020-11-17T12:48:49 Fix Neon SIMD build issues with Visual Studio - Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for compile-time detection of Arm builds, since __arm__ and __aarch64__ are only present in GNU-compatible compilers. - Neon/intrinsics: Use the _CountLeadingZeros() and _CountLeadingZeros64() intrinsics provided by Visual Studio, since __builtin_clz() and __builtin_clzl() are only present in GNU-compatible compilers. - Neon/intrinsics: Since Visual Studio does not support static vector initialization, replace static initialization of Neon vectors with the appropriate intrinsics. Compared to the static initialization approach, this produces identical assembly code with both GCC and Clang. - Neon/intrinsics: Since Visual Studio does not support inline assembly code, provide alternative code paths for Visual Studio whenever inline assembly is used. - Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds (Visual Studio does not emit fused multiply-add [FMA] instructions by default for such builds.) - Neon/intrinsics: Move temporary buffer allocation outside of nested loops. Since Visual Studio configures Arm builds with a relatively small amount of stack memory, attempting to allocate those buffers within the inner loops caused a stack overflow. Closes #461 Closes #475
DRC 91dd3b23 2020-11-24T19:22:38 ChangeLog: macOS Armv8/x86-64 univ. binary support
DRC 7e0d94d3 2020-11-24T20:31:51 Merge branch 'master' into dev
DRC 1c839761 2020-11-24T18:51:16 Force Git to treat testorig.ppm as a binary file Otherwise, because the file begins with an ASCII header, Git will erroneously treat is as an ASCII file, and if Git for Windows is configured with default options (specifically, "Checkout windows-style, commit Unix-style line endings"), it will add carriage return characters to all of the "linefeed" characters in the PPM file, thus corrupting it and causing libjpeg-turbo's regression tests to fail.
DRC 6d91e950 2020-10-05T13:37:44 Use 5x5 win & 9 AC coeffs when smoothing DC scans ... of progressive images. Based on: https://github.com/mo271/libjpeg-turbo/commit/be8d36d13b79a472e56da0717ba067e6139bc0e1 https://github.com/mo271/libjpeg-turbo/commit/9d528f278ee3a5ba571c0b9ec4567c557614fb25 https://github.com/mo271/libjpeg-turbo/commit/85f36f0765ea2c28909fc4c0e570cd68d3a1ed85 https://github.com/mo271/libjpeg-turbo/commit/63a4d39e387f61bcb83b393838f436b410b97308 https://github.com/mo271/libjpeg-turbo/commit/51336a6ad5acb9379dc8e3e5e5758fd439224b7c Closes #459 Closes #474
DRC d523435e 2020-11-19T19:30:38 Travis: Use Xcode 12.2 for all iOS & macOS builds There doesn't seem to be any performance or compatibility downside to this, and it has the advantages of simplicity and consistency between the PR and official builds.
DRC 1ac83cd6 2020-11-18T18:16:12 Travis: The Mac build log is now log-macos.txt (oversight from f7a10a61e3bbab14d2e901c8823cec4961a46b2f)
DRC 0ba70b6a 2020-11-18T15:01:24 Build: Support macOS Armv8/x86-64 univ. binaries - Rename IOS_ARMV8_BUILD to ARMV8_BUILD. - Rename install_ios() to install_subbuild() in makemacpkg. - Wordsmith the build instructions accordingly. - Use xcode12.2 image in Travis CI.
DRC e417033d 2020-11-18T14:13:54 Merge branch 'master' into dev
DRC 6d2e8837 2020-11-18T13:25:06 jpeg_skip_scanlines(): Avoid NULL + 0 UBSan error This error occurs at the call to (*cinfo->cconvert->color_convert)() in sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE (i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is innocuous, since (*cinfo->cconvert->color_convert)() points to a dummy function (noop_convert()) in that case. Fixes #470
DRC f7c54892 2020-11-18T10:11:21 Travis: Add /opt/local/bin to PATH for Mac build (oversight from previous commit) macports-ci does this, and it's necessary in order for the build script to find md5sum.
DRC f7a10a61 2020-11-17T13:51:28 Build: "OS X"/"OSX" = "macOS"/"MACOS" There are no supported versions of "OS X" anymore. The operating system has been named "macOS" since 10.12 Sierra, which was released four years ago.
DRC d111d9ff 2020-11-17T11:54:20 Merge branch 'master' into dev
DRC 10ba6ed3 2020-11-16T17:30:37 Travis: Install MacPorts without using macports-ci
DRC 292d78e7 2020-11-16T15:28:02 Merge branch 'master' into dev
DRC 88bf1d16 2020-11-16T14:38:15 Build: Set FLOATTEST more intelligently The "32bit" vs. "64bit" floating point test results actually have nothing to do with the FPU. That was a fallacious assumption based on the observation that, with multiple CPU types, 32-bit and 64-bit builds produce different floating point test results. It seems that this is, in fact, due to differing compiler behavior-- more specifically, whether fused multiply-add (FMA) instructions are used to combine multiple floating point operations into a single instruction ("floating point expression contraction".) GCC does this by default if the target supports FMA instructions, which PowerPC and AArch64 targets both do. Fixes #468
DRC 8f830598 2020-11-13T15:21:26 Merge branch 'master' into dev
DRC 42f7c78f 2020-11-13T15:18:35 BUILDING.md: Use min. iOS v8 in iOS Armv8 example This is necessary in order to enable thread-local storage.
DRC 33859880 2020-11-13T12:12:47 Neon: Auto-detect compiler intrinsics completeness This allows the Neon intrinsics code to be built successfully (albeit likely with reduced run-time performance) with Xcode 5.0-6.2 (iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2 will not build the Armv8 GAS code without gas-preprocessor.pl, and no version of Xcode will build the Armv7 GAS code without gas-preprocessor.pl, so we always use the full Neon intrinsics implementation by default with macOS and iOS builds. Auto-detecting the completeness of the compiler's set of Neon intrinsics also allows us to more intelligently set the default value of NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a reasonable, albeit imperfect, proxy for whether a compiler has a full and optimal set of Neon intrinsics. Specific notes: - 64-bit RGB-to-YCbCr color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer forward DCT uses vld1_s16_x3(), regresses with GCC - 64-bit Huffman encoding uses vld1q_u8_x4(), regresses with GCC - 64-bit YCbCr-to-RGB color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 64-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. - 32-bit RGB-to-YCbCr color conversion: uses vld1_u16_x2(), regresses with GCC - 32-bit accurate integer forward DCT uses vld1_s16_x3(), regression irrelevant because there was no previous implementation - 32-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 32-bit fast integer inverse DCT does not use any of the intrinsics in question, regresses with GCC - 32-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. Presumably when GCC includes a full and optimal set of Neon intrinsics, the HAVE_VLD1* tests will pass, and the full Neon intrinsics implementation will be enabled automatically.
DRC 3e9e7c70 2020-11-11T17:54:06 Fix build if WITH_12BIT==1 && WITH_JPEG(7|8)==1 Fixes #466
DRC bbd80892 2020-11-10T17:54:14 Neon: Finalize intrinsics implementation - Remove gas-preprocessor.pl. None of the compilers that can build the new intrinsics implementation require gas-preprocessor.pl (tested with Xcode and with Clang 3.9+ for Linux.) - Document that Xcode 6.3.x or later is now required for iOS builds (older versions of Xcode do not have a full set of Neon intrinsics.) - Add a change log entry. - Do not enable the ASM CMake language unless NEON_INTRINSICS is false. - Add a Clang/Arm64 test to .travis.yml in order to test the new intrinsics implementation. Closes #455
Martyn Jacques 141f26ff 2018-09-18T18:28:31 Neon: Intrinsics impl. of 2x2 and 4x4 scaled IDCTs The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementations provide the same or better performance.
Jonathan Wright 4574f01f 2018-06-28T16:17:36 Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp There was no previous GAS implementation. NOTE: This doesn't produce much of a speedup when using -O3, because -O3 already enables Neon autovectorization, which works well for the scalar C implementation of plain upsampling. However, the Neon SIMD implementation will benefit other optimization levels.
Jonathan Wright ba52a3de 2018-07-19T18:46:24 Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp There was no previous GAS implementation. This commit also reverts 40557b23015d2f8b576420231b8dd1f39f2ceed8 and 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57. 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57 was only necessary because there was no Neon implementation of merged upsampling/color conversion, and 40557b23015d2f8b576420231b8dd1f39f2ceed8 was only necessary because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.
Jonathan Wright 240ba417 2020-01-07T16:40:32 Neon: Intrinsics impl. of prog. Huffman encoding The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.
Jonathan Wright ed581cd9 2019-06-12T18:16:53 Neon: Intrinsics impl. of accurate int inverse DCT The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable.
Jonathan Wright 2c6b68e2 2018-09-25T18:20:25 Neon: Intrinsics impl. of fast integer Inverse DCT The previous AArch32 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
Jonathan Wright 2acfb93c 2019-05-08T15:43:26 Neon: Intrinsics impl. of h1v2 fancy upsamling There was no previous GAS implementation.
Jonathan Wright 97530777 2018-06-15T11:13:52 Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp The previous AArch32 GAS implementation of h2v1 fancy upsampling has been removed, since the intrinsics implementation provides the same or better performance. There was no previous GAS implementation of h2v2 fancy upsampling, and there was no previous AArch64 GAS implementation of h2v1 fancy upsampling.
Jonathan Wright 5dbd3932 2018-08-01T16:52:31 Neon: Intrinsics implementation of YCbCr->RGB565 The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
Jonathan Wright 0f35cd68 2018-07-16T10:25:14 Neon: Intrinsics implementation of YCbCr->RGB The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
Jonathan Wright f3c3f01d 2018-09-24T04:35:20 Neon: Intrinsics impl. of Huffman encoding The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
Jonathan Wright d0004de5 2018-08-22T13:38:37 Neon: Intrinsics impl. of accurate int forward DCT The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. There was no previous AArch32 GAS implementation.
Jonathan Wright 3d84668d 2018-08-23T14:22:23 Neon: Intrinsics impl. of fast integer forward DCT The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.
Jonathan Wright 951d3677 2018-08-24T18:04:21 Neon: Intrinsics impl. of int sample conv./quant. The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.
Jonathan Wright 366168aa 2018-08-06T15:14:34 Neon: Intrinsics impl. of h2v1 & h2v2 downsampling The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.
Jonathan Wright f73b1dbc 2018-08-09T15:08:21 Neon: Intrinsics implementation of RGB->Grayscale There was no previous GAS implementation.
Jonathan Wright 4f2216b4 2019-11-26T18:14:33 Neon: Intrinsics implementation of RGB->YCbCr The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using a new NEON_INTRINSICS CMake variable.
DRC 0efc4858 2020-11-09T18:55:21 Merge branch 'master' into dev
DRC 02227e48 2020-11-09T16:31:49 Travis: Combine PPC/Arm tests with jpeg-7/8 tests There is no reason not to, since the jpeg-7 and jpeg-8 API/ABI tests do not exercise the SIMD extensions any differently than the other tests.
DRC c7dd1912 2020-11-08T15:15:02 Merge branch 'master' into dev
DRC 40557b23 2020-11-06T18:51:55 Build: Fix test failures w/ Arm Neon SIMD exts Regression caused by a46c111d9f3642f0ef3819e7298846ccc61869e0 Because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, which was introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/ color conversion is disabled on platforms that have SIMD-accelerated YCbCr -> RGB color conversion but not SIMD-accelerated merged upsampling/color conversion. This was intended to improve performance with the Neon SIMD extensions, since those are the only SIMD extensions for which those circumstances apply. Under normal circumstances, the separate "plain" (non-fancy) upsampling and color conversion routines will produce bitwise-identical output to the merged upsampling/color conversion routines, but that is not the case when skipping scanlines starting at an odd-numbered scanline. The modified test introduced in a46c111d9f3642f0ef3819e7298846ccc61869e0 does precisely that in order to validate the fixes introduced in 9120a247436e84c0b4eea828cb11e8f665fcde30 and a46c111d9f3642f0ef3819e7298846ccc61869e0. Because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, the segfault fixed in 9120a247436e84c0b4eea828cb11e8f665fcde30 and a46c111d9f3642f0ef3819e7298846ccc61869e0 didn't affect the Neon SIMD extensions, so this commit effectively reverts the test modifications in a46c111d9f3642f0ef3819e7298846ccc61869e0 when using those SIMD extensions. We can get rid of this hack, as well as 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57, once a Neon implementation of merged upsampling/color conversion is available.
DRC a524b9b0 2020-11-06T17:24:16 Travis: Regression-test Armv8 and PPC SIMD exts Currently this only tests the 64-bit code paths, but it's better than nothing.
DRC 7c1a1789 2020-11-05T16:04:55 Merge branch 'master' into dev
DRC 6e632af9 2020-11-04T10:13:06 Demote "fast" [I]DCT algorithms to legacy status - Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.