kmx git

Commit	Date	Message
4e151a4a	2025-08-26T21:11:07	Remove vestigial filenames from SIMD code headers These were a relic of libjpeg/SIMD, which attempted to follow the conventions of the libjpeg source code, but they are no longer relevant (or even accurate in some cases.)
78a36f6d	2022-11-15T17:01:17	Fix buffer overrun in 12-bit prog Huffman encoder Regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8 and 5b177b3cab5cfb661256c1e74df160158ec6c34e The pre-computed absolute values used in encode_mcu_AC_first() and encode_mcu_AC_refine() were stored in a JCOEF (signed short) array. When attempting to losslessly transform a specially-crafted malformed 12-bit JPEG image with a coefficient value of -32768 into a progressive 12-bit JPEG image, the progressive Huffman encoder attempted to store the absolute value of -32768 in the JCOEF array, thus overflowing the 16-bit signed data type. Therefore, at this point in the code: https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L889 the absolute value was read as -32768, which caused the test at https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L896 to fail, falling through to https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L908 with an overly large value of r (46) that, when shifted left four places, incremented, and passed to emit_symbol(), exceeded the maximum index (255) for the derived code tables. Fortunately, the buffer overrun was fully contained within phuff_entropy_encoder, so the issue did not generate a segfault or other user-visible errant behavior, but it did cause a UBSan failure that was detected by OSS-Fuzz. This commit introduces an unsigned JCOEF (UJCOEF) data type and uses it to store the absolute values of DCT coefficients computed by the AC_first_prepare() and AC_refine_prepare() methods. Note that the changes to the Arm Neon progressive Huffman encoder extensions cause signed 16-bit instructions to be replaced with equivalent unsigned 16-bit instructions, so the changes should be performance-neutral. Based on: https://github.com/mayeut/libjpeg-turbo/commit/bbf61c0382c4f8bd1f1cfc666467581496c2fb7c Closes #628
4574f01f	2018-06-28T16:17:36	Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp There was no previous GAS implementation. NOTE: This doesn't produce much of a speedup when using -O3, because -O3 already enables Neon autovectorization, which works well for the scalar C implementation of plain upsampling. However, the Neon SIMD implementation will benefit other optimization levels.
ba52a3de	2018-07-19T18:46:24	Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp There was no previous GAS implementation. This commit also reverts 40557b23015d2f8b576420231b8dd1f39f2ceed8 and 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57. 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57 was only necessary because there was no Neon implementation of merged upsampling/color conversion, and 40557b23015d2f8b576420231b8dd1f39f2ceed8 was only necessary because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.
2acfb93c	2019-05-08T15:43:26	Neon: Intrinsics impl. of h1v2 fancy upsamling There was no previous GAS implementation.
97530777	2018-06-15T11:13:52	Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp The previous AArch32 GAS implementation of h2v1 fancy upsampling has been removed, since the intrinsics implementation provides the same or better performance. There was no previous GAS implementation of h2v2 fancy upsampling, and there was no previous AArch64 GAS implementation of h2v1 fancy upsampling.
0f35cd68	2018-07-16T10:25:14	Neon: Intrinsics implementation of YCbCr->RGB The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
f3c3f01d	2018-09-24T04:35:20	Neon: Intrinsics impl. of Huffman encoding The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.
f73b1dbc	2018-08-09T15:08:21	Neon: Intrinsics implementation of RGB->Grayscale There was no previous GAS implementation.
4f2216b4	2019-11-26T18:14:33	Neon: Intrinsics implementation of RGB->YCbCr The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using a new NEON_INTRINSICS CMake variable.
7c1a1789	2020-11-05T16:04:55	Merge branch 'master' into dev
6e632af9	2020-11-04T10:13:06	Demote "fast" [I]DCT algorithms to legacy status - Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.
e821464f	2018-04-03T12:47:54	ARM64 NEON SIMD impl. of prog. Huffman encoding This commit adds ARM64 NEON optimizations for the encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in progressive Huffman encoding. Compression speedups for the typical set of five libjpeg-turbo test images (https://libjpeg-turbo.org/About/Performance): Cortex-A53: 23.8-39.2% (avg. 32.2%) Cortex-A72: 26.8-41.1% (avg. 33.5%) Apple A7: 29.7-45.9% (avg. 39.6%) Closes #229
2f9e7c84	2019-01-31T21:24:13	Loongson MMI h2v1 and h2v2 merged upsampling Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/e8f5cee5aa83e2d1b0af3a7bd0aaa2391c930fb2
3c7199ff	2019-01-31T16:22:31	Loongson MMI h2v1 fancy upsampling Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/e8f5cee5aa83e2d1b0af3a7bd0aaa2391c930fb2
73b98acd	2019-01-31T13:54:46	Loongson MMI RGB-to-Grayscale conversion Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/e8f5cee5aa83e2d1b0af3a7bd0aaa2391c930fb2
ae4221f9	2019-01-29T19:51:08	Loongson MMI fast forward/inverse DCT Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/32a9ca222d7e4c8e01bb75cd137f204a0587b383
5b177b3c	2018-03-22T11:36:43	C/SSE2 optimization of encode_mcu_AC_first() This commit adds C and SSE2 optimizations for the encode_mcu_AC_first() function used in progressive Huffman encoding. The image used for testing can be retrieved from this page: https://blog.cloudflare.com/doubling-the-speed-of-jpegtran All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz` clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)` gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0` gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0` Here are the results in comparison to libjpeg-turbo@293263c using `time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg` C clang x86_64: +19% gcc-5 x86_64: +80% gcc-7 x86_64: +57% clang i386: +5% gcc-5 i386: +59% gcc-7 i386: +51% SSE2 clang x86_64: +79% gcc-5 x86_64: +158% gcc-7 x86_64: +122% clang i386: +71% gcc-5 i386: +134% gcc-7 i386: +135% Discussion in libjpeg-turbo/libjpeg-turbo#46
16bd9845	2018-03-02T22:33:19	C/SSE2 optimization of encode_mcu_AC_refine() This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine() function used in progressive Huffman encoding. The image used for testing can be retrieved from this page: https://blog.cloudflare.com/doubling-the-speed-of-jpegtran All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz` clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)` gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0` gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0` Here are the results in comparison to libjpeg-turbo@3c54642 using `time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg` C clang x86_64: +7% gcc-5 x86_64: +30% gcc-7 x86_64: +33% clang i386: +0% gcc-5 i386: +24% gcc-7 i386: +23% SSE2 clang x86_64: +42% gcc-5 x86_64: +53% gcc-7 x86_64: +64% clang i386: +35% gcc-5 i386: +46% gcc-7 i386: +49% Discussion in libjpeg-turbo/libjpeg-turbo#46
293263c3	2018-03-17T15:14:35	Format preprocessor macros more consistently Within the libjpeg API code, it seems to be more the convention than not to separate the macro name and value by two or more spaces, which improves general readability. Making this consistent across all of libjpeg-turbo is less about my individual preferences and more about making it easy to automatically detect variations from our chosen formatting convention. I intend to release the script I'm using to validate this stuff, once it matures and stabilizes a bit.
19c791cd	2018-03-08T10:55:20	Improve code formatting consistency With rare exceptions ... - Always separate line continuation characters by one space from preceding code. - Always use two-space indentation. Never use tabs. - Always use K&R-style conditional blocks. - Always surround operators with spaces, except in raw assembly code. - Always put a space after, but not before, a comma. - Never put a space between type casts and variables/function calls. - Never put a space between the function name and the argument list in function declarations and prototypes. - Always surround braces ('{' and '}') with spaces. - Always surround statements (if, for, else, catch, while, do, switch) with spaces. - Always attach pointer symbols ('' and '') to the variable or function name. - Always precede pointer symbols ('' and '**') by a space in type casts. - Use the MIN() macro from jpegint.h within the libjpeg and TurboJPEG API libraries (using min() from tjutil.h is still necessary for TJBench.) - Where it makes sense (particularly in the TurboJPEG code), put a blank line after variable declaration blocks. - Always separate statements in one-liners by two spaces. The purpose of this was to ease maintenance on my part and also to make it easier for contributors to figure out how to format patch submissions. This was admittedly confusing (even to me sometimes) when we had 3 or 4 different style conventions in the same source tree. The new convention is more consistent with the formatting of other OSS code bases. This commit corrects deviations from the chosen formatting style in the libjpeg API code and reformats the TurboJPEG API code such that it conforms to the same standard. NOTES: - Although it is no longer necessary for the function name in function declarations to begin in Column 1 (this was historically necessary because of the ansi2knr utility, which allowed libjpeg to be built with non-ANSI compilers), we retain that formatting for the libjpeg code because it improves readability when using libjpeg's function attribute macros (GLOBAL(), etc.) - This reformatting project was accomplished with the help of AStyle and Uncrustify, although neither was completely up to the task, and thus a great deal of manual tweaking was required. Note to developers of code formatting utilities: the libjpeg-turbo code base is an excellent test bed, because AFAICT, it breaks every single one of the utilities that are currently available. - The legacy (MMX, SSE, 3DNow!) assembly code for i386 has been formatted to match the SSE2 code (refer to ff5685d5344273df321eb63a005eaae19d2496e3.) I hadn't intended to bother with this, but the Loongson MMI implementation demonstrated that there is still academic value to the MMX implementation, as an algorithmic model for other 64-bit vector implementations. Thus, it is desirable to improve its readability in the same manner as that of the SSE2 implementation.
33ce0b5e	2018-03-01T10:38:17	Loongson MMI SIMD extensions Based on: https://github.com/zhuchen1911/libjpeg-turbo/commit/42aff4497bdaca3258279cafc74511e3c25454b8 Closes #158
35ed3c97	2018-02-28T16:24:03	SIMD: Formatting tweaks + remove unnecessary code + "JSIMD_ARM_NEON" = "JSIMD_NEON" + "JSIMD_MIPS_DSPR2" = "JSIMD_DSPR2" + "_mips_dspr2" = "_dspr2" It's obvious that "NEON" refers to Arm and "DSPr2" refers to MIPS, and this naming convention is consistent with the other SIMD extensions.
de9e9db6	2018-02-23T11:50:11	64-bit AVX2 implementation of slow int inverse DCT
39e9e65c	2018-02-17T19:39:53	64-bit AVX2 implementation of int sample conv.
264dd42a	2018-02-17T17:32:25	64-bit AVX2 implementation of slow int forward DCT
eaae2cdb	2016-07-08T13:56:30	64-bit AVX2 implementation of integer quantization
621b29f5	2016-05-31T15:19:53	64-bit AVX2 impl. of h2v2 & h2v1 merged upsampling
f1cbc328	2016-05-29T08:09:27	64-bit AVX2 impl. of h2v2 & h2v1 upsampling (Fancy & Plain)
72c837da	2016-05-29T06:54:56	64-bit AVX2 impl. of YCC->RGB color conversion
1c8a475c	2016-05-28T19:53:44	64-bit AVX2 impl. of h2v2 & h2v1 downsampling
8880e087	2016-05-28T19:15:18	64-bit AVX2 impl. of RGB->Gray color conversion
426d787c	2016-05-28T16:42:44	64-bit AVX2 impl. of RGB->YCC color conversion
2cf199cb	2016-05-20T10:45:32	Lay the groundwork for 64-bit AVX2 SIMD support
123f7258	2016-05-24T10:23:56	Format copyright headers more consistently The IJG convention is to format copyright notices as: Copyright (C) YYYY, Owner. We try to maintain this convention for any code that is part of the libjpeg API library (with the exception of preserving the copyright notices from Cendio's code verbatim, since those predate libjpeg-turbo.) Note that the phrase "All Rights Reserved" is no longer necessary, since all Buenos Aires Convention signatories signed onto the Berne Convention in 2000. However, our convention is to retain this phrase for any files that have a self-contained copyright header but to leave it off of any files that refer to another file for conditions of distribution and use. For instance, all of the non-SIMD files in the libjpeg API library refer to README.ijg, and the copyright message in that file contains "All Rights Reserved", so it is unnecessary to add it to the individual files. The TurboJPEG code retains my preferred formatting convention for copyright notices, which is based on that of VirtualGL (where the TurboJPEG API originated.)
bd49803f	2016-02-19T08:53:33	Use consistent/modern code formatting for pointers The convention used by libjpeg: type * variable; is not very common anymore, because it looks too much like multiplication. Some (particularly C++ programmers) prefer to tuck the pointer symbol against the type: type* variable; to emphasize that a pointer to a type is effectively a new type. However, this can also be confusing, since defining multiple variables on the same line would not work properly: type* variable1, variable2; /* Only variable1 is actually a pointer. / This commit reformats the entirety of the libjpeg-turbo code base so that it uses the same code formatting convention for pointers that the TurboJPEG API code uses: type variable1, *variable2; This seems to be the most common convention among C programmers, and it is the convention used by other codec libraries, such as libpng and libtiff.
8632f1b2	2016-02-09T00:38:58	ARM64: Avoid tbl instruction on Cortex-A53/A57 Full-color compression speedups relative to previous commits: Cortex-A53 (Nexus 5X), Android, 64-bit: 0.91-3.0% (avg. 1.8%) Cortex-A57 (Nexus 5X), Android, 64-bit: -0.35-1.5% (avg. 0.65%)
46ecffa3	2016-02-07T22:05:56	ARM64: Avoid LD3/ST3 at run time, not compile time ... and only if ThunderX is detected. This can be easily expanded later on to include other CPUs that are known to suffer from slow LD3/ST3, but it doesn't make sense to disable LD3/ST3 for all non-Android Linux platforms just because ThunderX is slow.
ec6941f7	2016-01-15T09:29:11	Complete the ARM64 NEON SIMD implementation This adds 64-bit NEON coverage for all of the algorithms that are covered by the 32-bit NEON implementation, except for h2v1 (4:2:2) fancy upsampling (used when decompressing 4:2:2 JPEG images.) It also adds 64-bit NEON SIMD coverage for: * slow integer forward DCT (compressor) * h2v2 (4:2:0) downsampling (compressor) * h2v1 (4:2:2) downsampling (compressor) which are not covered in the 32-bit implementation. Compression speedups relative to libjpeg-turbo 1.4.2: Apple A7 (iPhone 5S), iOS, 64-bit: 113-150% (reported) 48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 2.1-33% (avg. 15%) Refer to #44 and #49 for discussion This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Based on: https://github.com/mayeut/libjpeg-turbo/commit/dcd9d84f10fae192c0e3935818dc289bca9c3e29 https://github.com/mayeut/libjpeg-turbo/commit/b0d87b811f37bd560083deea8c6e7d704e5cd944 https://github.com/mayeut/libjpeg-turbo/commit/70cd5c8a493a67f4d54dd2067ae6dedb65d95389 https://github.com/mayeut/libjpeg-turbo/commit/3e58d9a064648503c57ec2650ee79880f749a52b https://github.com/mayeut/libjpeg-turbo/commit/837b19542f53fa81af83e6ba002d559877aaf597 https://github.com/mayeut/libjpeg-turbo/commit/73dc43ccc870c2e10ba893e9764b8e48d6836585 https://github.com/mayeut/libjpeg-turbo/commit/a82b71a261b4c0213f558baf4bc745f1c27356d8 https://github.com/mayeut/libjpeg-turbo/commit/c1b1188c2106d6ea7b76644b6023b57edeb602e1 https://github.com/mayeut/libjpeg-turbo/commit/305c89284e1bb222b34fbc7261f697a0cc452a41 https://github.com/mayeut/libjpeg-turbo/commit/7f443f99950b4d7d442b9b879648eca5273209bd https://github.com/mayeut/libjpeg-turbo/commit/4c2b53b77da5a20e30e2aadaeddb0efbfe24e06d Unified version with fixes: https://github.com/mayeut/libjpeg-turbo/commit/1004a3cd05870612a194b410efeaa1b4da76d246
499c470b	2016-01-13T03:13:20	ARM32 NEON SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%) Refer to #42 and #47 for discussion. This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Removing those if statements improved performance across the board by a couple of percent. Based on: https://github.com/mayeut/libjpeg-turbo/commit/fc023c880ce1d6c908fb78ccc25f5d5fd910ccc5
f3a8684c	2016-01-07T00:19:43	SSE2 SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%) 2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%) 2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%) 3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%) 3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%) 2.6 GHz AMD Athlon 64 X2 5050e: Performance-neutral (give or take a few percent) Full-color compression speedups relative to IPP: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%) Refer to #42 for discussion. Numerous other approaches were attempted, but this one proved to be the most performant across all platforms. This commit also fixes #3 (works around, really-- the clang-compiled version of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but that code is now bypassed by the new SSE2 Huffman algorithm.) Based on: https://github.com/mayeut/libjpeg-turbo/commit/2cb4d41330e1edc4469f6b97ba73b73abfbeb02f https://github.com/mayeut/libjpeg-turbo/commit/36c94e050d117912adbff9fbcc6fe307df240168
c641cddb	2015-01-14T15:41:11	AltiVec SIMD implementation of H2V1 and H2V2 plain upsampling (used only when decompressing YCCK images with fast upsampling enabled.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1504 632fc199-4ca6-4c93-a231-07263d6284db
86af36ae	2015-01-14T13:27:32	AltiVec SIMD implementation of H2V1 and H2V2 merged upsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1503 632fc199-4ca6-4c93-a231-07263d6284db
52a4ec6c	2015-01-13T09:02:29	AltiVec SIMD implementation of H2V1 and H2V2 fancy upsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1495 632fc199-4ca6-4c93-a231-07263d6284db
ac4daa77	2015-01-10T22:56:26	AltiVec SIMD implementation of YCC-to-RGB color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1489 632fc199-4ca6-4c93-a231-07263d6284db
22048207	2015-01-08T06:18:33	AltiVec SIMD implementation of 2x1 and 2x2 downsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1483 632fc199-4ca6-4c93-a231-07263d6284db
577ecd93	2014-12-23T04:14:54	AltiVec SIMD implementation of sample conversion and integer quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1474 632fc199-4ca6-4c93-a231-07263d6284db
b1fec4ff	2014-12-22T14:10:33	AltiVec SIMD implementation of RGB-to-Grayscale color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1471 632fc199-4ca6-4c93-a231-07263d6284db
62bae204	2014-12-22T13:42:26	AltiVec SIMD implementation of RGB-to-YCC color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1469 632fc199-4ca6-4c93-a231-07263d6284db
0691162a	2014-12-20T01:17:39	AltiVec SIMD implementation of slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1461 632fc199-4ca6-4c93-a231-07263d6284db
6cb7f40a	2014-12-18T10:12:29	AltiVec SIMD implementation of fast integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1445 632fc199-4ca6-4c93-a231-07263d6284db
fb0c3940	2014-12-17T08:04:39	AltiVec SIMD implementation of slow integer forward DCT; Clean up fast integer forward DCT code so that it is easier to see how it derives from the SSE2 code and to make it play more nicely with the slow FDCT code. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1443 632fc199-4ca6-4c93-a231-07263d6284db
cd2d8e1c	2014-09-05T06:33:42	AltiVec SIMD implementation of fast forward DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1405 632fc199-4ca6-4c93-a231-07263d6284db
d729f4da	2014-08-23T15:47:51	ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code: ----- https://github.com/ssvb/libjpeg-turbo/commit/aee36252be20054afce371a92406fc66ba6627b5.patch From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 03:50:22 +0300 Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15 The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9. Tuning it for newer ARM processors can introduce some speed-up (up to 20%). The performance of the inner loop (conversion of 8 pixels) improves from ~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles down to ~18 cycles on ARM Cortex-A15. The performance remains exactly the same on ARM Cortex-A7 (~58 cycles), ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors. Also use larger indentation in the source code for separating two independent instruction streams. ----- https://github.com/ssvb/libjpeg-turbo/commit/a5efdbf22ce9c1acd4b14a353cec863c2c57557e.patch From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 07:23:09 +0300 Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion The performance of the inner loop (conversion of 8 pixels): * ARM Cortex-A7: ~55 cycles * ARM Cortex-A8: ~28 cycles * ARM Cortex-A9: ~32 cycles * ARM Cortex-A15: ~20 cycles * Qualcomm Krait: ~24 cycles Based on the Linaro rgb565 patch from https://sourceforge.net/p/libjpeg-turbo/patches/24/ but implements better instructions scheduling. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
5ef46305	2014-05-18T20:04:47	SIMD-accelerated int upsample routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1315 632fc199-4ca6-4c93-a231-07263d6284db
c728cfd8	2014-05-18T19:36:05	Fix MIPS build git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1314 632fc199-4ca6-4c93-a231-07263d6284db
bc56b754	2014-05-16T10:43:44	Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1308 632fc199-4ca6-4c93-a231-07263d6284db
52ded876	2014-05-15T20:30:16	Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1307 632fc199-4ca6-4c93-a231-07263d6284db
1b3fd7ee	2014-05-15T18:26:01	SIMD-accelerated NULL convert routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1304 632fc199-4ca6-4c93-a231-07263d6284db
6a61c1e6	2014-05-14T15:00:10	SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1301 632fc199-4ca6-4c93-a231-07263d6284db
b844eaa3	2014-05-13T18:40:14	SIMD-accelerated merged upsampling routines for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1297 632fc199-4ca6-4c93-a231-07263d6284db
34347862	2014-05-06T09:53:21	SIMD-accelerated slow integer IDCT routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1269 632fc199-4ca6-4c93-a231-07263d6284db
fff6c23a	2013-10-12T21:39:20	SIMD-accelerated integer convsamp routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1059 632fc199-4ca6-4c93-a231-07263d6284db
3d727281	2013-10-09T18:39:44	SIMD-accelerated floating point quantize and convsamp routines for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1058 632fc199-4ca6-4c93-a231-07263d6284db
d3131c1b	2013-10-08T02:18:59	SIMD-accelerated fast integer inverse DCT routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1056 632fc199-4ca6-4c93-a231-07263d6284db
71e06a7d	2013-10-08T02:11:21	SIMD-accelerated fast integer forward DCT routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1055 632fc199-4ca6-4c93-a231-07263d6284db
a6b7fbd3	2013-09-30T18:13:27	SIMD-accelerated slow integer forward DCT and quantize routines for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1054 632fc199-4ca6-4c93-a231-07263d6284db
e5005917	2013-09-27T17:51:08	SIMD-accelerated 3/4 and 3/2 decompression scaling for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1047 632fc199-4ca6-4c93-a231-07263d6284db
2ccf4d1a	2013-09-27T17:43:23	SIMD-accelerated 1/2 and 1/4 decompression scaling for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1046 632fc199-4ca6-4c93-a231-07263d6284db
49eaa757	2013-09-27T17:39:57	SIMD-optimized RGB-to-grayscale conversion for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1045 632fc199-4ca6-4c93-a231-07263d6284db
16962c11	2013-07-27T21:50:02	SIMD support for performing upsampling using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@996 632fc199-4ca6-4c93-a231-07263d6284db
6f2d3c2c	2013-07-27T21:48:18	SIMD support for performing downsampling using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@995 632fc199-4ca6-4c93-a231-07263d6284db
86fbf35f	2013-07-27T21:44:14	SIMD support for performing fancy upsampling using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@994 632fc199-4ca6-4c93-a231-07263d6284db
0be9fa57	2013-07-24T21:50:20	SIMD support for performing color conversion using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@993 632fc199-4ca6-4c93-a231-07263d6284db
316617fa	2012-06-13T05:17:03	Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@837 632fc199-4ca6-4c93-a231-07263d6284db
ce4e3e86	2011-08-22T13:48:01	NEON-accelerated slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@690 632fc199-4ca6-4c93-a231-07263d6284db
82bd5219	2011-08-17T21:00:59	NEON-accelerated quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@689 632fc199-4ca6-4c93-a231-07263d6284db
7a9376c1	2011-08-12T19:27:20	ARM NEON-accelerated RGB-to-YCbCr conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@682 632fc199-4ca6-4c93-a231-07263d6284db
b740054f	2011-08-10T23:31:13	Support for accelerated forward DCT using ARM NEON instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@678 632fc199-4ca6-4c93-a231-07263d6284db
8c60d22f	2011-06-17T21:12:58	NEON-optimized 2x2 and 4x4 scaled iDCTs git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@662 632fc199-4ca6-4c93-a231-07263d6284db
321e0686	2011-05-03T08:47:43	ARM NEON support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@607 632fc199-4ca6-4c93-a231-07263d6284db
ddb158c5	2011-02-27T09:09:54	Add short names for RGB->grayscale MMX functions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@468 632fc199-4ca6-4c93-a231-07263d6284db
392e0483	2011-02-18T20:43:04	Updated (C) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@394 632fc199-4ca6-4c93-a231-07263d6284db
c8666333	2011-02-18T11:23:45	SIMD-accelerated RGB-to-Grayscale color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@393 632fc199-4ca6-4c93-a231-07263d6284db
af1ca9bc	2011-02-02T05:42:37	Clarify that the C wrappers and headers fall under the same license as the rest of the SIMD code git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.0.x@335 632fc199-4ca6-4c93-a231-07263d6284db
720e1610	2009-04-05T21:51:25	Add colorspace extensions to merged upsampling routines git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@42 632fc199-4ca6-4c93-a231-07263d6284db
f25c071e	2009-04-03T12:00:51	Implement new colorspaces to allow directly compressing from/decompressing to RGB/RGBX/BGR/BGRX/XBGR/XRGB without conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@35 632fc199-4ca6-4c93-a231-07263d6284db
eea72155	2009-03-09T13:34:17	Add SSE2 SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@22 632fc199-4ca6-4c93-a231-07263d6284db
018fc429	2009-03-09T13:31:56	Add SSE SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@21 632fc199-4ca6-4c93-a231-07263d6284db
65d03173	2009-03-09T13:28:10	Add 3DNow SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@18 632fc199-4ca6-4c93-a231-07263d6284db
5eb84ff9	2009-03-09T13:25:30	Add MMX SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@17 632fc199-4ca6-4c93-a231-07263d6284db
2ae181c7	2009-03-09T13:21:27	Implement x86 SIMD framework Add NASM support and stub routine for detecting SIMD extensions. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@15 632fc199-4ca6-4c93-a231-07263d6284db

4e151a4a

2025-08-26T21:11:07

Remove vestigial filenames from SIMD code headers These were a relic of libjpeg/SIMD, which attempted to follow the conventions of the libjpeg source code, but they are no longer relevant (or even accurate in some cases.)

78a36f6d

2022-11-15T17:01:17

Fix buffer overrun in 12-bit prog Huffman encoder Regression introduced by 16bd984557fa2c490be0b9665e2ea0d4274528a8 and 5b177b3cab5cfb661256c1e74df160158ec6c34e The pre-computed absolute values used in encode_mcu_AC_first() and encode_mcu_AC_refine() were stored in a JCOEF (signed short) array. When attempting to losslessly transform a specially-crafted malformed 12-bit JPEG image with a coefficient value of -32768 into a progressive 12-bit JPEG image, the progressive Huffman encoder attempted to store the absolute value of -32768 in the JCOEF array, thus overflowing the 16-bit signed data type. Therefore, at this point in the code: https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L889 the absolute value was read as -32768, which caused the test at https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L896 to fail, falling through to https://github.com/libjpeg-turbo/libjpeg-turbo/blob/8c5e78ce292c1642057102eac42f12ab57964293/jcphuff.c#L908 with an overly large value of r (46) that, when shifted left four places, incremented, and passed to emit_symbol(), exceeded the maximum index (255) for the derived code tables. Fortunately, the buffer overrun was fully contained within phuff_entropy_encoder, so the issue did not generate a segfault or other user-visible errant behavior, but it did cause a UBSan failure that was detected by OSS-Fuzz. This commit introduces an unsigned JCOEF (UJCOEF) data type and uses it to store the absolute values of DCT coefficients computed by the AC_first_prepare() and AC_refine_prepare() methods. Note that the changes to the Arm Neon progressive Huffman encoder extensions cause signed 16-bit instructions to be replaced with equivalent unsigned 16-bit instructions, so the changes should be performance-neutral. Based on: https://github.com/mayeut/libjpeg-turbo/commit/bbf61c0382c4f8bd1f1cfc666467581496c2fb7c Closes #628

4574f01f

2018-06-28T16:17:36

Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp There was no previous GAS implementation. NOTE: This doesn't produce much of a speedup when using -O3, because -O3 already enables Neon autovectorization, which works well for the scalar C implementation of plain upsampling. However, the Neon SIMD implementation will benefit other optimization levels.

ba52a3de

2018-07-19T18:46:24

Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp There was no previous GAS implementation. This commit also reverts 40557b23015d2f8b576420231b8dd1f39f2ceed8 and 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57. 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57 was only necessary because there was no Neon implementation of merged upsampling/color conversion, and 40557b23015d2f8b576420231b8dd1f39f2ceed8 was only necessary because of 7723d7f7d0aa40349d5bdd1fbe4f8631fd5a2b57.

2acfb93c

2019-05-08T15:43:26

Neon: Intrinsics impl. of h1v2 fancy upsamling There was no previous GAS implementation.

97530777

2018-06-15T11:13:52

Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp The previous AArch32 GAS implementation of h2v1 fancy upsampling has been removed, since the intrinsics implementation provides the same or better performance. There was no previous GAS implementation of h2v2 fancy upsampling, and there was no previous AArch64 GAS implementation of h2v1 fancy upsampling.

0f35cd68

2018-07-16T10:25:14

Neon: Intrinsics implementation of YCbCr->RGB The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.

f3c3f01d

2018-09-24T04:35:20

Neon: Intrinsics impl. of Huffman encoding The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.

f73b1dbc

2018-08-09T15:08:21

Neon: Intrinsics implementation of RGB->Grayscale There was no previous GAS implementation.

4f2216b4

2019-11-26T18:14:33

Neon: Intrinsics implementation of RGB->YCbCr The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using a new NEON_INTRINSICS CMake variable.

7c1a1789

2020-11-05T16:04:55

Merge branch 'master' into dev

6e632af9

2020-11-04T10:13:06

Demote "fast" [I]DCT algorithms to legacy status - Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.

e821464f

2018-04-03T12:47:54

ARM64 NEON SIMD impl. of prog. Huffman encoding This commit adds ARM64 NEON optimizations for the encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in progressive Huffman encoding. Compression speedups for the typical set of five libjpeg-turbo test images (https://libjpeg-turbo.org/About/Performance): Cortex-A53: 23.8-39.2% (avg. 32.2%) Cortex-A72: 26.8-41.1% (avg. 33.5%) Apple A7: 29.7-45.9% (avg. 39.6%) Closes #229

2f9e7c84

2019-01-31T21:24:13

Loongson MMI h2v1 and h2v2 merged upsampling Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/e8f5cee5aa83e2d1b0af3a7bd0aaa2391c930fb2

3c7199ff

2019-01-31T16:22:31

Loongson MMI h2v1 fancy upsampling Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/e8f5cee5aa83e2d1b0af3a7bd0aaa2391c930fb2

73b98acd

2019-01-31T13:54:46

Loongson MMI RGB-to-Grayscale conversion Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/e8f5cee5aa83e2d1b0af3a7bd0aaa2391c930fb2

ae4221f9

2019-01-29T19:51:08

Loongson MMI fast forward/inverse DCT Based on: https://github.com/zhanglixia-hf/libjpeg-turbo/commit/32a9ca222d7e4c8e01bb75cd137f204a0587b383

5b177b3c

2018-03-22T11:36:43

C/SSE2 optimization of encode_mcu_AC_first() This commit adds C and SSE2 optimizations for the encode_mcu_AC_first() function used in progressive Huffman encoding. The image used for testing can be retrieved from this page: https://blog.cloudflare.com/doubling-the-speed-of-jpegtran All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz` clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)` gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0` gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0` Here are the results in comparison to libjpeg-turbo@293263c using `time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg` C clang x86_64: +19% gcc-5 x86_64: +80% gcc-7 x86_64: +57% clang i386: +5% gcc-5 i386: +59% gcc-7 i386: +51% SSE2 clang x86_64: +79% gcc-5 x86_64: +158% gcc-7 x86_64: +122% clang i386: +71% gcc-5 i386: +134% gcc-7 i386: +135% Discussion in libjpeg-turbo/libjpeg-turbo#46

16bd9845

2018-03-02T22:33:19

C/SSE2 optimization of encode_mcu_AC_refine() This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine() function used in progressive Huffman encoding. The image used for testing can be retrieved from this page: https://blog.cloudflare.com/doubling-the-speed-of-jpegtran All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz` clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)` gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0` gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0` Here are the results in comparison to libjpeg-turbo@3c54642 using `time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg` C clang x86_64: +7% gcc-5 x86_64: +30% gcc-7 x86_64: +33% clang i386: +0% gcc-5 i386: +24% gcc-7 i386: +23% SSE2 clang x86_64: +42% gcc-5 x86_64: +53% gcc-7 x86_64: +64% clang i386: +35% gcc-5 i386: +46% gcc-7 i386: +49% Discussion in libjpeg-turbo/libjpeg-turbo#46

293263c3

2018-03-17T15:14:35

Format preprocessor macros more consistently Within the libjpeg API code, it seems to be more the convention than not to separate the macro name and value by two or more spaces, which improves general readability. Making this consistent across all of libjpeg-turbo is less about my individual preferences and more about making it easy to automatically detect variations from our chosen formatting convention. I intend to release the script I'm using to validate this stuff, once it matures and stabilizes a bit.

19c791cd

2018-03-08T10:55:20

Improve code formatting consistency With rare exceptions ... - Always separate line continuation characters by one space from preceding code. - Always use two-space indentation. Never use tabs. - Always use K&R-style conditional blocks. - Always surround operators with spaces, except in raw assembly code. - Always put a space after, but not before, a comma. - Never put a space between type casts and variables/function calls. - Never put a space between the function name and the argument list in function declarations and prototypes. - Always surround braces ('{' and '}') with spaces. - Always surround statements (if, for, else, catch, while, do, switch) with spaces. - Always attach pointer symbols ('*' and '**') to the variable or function name. - Always precede pointer symbols ('*' and '**') by a space in type casts. - Use the MIN() macro from jpegint.h within the libjpeg and TurboJPEG API libraries (using min() from tjutil.h is still necessary for TJBench.) - Where it makes sense (particularly in the TurboJPEG code), put a blank line after variable declaration blocks. - Always separate statements in one-liners by two spaces. The purpose of this was to ease maintenance on my part and also to make it easier for contributors to figure out how to format patch submissions. This was admittedly confusing (even to me sometimes) when we had 3 or 4 different style conventions in the same source tree. The new convention is more consistent with the formatting of other OSS code bases. This commit corrects deviations from the chosen formatting style in the libjpeg API code and reformats the TurboJPEG API code such that it conforms to the same standard. NOTES: - Although it is no longer necessary for the function name in function declarations to begin in Column 1 (this was historically necessary because of the ansi2knr utility, which allowed libjpeg to be built with non-ANSI compilers), we retain that formatting for the libjpeg code because it improves readability when using libjpeg's function attribute macros (GLOBAL(), etc.) - This reformatting project was accomplished with the help of AStyle and Uncrustify, although neither was completely up to the task, and thus a great deal of manual tweaking was required. Note to developers of code formatting utilities: the libjpeg-turbo code base is an excellent test bed, because AFAICT, it breaks every single one of the utilities that are currently available. - The legacy (MMX, SSE, 3DNow!) assembly code for i386 has been formatted to match the SSE2 code (refer to ff5685d5344273df321eb63a005eaae19d2496e3.) I hadn't intended to bother with this, but the Loongson MMI implementation demonstrated that there is still academic value to the MMX implementation, as an algorithmic model for other 64-bit vector implementations. Thus, it is desirable to improve its readability in the same manner as that of the SSE2 implementation.

33ce0b5e

2018-03-01T10:38:17

Loongson MMI SIMD extensions Based on: https://github.com/zhuchen1911/libjpeg-turbo/commit/42aff4497bdaca3258279cafc74511e3c25454b8 Closes #158

35ed3c97

2018-02-28T16:24:03

SIMD: Formatting tweaks + remove unnecessary code + "JSIMD_ARM_NEON" = "JSIMD_NEON" + "JSIMD_MIPS_DSPR2" = "JSIMD_DSPR2" + "*_mips_dspr2" = "*_dspr2" It's obvious that "NEON" refers to Arm and "DSPr2" refers to MIPS, and this naming convention is consistent with the other SIMD extensions.

de9e9db6

2018-02-23T11:50:11

64-bit AVX2 implementation of slow int inverse DCT

39e9e65c

2018-02-17T19:39:53

64-bit AVX2 implementation of int sample conv.

264dd42a

2018-02-17T17:32:25

64-bit AVX2 implementation of slow int forward DCT

eaae2cdb

2016-07-08T13:56:30

64-bit AVX2 implementation of integer quantization

621b29f5

2016-05-31T15:19:53

64-bit AVX2 impl. of h2v2 & h2v1 merged upsampling

f1cbc328

2016-05-29T08:09:27

64-bit AVX2 impl. of h2v2 & h2v1 upsampling (Fancy & Plain)

72c837da

2016-05-29T06:54:56

64-bit AVX2 impl. of YCC->RGB color conversion

1c8a475c

2016-05-28T19:53:44

64-bit AVX2 impl. of h2v2 & h2v1 downsampling

8880e087

2016-05-28T19:15:18

64-bit AVX2 impl. of RGB->Gray color conversion

426d787c

2016-05-28T16:42:44

64-bit AVX2 impl. of RGB->YCC color conversion

2cf199cb

2016-05-20T10:45:32

Lay the groundwork for 64-bit AVX2 SIMD support

123f7258

2016-05-24T10:23:56

Format copyright headers more consistently The IJG convention is to format copyright notices as: Copyright (C) YYYY, Owner. We try to maintain this convention for any code that is part of the libjpeg API library (with the exception of preserving the copyright notices from Cendio's code verbatim, since those predate libjpeg-turbo.) Note that the phrase "All Rights Reserved" is no longer necessary, since all Buenos Aires Convention signatories signed onto the Berne Convention in 2000. However, our convention is to retain this phrase for any files that have a self-contained copyright header but to leave it off of any files that refer to another file for conditions of distribution and use. For instance, all of the non-SIMD files in the libjpeg API library refer to README.ijg, and the copyright message in that file contains "All Rights Reserved", so it is unnecessary to add it to the individual files. The TurboJPEG code retains my preferred formatting convention for copyright notices, which is based on that of VirtualGL (where the TurboJPEG API originated.)

bd49803f

2016-02-19T08:53:33

Use consistent/modern code formatting for pointers The convention used by libjpeg: type * variable; is not very common anymore, because it looks too much like multiplication. Some (particularly C++ programmers) prefer to tuck the pointer symbol against the type: type* variable; to emphasize that a pointer to a type is effectively a new type. However, this can also be confusing, since defining multiple variables on the same line would not work properly: type* variable1, variable2; /* Only variable1 is actually a pointer. */ This commit reformats the entirety of the libjpeg-turbo code base so that it uses the same code formatting convention for pointers that the TurboJPEG API code uses: type *variable1, *variable2; This seems to be the most common convention among C programmers, and it is the convention used by other codec libraries, such as libpng and libtiff.

8632f1b2

2016-02-09T00:38:58

ARM64: Avoid tbl instruction on Cortex-A53/A57 Full-color compression speedups relative to previous commits: Cortex-A53 (Nexus 5X), Android, 64-bit: 0.91-3.0% (avg. 1.8%) Cortex-A57 (Nexus 5X), Android, 64-bit: -0.35-1.5% (avg. 0.65%)

46ecffa3

2016-02-07T22:05:56

ARM64: Avoid LD3/ST3 at run time, not compile time ... and only if ThunderX is detected. This can be easily expanded later on to include other CPUs that are known to suffer from slow LD3/ST3, but it doesn't make sense to disable LD3/ST3 for all non-Android Linux platforms just because ThunderX is slow.

ec6941f7

2016-01-15T09:29:11

Complete the ARM64 NEON SIMD implementation This adds 64-bit NEON coverage for all of the algorithms that are covered by the 32-bit NEON implementation, except for h2v1 (4:2:2) fancy upsampling (used when decompressing 4:2:2 JPEG images.) It also adds 64-bit NEON SIMD coverage for: * slow integer forward DCT (compressor) * h2v2 (4:2:0) downsampling (compressor) * h2v1 (4:2:2) downsampling (compressor) which are not covered in the 32-bit implementation. Compression speedups relative to libjpeg-turbo 1.4.2: Apple A7 (iPhone 5S), iOS, 64-bit: 113-150% (reported) 48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 2.1-33% (avg. 15%) Refer to #44 and #49 for discussion This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Based on: https://github.com/mayeut/libjpeg-turbo/commit/dcd9d84f10fae192c0e3935818dc289bca9c3e29 https://github.com/mayeut/libjpeg-turbo/commit/b0d87b811f37bd560083deea8c6e7d704e5cd944 https://github.com/mayeut/libjpeg-turbo/commit/70cd5c8a493a67f4d54dd2067ae6dedb65d95389 https://github.com/mayeut/libjpeg-turbo/commit/3e58d9a064648503c57ec2650ee79880f749a52b https://github.com/mayeut/libjpeg-turbo/commit/837b19542f53fa81af83e6ba002d559877aaf597 https://github.com/mayeut/libjpeg-turbo/commit/73dc43ccc870c2e10ba893e9764b8e48d6836585 https://github.com/mayeut/libjpeg-turbo/commit/a82b71a261b4c0213f558baf4bc745f1c27356d8 https://github.com/mayeut/libjpeg-turbo/commit/c1b1188c2106d6ea7b76644b6023b57edeb602e1 https://github.com/mayeut/libjpeg-turbo/commit/305c89284e1bb222b34fbc7261f697a0cc452a41 https://github.com/mayeut/libjpeg-turbo/commit/7f443f99950b4d7d442b9b879648eca5273209bd https://github.com/mayeut/libjpeg-turbo/commit/4c2b53b77da5a20e30e2aadaeddb0efbfe24e06d Unified version with fixes: https://github.com/mayeut/libjpeg-turbo/commit/1004a3cd05870612a194b410efeaa1b4da76d246

499c470b

2016-01-13T03:13:20

ARM32 NEON SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%) Refer to #42 and #47 for discussion. This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Removing those if statements improved performance across the board by a couple of percent. Based on: https://github.com/mayeut/libjpeg-turbo/commit/fc023c880ce1d6c908fb78ccc25f5d5fd910ccc5

f3a8684c

2016-01-07T00:19:43

SSE2 SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%) 2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%) 2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%) 3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%) 3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%) 2.6 GHz AMD Athlon 64 X2 5050e: Performance-neutral (give or take a few percent) Full-color compression speedups relative to IPP: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%) Refer to #42 for discussion. Numerous other approaches were attempted, but this one proved to be the most performant across all platforms. This commit also fixes #3 (works around, really-- the clang-compiled version of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but that code is now bypassed by the new SSE2 Huffman algorithm.) Based on: https://github.com/mayeut/libjpeg-turbo/commit/2cb4d41330e1edc4469f6b97ba73b73abfbeb02f https://github.com/mayeut/libjpeg-turbo/commit/36c94e050d117912adbff9fbcc6fe307df240168

c641cddb

2015-01-14T15:41:11

AltiVec SIMD implementation of H2V1 and H2V2 plain upsampling (used only when decompressing YCCK images with fast upsampling enabled.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1504 632fc199-4ca6-4c93-a231-07263d6284db

86af36ae

2015-01-14T13:27:32

AltiVec SIMD implementation of H2V1 and H2V2 merged upsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1503 632fc199-4ca6-4c93-a231-07263d6284db

52a4ec6c

2015-01-13T09:02:29

AltiVec SIMD implementation of H2V1 and H2V2 fancy upsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1495 632fc199-4ca6-4c93-a231-07263d6284db

ac4daa77

2015-01-10T22:56:26

AltiVec SIMD implementation of YCC-to-RGB color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1489 632fc199-4ca6-4c93-a231-07263d6284db

22048207

2015-01-08T06:18:33

AltiVec SIMD implementation of 2x1 and 2x2 downsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1483 632fc199-4ca6-4c93-a231-07263d6284db

577ecd93

2014-12-23T04:14:54

AltiVec SIMD implementation of sample conversion and integer quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1474 632fc199-4ca6-4c93-a231-07263d6284db

b1fec4ff

2014-12-22T14:10:33

AltiVec SIMD implementation of RGB-to-Grayscale color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1471 632fc199-4ca6-4c93-a231-07263d6284db

62bae204

2014-12-22T13:42:26

AltiVec SIMD implementation of RGB-to-YCC color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1469 632fc199-4ca6-4c93-a231-07263d6284db

0691162a

2014-12-20T01:17:39

AltiVec SIMD implementation of slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1461 632fc199-4ca6-4c93-a231-07263d6284db

6cb7f40a

2014-12-18T10:12:29

AltiVec SIMD implementation of fast integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1445 632fc199-4ca6-4c93-a231-07263d6284db

fb0c3940

2014-12-17T08:04:39

AltiVec SIMD implementation of slow integer forward DCT; Clean up fast integer forward DCT code so that it is easier to see how it derives from the SSE2 code and to make it play more nicely with the slow FDCT code. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1443 632fc199-4ca6-4c93-a231-07263d6284db

cd2d8e1c

2014-09-05T06:33:42

AltiVec SIMD implementation of fast forward DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1405 632fc199-4ca6-4c93-a231-07263d6284db

d729f4da

2014-08-23T15:47:51

ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code: ----- https://github.com/ssvb/libjpeg-turbo/commit/aee36252be20054afce371a92406fc66ba6627b5.patch From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 03:50:22 +0300 Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15 The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9. Tuning it for newer ARM processors can introduce some speed-up (up to 20%). The performance of the inner loop (conversion of 8 pixels) improves from ~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles down to ~18 cycles on ARM Cortex-A15. The performance remains exactly the same on ARM Cortex-A7 (~58 cycles), ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors. Also use larger indentation in the source code for separating two independent instruction streams. ----- https://github.com/ssvb/libjpeg-turbo/commit/a5efdbf22ce9c1acd4b14a353cec863c2c57557e.patch From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 07:23:09 +0300 Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion The performance of the inner loop (conversion of 8 pixels): * ARM Cortex-A7: ~55 cycles * ARM Cortex-A8: ~28 cycles * ARM Cortex-A9: ~32 cycles * ARM Cortex-A15: ~20 cycles * Qualcomm Krait: ~24 cycles Based on the Linaro rgb565 patch from https://sourceforge.net/p/libjpeg-turbo/patches/24/ but implements better instructions scheduling. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db

5ef46305

2014-05-18T20:04:47

SIMD-accelerated int upsample routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1315 632fc199-4ca6-4c93-a231-07263d6284db

c728cfd8

2014-05-18T19:36:05

Fix MIPS build git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1314 632fc199-4ca6-4c93-a231-07263d6284db

bc56b754

2014-05-16T10:43:44

Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1308 632fc199-4ca6-4c93-a231-07263d6284db

52ded876

2014-05-15T20:30:16

Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1307 632fc199-4ca6-4c93-a231-07263d6284db

1b3fd7ee

2014-05-15T18:26:01

SIMD-accelerated NULL convert routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1304 632fc199-4ca6-4c93-a231-07263d6284db

6a61c1e6

2014-05-14T15:00:10

SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1301 632fc199-4ca6-4c93-a231-07263d6284db

b844eaa3

2014-05-13T18:40:14

SIMD-accelerated merged upsampling routines for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1297 632fc199-4ca6-4c93-a231-07263d6284db

34347862

2014-05-06T09:53:21

SIMD-accelerated slow integer IDCT routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1269 632fc199-4ca6-4c93-a231-07263d6284db

fff6c23a

2013-10-12T21:39:20

SIMD-accelerated integer convsamp routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1059 632fc199-4ca6-4c93-a231-07263d6284db

3d727281

2013-10-09T18:39:44

SIMD-accelerated floating point quantize and convsamp routines for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1058 632fc199-4ca6-4c93-a231-07263d6284db

d3131c1b

2013-10-08T02:18:59

SIMD-accelerated fast integer inverse DCT routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1056 632fc199-4ca6-4c93-a231-07263d6284db

71e06a7d

2013-10-08T02:11:21

SIMD-accelerated fast integer forward DCT routine for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1055 632fc199-4ca6-4c93-a231-07263d6284db

a6b7fbd3

2013-09-30T18:13:27

SIMD-accelerated slow integer forward DCT and quantize routines for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1054 632fc199-4ca6-4c93-a231-07263d6284db

e5005917

2013-09-27T17:51:08

SIMD-accelerated 3/4 and 3/2 decompression scaling for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1047 632fc199-4ca6-4c93-a231-07263d6284db

2ccf4d1a

2013-09-27T17:43:23

SIMD-accelerated 1/2 and 1/4 decompression scaling for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1046 632fc199-4ca6-4c93-a231-07263d6284db

49eaa757

2013-09-27T17:39:57

SIMD-optimized RGB-to-grayscale conversion for MIPS DSPr2 git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1045 632fc199-4ca6-4c93-a231-07263d6284db

16962c11

2013-07-27T21:50:02

SIMD support for performing upsampling using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@996 632fc199-4ca6-4c93-a231-07263d6284db

6f2d3c2c

2013-07-27T21:48:18

SIMD support for performing downsampling using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@995 632fc199-4ca6-4c93-a231-07263d6284db

86fbf35f

2013-07-27T21:44:14

SIMD support for performing fancy upsampling using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@994 632fc199-4ca6-4c93-a231-07263d6284db

0be9fa57

2013-07-24T21:50:20

SIMD support for performing color conversion using MIPS DSPr2 instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@993 632fc199-4ca6-4c93-a231-07263d6284db

316617fa

2012-06-13T05:17:03

Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@837 632fc199-4ca6-4c93-a231-07263d6284db

ce4e3e86

2011-08-22T13:48:01

NEON-accelerated slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@690 632fc199-4ca6-4c93-a231-07263d6284db

82bd5219

2011-08-17T21:00:59

NEON-accelerated quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@689 632fc199-4ca6-4c93-a231-07263d6284db

7a9376c1

2011-08-12T19:27:20

ARM NEON-accelerated RGB-to-YCbCr conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@682 632fc199-4ca6-4c93-a231-07263d6284db

b740054f

2011-08-10T23:31:13

Support for accelerated forward DCT using ARM NEON instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@678 632fc199-4ca6-4c93-a231-07263d6284db

8c60d22f

2011-06-17T21:12:58

NEON-optimized 2x2 and 4x4 scaled iDCTs git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@662 632fc199-4ca6-4c93-a231-07263d6284db

321e0686

2011-05-03T08:47:43

ARM NEON support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@607 632fc199-4ca6-4c93-a231-07263d6284db

ddb158c5

2011-02-27T09:09:54

Add short names for RGB->grayscale MMX functions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@468 632fc199-4ca6-4c93-a231-07263d6284db

392e0483

2011-02-18T20:43:04

Updated (C) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@394 632fc199-4ca6-4c93-a231-07263d6284db

c8666333

2011-02-18T11:23:45

SIMD-accelerated RGB-to-Grayscale color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@393 632fc199-4ca6-4c93-a231-07263d6284db

af1ca9bc

2011-02-02T05:42:37

Clarify that the C wrappers and headers fall under the same license as the rest of the SIMD code git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.0.x@335 632fc199-4ca6-4c93-a231-07263d6284db

720e1610

2009-04-05T21:51:25

Add colorspace extensions to merged upsampling routines git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@42 632fc199-4ca6-4c93-a231-07263d6284db

f25c071e

2009-04-03T12:00:51

Implement new colorspaces to allow directly compressing from/decompressing to RGB/RGBX/BGR/BGRX/XBGR/XRGB without conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@35 632fc199-4ca6-4c93-a231-07263d6284db

eea72155

2009-03-09T13:34:17

Add SSE2 SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@22 632fc199-4ca6-4c93-a231-07263d6284db

018fc429

2009-03-09T13:31:56

Add SSE SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@21 632fc199-4ca6-4c93-a231-07263d6284db

65d03173

2009-03-09T13:28:10

Add 3DNow SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@18 632fc199-4ca6-4c93-a231-07263d6284db

5eb84ff9

2009-03-09T13:25:30

Add MMX SIMD implementation of computationally intensive routines. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@17 632fc199-4ca6-4c93-a231-07263d6284db

2ae181c7

2009-03-09T13:21:27

Implement x86 SIMD framework Add NASM support and stub routine for detecting SIMD extensions. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@15 632fc199-4ca6-4c93-a231-07263d6284db

kc3-lang/libjpeg-turbo/simd/jsimd.h

simd/jsimd.h

Log