simd/jsimd_arm.c


Log

Author Commit Date CI Message
DRC 9d64f3c6 2017-04-24T14:42:58 Attribute ARM runtime detection code to Nokia This code was submitted in the initial ARM NEON patches (https://sourceforge.net/p/libjpeg-turbo/patches/7/) by Siarhei while he was still a Nokia employee.
DRC 9055fb40 2016-07-07T13:10:30 ARM/MIPS: Change the behavior of JSIMD_FORCE* The JSIMD_FORCE* environment variables previously meant "force the use of this instruction set if it is available but others are available as well", but that did nothing on ARM platforms, since there is only ever one instruction set available. Since the ARM and MIPS CPU feature detection code is less than bulletproof, and since there is only one SIMD instruction set (currently) supported on those platforms, it makes sense for the JSIMD_FORCE* environment variables on those platforms to actually force the use of the SIMD instruction set, thus bypassing the CPU feature detection code. This addresses a concern raised in #88 whereby parsing /proc/cpuinfo didn't work within a QEMU environment. This at least provides a workaround, allowing users to force-enable or force-disable SIMD instructions for ARM and MIPS builds of libjpeg-turbo.
DRC 123f7258 2016-05-24T10:23:56 Format copyright headers more consistently The IJG convention is to format copyright notices as: Copyright (C) YYYY, Owner. We try to maintain this convention for any code that is part of the libjpeg API library (with the exception of preserving the copyright notices from Cendio's code verbatim, since those predate libjpeg-turbo.) Note that the phrase "All Rights Reserved" is no longer necessary, since all Buenos Aires Convention signatories signed onto the Berne Convention in 2000. However, our convention is to retain this phrase for any files that have a self-contained copyright header but to leave it off of any files that refer to another file for conditions of distribution and use. For instance, all of the non-SIMD files in the libjpeg API library refer to README.ijg, and the copyright message in that file contains "All Rights Reserved", so it is unnecessary to add it to the individual files. The TurboJPEG code retains my preferred formatting convention for copyright notices, which is based on that of VirtualGL (where the TurboJPEG API originated.)
DRC bd49803f 2016-02-19T08:53:33 Use consistent/modern code formatting for pointers The convention used by libjpeg: type * variable; is not very common anymore, because it looks too much like multiplication. Some (particularly C++ programmers) prefer to tuck the pointer symbol against the type: type* variable; to emphasize that a pointer to a type is effectively a new type. However, this can also be confusing, since defining multiple variables on the same line would not work properly: type* variable1, variable2; /* Only variable1 is actually a pointer. */ This commit reformats the entirety of the libjpeg-turbo code base so that it uses the same code formatting convention for pointers that the TurboJPEG API code uses: type *variable1, *variable2; This seems to be the most common convention among C programmers, and it is the convention used by other codec libraries, such as libpng and libtiff.
DRC e8aa5fa9 2016-01-15T13:15:54 Add JSIMD_NOHUFFENC environment variable for ARM Useful in regression/performance testing
DRC 499c470b 2016-01-13T03:13:20 ARM32 NEON SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%) Refer to #42 and #47 for discussion. This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Removing those if statements improved performance across the board by a couple of percent. Based on: https://github.com/mayeut/libjpeg-turbo/commit/fc023c880ce1d6c908fb78ccc25f5d5fd910ccc5
DRC f3a8684c 2016-01-07T00:19:43 SSE2 SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%) 2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%) 2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%) 3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%) 3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%) 2.6 GHz AMD Athlon 64 X2 5050e: Performance-neutral (give or take a few percent) Full-color compression speedups relative to IPP: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%) Refer to #42 for discussion. Numerous other approaches were attempted, but this one proved to be the most performant across all platforms. This commit also fixes #3 (works around, really-- the clang-compiled version of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but that code is now bypassed by the new SSE2 Huffman algorithm.) Based on: https://github.com/mayeut/libjpeg-turbo/commit/2cb4d41330e1edc4469f6b97ba73b73abfbeb02f https://github.com/mayeut/libjpeg-turbo/commit/36c94e050d117912adbff9fbcc6fe307df240168
DRC d729f4da 2014-08-23T15:47:51 ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code: ----- https://github.com/ssvb/libjpeg-turbo/commit/aee36252be20054afce371a92406fc66ba6627b5.patch From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 03:50:22 +0300 Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15 The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9. Tuning it for newer ARM processors can introduce some speed-up (up to 20%). The performance of the inner loop (conversion of 8 pixels) improves from ~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles down to ~18 cycles on ARM Cortex-A15. The performance remains exactly the same on ARM Cortex-A7 (~58 cycles), ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors. Also use larger indentation in the source code for separating two independent instruction streams. ----- https://github.com/ssvb/libjpeg-turbo/commit/a5efdbf22ce9c1acd4b14a353cec863c2c57557e.patch From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 07:23:09 +0300 Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion The performance of the inner loop (conversion of 8 pixels): * ARM Cortex-A7: ~55 cycles * ARM Cortex-A8: ~28 cycles * ARM Cortex-A9: ~32 cycles * ARM Cortex-A15: ~20 cycles * Qualcomm Krait: ~24 cycles Based on the Linaro rgb565 patch from https://sourceforge.net/p/libjpeg-turbo/patches/24/ but implements better instructions scheduling. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
DRC 64086281 2014-05-15T19:46:48 Clean up code formatting in the SIMD interface functions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1306 632fc199-4ca6-4c93-a231-07263d6284db
DRC 1419852c 2014-05-15T19:45:11 Clean up code formatting in the SIMD interface functions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1305 632fc199-4ca6-4c93-a231-07263d6284db
DRC b7753510 2014-05-11T09:36:25 Convert tabs to spaces in the libjpeg code and the SIMD code (TurboJPEG retains the use of tabs for historical reasons. They were annoying in the libjpeg code primarily because they were not consistently used and because they were used to format as well as indent the code. In the case of TurboJPEG, tabs are used just to indent the code, so even if the editor assumes a different tab width, the code will still be readable.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1285 632fc199-4ca6-4c93-a231-07263d6284db
DRC 1a45b81f 2014-05-09T18:06:58 Remove trailing spaces (+ one additional tab in TJUnitTest.java that was missed in the previous commit) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1279 632fc199-4ca6-4c93-a231-07263d6284db
DRC e189ec7a 2014-02-06T19:15:03 Remove trailing space git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1111 632fc199-4ca6-4c93-a231-07263d6284db
DRC 93974690 2014-02-06T19:13:24 Remove trailing space git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1110 632fc199-4ca6-4c93-a231-07263d6284db
DRC 19eeaa79 2013-10-31T07:40:24 Make environment variable syntax consistent between ARM and x86 code, and add an option to disable SIMD on x86 (this option will be added to the x86-64 code as well, but it makes more sense to add it when we add AVX support.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1073 632fc199-4ca6-4c93-a231-07263d6284db
DRC 316617fa 2012-06-13T05:17:03 Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@837 632fc199-4ca6-4c93-a231-07263d6284db
DRC d65d99a9 2012-01-31T03:39:23 Compiler warnings git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@758 632fc199-4ca6-4c93-a231-07263d6284db
DRC 67ce3b23 2011-12-19T02:21:03 Added new alpha channel colorspace constants/pixel formats, so applications can specify that they need the unused byte in a 4-component RGB output buffer set to 0xFF when decompressing. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@732 632fc199-4ca6-4c93-a231-07263d6284db
DRC ce4e3e86 2011-08-22T13:48:01 NEON-accelerated slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@690 632fc199-4ca6-4c93-a231-07263d6284db
DRC 82bd5219 2011-08-17T21:00:59 NEON-accelerated quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@689 632fc199-4ca6-4c93-a231-07263d6284db
DRC 7a9376c1 2011-08-12T19:27:20 ARM NEON-accelerated RGB-to-YCbCr conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@682 632fc199-4ca6-4c93-a231-07263d6284db
DRC b740054f 2011-08-10T23:31:13 Support for accelerated forward DCT using ARM NEON instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@678 632fc199-4ca6-4c93-a231-07263d6284db
DRC 8c60d22f 2011-06-17T21:12:58 NEON-optimized 2x2 and 4x4 scaled iDCTs git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@662 632fc199-4ca6-4c93-a231-07263d6284db
DRC 4346f91f 2011-06-14T22:16:50 iOS ARM support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@659 632fc199-4ca6-4c93-a231-07263d6284db
DRC 321e0686 2011-05-03T08:47:43 ARM NEON support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@607 632fc199-4ca6-4c93-a231-07263d6284db