kmx git

Commit	Date	Message
cf888486	2016-02-02T23:17:06	Use consistent formatting in ARM NEON SIMD code There aren't really any best practices to follow here. I tried as best as I could to adopt a standard that would ease any future maintenance burdens. The basic tenets of that standard are: * Assembly instructions always start on Column 5, and operands always start on Column 21, except: - The instruction and operand can be indented (usually by 2 spaces) to indicate a separate instruction stream. - If the instruction is within an enclosing .if block in a macro, it should always be indented relative to the .if block. * Comments are placed with an eye toward readability. There are always at least 2 spaces between the end of a line of code and the associated in-line comment. Where it made sense, I tried to line up the comments in blocks, and some were shifted right to avoid overlap with neighboring instruction lines. Not an exact science. * Assembler directives and macros use 2-space indenting rules. .if blocks are indented relative to the macro, and code within the .if blocks is indented relative to the .if directive. * No extraneous spaces between operands. Lining up the operands vertically did not really improve readability-- personally, I think it made it worse, since my eye would tend to lose its place in the uniform columns of characters. Also, code with a lot of vertical alignment is really hard to maintain, since changing one line could necessitate changing a bunch of other lines to avoid spoiling the alignment. * No extraneous spaces in #defines or other directives. In general, the only extraneous spaces (other than indenting spaces) are between: - Instructions and operands - Operands and in-line comments This standard should be more or less in keeping with other formatting standards used within the project.
499c470b	2016-01-13T03:13:20	ARM32 NEON SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%) Refer to #42 and #47 for discussion. This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Removing those if statements improved performance across the board by a couple of percent. Based on: https://github.com/mayeut/libjpeg-turbo/commit/fc023c880ce1d6c908fb78ccc25f5d5fd910ccc5
1e32fe31	2015-10-14T17:32:39	Replace INT32 with a new internal datatype (JLONG) These days, INT32 is a commonly-defined datatype in system headers. We cannot eliminate the definition of that datatype from jmorecfg.h, since the INT32 typedef has technically been part of the libjpeg API since version 5 (1994.) However, using INT32 internally is risky, because the inclusion of a particular header (Xmd.h, for instance) could change the definition of INT32 from long to int on 64-bit platforms and thus change the internal behavior of libjpeg-turbo in unexpected ways (for instance, failing to correctly set __INT32_IS_ACTUALLY_LONG to match the INT32 typedef-- perhaps as a result of including the wrong version of jpeglib.h-- could cause libjpeg-turbo to produce incorrect results.) The library has always been built in environments in which INT32 is effectively long (on Windows, long is always 32-bit, so effectively it's the same as int), so it makes sense to turn INT32 into an explicitly long datatype. This ensures that libjpeg-turbo will always behave consistently, regardless of the headers included at compile time. Addresses a concern expressed in #26.
a6efae14	2014-08-25T15:26:09	Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1388 632fc199-4ca6-4c93-a231-07263d6284db
d729f4da	2014-08-23T15:47:51	ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code: ----- https://github.com/ssvb/libjpeg-turbo/commit/aee36252be20054afce371a92406fc66ba6627b5.patch From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 03:50:22 +0300 Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15 The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9. Tuning it for newer ARM processors can introduce some speed-up (up to 20%). The performance of the inner loop (conversion of 8 pixels) improves from ~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles down to ~18 cycles on ARM Cortex-A15. The performance remains exactly the same on ARM Cortex-A7 (~58 cycles), ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors. Also use larger indentation in the source code for separating two independent instruction streams. ----- https://github.com/ssvb/libjpeg-turbo/commit/a5efdbf22ce9c1acd4b14a353cec863c2c57557e.patch From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 07:23:09 +0300 Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion The performance of the inner loop (conversion of 8 pixels): * ARM Cortex-A7: ~55 cycles * ARM Cortex-A8: ~28 cycles * ARM Cortex-A9: ~32 cycles * ARM Cortex-A15: ~20 cycles * Qualcomm Krait: ~24 cycles Based on the Linaro rgb565 patch from https://sourceforge.net/p/libjpeg-turbo/patches/24/ but implements better instructions scheduling. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
2e2ce5a1	2014-08-22T11:31:46	.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424). git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1375 632fc199-4ca6-4c93-a231-07263d6284db
3e00f03a	2014-02-05T07:40:00	Formatting tweaks git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1107 632fc199-4ca6-4c93-a231-07263d6284db
316617fa	2012-06-13T05:17:03	Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@837 632fc199-4ca6-4c93-a231-07263d6284db
b071f01c	2011-09-06T18:58:22	Update Nokia contact info git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@694 632fc199-4ca6-4c93-a231-07263d6284db
ad6955d4	2011-09-06T18:57:53	Improve performance of IFAST iDCT by changing the order of transpose and descale steps git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@693 632fc199-4ca6-4c93-a231-07263d6284db
5129e396	2011-09-06T18:55:45	Make ARM ISLOW iDCT faster on typical cases, and eliminate the possibility of 16-bit overflows when handling arbitrary coefficients. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@692 632fc199-4ca6-4c93-a231-07263d6284db
98a44fe0	2011-08-24T23:27:44	Improve the performance of YCbCr to RGB conversion on ARM git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@691 632fc199-4ca6-4c93-a231-07263d6284db
ce4e3e86	2011-08-22T13:48:01	NEON-accelerated slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@690 632fc199-4ca6-4c93-a231-07263d6284db
82bd5219	2011-08-17T21:00:59	NEON-accelerated quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@689 632fc199-4ca6-4c93-a231-07263d6284db
4b024a62	2011-08-15T08:36:51	Improve performance of ARM NEON IFAST iDCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@686 632fc199-4ca6-4c93-a231-07263d6284db
7a9376c1	2011-08-12T19:27:20	ARM NEON-accelerated RGB-to-YCbCr conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@682 632fc199-4ca6-4c93-a231-07263d6284db
b740054f	2011-08-10T23:31:13	Support for accelerated forward DCT using ARM NEON instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@678 632fc199-4ca6-4c93-a231-07263d6284db
8c60d22f	2011-06-17T21:12:58	NEON-optimized 2x2 and 4x4 scaled iDCTs git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@662 632fc199-4ca6-4c93-a231-07263d6284db
4346f91f	2011-06-14T22:16:50	iOS ARM support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@659 632fc199-4ca6-4c93-a231-07263d6284db
321e0686	2011-05-03T08:47:43	ARM NEON support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@607 632fc199-4ca6-4c93-a231-07263d6284db

cf888486

2016-02-02T23:17:06

Use consistent formatting in ARM NEON SIMD code There aren't really any best practices to follow here. I tried as best as I could to adopt a standard that would ease any future maintenance burdens. The basic tenets of that standard are: * Assembly instructions always start on Column 5, and operands always start on Column 21, except: - The instruction and operand can be indented (usually by 2 spaces) to indicate a separate instruction stream. - If the instruction is within an enclosing .if block in a macro, it should always be indented relative to the .if block. * Comments are placed with an eye toward readability. There are always at least 2 spaces between the end of a line of code and the associated in-line comment. Where it made sense, I tried to line up the comments in blocks, and some were shifted right to avoid overlap with neighboring instruction lines. Not an exact science. * Assembler directives and macros use 2-space indenting rules. .if blocks are indented relative to the macro, and code within the .if blocks is indented relative to the .if directive. * No extraneous spaces between operands. Lining up the operands vertically did not really improve readability-- personally, I think it made it worse, since my eye would tend to lose its place in the uniform columns of characters. Also, code with a lot of vertical alignment is really hard to maintain, since changing one line could necessitate changing a bunch of other lines to avoid spoiling the alignment. * No extraneous spaces in #defines or other directives. In general, the only extraneous spaces (other than indenting spaces) are between: - Instructions and operands - Operands and in-line comments This standard should be more or less in keeping with other formatting standards used within the project.

499c470b

2016-01-13T03:13:20

ARM32 NEON SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%) Refer to #42 and #47 for discussion. This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Removing those if statements improved performance across the board by a couple of percent. Based on: https://github.com/mayeut/libjpeg-turbo/commit/fc023c880ce1d6c908fb78ccc25f5d5fd910ccc5

1e32fe31

2015-10-14T17:32:39

Replace INT32 with a new internal datatype (JLONG) These days, INT32 is a commonly-defined datatype in system headers. We cannot eliminate the definition of that datatype from jmorecfg.h, since the INT32 typedef has technically been part of the libjpeg API since version 5 (1994.) However, using INT32 internally is risky, because the inclusion of a particular header (Xmd.h, for instance) could change the definition of INT32 from long to int on 64-bit platforms and thus change the internal behavior of libjpeg-turbo in unexpected ways (for instance, failing to correctly set __INT32_IS_ACTUALLY_LONG to match the INT32 typedef-- perhaps as a result of including the wrong version of jpeglib.h-- could cause libjpeg-turbo to produce incorrect results.) The library has always been built in environments in which INT32 is effectively long (on Windows, long is always 32-bit, so effectively it's the same as int), so it makes sense to turn INT32 into an explicitly long datatype. This ensures that libjpeg-turbo will always behave consistently, regardless of the headers included at compile time. Addresses a concern expressed in #26.

a6efae14

2014-08-25T15:26:09

Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1388 632fc199-4ca6-4c93-a231-07263d6284db

d729f4da

2014-08-23T15:47:51

ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code: ----- https://github.com/ssvb/libjpeg-turbo/commit/aee36252be20054afce371a92406fc66ba6627b5.patch From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 03:50:22 +0300 Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15 The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9. Tuning it for newer ARM processors can introduce some speed-up (up to 20%). The performance of the inner loop (conversion of 8 pixels) improves from ~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles down to ~18 cycles on ARM Cortex-A15. The performance remains exactly the same on ARM Cortex-A7 (~58 cycles), ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors. Also use larger indentation in the source code for separating two independent instruction streams. ----- https://github.com/ssvb/libjpeg-turbo/commit/a5efdbf22ce9c1acd4b14a353cec863c2c57557e.patch From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001 From: Siarhei Siamashka <siarhei.siamashka@gmail.com> Date: Wed, 13 Aug 2014 07:23:09 +0300 Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion The performance of the inner loop (conversion of 8 pixels): * ARM Cortex-A7: ~55 cycles * ARM Cortex-A8: ~28 cycles * ARM Cortex-A9: ~32 cycles * ARM Cortex-A15: ~20 cycles * Qualcomm Krait: ~24 cycles Based on the Linaro rgb565 patch from https://sourceforge.net/p/libjpeg-turbo/patches/24/ but implements better instructions scheduling. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db

2e2ce5a1

2014-08-22T11:31:46

.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424). git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1375 632fc199-4ca6-4c93-a231-07263d6284db

3e00f03a

2014-02-05T07:40:00

Formatting tweaks git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1107 632fc199-4ca6-4c93-a231-07263d6284db

316617fa

2012-06-13T05:17:03

Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@837 632fc199-4ca6-4c93-a231-07263d6284db

b071f01c

2011-09-06T18:58:22

Update Nokia contact info git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@694 632fc199-4ca6-4c93-a231-07263d6284db

ad6955d4

2011-09-06T18:57:53

Improve performance of IFAST iDCT by changing the order of transpose and descale steps git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@693 632fc199-4ca6-4c93-a231-07263d6284db

5129e396

2011-09-06T18:55:45

Make ARM ISLOW iDCT faster on typical cases, and eliminate the possibility of 16-bit overflows when handling arbitrary coefficients. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@692 632fc199-4ca6-4c93-a231-07263d6284db

98a44fe0

2011-08-24T23:27:44

Improve the performance of YCbCr to RGB conversion on ARM git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@691 632fc199-4ca6-4c93-a231-07263d6284db

ce4e3e86

2011-08-22T13:48:01

NEON-accelerated slow integer inverse DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@690 632fc199-4ca6-4c93-a231-07263d6284db

82bd5219

2011-08-17T21:00:59

NEON-accelerated quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@689 632fc199-4ca6-4c93-a231-07263d6284db

4b024a62

2011-08-15T08:36:51

Improve performance of ARM NEON IFAST iDCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@686 632fc199-4ca6-4c93-a231-07263d6284db

7a9376c1

2011-08-12T19:27:20

ARM NEON-accelerated RGB-to-YCbCr conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@682 632fc199-4ca6-4c93-a231-07263d6284db

b740054f

2011-08-10T23:31:13

Support for accelerated forward DCT using ARM NEON instructions git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@678 632fc199-4ca6-4c93-a231-07263d6284db

8c60d22f

2011-06-17T21:12:58

NEON-optimized 2x2 and 4x4 scaled iDCTs git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@662 632fc199-4ca6-4c93-a231-07263d6284db

4346f91f

2011-06-14T22:16:50

iOS ARM support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@659 632fc199-4ca6-4c93-a231-07263d6284db

321e0686

2011-05-03T08:47:43

ARM NEON support git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@607 632fc199-4ca6-4c93-a231-07263d6284db

kc3-lang/libjpeg-turbo/simd/jsimd_arm_neon.S

simd/jsimd_arm_neon.S

Log