|
499c470b
|
2016-01-13T03:13:20
|
|
ARM32 NEON SIMD implementation of Huffman encoding
Full-color compression speedups relative to libjpeg-turbo 1.4.2:
800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%)
Refer to #42 and #47 for discussion.
This commit also removes the unnecessary
if (simd_support & JSIMD_ARM_NEON)
statements from the jsimd* algorithm functions. Since the jsimd_can*()
functions check for the existence of NEON, the corresponding algorithm
functions will never be called if NEON isn't available. Removing those
if statements improved performance across the board by a couple of
percent.
Based on:
https://github.com/mayeut/libjpeg-turbo/commit/fc023c880ce1d6c908fb78ccc25f5d5fd910ccc5
|
|
1e32fe31
|
2015-10-14T17:32:39
|
|
Replace INT32 with a new internal datatype (JLONG)
These days, INT32 is a commonly-defined datatype in system headers. We
cannot eliminate the definition of that datatype from jmorecfg.h, since
the INT32 typedef has technically been part of the libjpeg API since
version 5 (1994.) However, using INT32 internally is risky, because the
inclusion of a particular header (Xmd.h, for instance) could change the
definition of INT32 from long to int on 64-bit platforms and thus change
the internal behavior of libjpeg-turbo in unexpected ways (for instance,
failing to correctly set __INT32_IS_ACTUALLY_LONG to match the INT32
typedef-- perhaps as a result of including the wrong version of
jpeglib.h-- could cause libjpeg-turbo to produce incorrect results.)
The library has always been built in environments in which INT32 is
effectively long (on Windows, long is always 32-bit, so effectively it's
the same as int), so it makes sense to turn INT32 into an explicitly
long datatype. This ensures that libjpeg-turbo will always behave
consistently, regardless of the headers included at compile time.
Addresses a concern expressed in #26.
|
|
a6efae14
|
2014-08-25T15:26:09
|
|
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1388 632fc199-4ca6-4c93-a231-07263d6284db
|
|
d729f4da
|
2014-08-23T15:47:51
|
|
ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
-----
https://github.com/ssvb/libjpeg-turbo/commit/aee36252be20054afce371a92406fc66ba6627b5.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
https://github.com/ssvb/libjpeg-turbo/commit/a5efdbf22ce9c1acd4b14a353cec863c2c57557e.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
|
|
2e2ce5a1
|
2014-08-22T11:31:46
|
|
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1375 632fc199-4ca6-4c93-a231-07263d6284db
|
|
3e00f03a
|
2014-02-05T07:40:00
|
|
Formatting tweaks
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1107 632fc199-4ca6-4c93-a231-07263d6284db
|
|
316617fa
|
2012-06-13T05:17:03
|
|
Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling)
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.2.x@837 632fc199-4ca6-4c93-a231-07263d6284db
|
|
b071f01c
|
2011-09-06T18:58:22
|
|
Update Nokia contact info
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@694 632fc199-4ca6-4c93-a231-07263d6284db
|
|
ad6955d4
|
2011-09-06T18:57:53
|
|
Improve performance of IFAST iDCT by changing the order of transpose and descale steps
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@693 632fc199-4ca6-4c93-a231-07263d6284db
|
|
5129e396
|
2011-09-06T18:55:45
|
|
Make ARM ISLOW iDCT faster on typical cases, and eliminate the possibility of 16-bit overflows when handling arbitrary coefficients.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@692 632fc199-4ca6-4c93-a231-07263d6284db
|
|
98a44fe0
|
2011-08-24T23:27:44
|
|
Improve the performance of YCbCr to RGB conversion on ARM
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@691 632fc199-4ca6-4c93-a231-07263d6284db
|
|
ce4e3e86
|
2011-08-22T13:48:01
|
|
NEON-accelerated slow integer inverse DCT
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@690 632fc199-4ca6-4c93-a231-07263d6284db
|
|
82bd5219
|
2011-08-17T21:00:59
|
|
NEON-accelerated quantization
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@689 632fc199-4ca6-4c93-a231-07263d6284db
|
|
4b024a62
|
2011-08-15T08:36:51
|
|
Improve performance of ARM NEON IFAST iDCT
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@686 632fc199-4ca6-4c93-a231-07263d6284db
|
|
7a9376c1
|
2011-08-12T19:27:20
|
|
ARM NEON-accelerated RGB-to-YCbCr conversion
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@682 632fc199-4ca6-4c93-a231-07263d6284db
|
|
b740054f
|
2011-08-10T23:31:13
|
|
Support for accelerated forward DCT using ARM NEON instructions
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@678 632fc199-4ca6-4c93-a231-07263d6284db
|
|
8c60d22f
|
2011-06-17T21:12:58
|
|
NEON-optimized 2x2 and 4x4 scaled iDCTs
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@662 632fc199-4ca6-4c93-a231-07263d6284db
|
|
4346f91f
|
2011-06-14T22:16:50
|
|
iOS ARM support
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@659 632fc199-4ca6-4c93-a231-07263d6284db
|
|
321e0686
|
2011-05-03T08:47:43
|
|
ARM NEON support
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@607 632fc199-4ca6-4c93-a231-07263d6284db
|