Hash :
6e35e421
Author :
Date :
2018-10-01T14:43:03
Working on bug 3921 - Add some Fastpath to BlitNtoNKey and BlitNtoNKeyCopyAlpha
Sylvain
I did various benches. with clang 6.0.0 on linux, and ndk-r16b on android (NDK_TOOLCHAIN_VERSION=clang).
- still see a x10 speed factor.
- with duff_loops, it does not use vectorisation (but doesn't seem to be a problem).
on linux my patch is already at full speed on -O2, whereas the duff_loops need -O3 (200 ms at -03, and 300ms at -02).
I realized that on Android, I had a slight variation which fits best.
both on linux with -O2 and -O3, and on android with 02/03 and armeabi-v7a/arm64.
Here's the patch.