poclbm.cl


Log

Author Commit Date CI Message
Con Kolivas 2b6e8416 2011-06-29T23:38:16 Use a buffer of up to 512 * 4 integers when retrieving work from the GPU. This allows each local thread id to have one slot to put any positive results into, thus making overlapping results far less likely. Thus races will be much rarer, allowing more threads. It should also pick up blocks close to each other more reliably and hopefully decrease the number of rejects and opencl errors. Do the search over the buffer entirely in a separate thread to allow the GPU to stay as busy as possible. Detach threads from themselves to prevent unlucky even where dereferencing occurs by freeing the data that stores the thread info.
Con Kolivas 2dbb3944 2011-06-27T22:05:03 Base was being set wrongly meaning we were repeating searches and the rate was actually lower than displayed :( Tweak Ma with new changes. Change default vectors to 2 since it's faster than 4 even when 4 is reported as preferred.
Con Kolivas 623b9b9f 2011-06-27T12:45:03 Patch bitalign separately from bfi_int. Recover from failing to patch for bfi int.
Con Kolivas 8253f141 2011-06-23T23:38:04 Use some line breaks in the kernel.
Con Kolivas 4257deaf 2011-06-23T23:14:47 Convert abcd... to an array.
Con Kolivas 75cf5ccd 2011-06-23T23:04:34 Replace Ws with an array.
ckolivas 19eea906 2011-06-23T17:50:37 Implement code detecting max work size and optimal vector width. Use this to patch the kernel to suit the idea values for the card. Then use these values when invoking the kernel.
Con Kolivas f54d2cc0 2011-06-22T23:07:30 Make poclbm use 4 vectors and decrease worksize to keep pipelines fullish. Make it possible to have 0 CPU threads and update docs. Fix counter with no cpu threads.
ckolivas b4d2733c 2011-06-22T16:47:34 Convert to poclbm kernel.