kmx git

Commit	Date	Message
2b6e8416	2011-06-29T23:38:16	Use a buffer of up to 512 * 4 integers when retrieving work from the GPU. This allows each local thread id to have one slot to put any positive results into, thus making overlapping results far less likely. Thus races will be much rarer, allowing more threads. It should also pick up blocks close to each other more reliably and hopefully decrease the number of rejects and opencl errors. Do the search over the buffer entirely in a separate thread to allow the GPU to stay as busy as possible. Detach threads from themselves to prevent unlucky even where dereferencing occurs by freeing the data that stores the thread info.
2dbb3944	2011-06-27T22:05:03	Base was being set wrongly meaning we were repeating searches and the rate was actually lower than displayed :( Tweak Ma with new changes. Change default vectors to 2 since it's faster than 4 even when 4 is reported as preferred.
623b9b9f	2011-06-27T12:45:03	Patch bitalign separately from bfi_int. Recover from failing to patch for bfi int.
4257deaf	2011-06-23T23:14:47	Convert abcd... to an array.
75cf5ccd	2011-06-23T23:04:34	Replace Ws with an array.
19eea906	2011-06-23T17:50:37	Implement code detecting max work size and optimal vector width. Use this to patch the kernel to suit the idea values for the card. Then use these values when invoking the kernel.
f54d2cc0	2011-06-22T23:07:30	Make poclbm use 4 vectors and decrease worksize to keep pipelines fullish. Make it possible to have 0 CPU threads and update docs. Fix counter with no cpu threads.
b4d2733c	2011-06-22T16:47:34	Convert to poclbm kernel.
8253f141	2011-06-23T23:38:04	Use some line breaks in the kernel.

2b6e8416

2011-06-29T23:38:16

Use a buffer of up to 512 * 4 integers when retrieving work from the GPU. This allows each local thread id to have one slot to put any positive results into, thus making overlapping results far less likely. Thus races will be much rarer, allowing more threads. It should also pick up blocks close to each other more reliably and hopefully decrease the number of rejects and opencl errors. Do the search over the buffer entirely in a separate thread to allow the GPU to stay as busy as possible. Detach threads from themselves to prevent unlucky even where dereferencing occurs by freeing the data that stores the thread info.

2dbb3944

2011-06-27T22:05:03

Base was being set wrongly meaning we were repeating searches and the rate was actually lower than displayed :( Tweak Ma with new changes. Change default vectors to 2 since it's faster than 4 even when 4 is reported as preferred.

623b9b9f

2011-06-27T12:45:03

Patch bitalign separately from bfi_int. Recover from failing to patch for bfi int.

4257deaf

2011-06-23T23:14:47

Convert abcd... to an array.

75cf5ccd

2011-06-23T23:04:34

Replace Ws with an array.

19eea906

2011-06-23T17:50:37

Implement code detecting max work size and optimal vector width. Use this to patch the kernel to suit the idea values for the card. Then use these values when invoking the kernel.

f54d2cc0

2011-06-22T23:07:30

Make poclbm use 4 vectors and decrease worksize to keep pipelines fullish. Make it possible to have 0 CPU threads and update docs. Fix counter with no cpu threads.

b4d2733c

2011-06-22T16:47:34

Convert to poclbm kernel.

8253f141

2011-06-23T23:38:04

Use some line breaks in the kernel.

thodg/cgminer/poclbm.cl

poclbm.cl

Log