Don't use asynchronous work with flushes as it decreases reliability and two threads per GPU achieves the same throughput.