Commit c0386ad4e3a70c9a905f990c576c23deb6310a35

Roman Lavrov 2024-04-05T11:56:17

AsyncWorkerPool releases shared_ptr<Closure> before notifying Parallel compile (MainLinkLoadTask, Program::LinkingState) is dependent on destructor getting called before the event is notified Repro: https://crrev.com/c/5425924 More details on the parallel compile case, provided by syoussefi@: """ A race condition caused the worker pool to sometimes be destroyed from a worker thread instead of the main thread. The race condition triggered in the following scenario: - The MainLinkLoadTask holds on to the worker pool - This is necessary for the main task to spawn further tasks asynchronously - The reference to the worker pool in MainLinkLoadTask is released by its destructor - The worker thread dequeues a task (i.e. MainLinkLoadTask) to execute and holds a reference to it. - Once the task is run by the worker thread, the worker thread signals its completion - (1) At this point, the scope holding the reference to the task closes and the task is released. However, this is done after signaling the task's completion. - On program destruction, the program ensures that all its tasks are complete - This uses the signal coming from the worker thread - (2) On display destruction, the worker pool is destroyed (by dereferencing it through the shared_ptr) - The destructor of the worker pool waits for the worker thread, with the expectation that this wait is done in the main thread. The race condition led to the assert firing when (2) was done before (1). Because the task is already signaled complete, the main thread considers it done and goes ahead with the destruction of the display. However, until the scope of the worker thread closes, the task itself is still not destroyed. Since the task is holding a reference to the worker pool, that prevents the worker pool from getting destroyed too. Once the display is destroyed, the worker thread closes its scope, causing the task to be destroyed. In turn, this leads to the worker pool itself to be destroyed. On destruction, the worker pool would wait for the worker thread to end which is a deadlock. Fortunately, this was caught earlier with an ASSERT that wanted to ensure destruction happens on the main thread. In this change, the worker thread ensures it releases the task before signalling it complete, avoiding this issue. Other possible solutions would have been: - Release the worker pool from MainLinkLoadTask as soon as the subtasks are scheduled - Explicitly call a "destroy" method on the pool, instead of relying on the destructor to clean up. """ Bug: angleproject:8661 Change-Id: I37c9bc8e8f05bce4062d794df449cc3d2c80a093 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5428806 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com>