Hash :
c0386ad4
Author :
Date :
2024-04-05T11:56:17
AsyncWorkerPool releases shared_ptr<Closure> before notifying
Parallel compile (MainLinkLoadTask, Program::LinkingState)
is dependent on destructor getting called before the event is notified
Repro: https://crrev.com/c/5425924
More details on the parallel compile case, provided by syoussefi@:
"""
A race condition caused the worker pool to sometimes be destroyed from a
worker thread instead of the main thread.
The race condition triggered in the following scenario:
- The MainLinkLoadTask holds on to the worker pool
- This is necessary for the main task to spawn further tasks
asynchronously
- The reference to the worker pool in MainLinkLoadTask is released by
its destructor
- The worker thread dequeues a task (i.e. MainLinkLoadTask) to execute
and holds a reference to it.
- Once the task is run by the worker thread, the worker thread signals
its completion
- (1) At this point, the scope holding the reference to the task
closes and the task is released. However, this is done after
signaling the task's completion.
- On program destruction, the program ensures that all its tasks are
complete
- This uses the signal coming from the worker thread
- (2) On display destruction, the worker pool is destroyed
(by dereferencing it through the shared_ptr)
- The destructor of the worker pool waits for the worker thread, with
the expectation that this wait is done in the main thread.
The race condition led to the assert firing when (2) was done before
(1). Because the task is already signaled complete, the main thread
considers it done and goes ahead with the destruction of the display.
However, until the scope of the worker thread closes, the task itself
is still not destroyed. Since the task is holding a reference to the
worker pool, that prevents the worker pool from getting destroyed too.
Once the display is destroyed, the worker thread closes its scope,
causing the task to be destroyed. In turn, this leads to the worker
pool itself to be destroyed. On destruction, the worker pool would wait
for the worker thread to end which is a deadlock. Fortunately, this was
caught earlier with an ASSERT that wanted to ensure destruction happens
on the main thread.
In this change, the worker thread ensures it releases the task before
signalling it complete, avoiding this issue. Other possible solutions
would have been:
- Release the worker pool from MainLinkLoadTask as soon as the subtasks
are scheduled
- Explicitly call a "destroy" method on the pool, instead of relying on
the destructor to clean up.
"""
Bug: angleproject:8661
Change-Id: I37c9bc8e8f05bce4062d794df449cc3d2c80a093
Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5428806
Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Commit-Queue: Roman Lavrov <romanl@google.com>