src/libANGLE/renderer/vulkan/ContextVk.cpp


Log

Author Commit Date CI Message
Charlie Lao 295ff607 2024-06-05T14:49:33 Vulkan: Precompute stageMask of kImageMemoryBarrierData Right now every time we need a pipelineStage in kImageMemoryBarrierData, we are doing a bitwise AND with mSupportedVulkanPipelineStageMask. This get called multiple times from barrier call. This CL adds mImageLayoutAndMemoryBarrierDataMap that has already precomputed all stageMask, thus avoid run time bitwise OR. This CL also precomputes the bufferWritePipelineStageMask so that flushImpl can be use it without construct every time. Bug: b/345279810 Change-Id: I878bd31c967cd217477061976f07df13b043fa7f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5601073 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com>
Igor Nazarov d0280f09 2024-06-03T13:21:06 Cleanup Program Executable post link task wait Frontend changes: - Add `ASSERT` into `ProgramExecutable::waitForPostLinkTasks()` to check that `mPostLinkSubTasks` must be empty after backend wait. - Add `UNIMPLEMENTED` into `ProgramExecutableImpl` `waitForPostLinkTasks()`, because this method is only required if backend fills `postLinkSubTasksOut` in `LinkTask`, and frontend must not call this method if `mPostLinkSubTasks` are empty. Vulkan backend changes: - Add public `ProgramExecutableVk::waitForComputePostLinkTasks()` with ASSERT to only allow usage with Compute executable. This change avoids confusion calling `waitForPostLinkTasksIfNecessary()` with `nullptr` in case of the Compute pipelines, which will always wait. - Rename `waitForPostLinkTasksIfNecessary()` to `waitForGraphicsPostLinkTasks()`, add ASSERT to only allow usage with Graphics executable, and change optional pointer to the `GraphicsPipelineDesc` to the reference. - Add `WaitableEvent::AllReady()` check into `ProgramExecutableVk` `waitForGraphicsPostLinkTasks()` when going to skip waiting, in order to process tasks (by calling wait) when all tasks are ready. Without this change, post link task may never be processed, causing repeated `GraphicsPipelineDesc` comparisons. - Replace `GraphicsPipelineDesc` hash comparison in `waitForGraphicsPostLinkTasks()` with `keyEqual()` call. This method is around 7 times faster, however effect on the overall performance will likely be unmeasurable. Changed for clarity. - Remove unnecessary `mWarmUpGraphicsPipelineDesc` reset from: `ProgramExecutableVk` initializer list (has default constructor), Compute warmup (already clean after constructor), wait post link tasks (not used when no tasks). - Other minor changes. Bug: angleproject:8297 Change-Id: I7d790da6712be013243083e314af75f82e73886d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5592474 Reviewed-by: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Roman Lavrov 818915a9 2024-05-27T10:53:35 PPO: Propagate dirty uniforms via mPPOProgramExecutables This eliminates the need for ProgramUniformUpdated notification Update mDefaultUniformBlocksDirty rather than just add a check to hasDirtyUniforms, as mDefaultUniformBlocksDirty can then be used elsewhere (for example, calcUniformUpdateRequiredSpace) Also adds ProgramExecutable.mIsPPO flag for an easy check for PPO without inspecting mPPOProgramExecutables Bug: angleproject:8666 Bug: b/335295728 Change-Id: I7725d02460104997df5c89a54d0e5ef3c3079946 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5569184 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 34b832a3 2024-05-24T13:38:58 Vulkan: Add RefCountedEvent recycler Previously the recycler was disabled due to race between resetEvent and setEvent. This CL splits mFreeStack into two list: mEventsToReset and mEventsToReuse. Events are first added to mEventsToReset list. Then at OutsideRenderPassCommandBufferHelper::flushToPrimary time, VkCmdResetEvents are added to reset all events in mEventsToReset list, and that reset operation is tracked by mResettingQueue. When reset command is completed, events moved into mEventsToReuse list. Since access to renderer's RefCountedEventRecycler requires lock, RefCountedEventCollector (a queue of events) is passed between ShareGroupVk and renderer's recycler to minimize the locked access. Bug: b/336844257 Change-Id: Iffac095729a81ba65a43df68cc9255d76e4be7c9 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5576757 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com>
Igor Nazarov 3467f0f2 2024-05-20T16:24:53 Vulkan: Cleanup QCOM foveated rendering extensions support Change removed `ContextVK::updateFoveatedRendering()` method because of the following: - Modifying `mGraphicsPipelineDesc` without also updating `mGraphicsPipelineTransition` may case usage of incorrect PSO. - Despite of the above, there is no bug, because the update itself is redundant. In all cases, where `updateFoveatedRendering()` was called, there also will be `onDrawFramebufferRenderPassDescChange()` call, that will call `mGraphicsPipelineDesc->updateRenderPassDesc()`. - The `onDrawFramebufferRenderPassDescChange()` will also call `invalidateCurrentGraphicsPipeline()` therefore, there is no need for this call in the `updateFoveatedRendering()`. - In all cases, the `onRenderPassFinished()` is called before `updateFoveatedRendering()`, therefore, there is no need to set `DIRTY_BIT_RENDER_PASS` bit. - All of the above made `updateFoveatedRendering()` completely redundant (maybe except for the ASSERT). Change also removed `mRenderPassDesc` update from `FramebufferVk::updateFoveationState()`. Note: similar update may be also removed when handling `shouldUpdateSrgbWriteControlMode`. Also fixes possible `mFoveationState` and `mCurrentFramebufferDesc` desynchronization in case if `updateFragmentShadingRateAttachment()` fails. Bug: angleproject:8484 Change-Id: If453bb6691e47aac3c11d0a5a6df696e885b64cb Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5573395 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Roman Lavrov a3057eed 2024-05-27T14:48:51 Vulkan: DIRTY_BIT_DESCRIPTOR_SETS in handleDirtyUniformsImpl For consistency between graphics and compute handling Bug: angleproject:7103 Change-Id: If6db0739d2f75d9e8e77bf88a05466e56d165a0a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5574006 Commit-Queue: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 1a9a703b 2024-05-21T11:14:40 Vulkan: Add DeviceQueueIndex to Context/BufferHelper/ImageHelper This CL adds a utility class DeviceQueueIndex, which encapsulates queueFamilyIndex and the queueIndex into one integer value so that we can pass around to barrier function. vk::Context and BufferHelper and ImageHelper class now keeps mCurrentDeviceQueueIndex instead of mCurrentQueueFamilyIndex. For All contexts by default it gets the default queue from renderer (which is always the one corresponding to Medium priority). For ContextVk, when priority changes it update mCurrentDeviceQueueIndex to match new context priority. Bug: b/337135577 Change-Id: I62cc483cfdb3e974d38db074e671c57299300074 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5555903 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao b22cce5f 2024-05-21T10:55:27 Vulkan: Remove Renderer::getDeviceQueueIndex Renderer::getDeviceQueueIndex() returns queueFamilyIndex. There is a function that already returns mCurrentQueueFamilyIndex, so this function is now removed. This CL also renames ImageHelper::isQueueChangeNeccesary to isQueueFamilyChangeNeccesary Bug: b/337135577 Change-Id: I3cd9ded1414d1389e162aaa5399c231a987f871e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5553067 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 0636b509 2024-05-06T12:36:20 Vulkan: Move RefCountedEvent GC and recycler to ShareGroupVk (2/3) One of the problem we had with RefCountedEvents is CPU overhead comes with it, and some part of the CPU overhead is due to atomic reference counting. The RefCountedEvents are only used by ImageHelper and ImageHelpers are per share group, so they are already protected by front end context share lock. The only reason we needs atomic here is due to garbage cleanup, which runs in separate thread and will decrement the refCount. The idea is to move that garbage list from RendererVk to ShareGroupVk so that access of RefCountedEvents are all protected already, thus we can remove the use of atomic. The down side with this approach is that a share group will hold onto its event garbage and not available for other context to reuse. But VkEvents are expected to be very light weighted objects, so that should be acceptable. This is the second CL in the series. In this CL, we added RefCountedEventsGarbageRecycler to the ShareGroupVk which is responsible to garbage collect and recycle RefCountedEvent. Since most of ImageHelper code have only access to Context argument, for convenience we also stored the RefCountedEventsGarbageRecycler pointer in the vk::Context for easy access. vk::Context argument is also passed to RefCounteEvent::init and release function so that it has access to the recycler. The garbage collection happens when RefCountedEvent is needed. The per renderer recycler is still kept to hold the RefCounteEvents that gets released from ShareGroupVk or when it is released without access to context information. Bug: b/336844257 Change-Id: I36fe5d1c8dacdbe35bb2d380f94a32b9b72bbaa5 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5529951 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com>
Charlie Lao 2e0aefe9 2024-05-06T11:03:19 Vulkan: Move RefCountedEvent GC and recycler to ShareGroupVk (1/3) One of the problem we had with RefCountedEvents is CPU overhead comes with it, and some part of the CPU overhead is due to atomic reference counting. The RefCountedEvents are only used by ImageHelper and ImageHelpers are per share group, so they are already protected by front end context share lock. The only reason we needs atomic here is due to garbage cleanup, which runs in separate thread and will decrement the refCount. The idea is to move that garbage list from RendererVk to ShareGroupVk so that access of RefCountedEvents are all protected already, thus we can remove the use of atomic. The down side with this approach is that a share group will hold onto its event garbage and not available for other context to reuse. But VkEvents are expected to be very light weighted objects, so that should be acceptable (If not, we can add some limit to the number of events it can hold in the garbage list). This is the first CL in the series. Before this CL, the RefCounteEvents are garbage collected at flushToPrimrary time, at which time we have lost ContextVk information. In order for us to do garbage collect to ShareGroupVk, we need to move the garbage collection process early, before command buffers leaving ContextVk's visibility. For OutsideRenderPassCommands, this is easy to do, we just call flushSetEvents before we call mRenderer->flushRenderPassCommands. For RenderPassCommands, that flushSetEvents call will simply make another copy of RefCountedEvents and add to the garbage list and the actual VkCmdSetEvents are defered at the executeSetEvents call that get called from flushToPrimrary time. Bug: b/336844257 Change-Id: I1948cd8240ff61d407931083b7584a54b1dc6b0d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5517891 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 5332ab8c 2024-05-10T19:48:34 Vulkan: Add RefCountedEventRecycler to vk::Renderer This CL adds event recycler in vk::Renderer to avoid the constant create and destroy of VkEvents. When RefCountedEvent is destroyed previously, it now goes into per renderer recycler. When RefCountedEvent is created previously, it now dips into this recycler and fetch it. Before we issue VkCmdSetEvent, if this event was from recycler, we also issue VkCmdResetEvent before VkCmdSetEvebt. When glFinish/EGLSwapBuffer is called or context gets destroyed, this recycler is purged to keep the free count under limit. Bug: b/336844257 Change-Id: I92ec1b183f708112a96c3d06fcfa265024f5aa04 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5519174 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com>
Mohan Maiya 1202055c 2024-05-09T11:25:38 Vulkan: Updates to perf counters 1. don't reset pipeline cache hit / miss counters after every frame 2. bugfix in LinkTaskVk::getResult(...) 3. accumulate counters in WarmUpTaskCommon::getResultImpl(...) Bug: angleproject:8297 Change-Id: I39d7e1a438d78e2e353c5cf8246fb24fb3cfefea Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5529942 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: mohan maiya <m.maiya@samsung.com>
Charlie Lao a96e9197 2024-04-25T10:35:02 Vulkan: Add RefCountedEvent class and VkCmdSetEvent call This CL defines RefCountedEvent class that adds reference counting to VkEvent. CommandBufferHelper and ImageHelper each holds one reference count to the event. Every time an event is added to the command buffer, the corresponding RefCountedEvent will be added to the garbage list which tracks the GPU completion using ResourceUse. That event garbage's reference count will not decremented until GPU is finished, thus ensures we never destroy a VkEvent until GPU is completed. For images used by RenderPassCommands, As RenderPassCommandBufferHelper::imageRead and imageWrite get called, an event with that layout gets created and added to the image. That event is saved in RenderPassCommandBufferHelper::mRefCountedEvents and that VkCmdSetEvents calls are issued from RenderPassCommandBufferHelper::flushToPrimary(). For renderPass attachments, the events are created and added to image when attachment image gets finalized. For images used in OutsideRenderPassCommands, The events are inserted as needed as we generates commands that uses image. We do not wait until commands gets flushed to issue VkCmdSetEvent calls. A convenient function trackImageWithEvent() is added to create and setEvent and add event to image all in one call. You can add this call after the image operation whenever we think it benefits, which gives us better control. (Note: Even if forgot to insert the trackImageWithEvent call, it is still okay since every time barrier is inserted, the event gets released. Next time when we inserts barrier again we will fallback to pipelineBarrier since there is no event associated with it. But that is next CL's content). This CL only adds the VkCmdSetEvent call when feature flag is enabled. The feature flag is still disabled and no VkCmdWaitEvent is used in this CL (will be added in later CL). Bug: b/336844257 Change-Id: Iae5c4d2553a80f0f74cd6065d72a9c592c79f075 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5490203 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Shahbaz Youssefi e4a12a67 2024-05-02T13:25:53 Vulkan: Dynamic depth test + static depth write When depth test changes in such a situation, the static state is still affected (because we mask depth write with depth test) so the graphics pipeline still needs to be invalidated. Bug: b/336386662 Change-Id: Iebba79ffd7d6fa3962a5b20c27efcca3aa35b10a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5511602 Commit-Queue: Yuxin Hu <yuxinhu@google.com> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Gowtham Tammana 298b8739 2024-04-15T13:58:01 Vulkan: Restrict the ContextVk dependency in CommandBufferHelper Updating the interfaces that have no need for ContextVk state and instead passing in the vk::Context. Bug: angleproject:8544 Signed-off-by: Gowtham Tammana <g.tammana@samsung.com> Change-Id: Id3b72d9eabb7d1d6ee89c46cdc24a23da9e32b5c Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5492319 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 80c8b6f0 2024-04-17T10:06:45 Revert "Vulkan: Only enable DS dynamic state if there is DS attachment." This reverts commit 471b50407d7d1c22491d066df77060cb8b9b2f89. The reverted change does not correctly handle UtilsVk functions, leading to validation failures. UtilsVk could be made to not set dynamic state when the depth/stencil attachments are missing, but instead the change is reverted because: - The original issue that prompted this is easily fixable (and fixed in this change) - Disabling depth/stencil dynamic state is not necessarily a performance improvement; every time a pipeline in such a render pass is bound, the driver would have to make sure to no-op the relevant state change if static, which is also costly. Instead, dynamic state may need to be set only once in the entire render pass. Bug: b/223456677 Bug: b/315353258 Bug: angleproject:8242 Change-Id: I8282b87857d6b9285dbcf307c3c6ecf69df5fadb Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5462079 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Mohan Maiya 924b40dc 2024-04-04T18:44:21 Selectively wait for post-link tasks in the frontend The frontend waits for post-link tasks only for a relink or in syncState when `disableProgramCaching` feature is not enabled. Bug: angleproject:8297 Change-Id: If7a3b8a10a2d01f82fd2bebac5c8f378be56e19e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5427001 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 103c1b53 2024-03-29T14:37:23 Vulkan: Drop MSRTT emulation dependency on independentResolveNone Usage of VK_RESOLVE_MODE_NONE was removed in [1], but dependency to this property was accidentally added in [2]. [1]: https://chromium-review.googlesource.com/c/angle/angle/+/2743666 [2]: https://chromium-review.googlesource.com/c/angle/angle/+/3353895. Bug: angleproject:4836 Change-Id: I25028b5d343686edd794acdac3714c4a6cb5fa17 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5407073 Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Mohan Maiya e4fe461f 2024-04-01T16:41:37 Vulkan: Apply mask during transition search When checking the transition cache for the shaders subset, mask transition bits with kShadersTransitionBitsMask Bug: angleproject:7369 Change-Id: Ic8e4ad00312d5e601dbfc0d84bbc76e809358427 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5410940 Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi b4cf07c3 2024-03-27T15:58:04 Vulkan: Move the interface pipeline library caches to share group When linking libraries into a pipeline, the linked pipeline lives in ProgramExecutableVk and may be shared between contexts in a share group. The caches for the vertex input and fragment output libraries thus cannot live in the context, but should remain alive until all contexts in the share group are destroyed. This change moves these caches to the share group. Bug: angleproject:8629 Change-Id: I2f7edf44d676505cf5e7e24640c6850c67f8b5e3 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5401514 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi c71a67de 2024-03-27T15:50:00 Vulkan: Move pipeline cache graph dump to renderer In preparation for moving some caches to the share group. Bug: angleproject:6565 Bug: angleproject:8629 Change-Id: I1a06a18417502e499da0edb9abb0d510e3ad99ce Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5401513 Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: mohan maiya <m.maiya@samsung.com>
Shahbaz Youssefi 9475ac40 2023-11-15T10:25:06 Vulkan: Make efficient MSAA resolve possible Prior to this change, using a resolve attachment to implement resolve through glBlitFramebuffer was done by temporarily modifying the source FramebufferVk's framebuffer description. This caused a good deal of complexity; enough to require the render pass to be immediately closed after this optimization. The downsides to this are: - Only one attachment can be efficiently resolved - There is no chance for the MSAA attachment to be invalidated In this change, resolve attachments that are added because of glBlitFramebuffer are stored in the command buffer, with the FramebufferVk completely oblivious to them. When the render pass is closed, either the FramebufferVk's original framebuffer object is used (if no resolve attachments are added) or a temporary one is created to include those resolve attachments. With the above method, the render pass is able to accumulate many resolve attachments as well as have its MSAA attachments be invalidated before it is flushed. For a FramebufferVk that is resolved in this way, there used to be two framebuffers created each time and thrown away as the code alternated between starting a render pass without a resolve attachment and then closing with one. With this change, there is now one framebuffer (without resolve attachments) that is cached in FramebufferVk (and is not recreated every time), and only the framebuffer with resolve attachments is recreated every time. Ultimatley, when VK_KHR_dynamic_rendering is implemented in ANGLE, there would be no framebuffers to create and destroy, and this change paves the way for that support too. WindowSurfaceVk framebuffers are still imagefull. Making them imageless adds unnecessary complication with no benefit. ----------------- To achieve efficient MSAA rendering on tiling hardware, applications should do the following: ``` glBindFramebuffer(GL_FRAMEBUFFER, msaaFBO); // Clear the framebuffer to avoid a load // Or invalidate, if not needed to load: // glInvalidateFramebuffer(GL_DRAW_FRAMEBUFFER, ...); glClear(...); // Draw calls // Resolve into the single sampled framebuffer glBindFramebuffer(GL_DRAW_FRAMEBUFFER, resolveFBO); glBlitFramebuffer(...); // Immediately discard the contents of the MSAA buffer, to avoid store glInvalidateFramebuffer(GL_READ_FRAMEBUFFER, ...); ``` The above would translate to the following Vulkan render pass: - MSAA LOAD_OP_CLEAR/DONT_CARE - MSAA STORE_OP_DONT_CARE - Resolve LOAD_OP_DONT_CARE - Resolve STORE_OP_STORE This makes sure the MSAA data doesn't leave the tile memory and greatly reduces bandwidth usage. Once anglebug.com/4892 is fixed, this would also allow the MSAA image to never be allocated either. Bug: angleproject:7551 Bug: angleproject:8625 Change-Id: Ia9f4d20863d76a013d8495033f95c7b39f77e062 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5388492 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 914fe61b 2024-03-15T13:20:49 Vulkan: Rename RendererVk.* to vk_renderer.* Done in a separate CL from the move to namespace vk to avoid possible rebase-time confusion with the file name change. Bug: angleproject:8564 Change-Id: Ibab79029834b88514d4466a7a4c076b1352bc450 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5370107 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Mohan Maiya 21d124c4 2024-03-16T10:06:02 Vulkan: Remove support for pipeline cache control For current and upcoming uses for pipeline caches the benefits of having an externally synchronized pipeline cache is minimal at best. Remove support for that and have all caches be internally synchronized by the Vulkan driver. Bug: angleproject:8601 Change-Id: Ic5d9740934641f61b527094aa301e27302b02a57 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5375102 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 60aaf4a0 2024-03-14T12:58:56 Vulkan: Move renderer to namespace vk This class is agnostic of EGL. This change moves it to namespace vk for use with the OpenCL implementation Bug: angleproject:8564 Change-Id: I57f7807d6af8b3d5d7f8efbaf8b5d537a930f881 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5371324 Reviewed-by: Austin Annestrand <a.annestrand@samsung.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Geoff Lang e5cb7f1f 2024-03-12T16:06:37 Vulkan: Fix access to inactive attributes ... within range of active ones. Since a buffer is bound for inactive attributes, it must be considered accessed. Ultimately, the nullDescriptor feature could be used to avoid binding a buffer for inactive attributes. Bug: chromium:327807820 Change-Id: Ieceea9442310c23568c47cef7357b4094b7ebbb4 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5369336 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Mohan Maiya 9bae5859 2024-03-13T10:55:18 Vulkan: Add blend factors to allow dithering to work Previously, we had - src: GL_SRC_ALPHA, dst: GL_ONE_MINUS_SRC_ALPHA Now, this adds - src: GL_ONE, dst: GL_ONE_MINUS_SRC_ALPHA This showed up in app "com.inertiasoftware.justjigsaws". Bug: b/328837151 Change-Id: I88208b1ed4dd050283d8d02cf31ccdcb3d02a444 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5369638 Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya 91ddf851 2024-03-03T10:57:22 Vulkan: support QCOM foveated rendering extensions Add support for foveated rendering in the vulkan backend. This is done by leveraging the VK_KHR_fragment_shading_rate extension. Bug: angleproject:8484 Change-Id: I0d01d07583f710b2302ea07b19c9d113c73bfe41 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5269907 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya b2773c11 2024-03-01T11:24:44 Vulkan: Bug fix in immutable sampler pipeline layout recreation An immutable sampler is tied to a sampler index and changing sampler uniform location value should force a recreation of the pipeline layout Bug: b/155487768 Bug: angleproject:5033 Bug: angleproject:5773 Tests: Texture2DTestES3.TexStorage2DMultipleYuvSamplersSwitch*Vulkan Change-Id: I82aaed332d7f87f11a2fd4923cfc004403ff0bd2 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/3657480 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Charlie Lao <cclao@google.com>
Shahbaz Youssefi 545e3f6e 2024-03-01T23:27:03 Vulkan: Decouple RendererVk from egl::BlobCache The new vk::GlobalOps class abstracts access to egl::BlobCache. This is a step towards decoupling RendererVk from egl::Display for direct use with OpenCL. Bug: angleproject:8564 Change-Id: I7b3910254430df74b889759639da1749735584a7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5332082 Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Geoff Lang <geofflang@chromium.org> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya b978974d 2024-03-03T10:48:48 Update frontend support for QCOM foveated extensions Modifications to frontend support - 1. EXTENDED_DIRTY_BIT_FOVEATED_RENDERING is removed 2. New framebuffer attachment API - getFoveationState 3. Attachment type restriction for foveated rendering is removed 4. Addition of new test - RenderbufferAttachmentClearThenDraw Bug: angleproject:8484 Change-Id: I699cbed81346c9a6344c4ff36afa51d6cc1bf052 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5338529 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 4e6fe5e0 2024-02-29T15:01:06 Vulkan: Cache ImageLoadContext in context This avoids the need to requery this from the display every time. Bug: angleproject:8564 Change-Id: Ied650e7789741f59b7662c0f97c55132b105778d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5332074 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Amirali Abdolrashidi 2ee295b4 2024-02-15T11:27:39 Vulkan: Add per-level image update tracker * Add a per-level image write tracker to ImageHelper. * It tracks the updates scheduled for different parts of the image. Within each level, it also tracks different layers, currently up to 64. * kMaxParallelSubresourceUpload renamed to kMaxParallelLayerWrites; moved to vk_helper header. * It is reset when a barrier is issued for the image. * Modified ImageHelper::recordWriteBarrier(). * Added isWriteBarrierNecessary(). * Now it checks the added writes for the image. It will no longer issue a barrier if the image is in the same layout and there is no write to a part of the image to which was previously written. * Added ReadImageSubresources to CommandBufferAccess. * It is used for layouts that allow both reading and writing to the image (including self-copy): * TransferSrcDst (used in CopyImageSubData) * ComputeShaderWrite (used in compute-based mipmap generation) * CommandBufferImageWrite -> CommandBufferImageSubresourceAccess * Updated onImageSelfCopy() args to include read subresource data. * Improves gpu_time for TextureUploadETC2TranscodingBenchmark perf test * Windows/NVIDIA: ~180609 ns -> ~62669 ns (~2.88x) * Linux/NVIDIA: ~157283 ns -> ~93360 ns (~1.68x) * Windows/Intel: ~72297 ns -> ~57153 ns (~1.27x) * Added a test to show that self-copy for a write-after-read works. * ArraySelfCopyImageSubDataWithWriteAfterRead * (ArraySelfCopyImageSubData covers RAW hazards; renamed) Bug: b/308455694 Change-Id: I5cef296d991ce6ec02792edc3ffc5cc4994831e1 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5301855 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Gowtham Tammana bcf814fd 2024-02-02T10:30:34 Vulkan: Constrain the dependency on ContextVk in BufferHelper Make the BufferHelper interface be not dependent on ContextVk state. This makes the interface to be suitable for implementation of other APIs with Vulkan backend. Any dependency on ContextVk is made explicit and handled in ContextVk. Bug: angleproject:8544 Change-Id: I8b285f54c8758a26dd7edf27b1371f9afcf7e241 Signed-off-by: Gowtham Tammana <g.tammana@samsung.com> Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5303573 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Mohan Maiya 3ca8befb 2024-02-14T12:35:08 Vulkan: Handle multi-context apps in pipeline cache graphs Append a monotonically increasing counter to filename so apps and benchmarks with multiple contexts don't clobber each others files. Bug: angleproject:6565 Change-Id: I5c781895e1ec8cc65728aa752e28fb2acb02abe9 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5297288 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: mohan maiya <m.maiya@samsung.com>
Mohan Maiya 6607a2b9 2024-01-17T15:58:20 Vulkan: Add support for VK_EXT_vertex_input_dynamic_state Hook into VK_EXT_vertex_input_dynamic_state so pipeline states that differ only in vertex input state can reuse existing pipelines. Bug: angleproject:7162 Tests: StateChangeTestES3.Vertex* Change-Id: Icd3134dee93fc5fc2e9d284fcfa8c674b62faec8 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5207462 Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi b380ed1f 2024-02-14T09:31:26 Vulkan: Add EGL_ANGLE_global_fence_sync Chrome has an implicit assumption that due to context virtualization, signaling a fence in one context results in synchronization with _all_ contexts that have previously made submissions. This is not per EGL spec, but the functionality is easily implementable in the Vulkan backend. In the Vulkan backend, each context is given its own "timeline" of submissions (tracked by serials associated with "indices"). The required functionality is implemented through a new EGL fence sync object whose sole difference is that it synchronizes with all the existing timelines rather than the one of the current context. Bug: b/318721705 Change-Id: I6c45d065e592d0d4ed627ce9695196b1086d5021 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5297396 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi dbc6bd9d 2024-02-12T14:07:49 Reland "Vulkan: Fix alignment issues with SecondaryCommandBuffer" This is a reland of commit e53270c9ca1afe393d6d7d0359e81cf6755b6ca5 Original change's description: > Vulkan: Fix alignment issues with SecondaryCommandBuffer > > This solves undefined behaviour on 64-bit systems. This inflates the > size of a few commands, but most commands either already did align to 8 > bytes or could be aligned to 8 bytes with a few tweaks. > > Bug: angleproject:7852 > Change-Id: Ie61976d5bf8df7790acd95c0e15d4c79402622a1 > Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5288636 > Reviewed-by: Charlie Lao <cclao@google.com> > Reviewed-by: Yuxin Hu <yuxinhu@google.com> > Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Bug: angleproject:7852 Change-Id: Ie206e66fc21c5db7c9e67eb478d9cddada5db8e0 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5296376 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuly Novikov <ynovikov@chromium.org>
Roman Lavrov c603a4f1 2024-02-08T10:53:27 Don't perf warn about ETC1->ETC2 emulation as it is efficient Format is forwards compatible: https://crsrc.org/c/third_party/angle/src/libANGLE/renderer/gl/formatutilsgl.cpp;drc=21f16cb16333802dfa942d67cac59885f904301d;l=701 Added hasInefficientlyEmulatedImageFormat() helper Bug: b/302115557 Change-Id: Ibc82c27ecf4e3afbfaac52cb45bdda776c50b4b3 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5278562 Commit-Queue: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya d05c9a5e 2024-01-25T13:01:49 Frontend support for QCOM foveated extensions Add frontend state management to support foveated rendering extensions. Bug: angleproject:8484 Test: Texture2D*Foveation* Change-Id: I0e1be9f11b2d442207674562da760f5bfd7debc8 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5208091 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Shahbaz Youssefi ecc35205 2024-01-25T23:58:25 Move uniform block dirty bits to State When glUniformBlockBinding changes the mapping from a program uniform block to a buffer binding, all contexts in the share group need to reprocess the affected block index. Prior to this change, the dirty bits that indicated which blocks have their mapping redefined were placed in the program executable, and were reset by the first context that processed them. As a result, the other contexts in the share group where not aware of such modifications. Similarly, when a buffer changed in one context, the mapped program blocks were marked dirty, with similar cross-context issues. In this change, the dirty bits are moved to State, so every context would react to these changes. Bug: angleproject:8493 Change-Id: I5712002224cbc4a576bf2ac46e8e75f26ebc5b2a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5238991 Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 0c4d6446 2024-01-24T10:38:45 Rework uniform block <-> uniform buffer mapping In GLES, the shader declares which buffer binding a block (uniform, storage or atomic counter) is bound to. For example: layout(binding = 1) uniform ubo0 { ... }; layout(binding = 2) uniform ubo1 { ... }; layout(binding = 1) uniform ubo2 { ... }; In the above, ubo0 and ubo2 use data from the buffer bound to index 2 (through glBindBufferRange), while ubo1 uses data from the buffer bound to index 1. For uniform blocks in particular, omitting the binding is allowed, in which case it is implicitly bound to buffer 0. GLES allows uniform blocks (and only uniform blocks) to remap their bindings through calls to glUniformBlockBinding. This means that the mapping of uniform blocks in the program (ubo0, ubo1, ubo2) to the buffer bindings is not constant. For storage blocks and atomic counter buffers, this binding _is_ constant and is determined at link time. At link time, the mapping of blocks to buffers is determined based on values specified in the shaders. This info is stored was stored in gl::InterfaceBlock::binding (for UBOs and SSBOs), and gl::AtomicCounterBuffer::binding. For clarity, this change renames these members to ...::inShaderBinding. When glUniformBlockBinding is called, the mapping is updated. Prior to this change, gl::InterfaceBlock::binding was directly updated, trumping the mapping determined at link time. A bug here was that after a call to glProgramBinary, GL expects the mappings to reset to their original link-time values, but instead ANGLE restored the mappings to what was configured at the time the binary was retrieved. This change tracks the uniform block -> buffer binding mapping separately from the link results so that the original values can be restored during glProgramBinary. In the process, the support data structures for tracking this mapping are moved to ProgramExecutable and the algorithms are simplified. Program Pipeline Objects maintain this mapping identically to Programs and no longer require a special and more costly path when a buffer state changes. This change prepares for but does not yet fix the more fundamental bug that the dirty bits are tracked in the program executable instead of the context state, which makes changes not propagate to all contexts correctly. Bug: angleproject:8493 Change-Id: Ib0999f49be24db06ebe9a4917d06b90af899611e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5235883 Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya ac71a592 2024-01-23T11:56:42 Vulkan: updates to pipeline cache graph dumping logic 1. To dump pipeline cache graph you need to - 1. add "angle_dump_pipeline_cache_graph" compile time flag to args.gn 2. set Android property "angle.dump_pipeline_cache_graph" or envvar ANGLE_DUMP_PIPELINE_CACHE_GRAPH on non-Android platforms before app start 2. Default path for dump on Android is "/data/local/tmp/angle_dumps/" 3. "angle.pipeline_cache_graph_dump_path" Android property or envvar ANGLE_PIPELINE_CACHE_GRAPH_DUMP_PATH on non-Android platforms can be used to configure the dump path Bug: angleproject:6565 Change-Id: I38848aff58f413dd7bdffc9083116bd4b95e4960 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5226054 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: mohan maiya <m.maiya@samsung.com>
Shahbaz Youssefi b007c74d 2024-01-23T14:17:54 GL: Separate dirty bits leading to glUniformBlockBinding The GL backend is special in that it needs to make actual calls (native glUniformBlockBinding) in response to (application) glUniformBlockBinding calls. The other backends just remap the bindings based on that information when creating descriptor sets. Previously, an optimization to track which bindings have changed used the same dirty bits that were used to signify when the GL backend needs to make these native calls. That ended up as a source of bugs. In a previous change [1], the context DIRTY_BIT_UNIFORM_BUFFER_BINDINGS is set when these mappings change, which fixes some of these issues. That change obviates the need for an actual backend sync of programs, except for GL programs that need to make these native calls. This change splits the dirty bits maintained for the purposes of the GL backend, moves them to that backend and removes the program backend sync. [1]: https://chromium-review.googlesource.com/c/angle/angle/+/5228599 Bug: angleproject:8493 Bug: b/318806125 Change-Id: I73c6514e88a116f1cd701cb06da0d8c38f07f7f6 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5230137 Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi e00995d4 2023-12-21T15:57:39 Vulkan: Invalidate pipeline with FBO draw buffer change Enabling/disabling draw buffers can affect the graphics pipeline without changing the render pass description. In that case, the graphics pipeline was not being invalidated. Bug: angleproject:8463 Change-Id: I6848472dcbb3d3ce4c34d95be28c8ec3fc50dcd7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5147847 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: mohan maiya <m.maiya@samsung.com>
Shahbaz Youssefi a1143857 2023-12-18T15:24:42 Fix UBO dirty bits vs PPOs This change fixes propagation of UBO dirty bits (such as through glUniformBlockBinding and glBindBufferRange) to program pipeline objects. Since PPOs concatenate the attached programs' UBOs in a list, a map of program UBO indices to PPO UBO indices is introduced to offset these dirty bits appropriately. Additionally, when the program's executable's buffer bindings change (through glUniformBlockBinding), a notification is send to the PPO to update its executable's buffer binding accordingly (which is otherwise only updated during PPO link). Bug: angleproject:8462 Change-Id: I4965ae23e6fc6cac0842e1643755e42e95d3d5cc Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5131418 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Cody Northrop <cnorthrop@google.com>
Shahbaz Youssefi 4bf40237 2023-12-18T15:24:15 GL: Fix missing glUniformBlockBinding handling When a program is current and this call is made, the program is made dirty so that the GL backend reacts to this call. Prior to https://chromium-review.googlesource.com/c/angle/angle/+/4922969, the program was made dirty when its executable was installed as well (if it had any UBOs dirty), but that change removed it. As a result, if this call was made while the program was _not_ current, the GL backend would miss processing it. This call ensures that the appropriate dirty bit is set when the program is made current again. This revealed a bug in the Vulkan backend where sometimes the executable's dirty bits would not get reset. This was benign but fired an assertion, and is fixed in this CL as well. Bug: chromium:1511506 Change-Id: Iae86ba0aa5b8f9e4f20dd6df6002d37e405280e7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5123005 Reviewed-by: Cody Northrop <cnorthrop@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Shahbaz Youssefi 74f9da02 2023-12-04T19:53:25 Vulkan: Remove spam about depth/stencil feedback loop It triggers a warning even if the application is appropriately using BASE and MAX levels to avoid feedback loop. Bug: b/289436017 Change-Id: Ie7e8281908802e91dfaad1b49dd95197ac6de1a1 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5086070 Commit-Queue: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Shahbaz Youssefi 3680a5dc 2023-11-17T13:51:07 Vulkan: Let program warmup continue passed link The warmup task does not actually affect the link results, so there is no reason to wait for it when the application queries the link status. This change allows the warm up task to continue in parallel until the program is used at draw time. This allows the warm up to be more efficient when the link itself is not parallelized. For applications that create programs in the middle of every frame, it's still likely best to disable warm up (as the following immediate draw will already effectively do the warm up). Note that currently the warm up code in the Vulkan backend is not completely thread-safe, and so the program still blocks on that task before the first draw can happen (or the program is modified in any way). Bug: angleproject:8417 Change-Id: I0877fef39a0585c3279e32699ce817d4643d7cd6 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5037538 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Roman Lavrov 8b4901d0 2023-11-06T10:43:14 Avoid GLenum conversion in BlendStateExt blend and equation The following functions now return value as is without ToGLenum conversion (that is often unnecessary): getEquationColorIndexed getEquationAlphaIndexed getSrcColorIndexed getDstColorIndexed getSrcAlphaIndexed getDstAlphaIndexed (at least) getEquationColorIndexed is on the hot path with noticeable performance impact; this CL also moves the implementation to the header to allow inlining. Bug: b/300968773 Change-Id: Ie223abe14b12afd7844686863ee5806945d10e45 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5008031 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com>
Hailin Zhang 2ee06400 2023-10-31T10:57:02 Vulkan: fix dynamic buffer alignment issue. Bug: b/306763053 Change-Id: I2f15fe2a2a36a9f55686987641e6afddb44ec9aa Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4994410 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Hailin Zhang <hailinzhang@google.com>
Charlie Lao 471b5040 2023-10-26T09:33:29 Vulkan: Only enable DS dynamic state if there is DS attachment. This is discovered while investigating EXT_yuv_target crash in driver. What happens is that UtilsVk::copyImage does not set depth stencil dynamic state since there is no depth stencil attachment. But we enabled dynamic state for D/S, thus driver still does D/S state setup, which sees garbage data and hitting assertion. Even though this is discovered with EXT_yuv_target test, I believe this is a general issue. This CL adds the renderPassDesc.hasDepthAttachment() and hasStencilAttachment() check and enable depth or stencil related dynamic state only if there is depth or stencil attachment. This fixes crash in driver with ImageTestES3.ClearYUVAHB test. This also has added performance benefit that we now completely skips depth/stencil related dynamic state dirty bit handling code, thus reduces state processing CPU overhead. Bug: b/223456677 Change-Id: I3a4fe6d97b14c066d78f8b8ded21c626cb2f376c Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4980765 Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 58ffa778 2023-10-11T09:41:23 Vulkan: Implement YUV_TARGET use VK_ANDROID_external_format_resolve This implements EXT_YUV_TARGET using VK_ANDROID_external_format_resolve extension. This CL is based on Chris Forbes's CL on android gerrit. Bug: b/223456677 Change-Id: Ieb6970a0787b0c2a72a76b208695a678d2c79e80 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4857459 Reviewed-by: Chris Forbes <chrisforbes@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Cody Northrop 1ea49a22 2023-10-13T11:28:41 Move uniform dirty bits to ProgramExecutable Rather than try to funnel them through Program and ProgramPipeline to the executable in the backend, just move them to ProgramExecutable in the front end. This fixes Dota Underlords at the same time due to not needing to set the Program dirty to propagate bits. Test: Dota Underlords Test: ProgramPipelineTest31.ProgramPipelineBindBufferRange Bug: b/299532942 Change-Id: Ic73c45608e22f89ca400ebf684f8cd287ed2f43a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4922969 Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Cody Northrop <cnorthrop@google.com>
Charlie Lao 2608c622 2023-10-06T13:32:49 Vulkan: Refactor SharedGarbageList into templated class This CL mostly involves non-functional changes to prepare for next CL. No behavior change is expected. This CL wraps the garbage list into its own templated class which maintains std::queue and tracks number bytes in the queue etc. This CL also renames SharedBufferSuballocationGarbageList to BufferSuballocationGarbageList to reduce verbosity a bit. This CL deleted GarbageAndQueueSerial and GarbageQueue since they are no longer being used. This renames vk::GarbageList to vk::GarbageObjects to reduce name confusion with SharedGarbageList. Bug: b/302739073 Change-Id: I7370c147847ffe69ad8aa3b48251d8b5762f97f9 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4919816 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Hailin Zhang <hailinzhang@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Roman Lavrov 53e37a3e 2023-09-29T14:15:37 Replace mActiveTextures.fill(nullptr) with memset std::array::fill yields unoptimized, unrolled loop with 8 byte increments. Surprisingly high (>2%) effect on the instructions counter in my tests (first frame of real_racing3 trace) The number of samples hitting this spot in profiling is also signficantly reduced. Impact on power harder to judge due to noise but does seem to be a bit better. Added a FillWithNullptr utility to check that nullptr is 0 and used it in other places where fill(nullptr) was used in ContextVk. It's not used elsewhere in vulkan. Bug: b/302708437 Change-Id: If7fab66d858bc10ca356418d2ab26232bb9a9ce7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4902288 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Roman Lavrov 34c8778b 2023-09-26T13:45:06 Use atomic counters early in perf warning macros Before this CL, snprintf was called repeatedly to format the warning message which was then discarded after 4 logs. snprintf showed up in profiling at ~2% and this CL appears to yield an ~8% power improvement in one of the traces (egypt_1500). A mutex was previously used to avoid the race condition on the static sRepeatCount variable. This CL avoids the need for that by using static atomics instead. Also updated the Debug macro to use the VK macro vararg approach so that formatting only happens when the message is actually logged. Bug: b/302112423 Change-Id: Ia8a18361cfb5a9f2aa19ff939499754ba861efb7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4886388 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Roman Lavrov <romanl@google.com>
Yuxin Hu 9fc3baf5 2023-07-26T10:29:13 Add the missing GraphicsPipelineDesc legacy dither bit update When the app calls glEnable(GL_DITHER) or glDisable(GL_DITHER), we need to update the legacy dither bit of the renderpass that belongs to ContextVk::mGraphicsPipelineDesc. If not, there is a change that the graphics pipeline will be created with a renderpass that has outdated legacy dither bit. This results the dither being applied to the render results incorrectly: e.g. the app calls glDisable(GL_DITHER), but the render results have dithering applied. Bug: b/286921997 Bug: b/292282210 Bug: b/293349058 Bug: b/284462263 Change-Id: Ie24b95898526c9021be6e3cb7620e4050f9faaaf Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4722446 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Roman Lavrov <romanl@google.com> Commit-Queue: Yuxin Hu <yuxinhu@google.com>
Amirali Abdolrashidi e1d2e88a 2023-09-20T11:53:15 Check pending garbage after some buffer releases * Embedded BufferHelper::releaseBufferAndDescriptorSetCache() inside a new ContextVk method: releaseBufferAllocation() * After releasing the buffer, there is a check for excess pending garbage. If the tracked pending garbage size is larger than the threshold, the context will be flushed. * Unskipped the test "BufferDataInLoopManyTimes", which was failing on Android devices. Bug: b/280304441 Change-Id: Ib34319f3291dd2200fc1a92e30645f9d1da8e2b9 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4879086 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 9ca025d2 2023-09-18T15:50:48 Flatten BufferVariable/ShaderVariableBuffer/InterfaceBlock struct InterfaceBlock inherits from ShaderVariableBuffer, ShaderVariableBuffer is not a trivially copyable struct, this made InterfaceBlock not trivially copyable. InterfaceBlock is being used by some app traces for uniform blocks. BufferVariable inherits from sh::ShaderVariable which is very complicated and not trivially copyable. This CL flattens all of these three structs to simple structs without inheritance, and wraps all trivially copyable data into one POD struct, thus load/save are cheaper. Bug: b/275102061 Change-Id: I96f89176ce3d3131cb1d3ea3280c3c36c257560f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4874610 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Yuxin Hu 53e3ce59 2023-09-08T17:41:56 Add device lost handle after finishImpl It is possible that the during context destroy, when calling finishImpl, the vulkan device is lost, e.g. https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/6447#issuecomment-1711479164. We should check vulkan device lost after finishImpl(). Bug: b/289544394 Change-Id: I75aa650cdd38d81815f7354770639e896e3376a7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4854763 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Yuxin Hu <yuxinhu@google.com>
Amirali Abdolrashidi 91ef1f3c 2023-09-08T16:39:53 Move buffer suballocation callers to ContextVk * Moved the following functions from BufferHelper to ContextVk. * initBufferForBufferCopy() * initBufferForImageCopy() * initBufferForVertexConversion() Bug: b/280304441 Change-Id: I890f4396b00b0c20feb44f0ad113c55924ce1014 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4854760 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi a1f52f1b 2023-09-07T14:44:24 Vulkan: Flush pending image garbage more often * Added a counter to the context object to keep track of the size of the pending image garbage: mEstimatedPendingImageGarbageSize. * Modified hasExcessPendingGarbage() to use the sum of the size of the image and and suballocation garbage. * RendererVk::calculatePendingGarbageSizeLimit() provides the limit. * Currently the limit is based on the available heap sizes. It will use a fraction of the largest memory heap size. * The portion is currently kGarbageSizeLimitCoefficient = 0.2f. * Unskipped the test "TextureDataInLoopManyTimes", which was failing on Android devices. Bug: b/280304441 Change-Id: Ibcced1d118ea8a1f347028b62d29cfbd9e38e8c0 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4851252 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Amirali Abdolrashidi 27896999 2023-08-16T16:44:22 Vulkan: Flush pending suballoc garbage more often * Added a counter to the renderer object to keep track of the pending suballocation garbage. * mPendingSuballocationGarbageSizeInBytes * Once it surpasses a limit (mPendingSuballocationGarbageSizeLimit), it will flush the context so the pending garbages can be freed. * Currently the limit is based on the available heap sizes. It will use a fraction of the largest memory heap size. * The portion is currently kGarbageSizeLimitCoefficient = 0.2f. * At the end of the render pass, it is checked if the limit has been reached. If so, context flush will occur. Bug: b/280304441 Change-Id: I08e6028cfe20059ece2b2e4e971ece897544cd6d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4787950 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Shahbaz Youssefi b4852ef9 2023-02-08T14:18:06 Vulkan: Drop support for Vulkan 1.0 Bug: angleproject:7959 Change-Id: Ib673679ea1a503af22b37092dbff1ee1fd34fba6 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4233092 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Ian Elliott <ianelliott@google.com>
Amirali Abdolrashidi e7418836 2023-08-16T14:25:52 Vulkan: Add context flushing as OOM fallback * As a new fallback for out-of-memory errors, if an allocation results in device OOM, the context is flushed and the allocation is retried. * Functions related to buffer/image allocations now return a VkResult value instead of angle::Result, which will be bubbled up to a higher level for safer handling. * The OOM is no longer handled at the level where the allocation happens, but is moved up to the context. * Added two functions to ContextVk for allocating memory for images and buffer suballocations, which also include the fallback options. * initBufferAllocation(): Uses BufferHelper::initSuballocation() * initImageAllocation(): Uses ImageHelper::initMemory() * Moved initNonZeroMemory() out of the following functions: * BufferHelper::initSuballocation() * Moved to ContextVk::initBufferAllocation(). * ImageHelper::initMemory() * Moved to ContextVk::initImageAllocation(). * Also moved to new function: ImageHelper::initMemoryAndNonZeroFillIfNeeded(). This function replaced the rest of initMemory() usages outside initImageAllocation(). * New macros for memory allocation * VK_RESULT_TRY() * If the output of the command inside it is not VK_SUCCESS, it will return with the error result from the command. * VK_RESULT_CHECK() * If the output of the command inside it is not VK_SUCCESS, it will return with the input error. * Added a test in which allocation would fail due to too much pending garbage without the fix on some platforms. The test ends once there has been a submission. * New suite: UniformBufferMemoryTest * Added a similar test for flushing texture-related pending garbage. * New suite: Texture2DMemoryTestES3 Bug: b/280304441 Change-Id: I60248ce39eae80b5a8ffe4723d8a1c5641087f23 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4787949 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 48e2c605 2023-09-07T14:30:46 More instances of program usage converted to executable Bug: angleproject:8297 Change-Id: I8e4eeef8f4f20610bbe0f994ce1141c17d588765 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4850888 Reviewed-by: Shahbaz Youssefi <syoussefi@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Shahbaz Youssefi 6698fb69 2023-08-25T22:21:32 Vulkan: Stop passing both ProgramExecutable and ...Vk around Now that ProgramExecutableVk is accessible through ProgramExecutable. Bug: angleproject:8297 Change-Id: Ie08770ef97400195d63b87f2d4b7e2a2c8f4ad24 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4812147 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi bb135f0e 2023-08-24T15:29:11 Make ProgramExecutableImpl managed by ProgramExecutable This change allows both parts of the program executable to be safely backed up and swapped on link. Bug: angleproject:8297 Change-Id: I17e4b6c05e4e481a66a227d6047dbf943d2c2603 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4812138 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Shahbaz Youssefi 571b4cdb 2023-08-14T16:55:28 Vulkan: Move pipeline/desc-set layout creation to link job The pipeline and desc-set layout caches are consequently made thread-safe. The reference counter on the layouts are also made atomic. With this change, practically all of the link in the Vulkan backend is moved to the link job. Bug: angleproject:8297 Change-Id: Iba694ece5fc5510d34cce2c34441ae08ca5bb646 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4774787 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao b8d5a423 2023-08-21T14:43:42 Add static_assert(std::is_trivially_copyable<LinkedUniform>(),"") Since we are using memcpy for LinkedUniform, it is desirable to utilize compile time assertion to ensure that in future if anyone modifies POD struct (and the class of data members of POD struct) and made it that no longer memcopy-able, we would immediately caught at compile time. std::is_trivially_copyable<>is exactly for this reason. In order to make this work, the POD struct and any data it uses can not have user defined copy constructor. The problem is that right now ANGLE is using clang_use_chrome_plugins=true, and chrome-style generates warnings if the complex struct (has more than 10 data members) does not define a copy constructor, and that warning causes build failure with -Werror. So clang_use_chrome_plugins=true and std::is_trivially_copyable have this conflicting requirements that I can not apply both. This has been raised to compiler team, but before we get a solution from them, if we have to make a choice, I think the better choice is to disable clang_use_chrome_plugins and apply std::is_trivially_copyable, since the later is more critical to ensure safety, while chrome-style is mostly trying to minimize the code size, but won't affect correctness/robustness. This CL sets clang_use_chrome_plugins to false, and removes the copy constructor and copy assignment operator from BitSetT and LinkedUniform and added static assertion is_trivially_copyable for LinkedUniform. Same thing applied to ProgramInput as well. In future once we have a better solution from compile team, we can re-enable clang_use_chrome_plugins and disable only for structs that requires is_trivially_copyable assertion. Bug: b/275102061 Change-Id: If33415ea61deda568d855a7dd6a4fd6042058be5 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4799342 Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 745023ef 2023-08-14T10:47:11 Vulkan: Ensure mComputeDirtyBits is set for potential submission. When ContextVk::flushOutsideRenderPassCommands is called and we run out of serial numbers reserved for outsideRPCommands (which means we have an already started renderpass)., we will call flushCommandsAndEndRenderPass so that we can have new queue serials for both renderPass and outsideRP commands. When this happens, the current bug is that we will not add mNewComputeCommandBufferDirtyBits to mComputeDirtyBits. If another thread comes in did the submission, and then this context calls dispatchCompute again without any state change, we will get a new primprary command buffer without dirty bits for the new command buffer. This CL ensures we always add mNewComputeCommandBufferDirtyBits immediately after mRenderer->flushOutsideRPCommands call. Bug: b/295533354 Change-Id: I1c672310b3b00cd9be25b5ee55a0a060239102a9 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4778445 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Hailin Zhang <hailinzhang@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 16cfa28e 2023-08-08T22:08:24 Vulkan: Basic infra for parallel link This change moves pipeline warm up to a parallelizable task, mostly as an exercise to put in the infrastructure for parallel link in the Vulkan backend. Follow up changes will move more of the link step to this task. The end goal is to be able to make the link task independent of ContextVk, which would allow it to be run as an UnlockedTailCall, even if not using a worker thread. Bug: angleproject:8297 Change-Id: I17047162b2a41f0d681d9e3ee33f2e0239b4280d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4764231 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Yuxin Hu a990ba34 2023-08-02T17:21:00 Fix write out of bounds on non robust contexts crashes dEQP-EGL.functional.robustness.reset_context.shaders.out_of_bounds_non_robust.reset_status.writes.* tests are failing, because the test expectes EGL_SUCCESS upon eglMakeCurrent(EGL_NO_CONTEXT), regardless of whether the context was lost. This CL: 1) Changes the validation function of eglMakeCurrent: if the EGLContext passed to eglMakeCurrent is EGL_NO_CONTEXT, do not return EGL_CONTEXT_LOST even if the context is already lost. 2) Adds a lost context check in checkOneCommandBatch. If the context is lost, do not check fence status and assume all of the vulkan commands have finished execution, so that we can properly destroying all the resources before destroying the context. 3) Changes the GL error code from GL_INVALID_OPERATION to GL_CONTEXT_LOST when there is a vulkan device lost. Bug: b/286921997 Bug: b/289544394 Change-Id: I91e8a4105f0d7a3ec3b59bae58da80bc64ffa94a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4728466 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Roman Lavrov 938ee1e8 2023-07-21T16:16:23 Vulkan: legacy_dithering disallow reactivate when breaking RP Hitting the assert in dEQP GLES2.functional.fragment_ops.random.0: https://crsrc.org/c/third_party/angle/src/libANGLE/renderer/vulkan/ContextVk.cpp;drc=52fe3116ead9a5de51ddad17fcb14bf8ecb3a69d;l=2347 Bug: b/292259684 Change-Id: Ib40b90dde3b271c714b6181e4ba4d70f3e1b5e86 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4706174 Reviewed-by: Charlie Lao <cclao@google.com> Auto-Submit: Roman Lavrov <romanl@google.com> Commit-Queue: Roman Lavrov <romanl@google.com>
Shahbaz Youssefi 52fe3116 2023-07-17T16:20:54 Vulkan: Deduplicate share group's context set tracking Bug: angleproject:8224 Change-Id: I7a59a37229682fb91ff777f31e02e05d7ab2b80f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4690345 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 9f9284b7 2023-07-17T15:41:27 Move ShareGroup to its own files Bug: angleproject:8224 Change-Id: Id6d272018bb5ee8c3e35488f641efa4d99fa836d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4690003 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Roman Lavrov c0f2f71e 2023-06-27T16:00:09 Use VK_EXT_legacy_dithering when available instead of emulation Yields improvement in gpu power: http://b/284462263#comment45 Bug: b/284462263 Change-Id: I5bfd115557b6baac17c05639118feaebf19c5cd4 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4652590 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com>
Charlie Lao 6ffd0d20 2023-07-12T12:09:45 Vulkan: Clean up depth stencil feedback mode part 2 Right now the tracking of depth stencil buffer readOnly or feedback loop is in FramebufferVk class. This really belongs to ContextVk, since it is not a permanent state of framebuffer, but current state of context. This CL moves it to ContextVk and changes to use BitSet instead of four boolean. Bug: b/289436017 Change-Id: I955c439259935f82eff30ddfff776a69723e5d0d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4679886 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Charlie Lao a33ec5dd 2023-07-11T18:01:12 Vulkan: Clean up depthStencil feedback loop implementation Part1 This is first clean up effort for depth stencil feedback loop implementation. This CL moves updateRenderPassStencilReadOnlyMode and updateRenderPassDepthReadOnlyMode methods from FramebufferVk to RenderPassCommandBufferHelper class. The method is actually updating renderPass's state, not FramebufferVk's state. In the next CL, FramebufferVk will be removed from the argument as well. With this change, I also removes updateStartedRenderPassWithDepthMode() and updateStartedRenderPassWithStencilMode() to use updateStartedRenderPassWithDepthStencilMode() directly. This CL is mechanical changes only, no behavior chnage is expected. Bug: b/289436017 Change-Id: Id3960f973a7115c05ebea199cb8ef802e995941a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4679365 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Charlie Lao f5986fbb 2023-07-11T12:11:20 Vulkan: Dont break RP if there is actual render feedback loop There is a bit terminology confusion here that will be fixed in next CL. If a depth attachment is read only, then there is no feedback loop, we should not call feedback loop for read only depth attachment. The real depth render feedback loop mode is formed when we write to depth and sample from depth at the same time. In this condition, the content is undefined per OpenGLES spec section 9.3.1 (https://registry.khronos.org/OpenGL/specs/es/3.2/es_spec_3.2.pdf). The shouldSwitchToReadOnlyDepthStencilFeedbackLoopMode() implementation handles the usage case that the same render pass has depth write and then switch to read only. Under this usage there is no actual feedback loop, and we should still work properly by end current render pass and start a new render pass with read only depth attachment. This implementation also treating the actual feedback loop case exactly the same way by ending render pass first, even though this is undefined behavior. gangstar_vegas has the exact this undefined behavior usage case, where it write and sample from depth buffer at the same draw call. Native driver is not ending the render pass but ANGLE currently does. This puts ANGLE into worse performance. Since this is undefined behavior, either way is correct. This CL checks if there is an actual feedback loop in the current render pass and if yes, we adopt the native driver's behavior that keep the current render pass going. This improves gangstar_vegas frame time from 4.365ms to 3.89ms, and interestingly, yield the same golden image. Bug: b/289436017 Change-Id: Ifc04ecd8ad6455a88e8615bd5452b9cce88c6687 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4679361 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 2a08c33b 2023-07-10T17:47:49 Vulkan: Avoid flushCommandsAndEndRenderPass for readonlyDS switch When we switch to read only depth stencil mode, right now we always call flushCommandsAndEndRenderPass, even though the started render pass is empty and loadOp is load. This flush will cause render pass actually submitted and color attachment being cleared and then color attachment gets loaded in the subsequent render pass. In this CL, we only flush if the depthStencil attachment has clear or written This CL save one renderPass for the following app traces: antutu_refinery, aztec_ruins, manhattan_10, manhattan_31. Bug: b/290833623 Change-Id: I13b7a968d797b4c913f1cfbe9677d9b8abe791d2 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4674087 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Yuxin Hu 6ee402f6 2023-07-06T16:56:28 Clamp the max Framebuffer width and height to 16 bit GraphicsDriverUniforms struct packs framebuffer width and height into a 32 bit uint, meaning the maximum width and height supported are 16 bit each. We should make sure below values do not exceed the maximum value of a 16-bit uint: caps.maxFramebufferWidth caps.maxFramebufferHeight caps.maxRenderbufferSize so that the application won't try to create a FBO with width/height exceeding 16-bit. We have clamped the caps.max2DTextureSize to 32768, it makes sense to clamp the FBO width and height to the same value. Bug: b/286921997 Change-Id: Iae598b37215c58d1a0f6a50bba9f391d4d23d1f2 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4671327 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 5f581f87 2023-06-27T20:38:03 Pass dirty bits by value Split CL from follow up change where the dirty bits need to be passed by value as they are calculated from two sets. Many cached dirty bits are turned to constexpr as a result. Bug: angleproject:8224 Change-Id: Ibdb3090d6ee93788e1502b72bce55f4677937c58 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4650074 Reviewed-by: Roman Lavrov <romanl@google.com> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Charlie Lao 02292814 2023-06-01T14:46:05 Vulkan: Optimize the usage of FastMap in DescriptorSetDescBuilder While looking at disassemble of DescriptorSetDescBuilder::updateOneShaderBuffer() function, I noticed that there are a lot of CPU cycles spent in FastMap::operator[]. What happend here is that we are increasing size one by one as we build descriptorSet, and that hit `if (mData.size() <= key)` case and we end up resize the underline FastVector, and that resize also initialize the element with zeros, which immediately overwrite by actual data. Since we actually know the eventual size of DescriptorSetDescBuilder::mDesc/mHandles/mDynamicOffsets, we could just switch to angle::FastVector which will avoid this check size and grow every time we write to it. This CL switches the use of FastMap in DescriptorSetDescBuilder to FastVector. The only trick we need to watch out is that previously the new elements are always zero filled and now it does not. So we need to make sure we write every field of structure. This CL also renames WriteDescriptorDescBuilder to WriteDescriptorDescs since when it is read only we are passing it as const reference already, there is no added advantage to have two classes. Bug: b/282194402 Change-Id: I06a063cc51585fc17fbf0d5aa916b9aa0ab88dd4 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4581881 Reviewed-by: Roman Lavrov <romanl@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 2e209516 2023-06-26T11:58:50 Move state dirty bits definitions out of the class This is in preparation for a follow up change that splits the state class. Bug: angleproject:8224 Change-Id: Ic9b253583e40fcc93ff37605b6b6e1deb55a6e55 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4631843 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Shahbaz Youssefi ec1f18db 2023-06-21T10:16:51 Vulkan: Remove ShaderVariableType and flatten info map With the conversion of the interface variable info map keys to SPIR-V ids, there is no longer a benefit to bucket resources by their type. This change removes this bucketing and flattens the map. Bug: angleproject:7220 Change-Id: If83cb02ca9e91f72dddb2deb7313fee40f9f06c3 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4632577 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi bbcf54bc 2023-06-16T16:02:08 Vulkan: Refactor uniform/block binding duplication code Previously, resource binding assignment was done as such: ``` for each shader stage assign bindings to textures assign bindings to blocks assign bindings to images etc ``` To deduplicate bindings when the same resource was used in multiple stages, a map was used, keyed by the resource's name, to detect when an already visited resource is encountered in a future stage. This was both inefficient and unnecessarily complicated. With this change, resource binding assignment is done as such: ``` for each texture assign one binding to all shader stages for each block assign one binding to all shader stages for each image assign one binding to all shader stages etc ``` The aforementioned map is removed. Because the resource bindings are now changed, the rest of the code (which sets up the pipeline layout, updates descriptor sets, sets dynamic buffer offsets, etc) are all updated to follow the above pattern. As a result, nested loops are avoided and duplicate entries in the variable map are never visited. Bug: angleproject:7220 Change-Id: Iaff7b5f8b2bada8ac5078d21e5c790bf0d27aca7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4622011 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Charlie Lao e21ecd1b 2023-05-26T14:06:46 Vulkan: Add dirty bit processing for uniform buffer change When app calls glBufferData for the uniform buffer, we may end up reallocate the storage. This will set DIRTY_BIT_UNIFORM_BUFFER_BINDINGS on the context, but the exact uniform block index gets lost along the way. This CL sets mDirtyBits on the program for the corresponding block index and then changed vulkan backend to utilize the program's mDirtyBits and only update the buffer if it is dirty, instead of always update all uniform buffers even if only one of the buffer is dirty. In order to make this work, this CL also adds the reverse tracking from buffer binding to uniform blocks. Previously we already have the tracking of which buffer binding index is used for which buffer block index. This CL adds mUniformBlockBindingMasks which is an array of BitSets. Each array element tracks all the uniform block index that is using this buffer binding index (you can have the same buffer bound to multiple uniform block index). Then when a buffer binding index is dirty, that BitSet gets added into program's uniform block dirty bits. This CL and previous CL improves GfxBench gl_driver2_off score 1.8% (from average 6797 to average 6919) on pixel 7 pro. Bug: b/282194402 Change-Id: Ic5002643a5297907276fc9b20ca7d21af9bdc4fe Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4553136 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Roman Lavrov <romanl@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 2501903e 2023-05-31T11:59:36 Vulkan: Merge UpdateShader***Buffers into updateShaderBuffers Both UpdateShaderUniformBuffers() and DescriptorSetDescBuilder::updateShaderBuffers() walks the list of uniform blocks (or storage blocks). Some of the logic are the same and we are paying that overhead twice. In this CL, UpdateShaderStorageBuffers, UpdateShaderUniformBuffers and UpdateShaderAtomicCounterBuffers functions are merged into DescriptorSetDescBuilder::updateShaderBuffers and DescriptorSetDescBuilder::updateAtomicCounters. In order to handle the usage that same buffer used by multiple shader stages, a new variant of bufferRead() function is added that takes "const gl::ShaderBitSet" instead of single PipelineStage. This also paves way for next CL so that we can call updateOneShaderBuffer individually. Bug: b/282194402 Change-Id: I09d045d6295827b60bdb4c05df9333fe593fa40e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4574288 Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Shahbaz Youssefi b0e9bbd7 2023-05-31T14:23:40 Vulkan: Split features for dynamic state When a driver bug with dynamic state is encountered, it is hard to debug which dynamic state exactly is causing an issue, due to the current granularity of disabling all entire state from an extension. With this change, every dynamic state gets its own ANGLE feature, and can be toggled as necessary. Disabling the supportsExtendedDynamicState* features implicitly disables all dependent features. Bug: b/285124778 Bug: b/275210062 Bug: fuchsia:107106 Bug: angleproject:5906 Change-Id: Ic291279872df2d0eb58618ff364ab118bdcc4a9f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4577553 Reviewed-by: Cody Northrop <cnorthrop@google.com> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi ec7e0778 2023-05-26T22:11:49 Vulkan: Track the emulated texture buffer in command buffer ... instead of the original texture buffer, because the emulated one is the one that is actually being used. This removes the necessity to issue a hacky barrier after the emulation is done. Bug: angleproject:8128 Change-Id: Ibc812894204cc1b2c6147817674de44e9c7ba2f4 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4571701 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Charlie Lao 01f629e3 2023-05-26T10:23:20 Vulkan: Remove the loop when calling updateShaderBuffers Right now there is a loop of getLinkedShaderStages when calling mShaderBuffersDescriptorDesc.updateShaderBuffers. The shaderType variable is only used to check if block is active or not, and get info.binding. They can be retrieved without loop of shaderType. This CL removes the loop so that we call DescriptorSetDescBuilder::updateShaderBuffers only once. Similar thing is applied to atomic counter buffer and images. Bug: b/282194402 Change-Id: I03f3b4a391e773dfc162877802a2f940311866b8 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4554625 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 297687c6 2023-05-18T17:27:54 Vulkan: Reduce CPU overhead for uniform buffer change One of the common usage pattern is change uniform buffers and draw. Right now every uniform buffer change goes into ShaderResource dirty code path that rebuilds entire ShaderResource cache key and descriptor set etc. This CL keeps SHaderResource dirty code path, and will be used when program changes or framebuffer changes. But uniform buffer change will go down its own code path that only update uniform buffer related state. This CL along with prior two CLs reduced asphalt_9 average frame time (with multi context hacked away) from 5.375 ms to 5.2594 ms (reduced 2.15%). Bug: b/282194402 Change-Id: Ibae2895663918ddc10bf13bc559f1483f94d2e11 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4528314 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Roman Lavrov <romanl@google.com>
Charlie Lao 9445fbbe 2023-05-16T10:43:55 Vulkan: Move mWriteDescriptors out of DescriptorSetDescBuilder DescriptorSetDescBuilder has two data structures: mWriteDescriptors, which keeps track of "layout" of descriptorSet, and mDescriptorInfos, which is the actual cache key for descriptorSet. mDescriptorInfos will be updated every time buffer or image changes. But mWriteDescriptors is immutable with buffer/image, it is static per program executable (with exception of InputAttachment). Right now whenever there is a buffer or image change, we call into DescriptorSetDescBuilder and update both mWriteDescriptors and mDescriptorInfos. This CL moves mWriteDescriptors out of DescriptorSetDescBuilder, and stores it in ProgramExecutableVk. To deal with InputAttachment variation, ContextVk makes a copy of mWriteDescriptors when program is bound, and then update inputAttachment with framebuffer information. This not only removes unnecessary update of mWriteDescriptors, but also removes the requirement that mShaderBuffersDescriptorDesc has to reset and rebuild as a whole (because mWriteDescriptors keeps mCurrentInfoIndex which gets incremented as we build). This allows us to do further optimization in future to do piece meal update of mDescriptorInfos with only the changed data. Bug: b/282194402 Change-Id: I443c7c3b85b7a2e2e93c68d40ea102533c43f76a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4540280 Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 2c836045 2023-05-18T12:00:41 Vulkan: Remove buffer/image tracking from DescriptorSetDescBuilder Right now DescriptorSetDescBuilder keeps an array of images and buffers as we build DescriptorSet cache key. Later on if we have a cache miss and allocated a new cache entry, we walk the array and tag the images and buffers with newSharedCacheKey. If we have a cache hit, the tracked images/buffers are unused and then cleared. This means the effort of keep track of these buffers are wasted when we have cache hit, which we expect to be most likely. This was initially implemented this way simply because of convenience, but there is really not a need to add another tracker for them, as they are readily available from context state anyway. This CL remove the tracking of images/buffers from DescriptorSetDescBuilder and replaced with context API to tag images and buffers with newSharedCacheKeywhen when we have a cache miss. Bug: b/282194402 Change-Id: I355c2fbabdfc573ce71c0a4281788c942d260271 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4539290 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Alexey Knyazev 934a25bc 2023-05-22T00:00:00 Vulkan: Implement EXT_depth_clamp Bug: angleproject:8047 Change-Id: I73244f5dcd6eeeb1889214ee3a611e4ecabbfe7e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4558744 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Alexey Knyazev <lexa.knyazev@gmail.com>
Jason Macnak 5ab2fa96 2023-05-12T08:38:19 Vulkan: Move texture QFOTs to syncState() ... (and getAttachmentRenderTarget() which is syncState()-like for deferred clear) so that QFOTs are not actually scheduled until first use. SurfaceFlinger optimistically creates EGL images and textures for AHBs in case it does need them in the future. However, the images and textures may go unused. Prior to this change, ANGLE would create pending QFOT barriers while importing AHBs into EGL images and GL textures. However, SurfaceFlinger may not be doing any "real work" (other than repeatedly creating and destroying EGL images and GL textures) which would result in the command buffers containing the QFOTs being flushed. This can result in a large build up of unreleased memory (as the VkDeviceMemory would still be kept alive by the reference from the unflushed QFOT command buffer) and lead to the low memory killer nuking processes. Bug: b/282075554 Test: cts -m CtsOpenGLTestCases -t android.opengl.cts.GLSurfaceViewTest Test: adb shell dumpsys SurfaceFlinger Test: angle_end2end_tests --gtest_filter=ImageTestES3.AHBImportReleaseStress/ES3_Vulkan Change-Id: I7776abb2c6f834e96aa3926c26e77c53352ee561 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4527437 Commit-Queue: Jason Macnak <natsu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 086b6c20 2023-05-04T15:55:31 Vulkan: Simplify TransformFeedback buffer tracking Right now the buffers that used by transform feedback is tracked by mCurrentTransformFeedbackBuffers, which is angle::FlatUnorderedSet. With the QueueSerial numbers, this can be much simplified. This CL removes mCurrentTransformFeedbackBuffers. It adds a new QueueSerial variable called mCurrentTransformFeedbackQueueSerial, and is inavlid if no active transform feedback. Then check if a buffer is used by transform feedback is as simple as check if it is written by mCurrentTransformFeedbackQueueSerial. Since buffers are already tracked by queue serial, so there is no new tracking needed. It is simply a check if the buffer contains mCurrentTransformFeedbackQueueSerial or not. Bug: b/280889890 Change-Id: I54bdc4a0cfc7194a12d2aa0abdc67a3211949024 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4507978 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Igor Nazarov 903d9fdf 2023-01-19T15:37:45 Vulkan: Implement ExternalFence for use in SyncHelperNativeFence `ExternalFence` allows concurrent usage in `CommandQueue` and `SyncHelperNativeFence` classes eliminating need of additional `vkQueueSubmit()` call. Waiting in `CommandQueue` on `QueueSerial` or `ResourceUse` will ensure corresponding state of the native FD (because `CommandQueue` will wait on the same FD instead of some other fence). After this change there will be only single `vkQueueSubmit()` call from the `SyncHelperNativeFence::initializeWithFd()` method. This CL and the follow-up is sufficient to fix the bugs below. Bug: angleproject:8115 Bug: angleproject:8117 Test: angle_end2end_tests --gtest_filter=EGLSyncTest.AndroidNativeFence_ExternalFenceWaitVVLBug* Change-Id: Ic562ecc71a95203454a1dc438589a13bcf3bff7f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4392879 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>