src/libANGLE/renderer/vulkan/vk_helpers.cpp


Log

Author Commit Date CI Message
Charlie Lao d3aaf795 2024-04-05T15:57:38 Vulkan: Early out ImageHelper::updateLayoutAndBarrier when possible If one image is attached to more than one attach points, when render pass closes we end up calling ImageHelper::updateLayoutAndBarrier multiple times. The first one is required since it does the layout transition etc. But the second call is unnecessarily inserting memory barriers. This is optimization itself, but will also fix the other bigger problem when we start using VkEvent instead of PipelineBarrier: we may end up waiting for an event that has not been set (since setEvent gets called after we end render pass but waitEvent is before render pass. Calling this sequence twice on the same image for the same render pass means second waitEvent is called before setEvent). Bug: b/333391804 Change-Id: Ic7b409c71806e63cb56c25e10b0bd0bfc9f6086d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5431033 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com>
Shahbaz Youssefi 13829f20 2024-03-26T23:03:12 Vulkan: Optimize depth/stencil resolve with glBlitFramebuffer Like color resolve, depth/stencil resolve is now also possibly done by modifying the render pass and attaching a depth/stencil resolve attachment. Bug: angleproject:7551 Change-Id: I045e3875e24006d2473a55b6c3856dd768fe8b84 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5398004 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 103c1b53 2024-03-29T14:37:23 Vulkan: Drop MSRTT emulation dependency on independentResolveNone Usage of VK_RESOLVE_MODE_NONE was removed in [1], but dependency to this property was accidentally added in [2]. [1]: https://chromium-review.googlesource.com/c/angle/angle/+/2743666 [2]: https://chromium-review.googlesource.com/c/angle/angle/+/3353895. Bug: angleproject:4836 Change-Id: I25028b5d343686edd794acdac3714c4a6cb5fa17 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5407073 Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi b559efa8 2024-03-26T22:02:41 Vulkan: Allow depth and stencil resolve to be separately added In preparation for optimizing resolve through glBlitFramebuffer for depth/stencil attachments. Bug: angleproject:7551 Change-Id: I57650d82c0cc6e56f44591eadfc42ac794cfef09 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5399140 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi 9475ac40 2023-11-15T10:25:06 Vulkan: Make efficient MSAA resolve possible Prior to this change, using a resolve attachment to implement resolve through glBlitFramebuffer was done by temporarily modifying the source FramebufferVk's framebuffer description. This caused a good deal of complexity; enough to require the render pass to be immediately closed after this optimization. The downsides to this are: - Only one attachment can be efficiently resolved - There is no chance for the MSAA attachment to be invalidated In this change, resolve attachments that are added because of glBlitFramebuffer are stored in the command buffer, with the FramebufferVk completely oblivious to them. When the render pass is closed, either the FramebufferVk's original framebuffer object is used (if no resolve attachments are added) or a temporary one is created to include those resolve attachments. With the above method, the render pass is able to accumulate many resolve attachments as well as have its MSAA attachments be invalidated before it is flushed. For a FramebufferVk that is resolved in this way, there used to be two framebuffers created each time and thrown away as the code alternated between starting a render pass without a resolve attachment and then closing with one. With this change, there is now one framebuffer (without resolve attachments) that is cached in FramebufferVk (and is not recreated every time), and only the framebuffer with resolve attachments is recreated every time. Ultimatley, when VK_KHR_dynamic_rendering is implemented in ANGLE, there would be no framebuffers to create and destroy, and this change paves the way for that support too. WindowSurfaceVk framebuffers are still imagefull. Making them imageless adds unnecessary complication with no benefit. ----------------- To achieve efficient MSAA rendering on tiling hardware, applications should do the following: ``` glBindFramebuffer(GL_FRAMEBUFFER, msaaFBO); // Clear the framebuffer to avoid a load // Or invalidate, if not needed to load: // glInvalidateFramebuffer(GL_DRAW_FRAMEBUFFER, ...); glClear(...); // Draw calls // Resolve into the single sampled framebuffer glBindFramebuffer(GL_DRAW_FRAMEBUFFER, resolveFBO); glBlitFramebuffer(...); // Immediately discard the contents of the MSAA buffer, to avoid store glInvalidateFramebuffer(GL_READ_FRAMEBUFFER, ...); ``` The above would translate to the following Vulkan render pass: - MSAA LOAD_OP_CLEAR/DONT_CARE - MSAA STORE_OP_DONT_CARE - Resolve LOAD_OP_DONT_CARE - Resolve STORE_OP_STORE This makes sure the MSAA data doesn't leave the tile memory and greatly reduces bandwidth usage. Once anglebug.com/4892 is fixed, this would also allow the MSAA image to never be allocated either. Bug: angleproject:7551 Bug: angleproject:8625 Change-Id: Ia9f4d20863d76a013d8495033f95c7b39f77e062 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5388492 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 914fe61b 2024-03-15T13:20:49 Vulkan: Rename RendererVk.* to vk_renderer.* Done in a separate CL from the move to namespace vk to avoid possible rebase-time confusion with the file name change. Bug: angleproject:8564 Change-Id: Ibab79029834b88514d4466a7a4c076b1352bc450 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5370107 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Mohan Maiya d2cef82a 2024-03-18T18:15:11 Vulkan: Use fragment shading rate access flags Image memory barrier for a fragment shading rate attachment needs to use VK_ACCESS_FRAGMENT_SHADING_RATE_ATTACHMENT_READ_BIT_KHR as the access flag instead of VK_ACCESS_SHADER_READ_BIT Bug: angleproject:8484 Change-Id: I3316f1a5965ed3866e683494ee4f8df0b208d92c Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5379262 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 60aaf4a0 2024-03-14T12:58:56 Vulkan: Move renderer to namespace vk This class is agnostic of EGL. This change moves it to namespace vk for use with the OpenCL implementation Bug: angleproject:8564 Change-Id: I57f7807d6af8b3d5d7f8efbaf8b5d537a930f881 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5371324 Reviewed-by: Austin Annestrand <a.annestrand@samsung.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Roman Lavrov d4d34781 2024-03-14T13:03:55 Multisampling support check: sampleCounts > 1 and createFlags At least two drivers are returning VK_SUCCESS from vkGetPhysicalDeviceImageFormatProperties2 but also set sampleCounts to 1 which supposedly means no MSRTT Qualcomm reference device driver fails vkCreateImageView when enabling the multisampling bit one cubemaps which have sampleCounts == 1 Additionally, * include vk::GetMinimalImageCreateFlags() in createFlags - we don't get the cubemap bit without that * check both the image format and the additional view format (linear+srgb) as we set both of these when creating the image This fixes a bunch of cubemap and 3D tests on Qualcomm reference device Bug: b/329286011 Change-Id: I6d3ddea0cd997cf37b503050063f42d69723bd50 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5372826 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com>
Mohan Maiya 91ddf851 2024-03-03T10:57:22 Vulkan: support QCOM foveated rendering extensions Add support for foveated rendering in the vulkan backend. This is done by leveraging the VK_KHR_fragment_shading_rate extension. Bug: angleproject:8484 Change-Id: I0d01d07583f710b2302ea07b19c9d113c73bfe41 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5269907 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Amirali Abdolrashidi 2ee295b4 2024-02-15T11:27:39 Vulkan: Add per-level image update tracker * Add a per-level image write tracker to ImageHelper. * It tracks the updates scheduled for different parts of the image. Within each level, it also tracks different layers, currently up to 64. * kMaxParallelSubresourceUpload renamed to kMaxParallelLayerWrites; moved to vk_helper header. * It is reset when a barrier is issued for the image. * Modified ImageHelper::recordWriteBarrier(). * Added isWriteBarrierNecessary(). * Now it checks the added writes for the image. It will no longer issue a barrier if the image is in the same layout and there is no write to a part of the image to which was previously written. * Added ReadImageSubresources to CommandBufferAccess. * It is used for layouts that allow both reading and writing to the image (including self-copy): * TransferSrcDst (used in CopyImageSubData) * ComputeShaderWrite (used in compute-based mipmap generation) * CommandBufferImageWrite -> CommandBufferImageSubresourceAccess * Updated onImageSelfCopy() args to include read subresource data. * Improves gpu_time for TextureUploadETC2TranscodingBenchmark perf test * Windows/NVIDIA: ~180609 ns -> ~62669 ns (~2.88x) * Linux/NVIDIA: ~157283 ns -> ~93360 ns (~1.68x) * Windows/Intel: ~72297 ns -> ~57153 ns (~1.27x) * Added a test to show that self-copy for a write-after-read works. * ArraySelfCopyImageSubDataWithWriteAfterRead * (ArraySelfCopyImageSubData covers RAW hazards; renamed) Bug: b/308455694 Change-Id: I5cef296d991ce6ec02792edc3ffc5cc4994831e1 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5301855 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Amirali Abdolrashidi 3c517e45 2024-02-14T14:26:42 Vulkan: Process ClearEmulatedChannels update first * When going through the level updates in flushStagedUpdates(), the ClearEmulatedChannels updates are expected to be before the rest. In addition, there can be only one such update in the level update list. Therefore, now they are processed and applied before the rest of the updates. By doing so, if this is the only update for the image, an unnecessary layout transition can be avoided. * Added flushStagedClearEmulatedChannelsUpdates(). * Added flushStagedUpdatesImpl() for the rest of the update types. * Used clipLevelToUpdateListUpperLimit() to limit the flush loops to the number of levels in subresource update list. * Added unit test to ensure updates after ClearEmulatedChannels are not ignored. * ImageTestES3.IncompleteRGBXAHBImportThenUploadThenEnd * The test contains a ClearEmulatedChannels followed by an image update. If the latter is ignored in this test, there is a failure during teardown due to orphanNonEmptyBufferBlock when destroying the buffer that contains the update. Bug: b/308455694 Change-Id: I53c73acb60a9c5440548886cde913112a664402d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5297317 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Amirali Abdolrashidi 38cc4cf0 2024-02-14T16:36:08 Vulkan: Update flushStagedUpdate to use switchcase * The if-else statements to check update type have been replaced with switch-case statements for more clarity. Bug: b/308455694 Change-Id: I3d5b77e697d342e77596fd177b2527ece4d228ed Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5297547 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Gowtham Tammana bcf814fd 2024-02-02T10:30:34 Vulkan: Constrain the dependency on ContextVk in BufferHelper Make the BufferHelper interface be not dependent on ContextVk state. This makes the interface to be suitable for implementation of other APIs with Vulkan backend. Any dependency on ContextVk is made explicit and handled in ContextVk. Bug: angleproject:8544 Change-Id: I8b285f54c8758a26dd7edf27b1371f9afcf7e241 Signed-off-by: Gowtham Tammana <g.tammana@samsung.com> Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5303573 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org>
Shahbaz Youssefi dbc6bd9d 2024-02-12T14:07:49 Reland "Vulkan: Fix alignment issues with SecondaryCommandBuffer" This is a reland of commit e53270c9ca1afe393d6d7d0359e81cf6755b6ca5 Original change's description: > Vulkan: Fix alignment issues with SecondaryCommandBuffer > > This solves undefined behaviour on 64-bit systems. This inflates the > size of a few commands, but most commands either already did align to 8 > bytes or could be aligned to 8 bytes with a few tweaks. > > Bug: angleproject:7852 > Change-Id: Ie61976d5bf8df7790acd95c0e15d4c79402622a1 > Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5288636 > Reviewed-by: Charlie Lao <cclao@google.com> > Reviewed-by: Yuxin Hu <yuxinhu@google.com> > Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Bug: angleproject:7852 Change-Id: Ie206e66fc21c5db7c9e67eb478d9cddada5db8e0 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5296376 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuly Novikov <ynovikov@chromium.org>
Yuly Novikov 243f8ad9 2024-02-14T11:44:58 Revert "Vulkan: Fix alignment issues with SecondaryCommandBuffer" This reverts commit e53270c9ca1afe393d6d7d0359e81cf6755b6ca5. Reason for revert: breaks x86 Android build: https://chromium-review.googlesource.com/c/chromium/src/+/5293321 https://ci.chromium.org/ui/p/chromium/builders/try/android-x86-rel/144329/overview Original change's description: > Vulkan: Fix alignment issues with SecondaryCommandBuffer > > This solves undefined behaviour on 64-bit systems. This inflates the > size of a few commands, but most commands either already did align to 8 > bytes or could be aligned to 8 bytes with a few tweaks. > > Bug: angleproject:7852 > Change-Id: Ie61976d5bf8df7790acd95c0e15d4c79402622a1 > Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5288636 > Reviewed-by: Charlie Lao <cclao@google.com> > Reviewed-by: Yuxin Hu <yuxinhu@google.com> > Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Bug: angleproject:7852 Change-Id: Id9c7a94ccc12816bc9e8c3803bd940550d9f7953 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5295854 Auto-Submit: Yuly Novikov <ynovikov@chromium.org> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Shahbaz Youssefi e53270c9 2024-02-12T14:07:49 Vulkan: Fix alignment issues with SecondaryCommandBuffer This solves undefined behaviour on 64-bit systems. This inflates the size of a few commands, but most commands either already did align to 8 bytes or could be aligned to 8 bytes with a few tweaks. Bug: angleproject:7852 Change-Id: Ie61976d5bf8df7790acd95c0e15d4c79402622a1 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5288636 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Roman Lavrov c603a4f1 2024-02-08T10:53:27 Don't perf warn about ETC1->ETC2 emulation as it is efficient Format is forwards compatible: https://crsrc.org/c/third_party/angle/src/libANGLE/renderer/gl/formatutilsgl.cpp;drc=21f16cb16333802dfa942d67cac59885f904301d;l=701 Added hasInefficientlyEmulatedImageFormat() helper Bug: b/302115557 Change-Id: Ibc82c27ecf4e3afbfaac52cb45bdda776c50b4b3 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5278562 Commit-Queue: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Amirali Abdolrashidi 5a061558 2024-01-31T13:05:59 Vulkan: Update dynamic buffer size policy When allocating a dynamic buffer, it is checked if the new data can fit in an existing allocation. However, if the size of the new data exceeds that of the current buffer, a new one is allocated. To avoid using too much memory, if the data size is less than a threshold (a fraction of the current buffer size, a smaller size will be used for the new buffer. However, with a specific pattern for the new sizes, combined with the threshold value, there could be many allocations and deallocations, which can affect the performance. In this CL, the policy to update the dynamic buffer size is updated to avoid this issue. * Instead of using a smaller buffer when the required size is less than 1/4 of the current buffer size, it is done when the average required size is less than 1/8 of the current size. * Added a decaying average required size for the DynamicBuffer object. * mSizeInRecentHistory * For each new buffer allocation, the new required size is used with the average size to calculate the new average. * For each calculation, kDecayCoeffPercent is used as the weight for the existing average, and the rest is the new required size, plus rounding. * kDecayCoeffPercent is currently set to 20%. * sizeIgnoringHistory renamed to minRequiredBlockSize for more clarity. Bug: b/322216767 Change-Id: Idcabbbe50f656910fe2103925e4d6d8602ca3425 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5254218 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Chris Forbes f5f3304a 2024-02-02T16:27:38 Vulkan: Simplify handling of YUV filtering support When the requested filtering mode changes, we need to consider whether it is actually supported by the Vulkan driver. Now that we support renderable YUV textures, there are now three interesting cases: 1) The texture has a VkFormat, and so filtering support can be queried from GPDFP, as was already done. 2) The texture is imported from an opaque AHB using an external format, that format is renderable, and so we have assigned one of the EXTERNALn angle formats. This was *not* covered properly, and would lead to VVL errors or UB. 3) The texture is imported from an opaque AHB using an external format, and we have not assigned an EXTERNALn angle format to it, because the format is not renderable, or the Vulkan driver is missing the external format resolve functionality; In this case the angle format is NONE. This was similarly *not* covered properly, although the code did attempt to protect itself from querying the capabilities of format NONE. VVL errors and UB were still possible. To most simply cover all of these cases, capture whether the image has the VK_FORMAT_FEATURE_SAMPLED_IMAGE_YCBCR_CONVERSION_LINEAR_FILTER feature upfront, and forget about format lookups in the internals of the YcbcrConversionDesc. Bug: b/315387961 Change-Id: Ie140293d52c2b88bf06ef19bc54bb1c95927b8ce Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5259719 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Cody Northrop <cnorthrop@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya 5c2fc02b 2024-01-29T16:49:28 Vulkan: Bugfix in ImageFormat FixedVector::operator[] does not update FixedVector::mSize. Need to call FixedVector::push_back(...) instead. Bug: angleproject:7553 Change-Id: I544a68276d1635957a27d21c222f50f71d35a609 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5246088 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Chris Forbes 6367f541 2024-01-25T10:16:04 Vulkan: supply YcbcrConversionDesc earlier Previously, the AHB import path would allow ImageHelper to build a bogus YcbcrConversionDesc (in initExternal) and then later overwrite it with what it wanted. The intermediate state was not necessarily valid, and could cause assertion failures and VVL errors. Instead, have ImageHelper clients provide the conversion they want upfront. In the non-external case, build an appropriate conversion for formats which need them, before delegating to initExternal. Bug: b/315387961 Change-Id: Icc8f561bb2de0289ceec56d41978b8c4651a47a2 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5232769 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Chris Forbes <chrisforbes@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi fad2adb2 2024-01-12T14:43:33 Vulkan: Fix importing external object and using as storage image The create flags used to create the imported object was 0. Later, when the texture was used as a storage image, TextureVk::syncState would recreate the image (losing connection to external object). This change makes sure the create flags include all the necessary create flags such that the texture can be correctly used as storage image. Bug: angleproject:8464 Change-Id: I6587b53b1c2819a11dec8f2d5a3a30c889a4c63f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5194064 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi ed2a8ef8 2023-12-20T00:06:10 Vulkan: Defer QFOT when acquiring texture with GL_NONE layout Instead of issuing a queue family ownership transfer with the UNDEFINED layout (and then hack its dst layout to be GENERAL), this change simply lets the queue family be changed when the image is next accessed (at which point a layout transition is necessary anyway). Bug: angleproject:8464 Change-Id: Iab36af0c641bd04029bdc0d9097e766e8a0f4145 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5138657 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi 4b356c39 2023-08-17T10:49:19 Vulkan: Drop support for VMA 2.0 Since ANGLE and Chromium were last updated to use VMA 3.0, there have been no breaking issues so far. * ANGLE update: https://crrev.com/c/4777337 * Chromium update: https://crrev.com/c/4911597 Therefore, the support for the old VMA (2.3) can now be removed. * Removed ANGLE_VMA_VERSION from the build files, since they are no longer required. * Removed the VMA-related guards in the code as well. * Removed the flags and thresholds for the buddy algorithm, which were used in VMA 2.3. * Share group buffer pools is no longer a map. Bug: b/303290680 Change-Id: Ic2b29e8f95ca5c941b297b20442c5bad4b8f52e3 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4791667 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 0aaa8de9 2023-12-19T23:23:40 Vulkan: Fix memory tracking vs external texture acquire ... with a layout of GL_NONE. Bug: angleproject:8464 Change-Id: I94690c5693c5bcb6d510e4a27097206f0da58a41 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5138656 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 40f4de8f 2023-12-15T10:17:32 Vulkan: Ensure we use cached memory for readPixels stagingBuffer Previous CL crrev.com/c/5112759 does not solve the performance issue for ChromeOS. The reason is that on more recent intel GPU, there is no hostVisibleCachedCoherent heap. When we allocate staging buffer, we specify CachedCoherent as the preferredFlags instead of requiredFlags. This means we still end up getting UncachedCoherent since VMA tries to respect coherent bits as first priority. This CL Changes CachedCoherent to CachedPreferCoherent, and made Cached as required bit, thus ensures the memory allocated is cached. Since coherent bit may not be honored, thus we have to call invalidate/flush (which underline implementation will check the bit and early out if no need). Somehow on ARM GPU using cachedNonCoherent staging buffer causing many test failures, even though we do call invalidate() after allocation, and tests pass on all other GPUs. It almost indicates ARM driver have a bug with invalidate() that it is not doing expected. But before I can be sure and fixed, I added feature bit to keep ARM the old behavior, which uses UnCached memory for readPixels which should suffer the performance as well. Bug: b/315836169 Bug: b/310701311 Change-Id: I1eec6105ce74275faa893b0206be8470f0cde72f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5122318 Commit-Queue: Charlie Lao <cclao@google.com> Auto-Submit: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao de591cff 2023-12-11T13:15:30 Vulkan: Add CachedCoherent staging buffer Right now if we allocate a coherent staging buffer, it always uncached. I believe the reason it picked uncached is that most usage for staging buffer is data flow from CPU to GPU. CPU only sequentially write into staging buffer. Uncached may has better performance here due to write combined. But this performs horrible if CPU ever read from it. This CL adds a CachedCoherent staging buffer and let staging buffer use that for coherent memory. UncachedCoherent is currently not used, but I still kept here in case we find regression for certain type of usage. Bug: b/315836169 Change-Id: Ica331914c1f4729baa9d2eab048dc3099a2887b5 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5112759 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao d0eb968d 2023-12-08T16:11:46 Vulkan: Fix the AHB leak for AHB backed buffer object For client buffer backed OpenGL buffer object, we call InitAndroidExternalMemory which calls AHB acquire. But when buffer object is released/destroyed, we never call ReleaseAndroidExternalMemory, which end up leaking AHB. Bug: b/314791770 Change-Id: I693c74213e73008497a6dfeca93ea62e84c71352 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5106599 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Auto-Submit: Charlie Lao <cclao@google.com>
Shahbaz Youssefi e9d5f13e 2023-12-04T22:36:41 Vulkan: More trace points in the readpixels path Occassionally we hit a bad path in this case, this change makes it clearer from the traces which bad path is hit. Bug: b/310701311 Change-Id: Ic674d6396b0e88f1a1db3ded7efe195fb7397135 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5087207 Reviewed-by: Charlie Lao <cclao@google.com> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Steven Noonan 99b077b7 2023-11-26T08:24:18 Vulkan: fix data clobbering with AllocateNonZeroMemory The offset of the suballocation was not being used when initializing memory, so it was possible to overwrite the start of an existing buffer with garbage. Bug: angleproject:8427 Change-Id: I8205068a173dc4342894c6c49ee5fa9c4a8a255a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5060776 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Auto-Submit: Steven Noonan <steven@uplinklabs.net>
Mohan Maiya 8ae36a93 2023-11-14T10:11:51 Vulkan: Bugfix in isFastUnpackPossible Disallow fast unpack when there is a mismatch between the actual texture format and intended buffer (PBO or client buffer) format. Bug: angleproject:3777 Test: Texture2DTestES3.UnpackCompatibleFormatButDifferentType* Change-Id: I9ea9d9cdd5e1391acebb3d75d69437e27cfa90df Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5029504 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: mohan maiya <m.maiya@samsung.com>
Shahbaz Youssefi 3a6b385f 2023-11-09T15:48:48 Vulkan: Fix depth/stencil texture copy Bug: angleproject:7289 Change-Id: Icde8a26e855e95a6c0a1e506d2435e981adc6f28 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5018798 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Shahbaz Youssefi d896fab8 2023-11-09T15:03:05 Vulkan: Fix texture self-copy A new layout is introduced to support self-copy. Bug: angleproject:7289 Change-Id: Ib914c433d55b9a79cfeb7a91f8a2b8680824d473 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5018797 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi 9a5d75de 2023-10-30T11:59:19 Vulkan: Fix incompatible redefinition of cube faces The TextureVk::mRedefinedLevels bitmask tracked which levels are incompatibly redefined, greatly reducing the complexity of dealing with GL's mutable textures. It did not however take into account the fact that GL allows each cubemap face to be separately redefined (unlike 2D arrays, where all layers are defined together). This change turns the bitmask into an array of bitmasks. Previously, a single bit represented whether the level is incompatibly redefined. Now, elements of the array track the same information for each cube face. For non-cube-map textures, only element 0 is used. Bug: chromium:1494664 Change-Id: I69568d3da2391796bf5f01505861fee42c6c8924 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4986289 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Hailin Zhang 5dd0c842 2023-10-24T13:48:29 Vulkan: improve memory type mismatch issue. for dynamic buffer usage, the memoryTypeIndex returned from findMemoryTypeIndexForBufferInfo not used. if we add more flags like VK_MEMORY_PROPERTY_HOST_CACHED_BIT at alloction. the actual memory type allocated may not have such flag. Bug: b/306763053 Change-Id: I778e51fdd5ce0bc0810a965c45b5763a155fc391 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4973574 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Hailin Zhang <hailinzhang@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Igor Nazarov 4f611a1f 2023-10-24T20:56:19 Vulkan: Remove RendererVk::collectAllocationGarbage Instead of adding separate method overload `DestroyGarbage()` method. This will avoid checking `mUse` and creating garbage list twice. Bug: b/218891184 Change-Id: If56ffe72a639021b1fd37feb02ebb91f62ad0933 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4974318 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi ba65feb4 2023-10-18T17:33:38 Vulkan: Limit mutable texture flush to one update In case there are many updates for a mutable texture, flushing it preemptively can reduce performance, especially if it is done repeatedly. * Added getLevelUpdateCount() to ImageHelper. * Previous mutable textures will now be flushed only if they have exactly one update per mip level/cubemap face (if defined). * This means that mutable textures with no data will also not be flushed. * Added unit tests for single-level texture flushing and situations with no updates or more than one update. Bug: b/285613719 Change-Id: I1592ecf502051a55ebfbb7fcd22577c9ce87bf43 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4953847 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao d2de7451 2023-10-19T14:10:44 Vulkan: Fix VK_android_external_format_resolve VVL error part 5 Fix assertion in RendererVk::getFormatFeatureBits(). When formatID is external format, we can not use vkGetPhysicalDeviceFormatProperties to get the formatFeature (since VkFormat is undefined). To fix this, we keep the formatFetaure that returned from AHB in the ExternalYuvFormatInfo and use that in getFormatFeatureBits() if it is external format. This also fixes the VVL error VUID-VkImageCreateInfo-pNext-02396: The Vulkan spec states: If the pNext chain includes a VkExternalFormatANDROID structure whose externalFormat member is not 0, flags must not include VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT Bug: b/223456677 Change-Id: I625c2bf4fe534fa206918b16772ac3ac7c6fa79a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4956117 Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 4d7fc442 2023-10-18T12:49:06 Vulkan: Fix VK_android_external_format_resolve VVL error part 3 VUID-VkRenderPassAttachmentBeginInfo-pAttachments-parameter: The Vulkan spec states: If attachmentCount is not 0, pAttachments must be a valid pointer to an array of attachmentCount valid VkImageView handles. The bug here is that when nullColorAttachmentWithExternalFormatResolve is true, there is no color attachment, but the RenderPassDesc still appears having a color attachment because we need to store the formatID in it. This CL changes to use mFramebuffer.getImageViews().size() instead of mRenderPassDesc.attachmentCount() which is more correct anyway. Bug: b/223456677 Change-Id: I0f0947f0c642bac9cd18a80525b92c62ef0723ec Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4952969 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Chris Forbes <chrisforbes@google.com>
Charlie Lao 58ffa778 2023-10-11T09:41:23 Vulkan: Implement YUV_TARGET use VK_ANDROID_external_format_resolve This implements EXT_YUV_TARGET using VK_ANDROID_external_format_resolve extension. This CL is based on Chris Forbes's CL on android gerrit. Bug: b/223456677 Change-Id: Ieb6970a0787b0c2a72a76b208695a678d2c79e80 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4857459 Reviewed-by: Chris Forbes <chrisforbes@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Hailin Zhang 493ca47c 2023-10-13T23:22:34 Vulkan: remove unused image view creation remove unused image view creation. Bug: b/303708135 Change-Id: I5e20788a6f3e042db04d739144236c065c407209 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4940252 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Hailin Zhang <hailinzhang@google.com>
Alexey Knyazev dc8c053e 2023-10-12T00:00:00 Split row and depth pitch computation for compressed formats Besides doing extra operations, computeCompressedImageSize cannot be used for computing pitch values for formats that have minimum block layout requirements, such as PVRTC1. Fixed: angleproject:8375 Change-Id: Id276e8cf723f0bb99b6f4a9b20d6d84e4840f6d7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4935568 Commit-Queue: Alexey Knyazev <lexa.knyazev@gmail.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Hailin Zhang 76608554 2023-10-02T15:07:45 Vulkan: use cpu transcoding for small texture size. Bug: b/250042517 Change-Id: I9a70fb7d4823d10b09f498bfc01b5384951e2ce4 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4908660 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Hailin Zhang <hailinzhang@google.com>
Charlie Lao 2608c622 2023-10-06T13:32:49 Vulkan: Refactor SharedGarbageList into templated class This CL mostly involves non-functional changes to prepare for next CL. No behavior change is expected. This CL wraps the garbage list into its own templated class which maintains std::queue and tracks number bytes in the queue etc. This CL also renames SharedBufferSuballocationGarbageList to BufferSuballocationGarbageList to reduce verbosity a bit. This CL deleted GarbageAndQueueSerial and GarbageQueue since they are no longer being used. This renames vk::GarbageList to vk::GarbageObjects to reduce name confusion with SharedGarbageList. Bug: b/302739073 Change-Id: I7370c147847ffe69ad8aa3b48251d8b5762f97f9 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4919816 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Hailin Zhang <hailinzhang@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 7f5143c2 2023-10-02T15:38:15 Vulkan: Notify VAO when VBO's mBufferWithUserSize changed. When buffer robust access is enabled, and bufferData is called with different size and we end up reusing the underline storage, we will have to recreate VkBuffer with user's size, and driver is relying on VkBuffer's size to implement robust access. The bug here is that we notify VAO when storage changes. But when storage is reused and we have dedicated VkBufer with user size and that VkBuffer changed, we were not notifying the VAO. This CL adds that notification so that VAO gets notified and dirty bits processed and its cache of VkBuffer gets updated Bug: chromium:1488055 Bug: b/303138134 Change-Id: Ie693c92c2edde9a22a41a25f5bde493397550d95 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4906568 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi d919870f 2023-09-14T16:00:07 Vulkan: Do host image copy without holding the share group lock When an application uploads texture data such as with `glTexSubImage2D`, the share group lock is being hold while the data is being copied. Without VK_EXT_host_image_copy, this is a copy to a staging buffer, which may itself be expensive. With VK_EXT_host_image_copy, the cost of the copy is higher and so the lock would be held for a longer duration. This is particularly harmful to applications that spawn a separate thread for texture uploads (as the main thread is unable to make GL calls). This change moves the actual copy call to the tail of the call after the share group lock has been released. As a result: - The upload thread may be a bit slower, but - The copy does not interfere with the main thread, and - The copy does not interfere with the GPU's rendering work. As a result, games that load content seamlessly during gameplay should experience less stutter during texture uploads. Bug: angleproject:8341 Change-Id: I818c4389d4bf828847578da89414623e4b5e844e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4864290 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Roman Lavrov 1579eed5 2023-09-25T18:21:39 BufferHelperPointerVector -> BufferHelperQueue (deque) DynamicBuffer.mBufferFreeList is a vector<unique_ptr>. DynamicBuffer::allocate erases an element from the front, which is what deque is exactly good for. Changed other uses too (mBufferFreeList, VertexArrayVk.mCachedStreamIndexBuffers) per Charlie's recommendation. Yields a significant power improvement in words_crush trace. Bug: b/302020992 Change-Id: I1b0242481404b8fc2cfdc27611252308b922f4d7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4886367 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com>
Charlie Lao be8739f2 2023-09-22T14:36:45 Vulkan: Fix StatInfo in vk_mem_alloc_wrapper.h to match VMA Right now we are defining our own StatInfo structure in vk_mem_alloc_wrapper.h to avoid inclusion of VMA header directly in other ANGLE code. This caused this struct no longer matches VMA's structure since VMA 3.0 switch. For quick fix, this CL just update StatInfo to match VMA 3.0 define. Bug: b/301653706 Change-Id: Ic510c362f30d9296a13964e6ba9c617e80e49ceb Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4888625 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi 91ef1f3c 2023-09-08T16:39:53 Move buffer suballocation callers to ContextVk * Moved the following functions from BufferHelper to ContextVk. * initBufferForBufferCopy() * initBufferForImageCopy() * initBufferForVertexConversion() Bug: b/280304441 Change-Id: I890f4396b00b0c20feb44f0ad113c55924ce1014 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4854760 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi a1f52f1b 2023-09-07T14:44:24 Vulkan: Flush pending image garbage more often * Added a counter to the context object to keep track of the size of the pending image garbage: mEstimatedPendingImageGarbageSize. * Modified hasExcessPendingGarbage() to use the sum of the size of the image and and suballocation garbage. * RendererVk::calculatePendingGarbageSizeLimit() provides the limit. * Currently the limit is based on the available heap sizes. It will use a fraction of the largest memory heap size. * The portion is currently kGarbageSizeLimitCoefficient = 0.2f. * Unskipped the test "TextureDataInLoopManyTimes", which was failing on Android devices. Bug: b/280304441 Change-Id: Ibcced1d118ea8a1f347028b62d29cfbd9e38e8c0 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4851252 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Charlie Lao 1b450b92 2023-09-15T11:07:25 Vulkan: Fix buffer storage reuse bug when robustAccess is enabled There is an optimization in vulkan backend that when the bufferData is called and current storage size is big enough for new bufferData call, we just reuse the storage. Mean while, when hasRobustAccess() is true, we must use the VkBuffer with the exact user size that glBufferData call provides so that driver can set proper access boundary. In order to satisfy both requirement, if robust resource access is enabled, we create a separate VkBuffer with the exact user provided size but bind to the same memory. There is a bug here that if robustAccess is true, this buffer of user provided size is not been recreated when storage is reused but with different user size (both has same allocation size). This causes we keep using the smaller VkBuffer and subsequently causes missing triangles. This CL clears mBufferWithUserSize when size changes and storage is reused. The other bug here is that previously we are checking isRobustResourceInitEnabled, which is incorrect. We should check hasRobustAccess. This appears works for chrome possibly due to both are enabled. This CL switches it to check hasRobustAccess. This CL also renames mBufferForVertexArray to mBufferWithUserSize to reflect what its true meaning. Bug: chromium:1476475 Change-Id: I843cc3a705f8a582a97bc0307f03aa1eb9fad3ff Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4864003 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi b2e6a196 2023-09-11T15:27:20 Vulkan: Use VK_EXT_host_image_copy for texture uploads Of all the scenarios where host image copy may be useful, this is likely the most common case. There are numerous conditions for when the copy may be done on the host: - The image format must support it, - It must be unused by the GPU, - It must not have any pending updates (this can potentially be mitigated if needed), and - It must be in a host-copyable layout. However, many texture uploads are done: - To compressed formats, where support is highly likely, - On init, where: - the image is never previously used, - the image has no previous uploads - the image is in the UNDEFINED layout which satisfies the conditions above. As a result of this change, when the upload is done on the host, creation of a temp buffer is avoided which greatly reduces memory pressure (specially during app loading which is when most texture data is uploaded) and may even improve performance (due to avoiding a double copy). Testing the first 3 frames of the following traces with a SwiftShader implementation shows the amount of buffer allocated for staged uploads changed as such: - Black Desert: 185MB -> 65MB - Genshin Impact: 125MB -> 12MB - Asphalt 9: 138MB -> 0MB Bug: angleproject:8341 Change-Id: Id71dcc4a7a0f8b67960d2d283fe9d19ce7429a03 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4856676 Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Amirali Abdolrashidi e7418836 2023-08-16T14:25:52 Vulkan: Add context flushing as OOM fallback * As a new fallback for out-of-memory errors, if an allocation results in device OOM, the context is flushed and the allocation is retried. * Functions related to buffer/image allocations now return a VkResult value instead of angle::Result, which will be bubbled up to a higher level for safer handling. * The OOM is no longer handled at the level where the allocation happens, but is moved up to the context. * Added two functions to ContextVk for allocating memory for images and buffer suballocations, which also include the fallback options. * initBufferAllocation(): Uses BufferHelper::initSuballocation() * initImageAllocation(): Uses ImageHelper::initMemory() * Moved initNonZeroMemory() out of the following functions: * BufferHelper::initSuballocation() * Moved to ContextVk::initBufferAllocation(). * ImageHelper::initMemory() * Moved to ContextVk::initImageAllocation(). * Also moved to new function: ImageHelper::initMemoryAndNonZeroFillIfNeeded(). This function replaced the rest of initMemory() usages outside initImageAllocation(). * New macros for memory allocation * VK_RESULT_TRY() * If the output of the command inside it is not VK_SUCCESS, it will return with the error result from the command. * VK_RESULT_CHECK() * If the output of the command inside it is not VK_SUCCESS, it will return with the input error. * Added a test in which allocation would fail due to too much pending garbage without the fix on some platforms. The test ends once there has been a submission. * New suite: UniformBufferMemoryTest * Added a similar test for flushing texture-related pending garbage. * New suite: Texture2DMemoryTestES3 Bug: b/280304441 Change-Id: I60248ce39eae80b5a8ffe4723d8a1c5641087f23 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4787949 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 6553225d 2023-09-11T11:33:19 Vulkan: Refactor image usage/flags support check logic The function that checks whether a format supports a specific usage is moved to ImageHelper. For VK_EXT_host_image_copy, Renderbuffer, AHB etc may also use this function. Bug: angleproject:8341 Change-Id: I6ebc06f97fd29e66aa8d43fcf045f51717d27864 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4856144 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Yiwei Zhang 6258d479 2023-08-30T22:11:38 Vulkan: disallow pbo copies when offset is incompatible with vulkan Per spec of vkCmdCopyImageToBuffer: If srcImage does not have either a depth/stencil or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size This change disallows pbo copies if violating that spec vu. Bug: b/297927542 Test: org.skia.skqp.SkQPRunner#UnitTest_TransferPixelsFromTextureTest Change-Id: I7df4bee5fa574c44dd872f7225567049c5562a99 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4827694 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Auto-Submit: Yiwei Zhang <zzyiwei@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi 571b4cdb 2023-08-14T16:55:28 Vulkan: Move pipeline/desc-set layout creation to link job The pipeline and desc-set layout caches are consequently made thread-safe. The reference counter on the layouts are also made atomic. With this change, practically all of the link in the Vulkan backend is moved to the link job. Bug: angleproject:8297 Change-Id: Iba694ece5fc5510d34cce2c34441ae08ca5bb646 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4774787 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Amirali Abdolrashidi f11c972b 2023-08-04T15:06:39 Update requiredFlags use for VMA image allocations For the OOM fallbacks, we currently remove bits from the required bits when device memory allocation is no longer possible. In doing so, allocating on the device has become a strong preference rather than a requirement. Therefore, we change this method a bit in this CL. * Removed the device-local bit from the required flags when calling allocateAndBindMemory(). * preferredFlags is now used in lieu of requiredFlags initially within allocateAndBindMemory() to signal to the VMA to prioritize allocating on the device. If it fails, we use requiredFlags for the fallback. Bug: b/280304441 Change-Id: Id47a224cd74dacd3fb12d4fbfd815d8cefc016c4 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4753758 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Shahbaz Youssefi 16cfa28e 2023-08-08T22:08:24 Vulkan: Basic infra for parallel link This change moves pipeline warm up to a parallelizable task, mostly as an exercise to put in the infrastructure for parallel link in the Vulkan backend. Follow up changes will move more of the link step to this task. The end goal is to be able to make the link task independent of ContextVk, which would allow it to be run as an UnlockedTailCall, even if not using a worker thread. Bug: angleproject:8297 Change-Id: I17047162b2a41f0d681d9e3ee33f2e0239b4280d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4764231 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 7c69116f 2023-08-08T10:14:47 Vulkan: Fix data race with DynamicDescriptorPool Right now DynamicDescriptorPool::destroyCachedDescriptorSet can be called from garbage clean up thread, while simultaneously accessed from context main thread, and data race will happen and cause bugs. This can only happen when the buffer is not being suballocated. In this case, suballocation owns the bufferBlock and bufferBlock gets destroyed when suballocation is destroyed from garbage collection thread. If buffer is suballocated, the shared group owns pool which owns bufferBlocks and they gets destroyed from shared group with the share group lock. This CL avoids this race problem by release the shared cacheKey when the buffer is released, while we still had the shared group lock. Bug: chromium:1469542 Change-Id: Ic1f99e6b6083d63e4efb9c3f408921da62c006ac Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4761365 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Mohan Maiya e50d3876 2023-08-04T14:47:41 Vulkan: Retain loadOp when there is a resolve attachment Rendertargets with resolve attachments cannot optimize away loadOp Load or Clear even if they are marked ResourceAccess::Unused and storeOp is RenderPassStoreOp::DontCare. Bug: angleproject:4836 Bug: angleproject:5981 Tests: ImageTest.SourceAHBTarget2DMSRTTInteraction* Change-Id: I39ec67a457de6876ed0bd47d66a963cc59fab064 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4753735 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: mohan maiya <m.maiya@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 52fe3116 2023-07-17T16:20:54 Vulkan: Deduplicate share group's context set tracking Bug: angleproject:8224 Change-Id: I7a59a37229682fb91ff777f31e02e05d7ab2b80f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4690345 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 6ffd0d20 2023-07-12T12:09:45 Vulkan: Clean up depth stencil feedback mode part 2 Right now the tracking of depth stencil buffer readOnly or feedback loop is in FramebufferVk class. This really belongs to ContextVk, since it is not a permanent state of framebuffer, but current state of context. This CL moves it to ContextVk and changes to use BitSet instead of four boolean. Bug: b/289436017 Change-Id: I955c439259935f82eff30ddfff776a69723e5d0d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4679886 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Charlie Lao a33ec5dd 2023-07-11T18:01:12 Vulkan: Clean up depthStencil feedback loop implementation Part1 This is first clean up effort for depth stencil feedback loop implementation. This CL moves updateRenderPassStencilReadOnlyMode and updateRenderPassDepthReadOnlyMode methods from FramebufferVk to RenderPassCommandBufferHelper class. The method is actually updating renderPass's state, not FramebufferVk's state. In the next CL, FramebufferVk will be removed from the argument as well. With this change, I also removes updateStartedRenderPassWithDepthMode() and updateStartedRenderPassWithStencilMode() to use updateStartedRenderPassWithDepthStencilMode() directly. This CL is mechanical changes only, no behavior chnage is expected. Bug: b/289436017 Change-Id: Id3960f973a7115c05ebea199cb8ef802e995941a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4679365 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Shahbaz Youssefi 0892420b 2023-06-28T23:03:51 Vulkan: Optimize PBO download between RGBA and BGRA Google Meet hits this path in Chrome when blurring the background. The CPU readback was particularly slow on Intel/Mesa where readback from the temp buffer took hundreds of milliseconds. This change adds a compute shader that directly copies from the image to the pack buffer in simple but common cases. Bug: b/286882707 Change-Id: I9877ea01e3d8377db96f2539362aca67cf832b4a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4657058 Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Yuxin Hu <yuxinhu@google.com>
Amirali Abdolrashidi 5f9548c3 2023-05-19T11:51:04 Vulkan: Free the garbage memory before realloc Currently image allocations fall back to system memory in case of a device OOM. However, in some cases, it is also possible to gain some memory by freeing garbage memory from the device. This allows us to keep the allocation on the device memory. * Updated the image allocation fallback, so we will try cleaning the garbage memory through the renderer before retrying the allocation. * finishOneCommandBatchAndCleanup() in RendererVk, which will call a similar function in its CommandQueue. It will be called until there are no more in-flight submissions. * The existing finishOneCommandBatchAndCleanup() in CommandQueue has been renamed to finishOneCommandBatchAndCleanupImpl(). * Updated the flags used for VMA image allocations. If any device memory is freed after garbage cleanup to make enough space for the new allocation, it will take precedence over the system memory. * Added unit tests in which a new image allocation could happen on the device after freeing the garbage memory. * They use a 2D texture and a 2D texture array for garbage. Bug: b/280304441 Change-Id: Ia5e605e180833b44af8c77550ab1b0b8ba21724e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4547941 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Shahbaz Youssefi 86e21fa2 2023-06-12T13:36:21 Vulkan: Refactor angle::Format depth/stencil checks Bug: b/246008627 Change-Id: If0a2992c5bd66adf27c6866aea04e54ba465a522 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4608489 Reviewed-by: Roman Lavrov <romanl@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Charlie Lao ba857c47 2023-06-01T11:33:22 Vulkan: Move some bufferRead logic into CommandBufferHelperCommon Some of the logic of bufferRead is common, to avoid code duplication, it has been refactored into bufferReadImpl and moved into CommandBufferHelperCommon. Bug: b/282194402 Change-Id: I98cd0788db7fe9f14bd3155b28dc208f9f4a138a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4581061 Reviewed-by: Roman Lavrov <romanl@google.com> Commit-Queue: Charlie Lao <cclao@google.com>
Charlie Lao 2501903e 2023-05-31T11:59:36 Vulkan: Merge UpdateShader***Buffers into updateShaderBuffers Both UpdateShaderUniformBuffers() and DescriptorSetDescBuilder::updateShaderBuffers() walks the list of uniform blocks (or storage blocks). Some of the logic are the same and we are paying that overhead twice. In this CL, UpdateShaderStorageBuffers, UpdateShaderUniformBuffers and UpdateShaderAtomicCounterBuffers functions are merged into DescriptorSetDescBuilder::updateShaderBuffers and DescriptorSetDescBuilder::updateAtomicCounters. In order to handle the usage that same buffer used by multiple shader stages, a new variant of bufferRead() function is added that takes "const gl::ShaderBitSet" instead of single PipelineStage. This also paves way for next CL so that we can call updateOneShaderBuffer individually. Bug: b/282194402 Change-Id: I09d045d6295827b60bdb4c05df9333fe593fa40e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4574288 Reviewed-by: Roman Lavrov <romanl@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Charlie Lao <cclao@google.com>
Shahbaz Youssefi b76166d0 2023-05-18T09:57:25 Vulkan: Separate image and sampler 2D view of 3D features The sampler feature is used to determine if EGL_KHR_gl_texture_3D_image can be exposed. The image feature is used to support base GLES 3.1 storage images. Bug: b/274478146 Change-Id: Ifb283633078ace7ee65f8aafe756d0a02b727bd7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4545005 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Shahbaz Youssefi 80dd54f9 2023-05-17T22:16:59 Vulkan: Enable VK_EXT_image_2d_view_of_3d and set flag Based on a change by Yiwei Zhang <zzyiwei@chromium.org> Enabling the extension was missed from https://chromium-review.googlesource.com/c/angle/angle/+/3648586, and some implementations do rely on the feature enablement (e.g. RADV). This also fixes the VVL violations on satisfied implementations. Additionally, the VK_IMAGE_CREATE_2D_VIEW_COMPATIBLE_BIT_EXT flag is always set on 3D images when this extension is supported; it is needed for both sampled and storage images, not just storage images. Bug: b/274478146 Change-Id: Ibc210275e2d39ac0c54d1ae4c2451a5402360972 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4544762 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Auto-Submit: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Cody Northrop ec308b35 2023-05-15T15:03:10 Vulkan: Add feature to limit sample count to 2 This CL adds a feature called `limitSampleCountTo2`. Using it will have the Vulkan backend limit max samples to 2. Why 2? That's the minimum required in Vulkan to multisample without error. Here's an example validation error: vkCmdResolveImage: srcImage sample count is VK_SAMPLE_COUNT_1_BIT. The Vulkan spec states: srcImage must have a sample count equal to any valid sample count value other than VK_SAMPLE_COUNT_1_BIT. https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkCmdResolveImage-srcImage-00257 Using a limit as opposed to forcing a value allows non-multisampling (sample count of 1) to continue working. To see how tests fare when the feature is set, see the following test results that force the value on: https://chromium-review.googlesource.com/c/angle/angle/+/4534098/4 Test: adb shell setprop debug.angle.feature_overrides_enabled limitSampleCountTo2 Bug: b/279498079 Bug: angleproject:8162 Change-Id: I1df2822709151e6084c32055b5aff444e0b10db5 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4518562 Commit-Queue: Cody Northrop <cnorthrop@google.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Greg Schlomoff <gregschlom@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi 6f959e07 2023-04-28T16:00:11 Vulkan: Add non-device memory option for VMA image * Updated the required flags for allocateAndBindMemory() to no longer include VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, though still preferred. This allows VMA to allocate from another memory type if the device is out of memory. * Added a debug message to indicate when allocated memory for VMA image does not have all the preferred property flags. * Also added a warning in the case of memory allocation fallback. * Added a perf counter to keep track of image allocation fallbacks from the device memory. * deviceMemoryImageAllocationFallbacks * Added a test to make sure that VMA images can still be allocated from other memory types even if device memory is unavailable. * VulkanImageTest.AllocateVMAImageWhenDeviceOOM Bug: b/280304441 Change-Id: Ic452c18ded25345cdb7e271442372b99aede045e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4493483 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Igor Nazarov e24f4519 2023-01-19T02:30:39 Vulkan: Add externalFence into submitCommands() Currently one-off fence in the `queueSubmitOneOff()` is used only in `SyncHelperNativeFence::initializeWithFd()` to submit external fence. Other `queueSubmitOneOff()` calls may use `QueueSerial` instead of a fence. Providing `fence` into `queueSubmitOneOff()` prevents tracking that submission with `QueueSerial`. Therefore using `mUse` to collecting `mFenceWithFd` as garbage will not work as intended. This CL removes `fence` from `queueSubmitOneOff()` and adds optional `externalFence` into `submitCommands()` instead. Providing `externalFence` will cause additional `vkQueueSubmit()` call: - first submission will submit everything as usual except using the `externalFence`. - second, will only submit internal `CommandQueue` fence for `QueueSerial` tracking. As the result of this CL, call to `initializeWithFd()` will always produce two (2) `vkQueueSubmit()` calls. Previously it may be one (1) or two (2) submissions. Future CL will reduce submission count to one (1). If add additional submission into `queueSubmitOneOff()` instead of `submitCommands()`, then maximum number of submissions will be three (3). Bug: angleproject:8117 Change-Id: I6f1ec12682aaab71bfc871e665fec2659df96b26 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4392877 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Amirali Abdolrashidi a73e546c 2023-04-14T13:40:19 Vulkan: Add pending memory size for VMA images * Added implementation for vma::FindMemoryTypeIndexForImageInfo(). * Add pending memory allocation tracking for VMA images using the memory requirement from getMemoryRequirements() and the memory type index from the function above. Bug: b/218891184 Change-Id: I1c3d3a8f5f36eb57bd7a5a059aa3bf713b819831 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4428535 Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
Shahbaz Youssefi 13935469 2023-04-25T09:58:42 Vulkan: Fix access mask of generateMipmap's blit Bug: angleproject:8143 Bug: angleproject:7125 Change-Id: I6e5b6cd1f445c2c41d4b78aeb368e30cc4c5354b Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4475444 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Roman Lavrov 2ce6d8df 2023-04-20T17:16:00 TextureVk calls onStateChange when releasing buffer views TextureVk::setBuffer releases buffer views and they don't get initialized unless a dirty bit is set. That works on the first draw call because Texture::Texture() sets DIRTY_BIT_IMPLEMENTATION in constructor, but when called twice mBufferViews remains unitialized causing mBufferViews.getView() misbehavior. Fix this by calling onStateChange when buffer views are released. As mBufferViews is only initialized conditionally for texture buffers, add a bool for tracking whether it is in the initialized state to avoid marking texture as dirty unnecessarily. Note that this isn't handled by signalDirtyStorage() inside Texture::setBufferRange because (in this test case?) there are no observers on Texture so onStateChange called from Texture is a no-op. Texture however observes its TextureVk implementation so onStateChange calls from TextureVk end up setting the dirty bit in Texture::onSubjectStateChange. Bug: b/278585075 Change-Id: I2b83160cdd89a086ed81e8412cd64d0aad930911 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4457147 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Roman Lavrov <romanl@google.com>
Amirali Abdolrashidi a7bd3f53 2023-03-31T16:57:35 Fix the retrace issue for VMA image suballocation After adding the VMA image suballocation feature, VVL errors were seen when using the retracing script for some traces, causing failure. After analysis, it was seen that the functionality of the allocation differs from the original method when it comes to non-zero memory allocation. * In allocateAndBindMemory(), the memory property flags from the allocated memory are returned to be used for non-zero memory feature usage. * Added mapMemoryAndInitWithNonZeroValue() to ImageMemorySuballocator, which is used when allocateNonZeroMemory is enabled and the allocated memory is host-visible. * Merged the following into ImageHelper::initializeNonZeroMemory(): * mapMemoryAndInitWithNonZeroValue() * InitMappableDeviceMemory(); used when VMA image suballocation is disabled. * Moved onMemoryAlloc() inside allocateAndBindMemory(). Test: retrace_restricted_traces.py Bug: b/277618656 Bug: angleproject:8058 Change-Id: If411a073e900c1c034d40a99e3fffefe30c82548 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4391403 Reviewed-by: Cody Northrop <cnorthrop@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Charlie Lao d58fbf04 2023-04-05T12:32:09 Vulkan: Wait for surface ANI semaphore only if image is used Right now there is a bug that surface's ANI semaphore is added to context when WindowSurfaceVk::getAttachmentRenderTarget get called, which gets called from FramebufferVk::syncState, which is before we end the previous render pass, due to the endRenderpass usually is deferred until next renderPass starts. This caused ANI semaphore gets added to the previous render pass's submission, which does not use surface, and thus a bubble in GPU execution pipeline where the user FBO rendering gets unnecessarily blocked until ANI semaphore is signaled. This lowers GPU utilization and thus GPU frequency gets dropped and frame time increased. This CL stores ANI semaphore to ImageHelper object and when barrier is generated, the ANI semaphore is moved to CommandBuffer. When CommandBuffer gets flushed and submitted, it gets added to the waitSemaphores vector and submitted to vulkan. Since all use of swap chain image must go through barrier code first (you need at least change layout), this ensures ANI semaphore gets waited in exact and robust way. Only the submission that references the swap chain image will be waited. With this CL, professional_baseball_spirits reduces frame time from 3.8 ms to 2.7 ms, achieving parity with native GLES on pixel 7 pro. Bug: b/275624771 Change-Id: Ifa6cacf9e3bc147bfde54eb7def2fca48c50aca0 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4400011 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 65ed3050 2023-04-04T11:57:38 Vulkan: Let recordWriteBarrier use CommandBufferHelper This is preparation for next CL. It changes OutsideRenderPassCommandBuffer argument to OutsideRenderPassCommandBufferHelper for recordWriteBarrier() and recordReadBarrier() calls, so that it has access to the helper object (will be used in next CL). It passes CommandsState to executeBarriers() instead of PrimaryCommandBuffer. No actual functionality change is expected. Bug: b/275624771 Change-Id: Ia06a0398a127539b0b642005803a498cb2a9d7f1 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4400407 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 20f3df07 2023-03-30T11:36:29 Vulkan: ImageHelper::initExternal missing some variable init When ImageHelper::releaseImage is called and then re-initialized (due to mipmap change or usage change etc), right now we have a bug that mCurrentShaderReadStageMask and mLastNonShaderReadOnlyLayout are still left as is. And they do not get reset when image is initialized. This means if the image was reinitialized again, it may end up think the new image was previously used by fragment shader and insert an unnecessary barrier when new VkImage is used. This was discovered when I do the experimental work try to ghost VkImage for glTexSubImage call and realized that new VkImage was having a strong fragment->vertex dependency which it should not (since this is a new VkImage). This CL initialize mCurrentShaderReadStageMask and mLastNonShaderReadOnlyLayout and mCurrentQueueFamilyIndex from initExternal call since it is a new vkImage. Bug: b/273808966 Change-Id: I5bbeae5f1012f10c24620cfae8fa20365a7b5ab7 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4386136 Reviewed-by: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Igor Nazarov 9524c639 2023-01-17T18:47:47 Vulkan: Fix Secondary Command Buffers with asyncCommandQueue. This change fixes following errors (asyncCommandQueue = true): 1. "vkEndCommandBuffer()" called from "asyncCommandQueue" thread. Call stack: vkEndCommandBuffer() VulkanSecondaryCommandBuffer::end() OutsideRenderPassCommandBufferHelper::flushToPrimary() CommandQueue::flushOutsideRPCommands() CommandProcessor::processTask() Fixed by calling "vkEndCommandBuffer()" from the Context thread in the new "OutsideRenderPassCommandBufferHelper::detachCommandPool()" method. 2. "vkAllocateCommandBuffers()/vkBeginCommandBuffer()" called from "asyncCommandQueue" thread. Call stack: vkAllocateCommandBuffers()/vkBeginCommandBuffer() VulkanSecondaryCommandBuffer::initialize() <*>CommandBufferHelper::initializeCommandBuffer() <*>CommandBufferHelper::reset() <*>CommandBufferHelper::flushToPrimary() CommandQueue::flush<*>Commands() CommandProcessor::processTask() Fixed by calling "vkAllocateCommandBuffers()/vkBeginCommandBuffer()" from the Context thread in the new "<*>CommandBufferHelper::attachCommandPool()" method. 3. "SecondaryCommandPool::collect()" called from "asyncCommandQueue" thread without synchronization. Call stack: SecondaryCommandPool::collect() rx::vk::RecycleCommandBufferHelper() CommandBufferRecycler<>::recycleCommandBufferHelper) RendererVk::recycle<*>CommandBufferHelper() CommandProcessor::processTask() No need for this call, because "SecondaryCommandPool" is already detached. Notes: This CL not only fixes errors, but also optimizes CommandBufferHelper recycling. Before, there was no recycling plus unnecessary "VulkanSecondaryCommandBuffer" allocation/freeing. Further optimization may add multiple "VkCommandPool"s to the "SecondaryCommandPool" to allow resetting buffers in the async thread. Bug: angleproject:6811 Bug: angleproject:6100 Change-Id: I7004c27a112e916c5c973b43b137193017d6aa3d Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4342189 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Greg Schlomoff a621ea88 2023-03-17T10:21:56 Adding a trace point for texture metrics. Bug: b/236121838 Change-Id: I28e7977f7b5dc6e24cb5a2be10689c223851ba0b Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4348737 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Doug Horn <doughorn@google.com> Reviewed-by: Charlie Lao <cclao@google.com>
Igor Nazarov 7d1a401b 2023-01-17T18:45:55 Vulkan: Fix freeing not completed Secondary Command Buffers. Problem: - Protected Context flushes its commands to the Protected Primary Command Buffer; - Unprotected Context flushes its commands to the Unprotected Primary Command Buffer; - Context with different "egl::ContextPriority" may flush commands into different Primary Command Buffers. - Secondary Command Buffers from all Contexts end-up in the single "CommandBufferRecycler::mSecondaryCommandBuffersToReset" list; - One of the Contexts submits its Primary Command Buffer, and attaches current "mSecondaryCommandBuffersToReset" list to the "CommandBatch"; - Secondary Command Buffers of other Contexts may be collected and later freed by "SecondaryCommandPool" without submitting/completion corresponding Primary Command Buffers. Fix: - Moving "mSecondaryCommandBuffersToReset" to the new "SecondaryCommandBufferCollector" class. - Separate "SecondaryCommandBufferCollector" instance is stored in the "CommandQueue" for each current Primary Command Buffer. Additionally fixes "asyncCommandQueue" related problem: "releaseCommandBuffersToReset()" may get outdated results if flush is not yet executed in the "asyncCommandQueue" thread. Bug: angleproject:6100 Change-Id: I7df161ac1f999fb34d4eccaebb603c58ecb1ac11 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4334579 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Igor Nazarov 9b6368cc 2023-03-14T14:48:30 Vulkan: Fix freeing Secondary Command Buffers from wrong thread. Problem: - Secondary Command Buffers are freed in the "CommandQueue" class. - This may happen from any Context thread that calls "checkCompletedCommands()" or "finish<*>()" methods. - As the result, one Command Buffer may be freed from one thread, while other Command Buffer from the same "VkCommandPool" is allocated/reset/recorded in the other thread. Vulkan spec demands external "VkCommandPool" synchronization for any modifications (begin/end/reset/free/cmd) on its "VkCommandBuffer"s. Fix: - Added new "rx::vk::SecondaryCommandPool" class that replaces the "rx::vk::CommandPool" wrapper. - This class has "collect()" method for storing "VkCommandBuffer"s. Collected buffers are freed from the correct thread on the next "allocate()" call. This CL only fixes the problem, keeping Secondary Command Buffer memory management as is (allocate/free single buffer without reuse). In the future CLs this behavior may be changed (reuse buffers, reset/free entire pools). Bug: angleproject:6100 Change-Id: If938416c4df4fe55f0cfb418b6759721ac53098b Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4334577 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 4c157b4b 2023-03-14T18:45:38 Vulkan: Switch staging buffer to Buddy algorithm Staging buffer should be considered as dynamic and uses buddy allocator. Bug: b/271915956 Change-Id: I7cbe3765fdae120582034b24376560043e007e67 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4327290 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com>
Charlie Lao d6a25bfa 2023-03-07T15:06:10 Vulkan: Optimize glBufferData call to improve storage reuse If app calls glBufferData with certain size, then calls it again with size 0, and then call it again with same old size again, we should try to reuse the existing storage. When size is zero, with the existing logic, we never free the storage. When glBufferData is called third time with the same size as the first glBufferData call, we expect to reuse the existing storage. But because of the storage reuse logic is comparing buffer's new size to the old size (which is 0), we missed the opportunity to reuse the existing storage. This CL update the reuse logic so that it checks the new size against storage's size (instead of OpenGLES buffer's size) and if we will end up with same sized allocation and same pool and memory type, then we reuse instead of reallocate. This reduces efootball_pes_2021 frame time from 4.670 ms to 4.277 ms on pixel 7 pro. Bug: b/271915956 Change-Id: I6f91e3e85b104eca215b28e7d0bea413ecc4401c Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4317488 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Charlie Lao 67ad3ddc 2023-03-06T16:44:36 Vulkan: Relax size limit for dynamicBuffer to pick buddy algorithm If glBufferData's usage is one of the dynamic usage, app may keep calling glBufferData frequently, which means get into suballocation code frequently. There are two suballocation algorithms today: buddy algorithm (faster) and generic (slower). Right now the decision of which algorithm (i.e, which pool) to use is purely based on size or memory type. This CL also utilize usage information so that dynamic usage will pick buddy algorithm with bigger size threshold. mSmallBufferPool is removed and replaced with the BufferPoolPointerArray that gets picked based on allocation algorithm. This CL reduces average frame time of efootball_pes_2021 from 7.518 ms to 4.670 ms on pixel 7 Pro. Bug: b/271915956 Change-Id: I1c2f270ac49f56e6f405501d20691cfbab49e7eb Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4313685 Reviewed-by: Amirali Abdolrashidi <abdolrashidi@google.com> Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Igor Nazarov 233c128b 2023-01-17T19:21:58 Vulkan: Fix UBs when deleted attachment is used in a RenderPass. Problem: - "RenderbufferVk"/"TextureVk" with "mOwnsImage == false" used as RenderPass attachment. - "RenderbufferVk"/"TextureVk" deleted. - Owning resource is destroyed ("EGLImage" and all siblings / "EGLSurface"). - Crash (UB) may happen when ending RenderPass, flushing or executing commands. Fix adds tracking of "vk::ImageSourceID" value in "vk::RenderPassAttachment" - IDs of objects, that originally provide "vk::ImageHelper" images. This is necessary, because when using EGLImage, there may be multiple "TextureVk" objects with same "vk::ImageHelper", and need to call "finalizeImageLayout()" for the correct attachment. Bug: angleproject:8032 Test: angle_end2end_tests --gtest_filter=ImageTest*DeletedWhileInUse* Test: angle_end2end_tests --gtest_filter=PbufferTest.UseAsFramebufferColorThenDestroy* Change-Id: I50fdd9d6b6a9677adad2262373303b46de1dee4c Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4296014 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Igor Nazarov 244e1931 2023-01-17T19:22:10 Vulkan: Fix use of pending Outside RenderPass CommandBuffer. Regression from this commit: 730c127102b540ce2c4ec086b037c8b732706e26 "Vulkan: Submit queue more often for texture data" Bug: angleproject:6354 Test: angle_end2end_tests --gtest_filter=*SubmittingOutsideCommandBufferAssertIsOpen* Change-Id: I5f72f499cd7153c94c8e5f8a3415df2182726c8e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4296802 Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Igor Nazarov ad7949c6 2023-01-17T19:24:59 Vulkan: Remove "rx::vk::ImageHelper" move constructor. Move constructor was only used in the "rx::SwapchainImage" structure. This CL replaces "image" member with "std::unique_ptr<vk::ImageHelper>". This will remove source of bugs. Bug: angleproject:8052 Change-Id: Ic16f674095233baaa56fbe8a8fb7ef3e323a7331 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4294905 Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Igor Nazarov dbece66f 2023-01-17T19:25:24 Vulkan: Fix move constructor/assignment of Resource classes. Bug in "DynamicallyGrowingPool<Pool>::PoolResource" causes real problems with queries. "QueryPool" may be reused without waiting for the previous use. Bugs "QueryHelper" may affect "mInFlightGpuEventQueries" (not used in "QueryVk"). Updated "FramebufferHelper" move assignment so it uses "Resource" assignment instead of protected member access. Bug: angleproject:8053 Change-Id: I441b62102fcf232456027fb42eefa97ed8958676 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4300050 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Charlie Lao 7eb6869a 2022-08-30T16:28:08 Vulkan: Change ResourceAccess::Write to ResourceAccess::ReadWrite AS a preparation for the next CL which will optimize for WriteOnly access, this CL changes Write to ReadWrite and adds WriteOnly access (but not used yet). Mechanical changes only and no function difference is expected. Bug: b/243711628 Change-Id: I509d6045ae87635e24076b646af42f35d88d52cf Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/3866672 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Yuxin Hu <yuxinhu@google.com>
Igor Nazarov 764cdbad 2023-01-25T18:25:57 Vulkan: Add missing mutex lock into resetCommandBuffer(). Regression after the commit: 77d19e39794cc2de28f33a714f27af5a7de128ae Vulkan: Add ThreadSafeCommandQueue class Bug: b/261106868 Change-Id: I083b1bdc42a1382d32ab9087c92adbb963ff7d1e Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4300869 Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com>
Igor Nazarov 1365f5b3 2022-09-09T17:19:26 Vulkan: Fix Swapchain Acquire Image Semaphore wait stage flags. There is a screen tearing on G996B with single "glClear(GL_COLOR_BUFFER_BIT)" no scissor in the frame. Fixed by defining "kSwapchainAcquireImageWaitStageFlags" and adding "VK_PIPELINE_STAGE_TRANSFER_BIT" stage flag. Also added "VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT" stage, because first use of the Swapchain Image after Acquire may be in the "glBlitFramebuffer()" command. This fix may slightly affect performance. In such case, a better fix should be implemented (include only stages based on the actual first use). However, this may be not trivial. Additionally, "kSwapchainAcquireImageWaitStageFlags" is used as a source stage mask in the "ImageLayout::Present" pipeline barrier. This is needed in order to build a dependency chain from the Acquire Image Semaphore to the layout transition's first synchronization scope, so that layout transition happens after acquire semaphore is signaled. Reference: https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples#combined-graphicspresent-queue https://vulkan-tutorial.com/Drawing_a_triangle/Drawing/Rendering_and_presentation Alternative fix of both issues is to define: kSwapchainAcquireImageWaitStageFlags = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT This might potentially delay command buffer execution relative to the Swapchain Acquire Image Semaphore signal operations, but will relax the pipeline barrier. Bug: angleproject:8030 Test: angle_end2end_tests --gtest_also_run_disabled_tests --gtest_filter=EGLSurfaceTest.DISABLED_RandomClearTearing* Change-Id: I29f58862c4b369524b2555dd944e2fb67eebe956 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4271377 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Charlie Lao <cclao@google.com>
Amirali Abdolrashidi 3a7904e1 2023-01-25T23:56:56 Vulkan: Use VMA suballocation for images There is a maximum limit for device memory object allocation. On some platforms, there can be an error regarding too many object allocations when 4096 device memory handles have been allocated. Suballocation can help mitigate this issue. In this CL, some images will be allocated using VMA API calls, which use suballocation. * Added a new feature (useVmaForImageSuballocation). * Added VMA allocation for ImageHelper, which is used in initMemory(). * Suballocation is used for VMA image allocation. * If enabled, mVmaAllocation will be initialized in the ImageHelper object (instead of mDeviceMemory). * It is currently used for all platforms. * Minor change to the name of an arg in CreateBuffer() declaration. * Added test to make sure we can allocate at least 4096 images on supported platforms (8000 in the test). * Skipped the test "NonZeroBaseEmulatedClear" when run on Linux/Intel if this feature is enabled (due to output color mismatch). * Skipped several tests for capture/replay on Windows. Bug: b/218891184 Change-Id: Ibf80c9c8c485b301da7d23b5ba4bcbb1a8e3194f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4191202 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com> Commit-Queue: Amirali Abdolrashidi <abdolrashidi@google.com>
Igor Nazarov 9a4a9f3f 2023-02-16T17:33:52 Revert "Fix dEQP-EGL.functional.mutable_render_buffer#basic" This PARTIALLY reverts commit: 629da7fc9cd4886dd87f07a069c259551e892936. Fix dEQP-EGL.functional.mutable_render_buffer#basic From the reverted CL: This CL also addresses similar issue in some other rx::vk::ImageLayout items in kImageMemoryBarrierData. These were unnecessary changes that may harm performance: - adding BOTTOM_OF_PIPE for source stage when transition from LAYOUT_UNDEFINED may add unnecessary GPU bubble. - Transition to LAYOUT_PRESENT_SRC_KHR does not require barrier. All writes will be automatically made visible for Presentation Engine. Execution dependency satisfied by VkSemaphore. - Transition from LAYOUT_PRESENT_SRC_KHR is RAR/WAR. Execution dependency satisfied by VkSemaphore. - Some layouts may not be a destination so BOTTOM_OF_PIPE is OK. Bug: b/264420030 Change-Id: I8b57b1636e1f5cf5b647003adf1502bd3286c5a3 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4262067 Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Charlie Lao <cclao@google.com>
Charlie Lao c402ea1c 2023-02-15T12:01:38 Vulkan: Rename hasUnfinishedUse to hasResourceUseFinished Most usage of hasUnfinishedUse is for !hasUnfinishedUse, and there was feedback that negative API is not preferred. This CL changes it to positive API name. Similarly renamed hasUnsubmittedUse to hasResourceUseSubmitted. Bug: b/267348918 Change-Id: Idb10b0f998ec50116ffb6aada19a98a516e87824 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4257105 Commit-Queue: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Igor Nazarov 8cac53cd 2023-01-17T19:22:57 Vulkan: Fix incorrect "SharedPresent" barrier. Previous fix swapped top/bottom barriers: 629da7fc9cd4886dd87f07a069c259551e892936 Fix dEQP-EGL.functional.mutable_render_buffer#basic Above fix is only partial, because it only includes execution dependency without memory barriers (top/bottom stages has no memory access). Fixed by forcing all possible stages for "SharedPresent" images. Better solution requires creating specific versions of "ImageLayout::SharedPresent". Added new test that skips "glFlush()" before "glReadPixels()". Performing flush executes present and may "fix" the barrier problem. New test fails on "Samsung Galaxy S22+ S906B" Bug: b/264420030 Test: angle_end2end_tests --gtest_filter="EGLSingleBufferTest.SharedPresentBarrier*" Change-Id: Icbb50900d99e42d2e9482cd6109981bbc460348a Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4262068 Commit-Queue: Igor Nazarov <i.nazarov@samsung.com> Reviewed-by: Charlie Lao <cclao@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org>
Yuxin Hu 629da7fc 2023-01-26T12:25:04 Fix dEQP-EGL.functional.mutable_render_buffer#basic These are vulkan commands submitted between glClear() and glReadPixels() when the EGL_RENDER_BUFFER is EGL_SINGLE_BUFFER (ImageLayour is SharedPresent): ``` vkCmdClearColorImage() vkCmdPipelineBarrier: ( VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,//srcStageMask VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,//dstStageMask VK_ACCESS_MEMORY_WRITE_BIT,//srcAccessMask VK_ACCESS_MEMORY_WRITE_BIT|VK_ACCESS_MEMORY_READ_BIT//dstAccessMask ) vkCmdCopyImageToBuffer() ``` This means that operations at the bottom of pipeline in vkCmdCopyImageToBuffer() need to wait for operations at the top of pipeline in vmCmdClearColorImage(), which translates to vkCmdCopyImageToBuffer() does not have to wait for vkCmdClearColorImage() to finish. Even the dstAccessMask ensures that vkCmdCopyImageToBuffer() will invalidate cache before copying image, it is possible that it will retrieve the old Framebuffer color attachment data as the vkCmdClearColorImage() has not finished. This CL fixes the bug by making the srcStageMask to be VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT and the dstStageMask to be VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, when the ImageLayout is SharedPresent. This ensures that vkCmdCopyImageToBuffer() waits for vkCmdClearColorImage() to finish. This CL also addresses similar issue in some other rx::vk::ImageLayout items in kImageMemoryBarrierData. Bug: b/264420030 Change-Id: If47ab071afaf96e396357cb0f50131339fa58509 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4198476 Commit-Queue: Yuxin Hu <yuxinhu@google.com> Reviewed-by: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>