Hash :
e54d0f90
Author :
Date :
2019-06-30T03:26:18
Vulkan: Debug overlay
A debug overlay system for the Vulkan backend designed with efficiency
and runtime configurability in mind. Overlay widgets are of two
fundamental types:
- Text widgets: A single line of text with small, medium or large font.
- Graph widgets: A bar graph of data.
Built on these, various overlay widget types are defined that gather
statistics. Five such types are defined with one widget per type as
example:
- Count: A widget that counts something. VulkanValidationMessageCount
is an overlay widget of this type that shows the number of validation
messages received from the validation layers.
- Text: A generic text. VulkanLastValidationMessage is an overlay
widget of this type that shows the last validation message.
- PerSecond: A value that gets reset every second automatically. FPS is
an overlay widget of this type that simply gets incremented on every
swap().
- RunningGraph: A graph of last N values. VulkanCommandGraphSize is an
overlay of this type. On every vkQueueSubmit, the number of nodes in
the command graph is accumulated. On every present(), the value is
taken as the number of nodes for the whole duration of the frame.
- RunningHistogram: A histogram of last N values. Input values are in
the [0, 1] range and they are ranked to N buckets for histogram
calculation. VulkanSecondaryCommandBufferPoolWaste is an overlay
widget of this type. On vkQueueSubmit, the memory waste from command
buffer pool allocations is recorded in the histogram.
Overlay font is placed in libANGLE/overlay/ which gen_overlay_fonts.py
processes to create an array of bits, which is processed at runtime to
create the actual font image (an image with 3 layers).
The overlay widget layout is defined in overlay_widgets.json which
gen_overlay_widgets.py processes to generate an array of widgetss, each
of its respective type, and sets their properties, such as color and
bounding box. The json file allows widgets to align against other
widgets as well as against the framebuffer edges.
Two compute shaders are implemented to efficiently render the UI:
- OverlayCull: This shader creates a bitset of Text and Graph widgets
whose bounding boxes intersect a corresponding subgroup processed by
OverlayDraw. This is done only when the enabled overlay widgets are
changed (a feature that is not yet implemented) or the surface is
resized.
- OverlayDraw: Using the bitsets generated by OverlayCull, values that
are uniform for each workgroup (set to be equal to hardware subgroup
size), this shader loops over enabled widgets that can possibly
intersect the pixel being processed and renders and blends in texts
and graphs. This is done once per frame on present().
Currently, to enable overlay widgets an environment variable is used.
For example:
$ export ANGLE_OVERLAY=FPS:VulkanSecondaryCommandBufferPoolWaste
$ ./hello_triangle --use-angle=vulkan
Possible future work:
- On Android, add settings in developer options and enable widgets based
on those.
- Spawn a small server in ANGLE and write an application that sends
enable/disable commands remotely.
- Implement overlay for other backends.
Bug: angleproject:3757
Change-Id: If9c6974d1935c18f460ec569e79b41188bd7afcc
Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/1729440
Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
Reviewed-by: Jamie Madill <jmadill@chromium.org>
angle_perftests is a standalone testing suite that contains targeted tests for OpenGL, Vulkan and ANGLE internal classes. The tests currently run on the Chromium ANGLE infrastructure and report results to the Chromium perf dashboard.
You can also build your own dashboards. For example, a comparison of ANGLE’s back-end draw call performance on Windows NVIDIA can be found at this link. Note that this link is not kept current.
You can follow the usual instructions to check out and build ANGLE. Build the angle_perftests target. Note that all test scores are higher-is-better. You should also ensure is_debug=false in your build. Running with dcheck_always_on or debug validation enabled is not recommended.
Variance can be a problem when benchmarking. We have a test harness to run a single test in an infinite loop and print some statistics to help mitigate variance. See scripts/perf_test_runner.py. To use the script first compile angle_perftests into a folder with the word Release in it. Then provide the name of the test as the argument to the script. The script will automatically pick up the most current angle_perftests and run in an infinite loop.
You can choose individual tests to run with --gtest_filter=*TestName*. To select a particular ANGLE back-end, add the name of the back-end to the test filter. For example: DrawCallPerfBenchmark.Run/gl or DrawCallPerfBenchmark.Run/d3d11. Many tests have sub-tests that run slightly different code paths. You might need to experiment to find the right sub-test and its name.
ANGLE implements a no-op driver for OpenGL, D3D11 and Vulkan. To run on these configurations use the gl_null, d3d11_null or vulkan_null test configurations. These null drivers will not do any GPU work. They will skip the driver entirely. These null configs are useful for diagnosing performance overhead in ANGLE code.
DrawCallPerfBenchmark: Runs a tight loop around DrawArarys calls. validation_only: Skips all rendering. render_to_texture: Render to a user Framebuffer instead of the default FBO. vbo_change: Applies a Vertex Array change between each draw. tex_change: Applies a Texture change between each draw. UniformsBenchmark: Tests performance of updating various uniforms counts followed by a DrawArrays call. vec4: Tests vec4 Uniforms. matrix: Tests using Matrix uniforms instead of vec4. multiprogram: Tests switching Programs between updates and draws. repeating: Skip the update of uniforms before each draw call. DrawElementsPerfBenchmark: Similar to DrawCallPerfBenchmark but for indexed DrawElements calls. BindingsBenchmark: Tests Buffer binding performance. Does no draw call operations. 100_objects_allocated_every_iteration: Tests repeated glBindBuffer with new buffers allocated each iteration. 100_objects_allocated_at_initialization: Tests repeated glBindBuffer the same objects each iteration. TexSubImageBenchmark: Tests glTexSubImage update performance. BufferSubDataBenchmark: Tests glBufferSubData update performance. TextureSamplingBenchmark: Tests Texture sampling performance. TextureBenchmark: Tests Texture state change performance. LinkProgramBenchmark: Tests performance of glLinkProgram. glmark2: Runs the glmark2 benchmark. Many other tests can be found that have documentation in their classes.