Hash :
6136cbcb
Author :
Date :
2020-09-23T21:31:05
Metal: Implement transform feedback
- XFB is currently emulated by writing to storage buffers.
- Metal doesn't allow vertex shader to both write to storage buffers and
to stage output (i.e clip position). So if GL_RASTERIZER_DISCARD is
NOT enabled, the draw with XFB enabled will have 2 passes:
+ First pass: vertex shader writes to XFB buffers + not write to stage
output + disable rasterizer.
+ Second pass: vertex shader writes to stage output (i.e.
[[position]]) + enable rasterizer. If GL_RASTERIZER_DISCARD is
enabled, the second pass is omitted.
+ This effectively executes the same vertex shader twice. TODO:
possible improvement is writing vertex outputs to buffer in first
pass then re-use that buffer as input for second pass which has a
passthrough vertex shader.
- If GL_RASTERIZER_DISCARD is enabled, and XFB is enabled:
+ Only first pass above will be executed, and the render pass will use
an empty 1x1 texture attachment since rasterization is not needed.
- If GL_RASTERIZER_DISCARD is enabled, but XFB is NOT enabled:
+ we still enable Metal rasterizer.
+ but vertex shader must emulate the discard by writing gl_Position =
(-3, -3, -3, 1). This effectively moves the vertex out of clip
space's visible area.
+ This is because GLSL still allows vertex shader to write to stage
output when rasterizer is disabled. However, Metal doesn't allow
that. In Metal, if rasterizer is disabled, then vertex shader must
not write to stage output.
- See src/libANGLE/renderer/metal/doc/TransformFeedback.md for more
details.
Bug: angleproject:2634
Change-Id: I6c700e031052560326b7f660ee7597202d38e6aa
Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/2408594
Reviewed-by: Jonah Ryan-Davis <jonahr@google.com>
Reviewed-by: Jamie Madill <jmadill@chromium.org>
Commit-Queue: Jonah Ryan-Davis <jonahr@google.com>
[[position]],
varying variables, …) then the Metal runtime won’t allow the MTLRenderPipelineState to be
created. It is only allowed to either write to buffers or to stage output not both on Metal. This
brings challenges to implement Transform Feedback when GL_RASTERIZER_DISCARD is not enabled,
because in that case, by right OpenGL will do both the Transform Feedback and rasterization
(feeding stage output to Fragment Shader) at the same time. [[vertex_id]]/gl_VertexIndex & [[instance_id]]/gl_InstanceID. MTLRenderPipelineState‘s rasterization will be disabled. This can be done in spirv-cross
translation step. spirv-cross can convert the Vertex Shader to a void function,
effectively won’t produce any stage output values for Fragment Shader. GL_RASTERIZER_DISCARD is enabled when Transform Feedback is enabled: GL_RASTERIZER_DISCARD is enabled and Transform Feedback is NOT enabled, we cannot disable
MTLRenderPipelineState‘s rasterization because if doing so, Metal runtime requires the Vertex
Shader to be a void function, i.e. not returning any stage output values. In order to
work-around this: MTLRenderPipelineState‘s rasterization will still be enabled this case. (-3, -3, -3, 1) to
[[position]]/gl_Position variable at the end. Effectively forcing the vertex to be clipped
and preventing it from being sent down to Fragment Shader. Note that the (-3, -3, -3, 1)
writing are controlled by a specialized constant, thus it could be turned on and off base on
GL_RASTERIZER_DISCARD state. It is more efficient doing this way than re-translating the
whole shader code again using spirv-cross to turn it to a void function. GL_RASTERIZER_DISCARD
enabled, the original approach will just simply executes the Vertex Shader once and use a cheap
1x1 render pass, thus it should be fast enough.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
# Transform Feedback implementation on Metal back-end
### Overview
- OpenGL ES 3.0 introduces Transform Feedback as a way to capture vertex outputs to buffers before
the introduction of Compute Shader in later versions.
- Metal doesn't support Transform Feedback natively but it is possible to be emulated using Compute
Shader or Vertex Shader to write vertex outputs to buffers directly.
- If Vertex Shader writes to buffers directly as well as to stage output (i.e. `[[position]]`,
varying variables, ...) then the Metal runtime won't allow the `MTLRenderPipelineState` to be
created. It is only allowed to either write to buffers or to stage output not both on Metal. This
brings challenges to implement Transform Feedback when `GL_RASTERIZER_DISCARD` is not enabled,
because in that case, by right OpenGL will do both the Transform Feedback and rasterization
(feeding stage output to Fragment Shader) at the same time.
### Current implementation
- Transform Feedback will be implemented by inserting additional code snippet to write vertex's
varying variables to buffers called XFB buffers at compilation time. The buffers' offsets are
calculated based on `[[vertex_id]]`/`gl_VertexIndex` & `[[instance_id]]`/`gl_InstanceID`.
- When Transform Feedback ends, a memory barrier must be inserted because the XFB buffers could be
used as vertex inputs in future draw calls. Due to Metal not supporting explicit memory barrier
(currently only macOS 10.14 and above supports it, ARM based macOS doesn't though), the only
reliable way to insert memory barrier currently is ending the render pass.
- In order to support Transform Feedback capturing and rasterization at the same time, the draw call
must be split into 2 passes:
- First pass: Vertex Shader will write captured varyings to XFB buffers.
`MTLRenderPipelineState`'s rasterization will be disabled. This can be done in `spirv-cross`
translation step. `spirv-cross` can convert the Vertex Shader to a `void` function,
effectively won't produce any stage output values for Fragment Shader.
- Second pass: Vertex Shader will write to stage output normally, but the XFB buffers writing
snippet are disabled. Note that the Vertex Shader in this pass is essential the same as the
first pass's, only difference is the output route (stage output vs XFB buffers). This
effectively executes the same Vertex Shader's internal logic twice.
- If `GL_RASTERIZER_DISCARD` is enabled when Transform Feedback is enabled:
- Only first pass above will be executed, the render pass will use 1x1 empty texture attachment
because rasterization is not needed and small texture attachment's load & store at render
pass's start & end boundary could be cheap. Recall that we have to end the render pass to
enforce XFB buffers' memory barrier as mentioned above.
- If `GL_RASTERIZER_DISCARD` is enabled and Transform Feedback is NOT enabled, we cannot disable
`MTLRenderPipelineState`'s rasterization because if doing so, Metal runtime requires the Vertex
Shader to be a `void` function, i.e. not returning any stage output values. In order to
work-around this:
- `MTLRenderPipelineState`'s rasterization will still be enabled this case.
- However, the Vertex Shader will be translated to write `(-3, -3, -3, 1)` to
`[[position]]`/`gl_Position` variable at the end. Effectively forcing the vertex to be clipped
and preventing it from being sent down to Fragment Shader. Note that the `(-3, -3, -3, 1)`
writing are controlled by a specialized constant, thus it could be turned on and off base on
`GL_RASTERIZER_DISCARD` state. It is more efficient doing this way than re-translating the
whole shader code again using `spirv-cross` to turn it to a `void` function.
### Future improvements
- Use explicit memory barrier on macOS devices supporting it instead of ending the render pass.
- Instead of executing the same Vertex Shader's logic twice, one alternative approach is writing the
vertex outputs to a temporary buffer. Then in second pass, copy the varyings from that buffer to
XFB buffers. If rasterization is still enabled, then the 3rd pass will be invoked to use the
temporary buffer as vertex input, the Vertex Shader in 3rd pass might just a simple passthrough
shader:
1. Original VS -> All outputs to temp buffer.
2. Temp buffer -> Copy captured varying to XFB buffers. Could be done in a Compute Shader.
3. Temp buffer -> VS pass through to FS for rasterization.
- However, this approach might even be slower than executing the Vertex Shader twice. Because a
memory barrier must be inserted after 1st step. This prevents multiple draw calls with Transform
Feedback to be parallelized. Furthermore, on iOS devices or devices not supporting explicit
barrier, the render pass must be ended and restarted after each draw call.
- Most of the time, the application usually uses Transform Feedback with `GL_RASTERIZER_DISCARD`
enabled, the original approach will just simply executes the Vertex Shader once and use a cheap
1x1 render pass, thus it should be fast enough.