Hash :
93b97a59
Author :
Date :
2023-11-03T22:07:23
Make link job directly wait on compile job Previously, program link waited on the compile job on the calling thread before launching the link job. As a result, sequences of intermixed compile and link would get largely serialized as such: Main Thread Thread 1 Thread 2 Thread 3 Thread 4 Compile -------> Compile Compile -----------|----------> Compile Link | | Wait | | | | | |<--------------/--------------/ \------------------------------------------> Link Compile -------> Compile | Compile -----------|----------> Compile | Link | | | Wait | | | | | | | |<--------------/--------------/ | \---------------------------------------------|-----------> Link Compile -------> Compile | | Compile -----------|----------> Compile | | Link | | | | Wait | | | | | | | | | ... With this change, the main thread no longer waits for compilation to finish. It's the link job itself that does the waiting. This allows the main thread to go through Compile and Link commands without blocking, generating as many jobs as needed. The above scenario therefore becomes: Main T1 T2 T3 T4 T5 T6 T7 T8 T9 C ----> C C ------|----> C L ------|------|----> L C ------|------|-------W---> C C ------|------|-------|-----|----> C L ------|------|-------|-----|------|----> L C ------|------|-------|-----|------|-------W---> C C ------|------|-------|-----|------|-------|-----|----> C L ------|------|-------|-----|------|-------|-----|------|----> L . \-----\------>/ | | | | | W . | \-----\------>/ | | | . | | \-----\------>/ . | | | . | | | This greatly improves the amount of parallelism compile and link jobs get. The careful observer may note that the link job being blocked on the compile job is now wasting a thread from the thread pool. While this change is strictly an improvement, parallelism can be further improved if the link job is just not assigned to a thread until the corresponding compile jobs are finished. This is currently not possible, but may be if: - Instead of a thread pool, the operating system's FIFO scheduler is used. Then the operating system would automatically put blocking tasks to sleep and pick up another task. This has the downside of requiring threads to be created for each task. - The thread pool work scheduler is enhanced to be made aware of relationship between tasks and avoid scheduling jobs whose dependencies are not yet met. Alternatively, the number of threads in the pool can be increased by 30% and hope for the best. Bug: angleproject:8297 Change-Id: If4e6540ade47558a10cfab55e2286f073b904928 Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/5006874 Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org> Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Charlie Lao <cclao@google.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
//
// Copyright 2022 The ANGLE Project Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
//
// CompiledShaderState.h:
// Defines a struct containing any data that is needed to build
// a ShaderState from a TCompiler.
//
#ifndef COMMON_COMPILEDSHADERSTATE_H_
#define COMMON_COMPILEDSHADERSTATE_H_
#include "common/BinaryStream.h"
#include "common/Optional.h"
#include "common/PackedEnums.h"
#include <GLSLANG/ShaderLang.h>
#include <GLSLANG/ShaderVars.h>
#include <memory>
#include <string>
namespace sh
{
struct BlockMemberInfo;
}
namespace gl
{
// @todo this type is also defined in compiler/Compiler.h and libANGLE/renderer_utils.h. Move this
// to a single common definition?
using SpecConstUsageBits = angle::PackedEnumBitSet<sh::vk::SpecConstUsage, uint32_t>;
// Helper functions for serializing shader variables
void WriteShaderVar(gl::BinaryOutputStream *stream, const sh::ShaderVariable &var);
void LoadShaderVar(gl::BinaryInputStream *stream, sh::ShaderVariable *var);
void WriteShInterfaceBlock(gl::BinaryOutputStream *stream, const sh::InterfaceBlock &block);
void LoadShInterfaceBlock(gl::BinaryInputStream *stream, sh::InterfaceBlock *block);
bool CompareShaderVar(const sh::ShaderVariable &x, const sh::ShaderVariable &y);
struct CompiledShaderState
{
CompiledShaderState(gl::ShaderType shaderType);
~CompiledShaderState();
void buildCompiledShaderState(const ShHandle compilerHandle, const bool isBinaryOutput);
void serialize(gl::BinaryOutputStream &stream) const;
void deserialize(gl::BinaryInputStream &stream);
const gl::ShaderType shaderType;
int shaderVersion;
std::string translatedSource;
sh::BinaryBlob compiledBinary;
sh::WorkGroupSize localSize;
std::vector<sh::ShaderVariable> inputVaryings;
std::vector<sh::ShaderVariable> outputVaryings;
std::vector<sh::ShaderVariable> uniforms;
std::vector<sh::InterfaceBlock> uniformBlocks;
std::vector<sh::InterfaceBlock> shaderStorageBlocks;
std::vector<sh::ShaderVariable> allAttributes;
std::vector<sh::ShaderVariable> activeAttributes;
std::vector<sh::ShaderVariable> activeOutputVariables;
bool hasClipDistance;
bool hasDiscard;
bool enablesPerSampleShading;
gl::BlendEquationBitSet advancedBlendEquations;
SpecConstUsageBits specConstUsageBits;
// GL_OVR_multiview / GL_OVR_multiview2
int numViews;
// Geometry Shader
Optional<gl::PrimitiveMode> geometryShaderInputPrimitiveType;
Optional<gl::PrimitiveMode> geometryShaderOutputPrimitiveType;
Optional<GLint> geometryShaderMaxVertices;
int geometryShaderInvocations;
// Tessellation Shader
int tessControlShaderVertices;
GLenum tessGenMode;
GLenum tessGenSpacing;
GLenum tessGenVertexOrder;
GLenum tessGenPointMode;
};
using SharedCompiledShaderState = std::shared_ptr<CompiledShaderState>;
} // namespace gl
#endif // COMMON_COMPILEDSHADERSTATE_H_