book/17-rendergraph-compute-uav-patterns.md

17. RenderGraph Compute/UAV pattern (SSAO/SSR type)

Configure the compute pass with AddComputePass/ComputeGraphContext in URP RenderGraph and organize design patterns to safely read/write UAV (RandomWrite) texture/buffer.

17. RenderGraph Compute/UAV pattern (SSAO/SSR type)

This chapter aims to learn Compute Pass as a “practical pattern” in URP RenderGraph.

  • Like SSAO, “screen-based data (Depth/Normals) → output texture” structure
  • Structure that combines “time (History) + motion/depth/normal/color” like SSR

Predecessor:

17.1 Raster vs Compute: When is Compute Advantageous?

Typical cases where compute is advantageous:

  • When UAV(RandomWrite) is needed (e.g. histogram/reduction/tile classification)
  • Preprocessing to divide the screen into tiles/clusters (Forward+ light list, SSR tile, etc.)
  • Multiple outputs/unstructured memory access (e.g. writing scatter/list to buffer)

Conversely, cases where Raster (full screen pass) is advantageous:

  • Focuses on texture sampling, like simple per-pixel filters (blur/color correction)
  • When load/store can be minimized on tile-based GPUs

Practical conclusion
“Using Compute” is usually a choice that pays the price of complicating resource design (buffer/UAV/barrier).
So first clarify why you really need compute.

17.2 Basic framework of Compute Pass in RenderGraph

Based on URP document flow, Compute Pass is focused on:

  • Instead of AddRasterRenderPass, AddComputePass
  • Instead of RasterGraphContext, ComputeGraphContext

Concept Code:

C#
public override void RecordRenderGraph(RenderGraph renderGraph, ContextContainer frameData)
{
    using (var builder = renderGraph.AddComputePass<PassData>("MyComputePass", out var passData))
    {
        builder.SetRenderFunc((PassData data, ComputeGraphContext ctx) =>
        {
            // ctx.cmd로 compute 커맨드 기록
        });
    }
}

Important differences
In Raster, the “render target” is set to SetRenderAttachment,
In Compute, UAV/resource binding is usually centered around SetComputeTextureParam/SetComputeBufferParam.

17.2.1 Compute pass is also a “Contract”

Like Raster, Compute must follow the rules of RenderGraph.

  • Texture/buffer to read is declared as Read
  • Texture/buffer to be used is declared as Write

Based on this declaration, RenderGraph:

  • Resource lifetime (when to create/discard)
  • Barrier (synchronization)
  • Execution order

Configure .

The reason why compute often breaks in practice is because compute is used instead of “render target binding mistake”. Because “UAV declaration/binding mistakes” are more common.

17.3 UAV(RandomWrite) texture design: enableRandomWrite

To write a texture as a “write target (UAV)” in Compute, you must allow RandomWrite during the texture creation step.

RenderGraph texture desc (concept):

  • desc.enableRandomWrite = true

Additionally, textures to be used for UAV have format restrictions (support by platform). If possible, choose a format that has been verified by the URP/platform.

Practice pattern:

  • SSAO: R8, R16, RHalf (precision/bandwidth balance)
  • SSR: RGBAHalfRyu (Interim results/history)

17.4 BufferHandle / GraphicsBuffer: Input/output buffer pattern

As examples from the URP document show, compute often “prints to a buffer” and reads from the CPU, or Another pass consumes that buffer.

17.4.1 Creating an output buffer (Structured Buffer)

Concept:

  1. Create GraphicsBuffer (Structured)
  2. Carry it to the RenderGraph pass data as BufferHandle
  3. Write from pass to compute

Caution
To read from the CPU, paths such as AsyncGPUReadback must be considered, and synchronous readback will cause a stall.

17.4.2 Five design questions when using buffers

  1. Is this buffer “frame-by-frame temporary” or “camera history”?
  2. Is the number of elements fixed? (If it is variable like the SSR tile list, counter/prefix sum is required)
  3. What is needed: Structured/Raw/Append/Consume?
  4. Should the CPU read it? (If you need to read, design the read timing/cycle/delay)
  5. Should I separate buffers for XR (eye star) and multicamera?

17.5 Pattern 1: SSAO (Half-Resolution) Compute Design Step

When implementing SSAO with “RenderGraph Compute”, the minimum design is usually like this.

Enter (required)

  • Depth (Scene Depth)
  • Normals (or normal reconstruction)
  • Random/Noise (Blue Noise/Rotation Vector)
  • Camera parameters (projection/near plane, etc.)

output

  • AO texture (half resolution)
  • Upsample results (full resolution) or blur results if necessary

RenderGraph pass configuration (recommended)

  1. AO creation (Compute, half-res UAV write)
  2. AO Blur (Compute or Raster)
  3. Upsample + synthesis (usually Raster)

Practical Tips
A hybrid that uses compute to create the AO and processes blur/compositing in full screen (raster) is often used.

17.6 Pattern 2: SSR (with time/history) Compute design steps

Screen Space Reflection (SSR) has rapidly increasing resource demands due to “geometric constraints + time accumulation.”

Input (Representative)

  • Color (current frame)
  • Depth
  • Normals
  • Roughness/Metallic (material parameters)
  • Motion Vectors
  • History Color (previous frame)

Output (representative)

  • Reflection Color (current)
  • Temporal Accumulation Results (History Update)

RenderGraph Design Points

17.6.1 “Minimum pass decomposition” in SSR (practical perspective)

SSR types usually combine some of the following:

  1. Ray march (or Hierarchical Z): Search for candidate hit points (Compute)
  2. Resolve: Color sampling/fade at hit points (Compute or Raster)
  3. Temporal Accumulation: History Accumulation (Compute)
  4. Denoise/Blur: Noise removal (Compute or Raster)

Of these, (3) is why History texture/motion vector design becomes essential.

17.7 Practical Code Skeleton: URP RendererFeature + Compute Pass

This code is a skeleton showing the “structure”. The actual API signature may differ depending on the URP/RenderGraph version.
Be sure to check and adjust by compiling in a Unity 6.3 project.

C#
using UnityEngine;
using UnityEngine.Rendering;
using UnityEngine.Rendering.Universal;
using UnityEngine.Rendering.RenderGraphModule;

public sealed class ComputeAoFeature : ScriptableRendererFeature
{
    [System.Serializable]
    public sealed class Settings
    {
        public ComputeShader computeShader;
        public int kernelIndex = 0;
    }

    public Settings settings = new();

    sealed class ComputeAoPass : ScriptableRenderPass
    {
        readonly ComputeShader _cs;
        readonly int _kernel;

        class PassData
        {
            public ComputeShader cs;
            public int kernel;
            public TextureHandle depth;
            public TextureHandle normals;
            public TextureHandle aoUav;
            public Vector4 dispatch; // (gx, gy, gz, unused)
        }

        public ComputeAoPass(ComputeShader cs, int kernel)
        {
            _cs = cs;
            _kernel = kernel;
        }

        public override void RecordRenderGraph(RenderGraph renderGraph, ContextContainer frameData)
        {
            var resources = frameData.Get<UniversalResourceData>();

            // 입력(프로젝트/설정에 따라 존재 여부가 달라질 수 있음)
            TextureHandle depth = resources.activeDepthTexture;
            TextureHandle normals = resources.cameraNormalsTexture;

            // 출력 UAV 텍스처(desc는 카메라 컬러 기반으로 잡는 것을 권장)
            var aoDesc = renderGraph.GetTextureDesc(resources.activeColorTexture);
            aoDesc.name = "AO_UAV";
            aoDesc.enableRandomWrite = true;
            aoDesc.width /= 2;
            aoDesc.height /= 2;
            var ao = renderGraph.CreateTexture(aoDesc);

            using (var builder = renderGraph.AddComputePass<PassData>("Compute AO", out var passData))
            {
                passData.cs = _cs;
                passData.kernel = _kernel;
                passData.depth = depth;
                passData.normals = normals;
                passData.aoUav = ao;
                passData.dispatch = new Vector4(
                    Mathf.CeilToInt(aoDesc.width / 8.0f),
                    Mathf.CeilToInt(aoDesc.height / 8.0f),
                    1, 0);

                builder.UseTexture(passData.depth, AccessFlags.Read);
                builder.UseTexture(passData.normals, AccessFlags.Read);
                builder.UseTexture(passData.aoUav, AccessFlags.Write);

                builder.SetRenderFunc((PassData data, ComputeGraphContext ctx) =>
                {
                    var cmd = ctx.cmd;
                    cmd.SetComputeTextureParam(data.cs, data.kernel, "_CameraDepthTexture", data.depth);
                    cmd.SetComputeTextureParam(data.cs, data.kernel, "_CameraNormalsTexture", data.normals);
                    cmd.SetComputeTextureParam(data.cs, data.kernel, "_AOTexture", data.aoUav);
                    cmd.DispatchCompute(data.cs, data.kernel, (int)data.dispatch.x, (int)data.dispatch.y, (int)data.dispatch.z);
                });
            }

            // 다음 패스가 접근할 수 있도록 전역 슬롯으로 노출(선택)
            // (프로젝트 사정에 따라 필요)
            // resources.xyz = ao; 또는 cmd.SetGlobalTexture(...)
        }
    }

    ComputeAoPass _pass;

    public override void Create()
    {
        if (settings.computeShader != null)
            _pass = new ComputeAoPass(settings.computeShader, settings.kernelIndex);
    }

    public override void AddRenderPasses(ScriptableRenderer renderer, ref RenderingData renderingData)
    {
        if (_pass == null)
            return;
        renderer.EnqueuePass(_pass);
    }
}

17.7.1 HLSL (Compute) example of using UAV texture in shader

HLSL
// Compute shader snippet
RWTexture2D<float> _AOTexture;
Texture2D<float> _CameraDepthTexture;

[numthreads(8,8,1)]
void CSMain(uint3 id : SV_DispatchThreadID)
{
    float d = _CameraDepthTexture[id.xy];
    _AOTexture[id.xy] = saturate(d); // 예시: depth를 그대로 써보기
}

17.8 Performance design points (Compute)

17.8.1 Tile/thread group size

  • numthreads(8,8,1) is a common default value, but it is not correct.
  • The optimal point varies depending on whether memory access (continuity), cache, or shared memory is used.

Practical Routine:

  1. First, ensure operation/accuracy with 8x8
  2. Measure performance by changing to 16x16, etc. (by platform)
  3. Check whether it is a bandwidth bottleneck or ALU bottleneck (Profiler/GPU capture)

17.8.2 Reduce bandwidth (texture read/write)

If Compute is designed incorrectly, there will be too many “textures to read and write,” creating a bandwidth bottleneck.

  • Active use of half-res (SSAO/blur/some SSR intermediate results)
  • Reduce format (R8/R16 etc if possible)
  • Reduce unnecessary intermediate textures (unify desc for RenderGraph to reuse)

Related: 04.9 Blit 최적화

17.8 Debugging checklist (Compute)

  • Is enableRandomWrite=true written in the texture desc?
  • Did you declare write in the builder? (UseTexture(..., Write))
  • Are the Dispatch group size (numthreads) and dispatch calculation consistent?
  • Does the input texture actually exist? (Check with Requirements/RenderGraph Viewer)
  • Does the platform support compute? (SystemInfo.supportsComputeShaders)

17.9 Official documentation (recommended)