All posts
Implementation10 min read

Building a WebGL2 Renderer for an NLE

Textured quads, zIndex sorting, context-loss recovery, and the 2D-canvas-to-texture pipeline for text. A walkthrough of the GpuRenderer architecture.

The GpuRenderer has exactly one job: turn a Scene into a sorted list of textured-quad draw calls, using GPU memory that is pooled and async work that never blocks the render path. Everything under the gpu/ folder collaborates around that single idea. This is a walkthrough of how the pieces fit.

Everything is a textured quad

There is no per-element shader zoo. Video frames, static images, and text all become the same primitive: a textured quad drawn with one shared quad shader. Three layers feed it — VideoLayer pulls frames from the decode pipeline, ImageLayer loads static bitmaps, and TextLayer rasterizes glyphs to a 2D canvas and uploads that as a texture. Once a layer has a texture, the draw path is identical for all three.

Placement math — object-fit contain, per-clip transforms, text layout — lives in pure helpers (drawRect.ts, objectFit.ts, textLayout.ts) that the export path reuses verbatim. The renderer does not invent geometry; it consumes it.

RenderGraph: diff, acquire, release, sort

On each tick, the RenderGraph diffs the active clips in the new Scene against the previous one. Clips that left are released (their textures returned to the pool); clips that entered are acquired (a fresh texture allocated). Then it builds one global draw list and sorts it by zIndex ascending, so the last element drawn lands on top.

ts
// zIndex comes from the resolver, derived from track order:
// zIndex = (maxOrder - track.order) * 1000
// track.order 0 (topmost in UI) -> highest zIndex -> front-most on screen.
// The * 1000 reserves room for sub-layer offsets (e.g. text +100 later).
drawList.sort((a, b) => a.zIndex - b.zIndex)

The render tick is strictly synchronous

Called once per RAF tick, render(scene) runs top to bottom with no awaits. If scene === lastScene or the context is lost, it is a no-op. Otherwise it clears, asks the RenderGraph to execute, and for each draw entry the layer pulls its current frame and uploads it:

ts
// VideoLayer.draw, per clip, inside the synchronous tick
provider.setPlayhead(sourceFrame)        // fire-and-forget; drives decode out-of-band
const frame = provider.getCurrent(sourceFrame)  // synchronous cache read
if (frame) {
  videoTexture.upload(gl, frame)         // borrow; never closes the frame
} else {
  // cache miss: keep the last texture content -> no flicker
}

Two invariants this enforces: render() never awaits, and render() never throws on a missed frame — on a cache miss it simply draws the last upload, so a decoder that is still warming up produces a held frame rather than a black flash.

Decode happens out-of-band, push-based

The frame provider is the boundary between the synchronous render thread and the asynchronous decode work. The contract is push-based: VideoLayer calls setPlayhead(N) before getCurrent(N) on every tick, and the provider drives decoding internally. The render path never schedules individual frame requests.

A contiguous advance (|delta| <= 1) feeds only the new tail to the decoder, which stays warm — there is no per-frame flush(). A discontinuity (a seek, |delta| > 1, or the first call) triggers a reset() to the nearest keyframe and re-feeds the window. The decoder is reset only when the playhead actually jumps.

Frame ownership, one rule

The FrameCache is the single owner and only closer of every cached frame. On the real decode path the cache holds ImageBitmap copies; the decoded VideoFrame is closed in onFrame the instant the copy exists. VideoTexture.upload borrows and never closes. Violate this and you either leak GPU memory or freeze playback.

The GL-free line and context-loss recovery

A WebGL context can be lost at any moment — driver reset, laptop sleep, a tab backgrounded too long. When it happens, every GPU object is instantly dead: textures, shaders, the pool. Plain JavaScript memory is untouched. So the system draws a hard line: everything above it (VideoTexture, ShaderProgram, TexturePool) is rebuildable; everything below it (StreamingFrameProducer, VideoDecoderManager, demuxer, FrameCache) holds no GL and keeps running.

text
GPU ZONE  — wiped on context loss
  VideoTexture · ShaderProgram · TexturePool      -> rebuilt on restore
=================== GL-free line ===================
SAFE MEMORY ZONE — survives a GPU reset
  StreamingFrameProducer · VideoDecoderManager
  Demuxer · FrameCache (ImageBitmap)              -> keeps running

On webglcontextlost the renderer nulls its GL handles, releases all active items, and sets lastScene = null. On webglcontextrestored it re-acquires the context and re-runs GL state init. The next render() treats every clip as entering, rebuilds the shader program and VAO, and re-uploads from the surviving frame cache — no re-decode, no stutter. This is the second reason the copy-and-close fix matters: an ImageBitmap is just as re-uploadable to a brand-new context as the original VideoFrame.

Construction and disposal order is load-bearing

Dispose runs in reverse of construction for a reason: tearing down the texture pool before the render graph would leak acquired textures. So mount builds context, then pool, then layers, then graph; dispose releases the graph first (returning textures to the pool free-list), then deletes every pooled texture, then loses the context. Order is not stylistic here — it is correctness.

A good renderer is mostly bookkeeping: who owns this texture, is this frame still borrowed, did the context just die. Get the ownership rules right and the pixels take care of themselves.

Building something with browser-native video?

Try the SDK, read the docs, or join the conversation.