← overview

UOps — Intermediate Representation

Stage 2 of 6 · tinygrad/uop/
UOp dataclass uop/ops.py:128

UOp is the single IR node that represents every operation in the entire pipeline — from high-level tensor ops through scheduling to compiled kernels. All Tensor computation is a DAG of UOps.

@dataclass(eq=False, slots=True)
class UOp(OpMixin, metaclass=UOpMetaClass):
  op:       Ops                  # what this node does
  dtype:    DType  = dtypes.void # output type
  src:      tuple[UOp,...] = ()  # input nodes (the DAG edges)
  arg:      Any    = None        # op-specific extra data
  tag:      Any    = None        # optional tracing metadata
  metadata: Metadata = None      # user-supplied metadata
uop/ops.py:128 — class UOp
Deduplication: UOpMetaClass (line 85) caches instances — two UOps with identical (op, dtype, src, arg) return the same object. This keeps the DAG compact and enables O(1) equality checks.

Key properties

toposort() Returns ordered dict of all ancestor UOps; respects control-flow entry ordering _shape Recursively infers shape from src nodes; may contain symbolic sint values ssimplify() Constant-fold symbolic expressions where possible sym_infer(var_vals) Substitute symbolic variables with concrete ints
uop/ops.py:174 — toposort() uop/ops.py:212 — _shape
Ops enum — all operation codes uop/__init__.py:13

The Ops enum enumerates every operation a UOp can represent. The ordering controls toposort priority.

Defines / Params
DEFINE_VAR BIND SPECIAL DEFINE_LOCAL DEFINE_REG
Program Structure
SINK LINEAR PROGRAM SOURCE BINARY PARAM FUNCTION CALL
Memory
BUFFER BUFFER_VIEW LOAD STORE INDEX COPY
Arithmetic (Unary)
CAST BITCAST EXP2 LOG2 SIN SQRT RECIPROCAL NEG TRUNC
Arithmetic (Binary)
ADD MUL SUB FDIV MAX SHL SHR XOR OR AND CMPLT CMPNE CMPEQ POW
Ternary & Reduce
WHERE MULACC REDUCE ALLREDUCE WMMA
Control Flow
RANGE END IF ENDIF BARRIER WAIT
Tensor-only (pre-schedule)
RESHAPE PERMUTE EXPAND PAD SHRINK FLIP CONTIGUOUS DEVICE
uop/__init__.py:13 — class Ops(FastEnum)
PatternMatcher — AST rewriting uop/ops.py:1299

All optimization and lowering passes are expressed as PatternMatcher rules. A pattern describes a UOp subgraph to match; if matched, a replacement is returned.

pm = PatternMatcher([
  # constant folding: ADD(CONST a, CONST b) → CONST(a+b)
  (UPat(Ops.ADD, src=(UPat(Ops.CONST, name="a"),
                      UPat(Ops.CONST, name="b"))),
   lambda a, b: UOp.const(a.dtype, a.arg + b.arg)),
  ...
])

new_uop = graph_rewrite(uop, pm)
uop/ops.py:1299 — class PatternMatcher
Bottom-up rewriting: graph_rewrite applies the pattern matcher bottom-up, repeatedly until no more rules match. All codegen, scheduling, and optimization passes are built from these rules.
Kernel-related UOp containers uop/ops.py:1037–1097

Three frozen dataclasses wrap kernel-level information and are stored in UOp arg fields.

KernelInfo Local sizes, dont_use_locals, tensor_core config. Stored in PROGRAM arg. ProgramInfo Device, name, globals/locals shape, op counts. Attached to PROGRAM after compilation. CallInfo Argument buffer list for a CALL node: which Buffers the kernel reads/writes and their roles.
uop/ops.py:1037 — KernelInfo uop/ops.py:1050 — ProgramInfo uop/ops.py:1097 — CallInfo
UOp lifecycle through the pipeline stages
# Stage 1 — Tensor graph (high-level, movement ops present)
BUFFER ─ RESHAPE ─ PERMUTE ─ EXPAND ─┐
                                      ├─ MUL ─ REDUCE ─ SINK
BUFFER ─────────────────────────────┘

# Stage 2 — After scheduling (movement ops eliminated, CALL tree)
LINEAR(
  CALL(PROGRAM(kernel_ast), buf_a, buf_b, buf_out)
  CALL(COPY, buf_src, buf_dst)
  ...
)

# Stage 3 — After codegen (lowered to indexed loops)
SINK(
  RANGE(0..N) ─┐
               ├─ STORE(INDEX(buf_out, i), LOAD(buf_a,i) * LOAD(buf_b,i))
               └─ END
)

# Stage 4 — After render (device source string)
SOURCE("kernel void k(float* a, float* b, float* c) { ... }")

# Stage 5 — After compile
BINARY(bytes_of_ptx_or_metallib)