← overview

Tensor & Operations

Stage 1 of 6 · tinygrad/tensor.py · tinygrad/mixin/
Tensor class tensor.py:80

The Tensor class is the user-facing entry point. Every operation builds a lazy computation graph (UOp DAG) — nothing executes until .realize() is called.

Tensor inherits from OpMixin which pulls in ElementwiseMixin, MovementMixin, and ReduceMixin.

tensor.py:80 — class Tensor(OpMixin)

Key attributes

uop UOp — the computation graph node for this tensor; the root of the DAG device str — target device string, e.g. "CUDA", "CPU", "METAL" dtype DType — numeric type (dtypes.float32, dtypes.int32, etc.) shape tuple[sint,...] — dimension sizes; may contain symbolic ints grad Tensor|None — gradient tensor after backward pass

Constructor inputs

scalar/list Python number or nested list → converted to UOp via CONST/BUFFER numpy array ndarray → allocated Buffer, data copied in via copyin() UOp Wrap an existing UOp node directly (used internally) None Allocates an uninitialized BUFFER on the given device
tensor.py:94 — Tensor.__init__
Operation mixins mixin/

Operations are split into three mixin classes. Each method eventually calls _apply_uop() to build a new UOp node.

E
ElementwiseMixin
Pointwise ops: add, sub, mul, div, neg, exp2, log2, sin, sqrt, cast, where, …
ADDMULEXP2LOG2WHERECAST
mixin/elementwise.py:10
M
MovementMixin
Shape/layout ops: reshape, permute, expand, pad, shrink, flip, transpose, slice, …
RESHAPEPERMUTEEXPANDPADSHRINKFLIP
mixin/movement.py
R
ReduceMixin
Reduction ops: sum, max, min, mean, prod, std, argmax, argmin, …
REDUCEADDMAX
mixin/reduce.py
_apply_uop — lazy graph construction tensor.py:147

Every op calls _apply_uop which wraps the UOp creation and returns a new Tensor object containing the new DAG node. No computation happens yet.

user: a = Tensor([1,2,3])   # BUFFER uop
user: b = Tensor([4,5,6])   # BUFFER uop
user: c = a + b             # creates ADD uop: src=(a.uop, b.uop)
# c.uop = UOp(Ops.ADD, dtypes.int32, src=(a.uop, b.uop))
# nothing executed yet — just a graph node
tensor.py:147 — _apply_uop
The DAG is immutable and deduplicated: if two operations produce identical UOps (same op, dtype, src, arg), the metaclass returns the same cached node.
realize() & linear_with_vars() tensor.py:228–241

These are the two exits from the lazy graph — they trigger the full pipeline from scheduling through execution.

1
realize(*lst)
Materializes this tensor (and optionally others). Calls linear_with_vars() then run_linear(). Returns self so it chains.
tensor.py:241 — realize()
2
linear_with_vars(*lst)
Builds a SINK UOp over all tensor outputs, then calls create_linear_with_vars() in the scheduler. Returns (linear_uop, var_vals) — the compiled execution plan without running it.
tensor.py:228 — linear_with_vars()
Example: a.matmul(b) walkthrough
a = Tensor.randn(4, 8)    # BUFFER(shape=(4,8))
b = Tensor.randn(8, 16)   # BUFFER(shape=(8,16))
c = a.matmul(b)

# Internally, matmul does roughly:
#   a2 = a.reshape(4,8,1)         → RESHAPE uop
#   b2 = b.permute(1,0).reshape(1,8,16) → PERMUTE + RESHAPE uop
#   mul = (a2 * b2)               → MUL uop (broadcast EXPAND implied)
#   c = mul.sum(axis=1)           → REDUCE(ADD) uop over axis=1
#
# No FLOPS yet — just a DAG:
#
#   BUFFER(a) ─┐
#               ├─ RESHAPE ─┐
#                           ├─ MUL ─ REDUCE(ADD) ── c.uop
#   BUFFER(b) ─ PERMUTE ─ RESHAPE ─┘

c.realize()   # ← triggers scheduling → codegen → execution