PromQL Query Engine
How a PromQL expression is parsed, planned, and evaluated against the TSDB to produce a result vector or matrix.
▶ Query Execution Flow
HTTP GET /api/v1/query?query=rate(http_requests_total[5m])&time=...
│
▼
┌─────────────────────────────────────────────────────────────┐
│ web/api/v1/api.go │
│ queryHandler → engine.NewInstantQuery(queryable, expr, t) │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ promql.Engine │ promql/engine.go:348
│ │
│ 1. parser.ParseExpr(exprStr) → AST │
│ 2. Validate AST (functions, types) │
│ 3. activeQueryTracker.Insert() (concurrency limit) │
│ 4. newEvalNodeHelper per-step │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ evaluator.eval(AST) │
│ │
│ Walk AST nodes recursively: │
│ • VectorSelector → select from storage │
│ • MatrixSelector → select range from storage │
│ • BinaryExpr → join two vectors (label matching) │
│ • AggregateExpr → sum/avg/topk/… over label sets │
│ • Call (function) → rate/increase/histogram_quantile/… │
└──────────────────────────┬──────────────────────────────────┘
│ (for each VectorSelector)
▼
┌─────────────────────────────────────────────────────────────┐
│ storage.Querier.Select() │
│ │
│ FanoutQuerier fans reads across: │
│ • headQuerier (in-memory Head block) │
│ • blockQueriers[] (one per on-disk Block in time range) │
└──────────────────────────┬──────────────────────────────────┘
│ SeriesSet
▼
┌────────────────────┐ ┌────────────────────────────────────┐
│ headQuerier │ │ blockQuerier │
│ tsdb/head_read.go│ │ tsdb/querier.go │
│ │ │ │
│ • stripeSeries map │ │ • index.Reader (postings search) │
│ • memChunks │ │ • chunks.Reader (decompress) │
└────────────────────┘ └────────────────────────────────────┘
│ []chunks.Meta → Iterator
▼
XOR-decode samples at each step timestamp
│
▼
*Result{Value, Err}
(Scalar | Vector | Matrix | String)
▶ Engine — Configuration & Limits
type Engine struct {
logger *slog.Logger
timeout time.Duration // max query duration
maxSamplesPerQuery int // OOM guard
lookbackDelta time.Duration // default 5m
activeQueryTracker QueryTracker // cap on concurrent queries
queryLogger QueryLogger // slow query log
noStepSubqueryIntervalFn func(rangeMillis int64) int64
enableAtModifier bool // @timestamp modifier
enableNegativeOffset bool // offset -5m
enablePerStepStats bool // per-step sample counts
enableDelayedNameRemoval bool // __name__ removal timing
enableTypeAndUnitLabels bool // unit/type label propagation
parser parser.Parser
}
Key Engine Options
| Option | Default | Guards against |
|---|---|---|
| timeout | 2m | Long-running query denial-of-service |
| maxSamplesPerQuery | 50 000 000 | OOM from huge result sets |
| lookbackDelta | 5m | Stale series extrapolation window |
| maxConcurrentQueries | 20 | CPU saturation from concurrent queries |
▶ Parsing — PromQL AST
The PromQL lexer and parser live in promql/parser. The parser produces an Expr interface value representing the AST.
// Example: rate(http_requests_total{job="api"}[5m])
//
// AST:
// Call{
// Func: "rate",
// Args: [
// MatrixSelector{
// VectorSelector{
// Name: "http_requests_total",
// LabelMatchers: [{job="api"}],
// },
// Range: 5m,
// }
// ]
// }
type VectorSelector struct {
Name string
LabelMatchers []*labels.Matcher
Offset time.Duration
Timestamp *int64 // @modifier
StartOrEnd ItemType
Series []storage.Series // populated during eval
...
}
type BinaryExpr struct {
Op ItemType // +, -, *, /, ==, and, or, …
LHS, RHS Expr
VectorMatching *VectorMatching
ReturnBool bool
}
type AggregateExpr struct {
Op ItemType // sum, avg, topk, count, …
Expr Expr
Grouping []string // by/without label list
Without bool
Param Expr // for topk(N, …) etc.
}
▶ Storage Select — Index Lookup read
A VectorSelector is resolved by calling querier.Select(). The fanout querier merges results from all blocks that overlap the query time range.
// Label matcher lookup in the inverted index: // 1. For each Matcher, call indexReader.Postings(name, value) // → returns a sorted list of series references (posting list) // 2. AND matchers → Intersect(postings...) // 3. OR matchers → Merge(postings...) // 4. NOT matchers → subtract via AllPostings() \ match // 5. Iterate refs → load labels from index // 6. Return as SeriesSet (lazy; chunks loaded on iteration)
Index Structure
tsdb/index/index.go — Reader / Writer Block index file layout: ┌─────────────────────────────────────────────┐ │ Symbol table (label names + values) │ │ Series table (ref → labels + chunk metas) │ │ Postings (label=value → []seriesRef) │ │ Postings offset table │ │ TOC (offsets of above sections) │ └─────────────────────────────────────────────┘
Implementation: tsdb/index/index.go
▶ Chunk Reading & Decoding
Once series references are resolved, the evaluator iterates their chunks for the query time range.
// ChunkedSeriesIterator iterates chunks for a series:
for _, meta := range series.chunks {
if meta.MaxTime < mint || meta.MinTime > maxt {
continue // skip chunks outside time range
}
chk, err := chunkReader.Chunk(meta)
it := chk.Iterator(reuse) // XOR decoder
for it.Next() == chunkenc.ValFloat {
ts, val := it.At() // decode next (t, v) pair
// feed into evaluator matrix
}
}
Iterator Value Types
| ValType | Method | Data |
|---|---|---|
| ValFloat | At() | (int64 ts, float64 v) |
| ValHistogram | AtHistogram() | (int64 ts, *histogram.Histogram) |
| ValFloatHistogram | AtFloatHistogram() | (int64 ts, *histogram.FloatHistogram) |
▶ Expression Evaluation
The evaluator walks the AST bottom-up. At each step timestamp it evaluates each node:
func (ev *evaluator) eval(ctx context.Context, expr parser.Expr) (parser.Value, annotations.Annotations) {
switch e := expr.(type) {
case *parser.AggregateExpr:
// Evaluate inner expression, then group + aggregate.
return ev.evalAggregation(ctx, e)
case *parser.Call:
// Evaluate arguments, then call registered function.
// e.g. "rate" → functions.FuncRate
return ev.evalCall(ctx, e)
case *parser.BinaryExpr:
// Evaluate LHS + RHS, then vector matching + binary op.
return ev.evalBinaryExpr(ctx, e)
case *parser.VectorSelector:
// Already pre-populated with series; sample lookup by timestamp.
return ev.evalVectorSelector(ctx, e, ...)
case *parser.MatrixSelector:
// Return a matrix (series → []sample window).
return ev.evalMatrixSelector(ctx, e, ...)
}
}
rate() / increase() Function
// rate(v[d]) = extrapolated per-second increase over window d. // Implemented in: promql/functions.go — funcRate() // // Algorithm: // 1. Take samples in [t-d, t] from the range vector. // 2. Compute (last - first) considering counter resets. // 3. Extrapolate to exact window boundaries. // 4. Divide by window duration in seconds.
Function registry: promql/functions.go
▶ Result Types
| Type | PromQL produces | API resultType |
|---|---|---|
| Vector | instant query on metric selector or aggregation | "vector" |
| Matrix | range query, or any expression with range selector | "matrix" |
| Scalar | numeric literal or scalar() function | "scalar" |
| String | label("name", selector) etc. | "string" |
// Vector — instant query result
type Vector []Sample // one Sample per matching series
type Sample struct {
Metric labels.Labels
T int64
F float64
H *histogram.FloatHistogram
}
// Matrix — range query result
type Matrix []Series
type Series struct {
Metric labels.Labels
Floats []FPoint // (t, float64) pairs
Histograms []HPoint // (t, *FloatHistogram) pairs
}
▶ Staleness & lookbackDelta
For an instant query at time t, a series must have a sample in (t - lookbackDelta, t] (default 5 min) to appear in the result. This prevents showing stale gauge values from targets that went away.
lookbackDelta can be overridden globally (--query.lookback-delta) or per-query via the lookback_delta query parameter. Range selectors use their explicit window instead.
lookbackDelta — it short-circuits the window check.