🗺 Query Execution Flow
  HTTP GET /api/v1/query?query=rate(http_requests_total[5m])&time=...
        │
        ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                  web/api/v1/api.go                          │
  │  queryHandler → engine.NewInstantQuery(queryable, expr, t)  │
  └──────────────────────────┬──────────────────────────────────┘
                             │
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                   promql.Engine                             │  promql/engine.go:348
  │                                                             │
  │  1. parser.ParseExpr(exprStr)  →  AST                       │
  │  2. Validate AST (functions, types)                         │
  │  3. activeQueryTracker.Insert()  (concurrency limit)        │
  │  4. newEvalNodeHelper per-step                              │
  └──────────────────────────┬──────────────────────────────────┘
                             │
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                    evaluator.eval(AST)                      │
  │                                                             │
  │  Walk AST nodes recursively:                                │
  │  • VectorSelector  → select from storage                   │
  │  • MatrixSelector  → select range from storage             │
  │  • BinaryExpr      → join two vectors (label matching)     │
  │  • AggregateExpr   → sum/avg/topk/… over label sets        │
  │  • Call (function) → rate/increase/histogram_quantile/…    │
  └──────────────────────────┬──────────────────────────────────┘
                             │  (for each VectorSelector)
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                 storage.Querier.Select()                    │
  │                                                             │
  │  FanoutQuerier fans reads across:                           │
  │  • headQuerier  (in-memory Head block)                      │
  │  • blockQueriers[] (one per on-disk Block in time range)    │
  └──────────────────────────┬──────────────────────────────────┘
                             │  SeriesSet
                             ▼
  ┌────────────────────┐    ┌────────────────────────────────────┐
  │   headQuerier      │    │      blockQuerier                  │
  │   tsdb/head_read.go│    │      tsdb/querier.go               │
  │                    │    │                                    │
  │ • stripeSeries map │    │ • index.Reader (postings search)   │
  │ • memChunks        │    │ • chunks.Reader (decompress)       │
  └────────────────────┘    └────────────────────────────────────┘
                             │  []chunks.Meta → Iterator
                             ▼
                    XOR-decode samples at each step timestamp
                             │
                             ▼
                       *Result{Value, Err}
                    (Scalar | Vector | Matrix | String)
Engine — Configuration & Limits
promql/engine.go — Engine struct L348
type Engine struct {
    logger             *slog.Logger
    timeout            time.Duration     // max query duration
    maxSamplesPerQuery int               // OOM guard
    lookbackDelta      time.Duration     // default 5m

    activeQueryTracker QueryTracker      // cap on concurrent queries
    queryLogger        QueryLogger       // slow query log

    noStepSubqueryIntervalFn func(rangeMillis int64) int64
    enableAtModifier         bool        // @timestamp modifier
    enableNegativeOffset     bool        // offset -5m
    enablePerStepStats       bool        // per-step sample counts
    enableDelayedNameRemoval bool        // __name__ removal timing
    enableTypeAndUnitLabels  bool        // unit/type label propagation
    parser                   parser.Parser
}

Key Engine Options

OptionDefaultGuards against
timeout2mLong-running query denial-of-service
maxSamplesPerQuery50 000 000OOM from huge result sets
lookbackDelta5mStale series extrapolation window
maxConcurrentQueries20CPU saturation from concurrent queries
🌳 Parsing — PromQL AST

The PromQL lexer and parser live in promql/parser. The parser produces an Expr interface value representing the AST.

promql/parser/ast.go — key Expr types ast.go
// Example: rate(http_requests_total{job="api"}[5m])
//
// AST:
//   Call{
//     Func: "rate",
//     Args: [
//       MatrixSelector{
//         VectorSelector{
//           Name: "http_requests_total",
//           LabelMatchers: [{job="api"}],
//         },
//         Range: 5m,
//       }
//     ]
//   }

type VectorSelector struct {
    Name           string
    LabelMatchers  []*labels.Matcher
    Offset         time.Duration
    Timestamp      *int64           // @modifier
    StartOrEnd     ItemType
    Series         []storage.Series // populated during eval
    ...
}

type BinaryExpr struct {
    Op       ItemType          // +, -, *, /, ==, and, or, …
    LHS, RHS Expr
    VectorMatching *VectorMatching
    ReturnBool bool
}

type AggregateExpr struct {
    Op       ItemType          // sum, avg, topk, count, …
    Expr     Expr
    Grouping []string          // by/without label list
    Without  bool
    Param    Expr              // for topk(N, …) etc.
}
🔎 Storage Select — Index Lookup read

A VectorSelector is resolved by calling querier.Select(). The fanout querier merges results from all blocks that overlap the query time range.

tsdb/querier.go — blockQuerier.Select() querier.go
// Label matcher lookup in the inverted index:
//  1. For each Matcher, call indexReader.Postings(name, value)
//     → returns a sorted list of series references (posting list)
//  2. AND matchers → Intersect(postings...)
//  3. OR matchers  → Merge(postings...)
//  4. NOT matchers → subtract via AllPostings() \ match
//  5. Iterate refs → load labels from index
//  6. Return as SeriesSet (lazy; chunks loaded on iteration)

Index Structure

tsdb/index/index.go — Reader / Writer

Block index file layout:
  ┌─────────────────────────────────────────────┐
  │  Symbol table  (label names + values)       │
  │  Series table  (ref → labels + chunk metas) │
  │  Postings      (label=value → []seriesRef)  │
  │  Postings offset table                      │
  │  TOC           (offsets of above sections)  │
  └─────────────────────────────────────────────┘

Implementation: tsdb/index/index.go

📦 Chunk Reading & Decoding

Once series references are resolved, the evaluator iterates their chunks for the query time range.

tsdb/chunkenc/xor.go — XorIterator xor.go
// ChunkedSeriesIterator iterates chunks for a series:
for _, meta := range series.chunks {
    if meta.MaxTime < mint || meta.MinTime > maxt {
        continue  // skip chunks outside time range
    }
    chk, err := chunkReader.Chunk(meta)
    it := chk.Iterator(reuse)   // XOR decoder
    for it.Next() == chunkenc.ValFloat {
        ts, val := it.At()      // decode next (t, v) pair
        // feed into evaluator matrix
    }
}

Iterator Value Types

ValTypeMethodData
ValFloatAt()(int64 ts, float64 v)
ValHistogramAtHistogram()(int64 ts, *histogram.Histogram)
ValFloatHistogramAtFloatHistogram()(int64 ts, *histogram.FloatHistogram)
🧮 Expression Evaluation

The evaluator walks the AST bottom-up. At each step timestamp it evaluates each node:

promql/engine.go — evaluator.eval() dispatch engine.go
func (ev *evaluator) eval(ctx context.Context, expr parser.Expr) (parser.Value, annotations.Annotations) {
    switch e := expr.(type) {

    case *parser.AggregateExpr:
        // Evaluate inner expression, then group + aggregate.
        return ev.evalAggregation(ctx, e)

    case *parser.Call:
        // Evaluate arguments, then call registered function.
        // e.g. "rate" → functions.FuncRate
        return ev.evalCall(ctx, e)

    case *parser.BinaryExpr:
        // Evaluate LHS + RHS, then vector matching + binary op.
        return ev.evalBinaryExpr(ctx, e)

    case *parser.VectorSelector:
        // Already pre-populated with series; sample lookup by timestamp.
        return ev.evalVectorSelector(ctx, e, ...)

    case *parser.MatrixSelector:
        // Return a matrix (series → []sample window).
        return ev.evalMatrixSelector(ctx, e, ...)
    }
}

rate() / increase() Function

// rate(v[d]) = extrapolated per-second increase over window d.
// Implemented in: promql/functions.go — funcRate()
//
// Algorithm:
//  1. Take samples in [t-d, t] from the range vector.
//  2. Compute (last - first) considering counter resets.
//  3. Extrapolate to exact window boundaries.
//  4. Divide by window duration in seconds.

Function registry: promql/functions.go

📤 Result Types
TypePromQL producesAPI resultType
Vectorinstant query on metric selector or aggregation"vector"
Matrixrange query, or any expression with range selector"matrix"
Scalarnumeric literal or scalar() function"scalar"
Stringlabel("name", selector) etc."string"
promql/value.go — result structs value.go
// Vector — instant query result
type Vector []Sample      // one Sample per matching series

type Sample struct {
    Metric labels.Labels
    T      int64
    F      float64
    H      *histogram.FloatHistogram
}

// Matrix — range query result
type Matrix []Series

type Series struct {
    Metric  labels.Labels
    Floats  []FPoint        // (t, float64) pairs
    Histograms []HPoint     // (t, *FloatHistogram) pairs
}
Staleness & lookbackDelta

For an instant query at time t, a series must have a sample in (t - lookbackDelta, t] (default 5 min) to appear in the result. This prevents showing stale gauge values from targets that went away.

lookbackDelta can be overridden globally (--query.lookback-delta) or per-query via the lookback_delta query parameter. Range selectors use their explicit window instead.
The stale marker (special NaN written by the scrape loop) makes a series immediately invisible regardless of lookbackDelta — it short-circuits the window check.