Prometheus Data Flow — Alerting & Rules

▶🗺 Alerting Pipeline

  rule files (*.yml / *.yaml)
        │
        ▼
  ┌───────────────────────────────────────────────┐
  │           rules.Manager                       │  rules/manager.go:99
  │  • ParseFiles() → []Group                     │
  │  • one goroutine per Group                    │
  │  • ticker fires every group.interval          │
  └──────────────────────┬────────────────────────┘
                         │ per Group
                         ▼
  ┌───────────────────────────────────────────────┐
  │           rules.Group.Eval()                  │  rules/group.go:504
  │  • evaluates all rules in the group           │
  │  • respects query_offset                      │
  └───────────┬───────────────────────┬───────────┘
              │                       │
              ▼                       ▼
  ┌───────────────────┐    ┌────────────────────────────┐
  │  AlertingRule     │    │  RecordingRule             │
  │  rules/alerting.go│    │  rules/recording.go        │
  │                   │    │                            │
  │ 1. exec PromQL    │    │ 1. exec PromQL             │
  │ 2. each result    │    │ 2. rename metric label     │
  │    sample →       │    │ 3. Append() to storage     │
  │    alert instance │    └────────────────────────────┘
  │ 3. state machine  │
  │    inactive→      │
  │    pending→       │
  │    firing         │
  │ 4. write ALERTS{} │
  │    to storage     │
  └───────────┬───────┘
              │  firing alerts
              ▼
  ┌───────────────────────────────────────────────┐
  │           notifier.Manager                    │  notifier/manager.go:53
  │  • SendAlerts() called after each Group eval  │
  │  • batches alerts (default 256 per request)   │
  │  • discovers Alertmanager endpoints via SD    │
  │  • HTTP POST /api/v1/alerts                   │
  └───────────────────────────────────────────────┘
              │
              ▼
  Alertmanager  (dedup, group, route, silence, inhibit)

▶📋 rules.Manager — Lifecycle

rules/manager.go — Manager struct L99

type Manager struct {
    opts   *ManagerOptions
    groups map[string]*Group  // key: "filepath;groupname"
    mtx    sync.RWMutex
    block  chan struct{}   // blocks until rule files are loaded
    done   chan struct{}
    ...
}

// ManagerOptions wires dependencies:
type ManagerOptions struct {
    Appendable storage.Appendable  // write recording rules + ALERTS
    Queryable  storage.Queryable   // read for rule PromQL
    QueryFunc  QueryFunc           // calls promql.Engine
    NotifyFunc NotifyFunc          // deliver alerts to notifier
    Context    context.Context
    ExternalURL *url.URL
    Logger     *slog.Logger
    ...
}

On config reload, Manager.Update() diffs old and new groups. Groups with the same name and same rules are kept running (their evaluation offset is preserved). New or changed groups are started fresh.

▶⏱ Group.Eval() — Per-Interval Evaluation

rules/group.go — Group struct L45

type Group struct {
    name          string
    file          string
    interval      time.Duration      // eval_interval
    queryOffset   *time.Duration     // query_offset
    rules         []Rule
    seriesInPreviousEval []map[string]labels.Labels
    staleSeries   []labels.Labels
    opts          *ManagerOptions
    mtx           sync.Mutex
    evaluationTime time.Duration     // last eval duration
    ...
}

rules/group.go — Group.Eval() L504

func (g *Group) Eval(ctx context.Context, ts time.Time) {
    for i, rule := range g.rules {
        // Execute PromQL query via QueryFunc.
        vector, err := rule.Eval(ctx, g.QueryOffset(), ts,
                                 g.opts.QueryFunc, g.opts.ExternalURL, g.Limit())

        // For alerting rules: update alert state machine.
        // For recording rules: append resulting vector to storage.

        // Track which series were produced; stale-mark any that vanished.
    }

    // After all rules: call NotifyFunc with firing alerts.
    g.opts.NotifyFunc(ctx, g.file, firingAlerts...)
}

Evaluation Timestamp

The evaluation timestamp ts is the wall-clock time of the tick, optionally shifted back by query_offset. This allows rules to query "slightly in the past" to avoid reading data that hasn't been fully scraped yet.

▶🔔 AlertingRule — State Machine

rules/alerting.go — AlertingRule struct L116

type AlertingRule struct {
    name        string
    vector      parser.Expr     // the PromQL expression
    holdDuration time.Duration  // "for:" clause
    labels      labels.Labels   // extra labels to attach
    annotations labels.Labels   // human-readable annotations
    externalURL *url.URL
    active      map[uint64]*Alert  // fingerprint → alert state
    mtx         sync.Mutex
    ...
}

type Alert struct {
    State       AlertState   // StateInactive / StatePending / StateFiring
    Labels      labels.Labels
    Annotations labels.Labels
    Value       float64      // the sample value that triggered
    ActiveAt    time.Time    // when alert first became Pending
    FiredAt     time.Time    // when alert transitioned to Firing
    ResolvedAt  time.Time    // when alert resolved
    LastSentAt  time.Time    // last time sent to Alertmanager
    ValidUntil  time.Time    // expiry if not re-evaluated
    ...
}

Alert State Machine

  ┌─────────────┐      PromQL returns sample     ┌─────────────┐
  │  Inactive   │  ─────────────────────────────▶ │   Pending   │
  │  (no data   │                                 │  (waiting   │
  │   or false) │  ◀─────────────────────────── │  for: dur)  │
  └─────────────┘     PromQL returns no sample    └──────┬──────┘
                                                         │ holdDuration elapsed
                                                         ▼
                                                  ┌─────────────┐
                                                  │   Firing    │
                                                  │ (sent to    │
                                                  │  Alertmgr)  │
                                                  └─────────────┘
                                                         │ PromQL returns no sample
                                                         ▼
                                                  ┌─────────────┐
                                                  │  Inactive   │
                                                  │ (resolved   │
                                                  │  sent once) │
                                                  └─────────────┘

Synthetic Metrics Written to Storage

// ALERTS metric (one per active alert instance):
ALERTS{
    alertname="MyAlert",
    alertstate="firing",    // or "pending"
    <rule labels>,
    <alert labels>,
} = 1

// ALERTS_FOR_STATE metric (tracks "for:" start time):
ALERTS_FOR_STATE{
    alertname="MyAlert",
    <rule labels>,
} = <unix timestamp when alert became pending>

These are written to fanoutStorage exactly like scraped metrics, making alert state queryable via PromQL.

▶🖊 RecordingRule — Precomputed Aggregates

rules/recording.go recording.go

// Example rule:
// - record: job:http_requests:rate5m
//   expr: sum by (job) (rate(http_requests_total[5m]))

func (rule *RecordingRule) Eval(ctx context.Context, ...) (promql.Vector, error) {
    vector, err := queryFunc(ctx, rule.vector.String(), t)
    // Rename __name__ label to rule.name on each sample.
    for i := range vector {
        vector[i].Metric = labelsutil.ReplaceOrAddLabel(
            vector[i].Metric, labels.MetricName, rule.name)
    }
    return vector, err
}

// The caller (Group.Eval) appends the vector to storage:
app := opts.Appendable.Appender(ctx)
for _, s := range vector {
    app.Append(0, s.Metric, s.T, s.F)
}
app.Commit()

Recording rules are the primary way to pre-aggregate expensive queries. The stored result is a regular metric, queryable at sub-interval precision isn't possible — it has the same granularity as the evaluation interval.

▶📣 Notifier — Alertmanager Delivery

notifier/manager.go — Manager struct L53

type Manager struct {
    opts  *Options
    queue []*Alert          // buffered alerts pending delivery
    more  chan struct{}      // signal: new alerts in queue
    mtx   sync.RWMutex

    // alertmanagers discovered via SD (separate discovery manager).
    alertmanagers map[string]*alertmanagerSet
    ...
}

// Delivery loop (simplified):
func (n *Manager) run() {
    for {
        select {
        case <-n.more:
            n.sendAll(n.nextBatch())
        }
    }
}

// sendAll sends the batch to all discovered Alertmanager endpoints.
// Each endpoint is tried independently; failure for one does not
// block delivery to others.

Alert Enrichment

Before sending, each alert is enriched:

External labels from --external-labels added (do not override alert labels)
Alertmanager URL added to GeneratorURL field
Annotations templated using Go text/template with $value, $labels available
Batched into chunks of maxBatchSize (default 256)
POST to https://<alertmanager>/api/v2/alerts

Alertmanager Discovery

The notifier uses its own discovery.Manager instance to watch the alerting.alertmanagers config. Alertmanager endpoints can be discovered via any SD mechanism, just like scrape targets.

The alert queue has a maximum size (QueueCapacity, default 10 000). If Alertmanager is unreachable for too long and the queue fills, older alerts are dropped. Alert counts are tracked in prometheus_notifications_dropped_total.

▶📄 Rule File Format

groups:
  - name: example                   # group name (must be unique per file)
    interval: 1m                    # override global evaluation_interval
    query_offset: 30s               # evaluate 30s in the past
    limit: 10                       # max alerts per rule (0 = unlimited)
    rules:

      # Alerting rule
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.05
        for: 5m                     # must be true for 5m before firing
        keep_firing_for: 2m         # stay firing at least 2m after resolving
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "High error rate on {{ $labels.job }}"
          description: "Error rate is {{ $value | humanizePercentage }}"

      # Recording rule
      - record: job:http_request_duration:p99
        expr: histogram_quantile(0.99, sum by (job, le) (
                rate(http_request_duration_seconds_bucket[5m])
              ))