β–ΆπŸ—Ί Remote Write Pipeline
  Scraper β†’ fanoutStorage.Append()
                 β”‚
                 β”œβ”€β”€β”€ Primary (local TSDB) writes WAL
                 β”‚
                 └─── remote.Storage.Appender() [no-op shim]
                           β”‚
                           β”‚  WAL is the source of truth for remote write.
                           β”‚  Samples are not re-buffered; the WAL is tailed.
                           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                  remote.WriteStorage                     β”‚  storage/remote/write.go:60
  β”‚  β€’ one QueueManager per remote_write endpoint           β”‚
  β”‚  β€’ notified when WAL has new data                       β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                  wlog.Watcher                            β”‚  tsdb/wlog/watcher.go
  β”‚  β€’ tails WAL segments sequentially                      β”‚
  β”‚  β€’ decodes records: SERIES, SAMPLES, HISTOGRAMS, …      β”‚
  β”‚  β€’ sends to QueueManager.Append()                       β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  []record.RefSample batches
                       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                  QueueManager                            β”‚  storage/remote/queue_manager.go:421
  β”‚  β€’ resolves series labels (ref β†’ labels.Labels)         β”‚
  β”‚  β€’ applies remote relabeling                            β”‚
  β”‚  β€’ shards samples across N parallel senders             β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚   dynamic sharding (numShards adapts)
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β–Ό       β–Ό          β–Ό
      shard[0] shard[1] … shard[N-1]       storage/remote/queue_manager.go
          β”‚
          β”‚  batch full OR maxShards flush timeout
          β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚      WriteClient            β”‚  storage/remote/client.go
  β”‚  β€’ serialize β†’ proto/JSON   β”‚
  β”‚  β€’ compress (snappy/zstd)   β”‚
  β”‚  β€’ HTTP POST /api/v1/write  β”‚
  β”‚  β€’ retry on 429/5xx         β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–ΆπŸ— WriteStorage β€” Manager
storage/remote/write.go β€” WriteStorage struct L60
type WriteStorage struct {
    logger *slog.Logger
    reg    prometheus.Registerer
    mtx    sync.Mutex

    watcherMetrics *wlog.WatcherMetrics
    liveReaderMetrics *wlog.LiveReaderMetrics

    // walDir is the TSDB WAL directory being tailed.
    walDir     string
    queues     []*QueueManager
    samplesIn  *ewmaRate  // EMA rate of incoming samples
    flushDeadline time.Duration
    interner   labels.Interner
    ...
}

On ApplyConfig(), WriteStorage diffs the old and new remote_write configs. Unchanged endpoints keep their WAL tail position (no re-replay). New or changed endpoints start a fresh QueueManager from the current WAL head.

β–ΆπŸ‘ WAL Watcher async

The WAL Watcher is a persistent reader that follows the TSDB WAL tail in real-time and decodes records to feed the QueueManager.

tsdb/wlog/watcher.go β€” Watcher key method watcher.go
// Watcher tails WAL segments, decoding each record type:
func (w *Watcher) readSegment(r *LiveReader, segmentNum int, tail bool) error {
    for r.Next() {
        rec := r.Record()
        switch dec.Type(rec) {
        case record.Series:
            series, err := dec.Series(rec, nil)
            w.writer.StoreSeries(series, segmentNum)  // cache ref→labels

        case record.Samples:
            samples, err := dec.Samples(rec, nil)
            w.writer.Append(samples)  // β†’ QueueManager

        case record.HistogramSamples:
            hSamples, err := dec.HistogramSamples(rec, nil)
            w.writer.AppendHistograms(hSamples)

        case record.MmapMarkers:
            // Head chunks were flushed to disk; update watcher position.
        }
    }
}

The watcher maintains a segmentNum bookmark so it can resume after crashes. When Prometheus restarts, each QueueManager's watcher replays unacknowledged WAL records.

The WAL watcher uses a live reader that can read the currently-active (incomplete) WAL segment, not just completed segments.
β–ΆπŸ“¦ QueueManager β€” Sharded Sender
storage/remote/queue_manager.go β€” QueueManager struct (key fields) L421
type QueueManager struct {
    logger        *slog.Logger
    cfg           config.QueueConfig
    externalLabels []labels.Label
    relabelConfigs []*relabel.Config

    watcher *wlog.Watcher  // WAL tail reader

    // seriesLabels maps HeadSeriesRef β†’ labels.Labels.
    // Populated when SERIES records are decoded by the watcher.
    seriesLabels   map[chunks.HeadSeriesRef]labels.Labels
    seriesMetadata map[chunks.HeadSeriesRef]*metadata.Metadata
    droppedSeries  map[chunks.HeadSeriesRef]struct{}

    shards    *shards        // current active shards
    numShards int            // dynamically adjusted
    reshardChan chan int      // signals resharding
    ...
}

Dynamic Sharding

The QueueManager monitors queue depth and send rate, then continuously adjusts numShards to saturate the remote endpoint without overloading it:

// Sharding logic (simplified):
//  desired = samplesPerSec * batchSendDeadline / samplesPerBatch
//  clamp to [minShards, maxShards]
//  if desired β‰  numShards: reshard(desired)

Shard β†’ HTTP Batch

  1. Each shard maintains a []timeSeries buffer
  2. When buffer reaches MaxSamplesPerSend (default 500) or BatchSendDeadline elapses: send
  3. Samples serialized to prompb.WriteRequest (protobuf)
  4. Compressed with Snappy (RW v1) or Zstd (RW v2)
  5. HTTP POST; on 4xx/5xx: exponential backoff and retry
  6. On 429 Too Many Requests: respect Retry-After header

Key Queue Config Parameters

ParameterDefaultEffect
capacity2500Per-shard buffer size (samples)
max_samples_per_send500Batch size before flush
batch_send_deadline5sMax time before partial batch is sent
min_shards1Minimum parallel senders
max_shards200Maximum parallel senders
min_backoff30msInitial retry delay
max_backoff5sMax retry delay
β–ΆπŸŒ WriteClient β€” HTTP Sender
storage/remote/client.go client.go
// Remote Write v1 request (legacy):
POST /api/v1/write
Content-Type: application/x-protobuf
Content-Encoding: snappy
X-Prometheus-Remote-Write-Version: 0.1.0
Body: snappy( proto.Marshal(prompb.WriteRequest{
    Timeseries: []TimeSeries{
        {Labels: [...], Samples: [{Timestamp, Value}, ...]},
        ...
    },
}))

// Remote Write v2 request (preferred since Prometheus 2.54):
POST /api/v1/write
Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request
Content-Encoding: snappy
X-Prometheus-Remote-Write-Version: 2.0.0

The client is configured per remote_write block. It supports TLS, Basic Auth, Bearer tokens, OAuth2, and custom HTTP headers.

Remote write is at-least-once. On network partition the QueueManager will buffer and retry. If the WAL is rotated before delivery succeeds, old records may be lost β€” sized by capacity Γ— numShards.
β–ΆπŸ“Š Key Remote Write Metrics
MetricWhat it measures
prometheus_remote_storage_samples_in_totalSamples received from WAL watcher
prometheus_remote_storage_samples_pendingSamples queued, waiting to send
prometheus_remote_storage_samples_dropped_totalSamples dropped (queue overflow)
prometheus_remote_storage_samples_failed_totalHTTP POST failures
prometheus_remote_storage_shardsCurrent number of shards per queue
prometheus_remote_storage_queue_highest_sent_timestamp_secondsLag vs. now indicates backpressure
prometheus_remote_storage_sent_batch_duration_secondsRound-trip latency per batch