Remote Write
How Prometheus durably forwards samples to external long-term storage systems via the remote_write protocol.
βΆ Remote Write Pipeline
Scraper β fanoutStorage.Append()
β
ββββ Primary (local TSDB) writes WAL
β
ββββ remote.Storage.Appender() [no-op shim]
β
β WAL is the source of truth for remote write.
β Samples are not re-buffered; the WAL is tailed.
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β remote.WriteStorage β storage/remote/write.go:60
β β’ one QueueManager per remote_write endpoint β
β β’ notified when WAL has new data β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β wlog.Watcher β tsdb/wlog/watcher.go
β β’ tails WAL segments sequentially β
β β’ decodes records: SERIES, SAMPLES, HISTOGRAMS, β¦ β
β β’ sends to QueueManager.Append() β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β []record.RefSample batches
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QueueManager β storage/remote/queue_manager.go:421
β β’ resolves series labels (ref β labels.Labels) β
β β’ applies remote relabeling β
β β’ shards samples across N parallel senders β
βββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β dynamic sharding (numShards adapts)
βββββββββΌβββββββββββ
βΌ βΌ βΌ
shard[0] shard[1] β¦ shard[N-1] storage/remote/queue_manager.go
β
β batch full OR maxShards flush timeout
βΌ
βββββββββββββββββββββββββββββββ
β WriteClient β storage/remote/client.go
β β’ serialize β proto/JSON β
β β’ compress (snappy/zstd) β
β β’ HTTP POST /api/v1/write β
β β’ retry on 429/5xx β
βββββββββββββββββββββββββββββββ
βΆ WriteStorage β Manager
type WriteStorage struct {
logger *slog.Logger
reg prometheus.Registerer
mtx sync.Mutex
watcherMetrics *wlog.WatcherMetrics
liveReaderMetrics *wlog.LiveReaderMetrics
// walDir is the TSDB WAL directory being tailed.
walDir string
queues []*QueueManager
samplesIn *ewmaRate // EMA rate of incoming samples
flushDeadline time.Duration
interner labels.Interner
...
}
On ApplyConfig(), WriteStorage diffs the old and new remote_write configs. Unchanged endpoints keep their WAL tail position (no re-replay). New or changed endpoints start a fresh QueueManager from the current WAL head.
βΆ WAL Watcher async
The WAL Watcher is a persistent reader that follows the TSDB WAL tail in real-time and decodes records to feed the QueueManager.
// Watcher tails WAL segments, decoding each record type:
func (w *Watcher) readSegment(r *LiveReader, segmentNum int, tail bool) error {
for r.Next() {
rec := r.Record()
switch dec.Type(rec) {
case record.Series:
series, err := dec.Series(rec, nil)
w.writer.StoreSeries(series, segmentNum) // cache refβlabels
case record.Samples:
samples, err := dec.Samples(rec, nil)
w.writer.Append(samples) // β QueueManager
case record.HistogramSamples:
hSamples, err := dec.HistogramSamples(rec, nil)
w.writer.AppendHistograms(hSamples)
case record.MmapMarkers:
// Head chunks were flushed to disk; update watcher position.
}
}
}
The watcher maintains a segmentNum bookmark so it can resume after crashes. When Prometheus restarts, each QueueManager's watcher replays unacknowledged WAL records.
βΆ QueueManager β Sharded Sender
type QueueManager struct {
logger *slog.Logger
cfg config.QueueConfig
externalLabels []labels.Label
relabelConfigs []*relabel.Config
watcher *wlog.Watcher // WAL tail reader
// seriesLabels maps HeadSeriesRef β labels.Labels.
// Populated when SERIES records are decoded by the watcher.
seriesLabels map[chunks.HeadSeriesRef]labels.Labels
seriesMetadata map[chunks.HeadSeriesRef]*metadata.Metadata
droppedSeries map[chunks.HeadSeriesRef]struct{}
shards *shards // current active shards
numShards int // dynamically adjusted
reshardChan chan int // signals resharding
...
}
Dynamic Sharding
The QueueManager monitors queue depth and send rate, then continuously adjusts numShards to saturate the remote endpoint without overloading it:
// Sharding logic (simplified): // desired = samplesPerSec * batchSendDeadline / samplesPerBatch // clamp to [minShards, maxShards] // if desired β numShards: reshard(desired)
Shard β HTTP Batch
- Each shard maintains a
[]timeSeriesbuffer - When buffer reaches
MaxSamplesPerSend(default 500) orBatchSendDeadlineelapses: send - Samples serialized to
prompb.WriteRequest(protobuf) - Compressed with Snappy (RW v1) or Zstd (RW v2)
- HTTP POST; on 4xx/5xx: exponential backoff and retry
- On
429 Too Many Requests: respectRetry-Afterheader
Key Queue Config Parameters
| Parameter | Default | Effect |
|---|---|---|
| capacity | 2500 | Per-shard buffer size (samples) |
| max_samples_per_send | 500 | Batch size before flush |
| batch_send_deadline | 5s | Max time before partial batch is sent |
| min_shards | 1 | Minimum parallel senders |
| max_shards | 200 | Maximum parallel senders |
| min_backoff | 30ms | Initial retry delay |
| max_backoff | 5s | Max retry delay |
βΆ WriteClient β HTTP Sender
// Remote Write v1 request (legacy):
POST /api/v1/write
Content-Type: application/x-protobuf
Content-Encoding: snappy
X-Prometheus-Remote-Write-Version: 0.1.0
Body: snappy( proto.Marshal(prompb.WriteRequest{
Timeseries: []TimeSeries{
{Labels: [...], Samples: [{Timestamp, Value}, ...]},
...
},
}))
// Remote Write v2 request (preferred since Prometheus 2.54):
POST /api/v1/write
Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request
Content-Encoding: snappy
X-Prometheus-Remote-Write-Version: 2.0.0
The client is configured per remote_write block. It supports TLS, Basic Auth, Bearer tokens, OAuth2, and custom HTTP headers.
capacity Γ numShards.
βΆ Key Remote Write Metrics
| Metric | What it measures |
|---|---|
| prometheus_remote_storage_samples_in_total | Samples received from WAL watcher |
| prometheus_remote_storage_samples_pending | Samples queued, waiting to send |
| prometheus_remote_storage_samples_dropped_total | Samples dropped (queue overflow) |
| prometheus_remote_storage_samples_failed_total | HTTP POST failures |
| prometheus_remote_storage_shards | Current number of shards per queue |
| prometheus_remote_storage_queue_highest_sent_timestamp_seconds | Lag vs. now indicates backpressure |
| prometheus_remote_storage_sent_batch_duration_seconds | Round-trip latency per batch |