Elasticsearch — Runtime & Errors

8. Error handling and edge cases

Exception hierarchy

Most failures extend ElasticsearchException, carrying HTTP status via status(). REST layer maps exceptions in RestController / ElasticsearchException to JSON error bodies with type, reason, stack_trace (if enabled).

ElasticsearchStatusException — explicit REST status
ClusterBlockException — index/cluster blocked (read/write/metadata)
IndexNotFoundException, DocumentMissingException
VersionConflictEngineException — OCC failure at engine
CircuitBreakingException — JVM heap / request breaker tripped

ActionListener pattern

Async throughout: transport actions take ActionListener<Response>. Composed via ActionListener#delegateFailure, SubscribableListener. Failures propagate to listener without blocking threads.

Validation layers

REST param parsing (RestRequest)
ActionRequest#validate() → ActionRequestValidationException
Cluster blocks check (master actions, search, bulk)
Shard-level engine exceptions (per bulk item)

Retries

Clients expected to retry 429/503 with backoff (not automatic in server)
Internal: peer recovery retries, snapshot retries, repository blob retries
Master task resubmission on NotMasterException in some paths

Circuit breakers

CircuitBreakerService tracks in-flight bytes (fielddata, request, accounting). RestController checks in-flight HTTP breaker before dispatch. Search aggregations use BreakerService on big structures.

9. Concurrency and lifecycle

Thread pools

Pool	Typical work
`write`	Indexing, bulk shard ops
`search`	Query/fetch phases
`search_coordination`	Search coordinator merge
`management`	Cluster state tasks (master)
`generic`	Misc async work
`snapshot`	Snapshot threads
`refresh`	Scheduled refresh

Rule: TransportService threads must not execute blocking Lucene IO; shard operations use Engine with internal locking / AsyncIOProcessor for translog fsync.

Locking & consistency

Primary shard — single writer; seq_no ordering
Engine — KeyedLock per id for concurrent index/delete same doc
Cluster state — single-threaded master service; immutable published states
IndexShard — ReentrantLock for engine lifecycle
Translog — fsync serialisation via AsyncIOProcessor

Cancellation

TaskManager registers tasks with parent links. Search uses CancellableTask; HTTP channel close triggers cancel. SearchTaskWatchdog kills long searches.

Shutdown

Node.close() → reverse start order
  HTTP stop → cluster leave → indices close → thread pool shutdown
  Plugins LifecycleComponent.stop() → NodeEnvironment.close()

Shard flush on close attempts to persist translog + Lucene commit.

Security-sensitive paths

x-pack Security plugin — ActionFilter for authz before actions
EntitlementBootstrap — limits plugin native/file/network access (JDK 24+)
SecureSettings — keystore secrets never logged
Scripting sandbox — Painless whitelist; other langs restricted

10. Important code walkthroughs

Walkthrough 1 — TransportAction.execute

Purpose: Universal entry for all named actions (index, search, admin).

execute(task, request, listener) calls handleExecution.
Registers task with TaskManager (for cancellation, headers).
Runs each ActionFilter in order (security, logging).
Dispatches doExecute on the action's Executor thread pool.
Listener receives response or exception; resources released via Releasables.

Why it matters: Adding a new API requires a new TransportAction subclass + registration in ActionModule + optional RestHandler. Forgetting registration → "No handler for action".

Break risk: Running doExecute on wrong executor can block transport threads and stall the cluster.

TransportAction.execute

Walkthrough 2 — InternalEngine.index

Purpose: Apply a single index operation on the primary with Lucene + translog semantics.

indexingStrategyForOperation decides: index, update, duplicate, skip (soft delete).
On primary origin: assign seqNo via generateSeqNoForOperationOnPrimary.
Add to translog for durability.
Update Lucene via IndexWriter (add/update document).
Return IndexResult with version, seqNo, failure.

Edge cases: Version conflicts return failure without indexing; soft deletes use SoftDeletesRetentionMergePolicy; append-only mode skips update path.

Break risk: Incorrect seq_no assignment corrupts replica consistency; translog/Lucene ordering bugs lose durability.

InternalEngine.index

Walkthrough 3 — MasterService.executeAndPublishBatch

Purpose: Atomically run queued cluster-state tasks and publish one new state.

Drain task batch from priority queue.
Call executor's execute(BatchExecutionContext) — tasks mutate builder starting from previous state.
patchVersions bumps cluster state version and term metadata.
If changed, call publishClusterStateUpdate → Coordinator.publish.
On success, run task listeners; on failure, onBatchFailure.

Why batched: Amortizes publication cost; related tasks (e.g. multiple mapping updates) merge into one state version.

Break risk: Non-deterministic executor → divergent states across master re-elections; must be pure function of inputs + previous state.

MasterService.executeAndPublishBatch

Walkthrough 4 — RestController.dispatchRequest

Purpose: Route HTTP request to handler with filters and error handling.

Resolve route from PathTrie by method + path.
Apply RestInterceptor chain (security, product checks).
Parse RestApiVersion from headers/params.
Call handler.prepareRequest → returns RestChannelConsumer.
Consumer runs with NodeClient, writes to RestChannel.
Uncaught exceptions → JSON error response with appropriate status.

Break risk: Consuming request body twice; missing ref-count release on chunked responses.

RestController.dispatchRequest

11. Non-obvious insights

Hidden coupling & conventions

Everything is an action. Even single-doc index goes through bulk machinery (TransportIndexAction → TransportBulkAction).
Cluster state is the source of truth for routing; data nodes cache it via applier thread — stale cache window is bounded by publication latency.
Guice injector is built once; services are singletons. Testing uses NodeConstruction with test overrides.
NamedWriteable registry must list every custom type serialized over transport — forgetting breaks wire compatibility.
TransportVersion vs Version — wire protocol evolves separately from release version.
Project metadata — ES 9.x multi-project work adds ProjectMetadata layer inside ClusterState (@FixForMultiProject annotations mark migration).
System indices — hard-coded descriptors; Kibana/ML depend on exact names and mappings.
RandomizedTesting — tests run with random seeds, time zones, Lucene codecs; failures need -Dtests.seed to reproduce.
Entitlements — new plugin sandbox; policy patches loaded from es.entitlements.policy.* resources in phase 2 bootstrap.
Performance hot paths — avoid object allocation in bulk indexing; Lucene merge scheduling affects write latency; search uses IndexSearcher thread pool per shard.

12. Architecture diagrams

Request flow (index)

Client │ HTTP PUT ▼ Netty HTTP ──► RestController ──► RestIndexAction │ │ │ ▼ │ TransportBulkAction │ │ │ ┌─────────────┴─────────────┐ │ ▼ ▼ │ [coordinating node] [primary shard node] │ route by ClusterState TransportShardBulkAction │ IndexShard → InternalEngine │ replicate → replica nodes ▼ JSON response ◄── RestToXContentListener

Cluster state lifecycle

[Master node] MasterService task queue │ execute executor ▼ new ClusterState (version N+1) │ Coordinator.publish ▼ transport broadcast ──────────────────────┐ │ │ ▼ ▼ [Data node A] [Data node B] ClusterApplierService.applyChanges │ ▼ IndicesClusterStateService (create shard / update routing / delete index)

← Data & Interfaces · Tests & Learning Path →