Elasticsearch — Execution Flows

End-to-end traces with call chains

4. Key execution flows

Flow A — Node startup
Triggerbin/elasticsearch or ./gradlew run
EntryElasticsearch.main

Call chain

ServerLauncher → ServerCli → Elasticsearch.main
  initPhase1() → Bootstrap, Environment, LogConfigurator
  initPhase2() → PluginsLoader, JarHell, EntitlementBootstrap
  initPhase3() → new Node(env, pluginsLoader); node.start()

Key structures

  • ServerArgs — config path, daemonize, pid file
  • PluginBundle list — modules + plugins
  • Injector — all singleton services
  • GatewayMetaState — reads metadata/ under data path

Side effects

Transport port bound; cluster join initiated; metadata recovered from disk; HTTP opened; PID file written; CLI receives SERVER_READY_MARKER.

Failure modes

  • Bootstrap checks fail → NodeValidationException → exit before HTTP
  • Plugin load / entitlement failure → startup abort in phase 2
  • Lucene version mismatch → assertion in checkLucene()
  • No master within timeout → warning, node may stay yellow
Flow B — Index document (PUT /{index}/_doc/{id})
TriggerHTTP PUT with JSON body
ValidationRestIndexAction.prepareRequest parses params; IndexRequest.validate() on transport side

Call chain

HttpServerTransport.dispatchRequest
  → RestController.dispatchRequest (PathTrie route match)
  → RestIndexAction.prepareRequest → IndexRequest
  → NodeClient.index → TransportIndexAction
  → TransportBulkAction (single-item bulk wrapper)
  → BulkOperation.groupRequestsByShards (uses ClusterState routing)
  → TransportShardBulkAction (TransportWriteAction)
  → IndexShard.applyIndexOperationOnPrimary
  → InternalEngine.index(Index)
  → replicate to replicas → RestToXContentListener

Key structures

  • IndexRequest — index, id, routing, source, version, if_seq_no
  • ShardId — (index UUID, shard number)
  • ParsedDocument — mapped Lucene fields from JSON
  • Engine.IndexResult — CREATED / UPDATED / CONFLICT

Side effects

Lucene document added; translog append; seq_no assigned; optional refresh; dynamic mapping may trigger async master task; auto-create index if allowed.

Failure modes

  • ClusterBlockException — write block on index/cluster
  • VersionConflictEngineException — optimistic concurrency
  • MapperParsingException — bad document for mapping
  • Replica not acked — depends on wait_for_active_shards
Flow C — Search (GET /{index}/_search)
TriggerHTTP GET/POST with query DSL body
CoordinatorNode receiving request (often same as HTTP node)

Call chain

RestSearchAction.prepareRequest → SearchRequest
  → RestCancellableNodeClient.execute(TransportSearchAction)
  → TransportSearchAction.executeRequest
      resolve indices (ClusterState + IndexNameExpressionResolver)
      build SearchShardIterator list
  → SearchQueryThenFetchAsyncAction
  → SearchTransportService.sendExecuteQuery (per shard, parallel)
  → SearchService.executeQueryPhase
  → QueryPhase.execute(SearchContext)
  → [optional] fetch phase
  → SearchPhaseController.merge
  → RestRefCountedChunkedToXContentListener

Key structures

  • SearchSourceBuilder — query, aggs, sort, from/size, timeout
  • ShardSearchRequest — per-shard slice of the search
  • SearchContext — IndexSearcher, collectors, aggregations
  • SearchResponse — merged hits, aggs, profile, shard failures

Side effects

Opens/acquires index readers; may populate request/query cache; registers cancellable SearchTask; telemetry via SearchResponseMetrics.

Failure modes

  • Read blocks → ClusterBlockException before fan-out
  • Partial shard failures — controlled by allow_partial_search_results
  • Timeout / cancel via HTTP channel close
  • Circuit breaker on large aggs
Flow D — Cluster state update (e.g. create index)

Call chain

REST PUT /{index} → MetadataCreateIndexService
  → createIndexTaskQueue.submitTask(CreateIndexClusterStateUpdateTask)
  → MasterService.executeAndPublishBatch
      executor.execute(BatchExecutionContext) → new ClusterState
      patchVersions(...)
      clusterStatePublisher.publish
  → Coordinator.publish → transport to all nodes
  → ClusterApplierService.applyChanges
      IndicesClusterStateService.applyClusterState
  → BatchedRerouteService.reroute (often follows)

Key structures

  • ClusterStateTaskExecutor — batch mutator
  • ClusterChangedEvent — diff for appliers
  • Metadata / IndexMetadata — index settings, mappings
  • RoutingTable — shard assignments after reroute

Side effects

Cluster state version++; persisted to disk on master via PersistedClusterStateService; appliers create indices/shard folders; ack listeners complete REST response.

Failure modes

  • NotMasterException — submitter lost mastership
  • ProcessClusterEventTimeoutException — queue wait exceeded
  • Publication failure → master may step down
  • acknowledged: false — timeout waiting for node acks
Flow E — Shard allocation & recovery

Master path (decision)

BatchedRerouteService.reroute
  → AllocationService.reroute
  → BalancedShardsAllocator.allocate
      allocateUnassigned / moveShards / balance
  → new RoutingTable in ClusterState → publish

Data node path (materialization)

ClusterApplierService → IndicesClusterStateService.applyClusterState
  → createOrUpdateShard → indicesService.createShard
  → RecoverySource (empty store | peer | snapshot)
  → PeerRecoverySourceService / RecoveryTarget
  → ShardStateAction.shardStarted → master marks STARTED

Key structures

  • RoutingAllocation — mutable workspace for deciders
  • AllocationDeciders — disk watermark, awareness, throttling
  • UnassignedInfo — reason, delayed allocation timer
  • RecoveryState — phase tracking for recovery API

Failure modes

  • No valid shard copy — shard stays UNASSIGNED (cluster RED)
  • Disk threshold decider — prevents allocation
  • Recovery failure — reroute with new failure count
  • Shard lock contention — retry in createShardWhenLockAvailable

Shared REST dispatch

All REST handlers converge on RestController.dispatchRequest, which matches MethodHandlers in a PathTrie, applies RestInterceptor filters (security, product origin), dispatches to RestHandler.handleRequest, and maps exceptions to HTTP status via ExceptionHandling.

RestController.dispatchRequest

Architecture · Core Modules →