4. Key execution flows
Flow A — Node startup
| Trigger | bin/elasticsearch or ./gradlew run |
|---|---|
| Entry | Elasticsearch.main |
Call chain
ServerLauncher → ServerCli → Elasticsearch.main initPhase1() → Bootstrap, Environment, LogConfigurator initPhase2() → PluginsLoader, JarHell, EntitlementBootstrap initPhase3() → new Node(env, pluginsLoader); node.start()
Key structures
ServerArgs— config path, daemonize, pid filePluginBundlelist — modules + pluginsInjector— all singleton servicesGatewayMetaState— readsmetadata/under data path
Side effects
Transport port bound; cluster join initiated; metadata recovered from disk; HTTP opened; PID file written; CLI receives SERVER_READY_MARKER.
Failure modes
- Bootstrap checks fail →
NodeValidationException→ exit before HTTP - Plugin load / entitlement failure → startup abort in phase 2
- Lucene version mismatch → assertion in
checkLucene() - No master within timeout → warning, node may stay yellow
Flow B — Index document (PUT /{index}/_doc/{id})
| Trigger | HTTP PUT with JSON body |
|---|---|
| Validation | RestIndexAction.prepareRequest parses params; IndexRequest.validate() on transport side |
Call chain
HttpServerTransport.dispatchRequest → RestController.dispatchRequest (PathTrie route match) → RestIndexAction.prepareRequest → IndexRequest → NodeClient.index → TransportIndexAction → TransportBulkAction (single-item bulk wrapper) → BulkOperation.groupRequestsByShards (uses ClusterState routing) → TransportShardBulkAction (TransportWriteAction) → IndexShard.applyIndexOperationOnPrimary → InternalEngine.index(Index) → replicate to replicas → RestToXContentListener
Key structures
IndexRequest— index, id, routing, source, version, if_seq_noShardId— (index UUID, shard number)ParsedDocument— mapped Lucene fields from JSONEngine.IndexResult— CREATED / UPDATED / CONFLICT
Side effects
Lucene document added; translog append; seq_no assigned; optional refresh; dynamic mapping may trigger async master task; auto-create index if allowed.
Failure modes
ClusterBlockException— write block on index/clusterVersionConflictEngineException— optimistic concurrencyMapperParsingException— bad document for mapping- Replica not acked — depends on
wait_for_active_shards
Flow C — Search (GET /{index}/_search)
| Trigger | HTTP GET/POST with query DSL body |
|---|---|
| Coordinator | Node receiving request (often same as HTTP node) |
Call chain
RestSearchAction.prepareRequest → SearchRequest
→ RestCancellableNodeClient.execute(TransportSearchAction)
→ TransportSearchAction.executeRequest
resolve indices (ClusterState + IndexNameExpressionResolver)
build SearchShardIterator list
→ SearchQueryThenFetchAsyncAction
→ SearchTransportService.sendExecuteQuery (per shard, parallel)
→ SearchService.executeQueryPhase
→ QueryPhase.execute(SearchContext)
→ [optional] fetch phase
→ SearchPhaseController.merge
→ RestRefCountedChunkedToXContentListener
Key structures
SearchSourceBuilder— query, aggs, sort, from/size, timeoutShardSearchRequest— per-shard slice of the searchSearchContext— IndexSearcher, collectors, aggregationsSearchResponse— merged hits, aggs, profile, shard failures
Side effects
Opens/acquires index readers; may populate request/query cache; registers cancellable SearchTask; telemetry via SearchResponseMetrics.
Failure modes
- Read blocks →
ClusterBlockExceptionbefore fan-out - Partial shard failures — controlled by
allow_partial_search_results - Timeout / cancel via HTTP channel close
- Circuit breaker on large aggs
Flow D — Cluster state update (e.g. create index)
Call chain
REST PUT /{index} → MetadataCreateIndexService
→ createIndexTaskQueue.submitTask(CreateIndexClusterStateUpdateTask)
→ MasterService.executeAndPublishBatch
executor.execute(BatchExecutionContext) → new ClusterState
patchVersions(...)
clusterStatePublisher.publish
→ Coordinator.publish → transport to all nodes
→ ClusterApplierService.applyChanges
IndicesClusterStateService.applyClusterState
→ BatchedRerouteService.reroute (often follows)
Key structures
ClusterStateTaskExecutor— batch mutatorClusterChangedEvent— diff for appliersMetadata/IndexMetadata— index settings, mappingsRoutingTable— shard assignments after reroute
Side effects
Cluster state version++; persisted to disk on master via PersistedClusterStateService; appliers create indices/shard folders; ack listeners complete REST response.
Failure modes
NotMasterException— submitter lost mastershipProcessClusterEventTimeoutException— queue wait exceeded- Publication failure → master may step down
acknowledged: false— timeout waiting for node acks
Flow E — Shard allocation & recovery
Master path (decision)
BatchedRerouteService.reroute
→ AllocationService.reroute
→ BalancedShardsAllocator.allocate
allocateUnassigned / moveShards / balance
→ new RoutingTable in ClusterState → publish
Data node path (materialization)
ClusterApplierService → IndicesClusterStateService.applyClusterState → createOrUpdateShard → indicesService.createShard → RecoverySource (empty store | peer | snapshot) → PeerRecoverySourceService / RecoveryTarget → ShardStateAction.shardStarted → master marks STARTED
Key structures
RoutingAllocation— mutable workspace for decidersAllocationDeciders— disk watermark, awareness, throttlingUnassignedInfo— reason, delayed allocation timerRecoveryState— phase tracking for recovery API
Failure modes
- No valid shard copy — shard stays UNASSIGNED (cluster RED)
- Disk threshold decider — prevents allocation
- Recovery failure — reroute with new failure count
- Shard lock contention — retry in
createShardWhenLockAvailable
Shared REST dispatch
All REST handlers converge on RestController.dispatchRequest, which matches MethodHandlers in a PathTrie, applies RestInterceptor filters (security, product origin), dispatches to RestHandler.handleRequest, and maps exceptions to HTTP status via ExceptionHandling.