SQLite Data Flow
An interactive walkthrough of how SQLite transforms a SQL string into rows on disk — from the tokenizer through the virtual machine down to the filesystem.
Complete Data Flow Pipeline
SQL Text
e.g. "SELECT * FROM users WHERE id=1"
↓ characters
tokenize.c · sqlite3RunParser():600
↓ token stream (TK_SELECT, TK_FROM…)
parse.y · sqlite3RunParser():600
↓ AST — Select, Expr, SrcList…
select.c · insert.c · where.c · expr.c
↓ VDBE opcode array (OP_OpenRead, OP_Next…)
vdbe.c · sqlite3VdbeExec():881
↓ BtCursor calls
btree.c · sqlite3BtreeCursor():4775
↓ page reads/writes
pager.c · sqlite3PagerGet():5726
↓ VFS calls
os.c · os_unix.c · sqlite3OsRead():88
↓ syscalls (read/write/fsync)
Filesystem / Disk
database.db (4096-byte pages)
Parse phase
Code generation
Execution
Storage
I/O
Public API Entry Points
Applications interact with SQLite through three primary functions that wrap the full pipeline:
| Function | File | Role |
|---|---|---|
sqlite3_exec() |
legacy.c:30 |
One-shot: prepare → step loop → finalize, with row callback |
sqlite3_prepare_v2() |
prepare.c:943 |
Compile SQL text → prepared statement (Vdbe*) |
sqlite3_step() |
vdbeapi.c:913 |
Advance VM one step; returns SQLITE_ROW or SQLITE_DONE |
legacy.c:30 — sqlite3_exec() loops prepare → step → finalize
int sqlite3_exec(
sqlite3 *db, /* The database on which the SQL executes */
const char *zSql, /* The SQL to be executed */
sqlite3_callback xCallback, /* Invoke this callback routine */
void *pArg, /* First argument to xCallback() */
char **pzErrMsg /* Write error messages here */
){
int rc = SQLITE_OK;
const char *zLeftover;
sqlite3_stmt *pStmt = 0;
sqlite3_mutex_enter(db->mutex);
while( rc==SQLITE_OK && zSql[0] ){
rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, &zLeftover);
if( rc!=SQLITE_OK ) continue;
while( 1 ){
rc = sqlite3_step(pStmt); // ← drives the VDBE
if( xCallback && SQLITE_ROW==rc ){
/* invoke user callback with row data */
}
if( rc!=SQLITE_ROW ) break;
}
sqlite3_finalize(pStmt);
zSql = zLeftover;
}
...
}
Component Overview
Stage 1
Tokenizer
tokenize.c
Scans SQL characters and emits a stream of typed tokens (keywords, identifiers, literals). Uses a hand-coded DFA for performance.
Stage 2
Parser
parse.y → parse.c
LEMON-generated LALR(1) parser. Grammar rules build AST nodes (Select, Expr, SrcList) and invoke code generator callbacks immediately.
Stage 3
Code Generator
select.c · insert.c · where.c
Walks the AST and emits VDBE bytecode opcodes. WHERE clause optimization happens here via query planner in where.c.
Stage 4
VDBE
vdbe.c · vdbeaux.c
Register-based virtual machine. Interprets the opcode array in a tight switch loop. Calls into B-tree for all data access.
Stage 5
B-Tree Engine
btree.c
Manages tables and indexes as B-trees of fixed-size pages. Provides cursor-based traversal and CRUD operations on sorted key/value pairs.
Stage 6
Pager & WAL
pager.c · wal.c
Page cache manager. Handles journaling for crash recovery, atomic commits, and the Write-Ahead Log (WAL) for concurrent readers.
Stage 7
OS / VFS Layer
os.c · os_unix.c
Virtual File System abstraction. Decouples storage from the platform. Each VFS implementation handles open/read/write/fsync/lock for one OS.
Key Data Structures
| Structure | Defined in | Purpose |
|---|---|---|
sqlite3 |
sqliteInt.h | Database connection; holds schema, pager handle, flags, mutex |
Parse |
sqliteInt.h | Parse/compile context; threads through tokenizer → parser → codegen |
Select |
sqliteInt.h | AST for a SELECT statement; linked list for compound queries |
Expr |
sqliteInt.h | Expression tree node (binary tree); carries type, operator, literal value |
Vdbe |
vdbeInt.h | Prepared statement / VM instance; holds opcode array, register bank |
BtCursor |
btreeInt.h | Cursor positioned within a B-tree; tracks page stack and cell index |
Pager |
pager.h | Page cache and journal manager; one instance per open database file |