LEMON Parser Generator
SQLite uses LEMON, its own LALR(1) parser generator (similar to yacc/bison but with a cleaner interface and thread-safe output). The grammar is written in parse.y. The LEMON tool compiles this into parse.c (a large generated C file) which is then compiled into SQLite.
LEMON generates a table-driven shift-reduce parser. Grammar rule actions in parse.y are C code fragments that fire when a rule is completely matched. These actions construct AST nodes and — critically — also call code-generation functions directly.
sqlite3Select()) incrementally as rules reduce. The "AST" is short-lived.
Grammar Structure (parse.y)
The grammar file has roughly 2,100 lines and covers all of SQL that SQLite supports. Rules are written in the Lemon format: result ::= components . { action }
/* Top-level rule */
input ::= cmdlist.
cmdlist ::= cmdlist ecmd.
cmdlist ::= ecmd.
ecmd ::= SEMI.
ecmd ::= cmdx SEMI.
/* All statement types */
cmdx ::= cmd.
cmd ::= BEGIN transtype trans_opt. { sqlite3BeginTransaction(pParse, ...); }
cmd ::= COMMIT|END trans_opt. { sqlite3CommitTransaction(pParse); }
cmd ::= ROLLBACK trans_opt. { sqlite3RollbackTransaction(pParse); }
cmd ::= select(X). {
SelectDest dest = {SRT_Output, 0, 0, 0, 0, 0, 0};
sqlite3Select(pParse, X, &dest); /* ← code generator called here */
sqlite3SelectDelete(pParse->db, X);
}
cmd ::= createkw TABLE ... /* DDL */
cmd ::= insert_stmt(X). { sqlite3Insert(pParse, ...); }
cmd ::= update_stmt(X). { sqlite3Update(pParse, ...); }
cmd ::= delete_stmt(X). { sqlite3DeleteFrom(pParse, ...); }
select(A) ::= WITH wqlist(W) selectnowith(X). {
A = attachWithToSelect(pParse, X, W);
}
select(A) ::= selectnowith(A).
selectnowith(A) ::= selectnowith(X) multiselect_op(Y) oneselect(Z). {
/* Build linked list for UNION / INTERSECT / EXCEPT */
A = sqlite3SelectNew(pParse, 0, 0, 0, 0, 0, 0, Y, 0);
A->pPrior = X;
A->pNext = Z;
}
oneselect(A) ::= SELECT distinct(D) selcollist(W) from(X)
where_opt(Y) groupby_opt(P) having_opt(Q)
orderby_opt(Z) limit_opt(L). {
A = sqlite3SelectNew(pParse, W, X, Y, P, Q, Z, D, L);
}
expr(A) ::= expr(X) AND expr(Y). { A = sqlite3ExprAnd(pParse,X,Y); }
expr(A) ::= expr(X) OR expr(Y). { BINARY(TK_OR, X, Y); }
expr(A) ::= expr(X) EQ expr(Y). { BINARY(TK_EQ, X, Y); }
expr(A) ::= expr(X) NE expr(Y). { BINARY(TK_NE, X, Y); }
expr(A) ::= expr(X) LT expr(Y). { BINARY(TK_LT, X, Y); }
expr(A) ::= nm(X) DOT nm(Y). { A = sqlite3PExpr(pParse, TK_DOT, X, Y); }
expr(A) ::= nm(X) LP exprlist(Y) RP. { /* function call */
A = sqlite3ExprFunction(pParse, Y, &X, 0);
}
AST Node Types
Grammar actions build lightweight structures that live on the heap and are freed promptly after code generation. The key ones:
| Structure | Created by | Purpose |
|---|---|---|
Select |
sqlite3SelectNew() | Represents one SELECT statement; linked list for compound queries (UNION etc.) |
Expr |
sqlite3PExpr() | Binary tree for expressions; each node carries op code, optional left/right children, and literal value |
ExprList |
sqlite3ExprListAppend() |
Dynamic array of Expr* for SELECT column list, ORDER BY, etc. |
SrcList |
sqlite3SrcListAppend() |
FROM clause: list of tables/subqueries with aliases |
IdList |
sqlite3IdListAppend() |
List of bare identifiers (e.g. column names in INSERT) |
Token |
tokenizer | Lightweight: just pointer + length into the original SQL string (no copy) |
struct Expr {
u8 op; /* TK_AND, TK_EQ, TK_INTEGER, TK_COLUMN … */
u8 affExpr; /* column affinity */
u32 flags; /* EP_* flags */
union {
char *zToken; /* TK_STRING, TK_ID, TK_VARIABLE */
int iValue; /* TK_INTEGER when EP_IntValue is set */
} u;
Expr *pLeft; /* left subtree */
Expr *pRight; /* right subtree */
union {
ExprList *pList; /* function arguments / IN operator list */
Select *pSelect; /* subquery */
} x;
int iTable; /* TK_COLUMN: cursor number of source table */
int iColumn; /* TK_COLUMN: column index */
...
};
Name Resolution (resolve.c)
After the AST is built, a name-resolution pass in resolve.c walks every Expr node and converts symbolic column references (like users.id) into table-cursor numbers and column indices. This is where "column not found" errors are raised.
The key function is sqlite3ResolveExprNames() which is called from within the parser actions for WHERE, ORDER BY, GROUP BY, and HAVING clauses.
Compilation Context: the Parse struct
The Parse struct threads through the entire tokenize → parse → codegen pipeline. It carries:
struct Parse {
sqlite3 *db; /* database connection */
Vdbe *pVdbe; /* the VDBE being built */
int nErr; /* error count */
char *zErrMsg; /* error message */
Table *pNewTable; /* table being CREATE'd */
int nTab; /* next available cursor number */
int nMem; /* next available register number */
int nLabel; /* next available label number */
...
};
When the parser fires a grammar action and calls, say, sqlite3Select(pParse, ...), the code generator uses pParse->pVdbe to emit opcodes and pParse->nMem to allocate registers.
Next Stage
Grammar actions directly invoke the code generator. sqlite3Select(), sqlite3Insert(), etc. walk the AST structures and emit VDBE opcodes.