Context-free languages are closed under substitution

Statement

A substitution replaces each terminal symbol a by an entire language f a: a word a₁…aₙ is mapped to the set of all concatenations w₁…wₙ with wᵢ ∈ f aᵢ, and this is extended to a language L.subst f. The theorem is that if L is context-free and every f a is context-free, then L.subst f is context-free.

Substitution is the master closure operation for context-free languages: union, concatenation, homomorphism, and Kleene star all fall out of it as one-line corollaries (below), which is why Langlib proves it as a standalone development.

In Lean

The core result:

Corollaries (each reduces to substitution)

(Direct, substitution-free constructions for union and concatenation are also given as “bonus” grammars, bonus_CF_of_CF_u_CF and bonus_CF_of_CF_c_CF.)

Proof idea

Given a context-free grammar g for L and grammars F a for each f a, the construction ContextFreeGrammar.subst g f builds a single grammar whose nonterminals are the disjoint union g.NT ⊕ (Σ a, (F a).NT). Every g-rule is kept but each terminal a on a right-hand side is redirected to the start symbol of F a; all rules of every F a are lifted in (tagged with a). A derivation in the combined grammar first runs the g-phase down to a string of F-start-symbols and then runs each F-phase independently — the two phases commute because the F-rules never touch g-nonterminals (derives_commute_of_not_mem_output). Matching these two phases against the definition of L.subst f gives the language equality.

Keywords / also known as

context-free closed under substitution, CFL substitution theorem, context-free closure properties, union concatenation homomorphism Kleene star context-free, substitution master closure operation, Lean formal proof.

Formalized in Lean 4 with Mathlib, in Langlib.