Langlib: formal-language theory in Lean 4
Langlib is a Lean 4 + Mathlib library of fully machine-checked results about grammars, automata, and language classes across the Chomsky hierarchy — regular, deterministic context-free, context-free, indexed, context-sensitive, recursive (decidable) and recursively enumerable languages. It formalizes the standard hierarchy theorems (PDA = CFG, CSG = LBA, TM = RE), closure properties (including the hard result that deterministic context-free languages are closed under complement), and decidability/computability results (including every context-sensitive language is recursive and Post’s theorem). Source and build instructions are on GitHub.
This page is a catalog of the library’s results.
Main results
- Deterministic context-free languages are closed under complement
- DPDA totalization: every DPDA has an equivalent always-halting deciding DPDA
- Every context-sensitive language is recursive — and strictly so (CS ⊊ Recursive)
- Membership in context-sensitive languages is computable
- Post’s theorem (RE ∩ co-RE = recursive)
- Tape vs. state acceptance for recursive languages
- PDA = CFG: pushdown automata recognize exactly the context-free languages
- TM = RE: Turing machines recognize exactly the recursively enumerable languages
- CSL = LBA: context-sensitive languages are exactly the linear-bounded-automaton languages
- The pumping lemma for context-free languages
- Recursive ⊊ RE: recursive is a strict subset of recursively enumerable
Grammar ⇔ automaton equivalences
- Regular grammars = DFA languages — and left-regular ⇔ right-regular grammars.
- PDA = CFG — pushdown automata recognize exactly the context-free languages; final-state vs. empty-stack acceptance agree.
- CSL = LBA — context-sensitive grammars and linear bounded automata recognize exactly the same languages (the Myhill–Kuroda theorem); plus NLBA languages ⊆ Turing-machine languages.
- TM = RE — Turing machines recognize exactly the recursively enumerable (unrestricted-grammar) languages.
- Langlib’s [
is_CF](https://nielstron.github.io/langlib/api/Langlib/Classes/ContextFree/Definition.html#is_CF) = Mathlib’sIsContextFree— plus a verified Chomsky normal form.
Hierarchy: strict inclusions
- The Chomsky hierarchy is strict: Regular ⊊ DCFL ⊊ CFL ⊊ Indexed, Regular ⊊ Linear ⊊ CFL, CF ⊆ CS ⊊ Recursive ⊊ RE.
- CS ⊊ Recursive: context-sensitive languages are a strict subset of the recursive languages — by diagonalization (no closure shortcut exists).
- {aⁿbⁿcⁿ} is indexed and context-sensitive but not context-free — the classic separating example.
Regular languages
- Regular grammars = DFA languages.
- Right-regular and left-regular grammars — both generate exactly the regular languages.
- Regular language membership is computable.
- Top and Bottom are regular:
isRegular_topandisRegular_bot. - Closure: union, intersection, complement, concatenation, Kleene star, homomorphism, substitution, inverse homomorphism, reverse, quotient — all Yes (see the closure table).
Deterministic context-free languages (DCFL)
- DCFL is closed under complement — the marquee result.
- DPDA totalization — the always-halting deciding-DPDA construction that powers it.
- DCFL ⊊ CFL:
DCF_subclass_CF. - Closure properties (full picture) — closed under complement (and ∩/∪ with a regular language), but not under union, intersection, concatenation, star, homomorphism, substitution, or quotient.
Context-free languages (CFL)
- PDA = CFG.
- Pumping lemma and Ogden’s lemma.
- Chomsky normal form & Mathlib compatibility.
- Closed under substitution — with union, concatenation, homomorphism and Kleene star derived as corollaries.
- Not closed under intersection — via the pumping lemma; non-closure under complement follows as a corollary.
- Not closed under right quotient — a concrete powers-of-two construction (but closed under right quotient with a regular language).
- Membership (CYK) and emptiness are decidable.
- Closure: union, concatenation, star, homomorphism, substitution, inverse homomorphism, reverse, intersection-with-regular, quotient-with-regular — Yes; complement, intersection, quotient — No. Also terminal bijections, permutations, prefix and suffix (see the closure table).
Linear languages
- Linear grammars and the linear languages — regular ⊊ linear ⊆ context-free;
{aⁿbⁿ}is linear. - Regular ⊊ Linear ⊊ CFL: see strict inclusions.
Indexed languages
{aⁿbⁿcⁿ}is indexed:is_Indexed_lang_eq_eq(see the separating-example page).- CFL ⊊ Indexed:
CF_strict_subclass_Indexed. - Closure: union, concatenation, homomorphism, inverse homomorphism, reverse — Yes (see the closure table).
Context-sensitive languages (CSL)
- Every context-sensitive language is recursive, strictly (CS ⊊ Recursive) — recursive via bounded derivations, and strictly so by diagonalization.
- Membership in context-sensitive languages is computable.
- Non-contracting and non-erasing context-sensitive grammars — Langlib defines CS as non-contracting; the equivalence with the non-erasing form is only partially formalized.
- CSL = LBA: context-sensitive grammars and linear bounded automata recognize the same languages (the Myhill–Kuroda theorem).
- CF ⊆ CS:
CF_subclass_CS. - Closure: ε-free homomorphism —
CS_closedUnderEpsFreeHomomorphism; reverse —CS_closedUnderReverse; terminal bijections —CS_bijemap_iff_CS.
Recursive (decidable) languages
- Post’s theorem (RE ∩ co-RE = recursive).
- Recursive languages are closed under complement.
- Tape vs. state acceptance for recursive languages.
- Recursive ⊊ RE.
- Closure: reverse —
Recursive_closedUnderReverse.
Recursively enumerable languages (RE)
- TM = RE.
- Compiling search procedures to Turing machines — the reusable semi-decision → TM bridge behind TM = RE and the RE closures.
- Closure properties (grammar & TM constructions) — closed under union, intersection, concatenation, star, homomorphism, inverse homomorphism, reverse, quotient and substitution; not complement.
{aⁿbⁿcⁿ}is RE:lang_eq_eq_is_RE.- Membership, emptiness, universality and equivalence are all undecidable:
RE_membership_undecidable,RE_emptiness_undecidable,RE_universality_undecidable,RE_equivalence_undecidable.
Closure properties
A machine-checked closure table across all seven classes is in the project
README. Abstract closure predicates
(ClosedUnderUnion, ClosedUnderHomomorphism, …) are defined as standalone
predicates. Highlights:
- DCFL closure profile — closed under complement, but not union/intersection/concatenation/star/homomorphism.
- CFL closed under substitution (and its corollaries); not closed under intersection or right quotient (nor complement).
- Recursive closed under complement; RE closed under all standard operations except complement.
Decidability and computability
Membership, emptiness, universality and equivalence are stated as uniform
class-level predicates (ComputableMembership, ComputableEmptiness, …): each bundles
that the encoding is adequate (Characterizes C — its range is exactly the class),
effective (MembershipSemiDecidable), and that the relevant decision is computable.
This makes both the positive and negative results genuine statements about the class,
not artefacts of the encoding.
- Regular membership is computable.
- Context-free membership (CYK) and emptiness are decidable.
- Context-sensitive membership is computable (and every CSL is recursive — strictly).
- Post’s theorem underpins the recursive/RE decidability results.
Formalized in Lean 4 with Mathlib, in Langlib.