The Chomsky hierarchy is strict
Statement
Each language class in the Chomsky hierarchy strictly contains the one below it. Langlib already formalizes the following strict inclusions:
- Regular ⊊ Deterministic context-free ⊊ Context-free ⊊ Context Sensitive ⊊ Recursive ⊊ Recursively enumerable
- Context-free ⊊ Indexed
- Regular ⊊ Linear ⊊ CFL
In Lean
- Regular ⊊ DCFL:
RG_strict_subclass_DCF. - Regular ⊊ Linear:
RG_strict_subclass_Linear; Linear ⊊ CF:Linear_strict_subclass_CF, separated by{0ⁿ1ⁿ2ᵐ3ᵐ}via the linear pumping lemma. - DCFL ⊊ CFL:
is_CF_of_is_DCF/DCF_subclass_CF(strictness viaDPDA_strict_subclass_PDA). - CFL ⊊ Indexed:
CF_strict_subclass_IndexedandCF_subclass_Indexed_and_exists_strict. - CF ⊆ CS:
CF_subclass_CS. - CS ⊊ Recursive:
CS_strict_subclass_Recursive— by diagonalization; see the dedicated page. - Recursive ⊊ RE: see the dedicated page.
Proof idea
Each strict inclusion combines an inclusion (every grammar/automaton of the lower class is one of the upper class) with strictness witnessed in one of two ways — a separating language in the upper class but provably not the lower, or a closure mismatch where the two classes differ on a closure operation.
- Regular ⊊ DCFL (
RG_strict_subclass_DCF) and Regular ⊊ Linear (RG_strict_subclass_Linear): the separating language is{aⁿbⁿ}(anbn), which is deterministic context-free (anbn_is_DCF) and linear but not regular (anbn_not_isRegular, via the regular pumping lemma); transported to a nontrivial alphabet by an injective letter map. - DCFL ⊊ CFL (
DCF_strict_subclass_CF): a closure mismatch, not a witness language. OverFin 3the DCF languages are closed under complement (DCF_closedUnderComplement) but the CF languages are not (CF_notClosedUnderComplement);strict_subset_of_subset_different_propertyturns this differing closure property into proper containment.DPDA_strict_subclass_PDAtransfers this to the automaton classes. - Linear ⊊ CFL (
Linear_strict_subclass_CF): the separating language is{0ⁿ1ⁿ2ᵐ3ᵐ}overFin 4(L4), context-free (L4_is_CF, a concatenation of two{aⁿbⁿ}blocks) but not linear (L4_not_is_Linear, via the linear pumping lemma). - CFL ⊊ Indexed (
CF_strict_subclass_Indexed): the separating language is{aⁿbⁿcⁿ}, indexed (an indexed grammar with a stack-bottom marker forcing each nonterminal to consume exactly as many flags as were pushed) but not context-free. - CS ⊊ Recursive (
CS_strict_subclass_Recursive): by diagonalization over an effective enumeration of context-sensitive grammars; see the dedicated page. - Recursive ⊊ RE: a closure mismatch — recursive languages are closed under complement, RE languages are not; see the dedicated page.
Keywords / also known as
Chomsky hierarchy strict, regular proper subset context-free, DCFL proper subset CFL, context-free proper subset indexed, recursive proper subset recursively enumerable, language class separations, proper containment Chomsky hierarchy.
Formalized in Lean 4 with Mathlib, in Langlib.