Context-Sensitive Languages = LBA Languages (CS = LBA) #
This file states the Myhill–Landweber–Kuroda theorem in its set-level form,
for the canonical endmarker LBA (is_LBA/LBA, Automata/LinearBounded/Definition.lean).
It is assembled in two layers:
- The genuinely space-bounded core,
is_LBA_pos_iff : is_LBA_pos L ↔ is_CS L ∧ [] ∉ L: the marker-free|w|-cell model (Automata/LinearBounded/Positive.lean) recognizes exactly the ε-free context-sensitive languages. This ismyhill_language_eq(Myhill's bisimulation,→,Equivalence/LBAToCSG.lean) together with the Kuroda construction restricted to the non-contracting core (←,Equivalence/CSGToLBA.lean). - The empty word, carried by the endmarker model on the genuine two-cell tape
⊢⊣— so the machine itself decidesε, no external flag.CS_subset_LBAruns the Kuroda LBA on⊢ w ⊣viasimMachine(Equivalence/EndmarkerTape.lean);LBA_subset_CSfolds an endmarker LBA onto|w|cells viaflagMachine(Equivalence/EndmarkerToFlag.lean) and adjoinsεwhen accepted.
This makes the two classes equal (the endmarker's ⊢⊣ cell is exactly what upgrades "ε-free CSL"
to all of CS), rather than merely "equal up to ε".
References #
- Kuroda, S.-Y. (1964), "Classes of languages and linear-bounded automata".
- Myhill, J. (1960), "Linear bounded automata".
- Hopcroft, Motwani, Ullman, Introduction to Automata Theory, Languages, and Computation, Chapter 9.
LBA ⊆ CS (Myhill) #
Myhill's construction (core). The context-sensitive grammar built from an LBA in
LBAToCSG.lean generates exactly the non-empty language recognised by the automaton.
This is the genuine content of the LBA → CS direction: the bisimulation between
cell-sentential-forms and LBA configurations (the grammar and its rule membership lemmas are
established in LBAToCSG.lean):
- completeness (
⊇): an accepting LBA run onwis mirrored by a derivationstart ⇒* map terminal w— the start rules lay down onecellper input symbol, the simulation rules track each LBA step (interior moves use the 3-stepcellPendingprotocol), and the cleanup rules recover the terminals once an accepting state appears; - soundness (
⊆): conversely every terminal string derivable fromstartarises this way, by an invariant stating that every reachable sentential form ofcells decodes to a configuration reachable byMfrom the initial configuration onw.
Every bounded-tape LBA language is context-sensitive (Myhill's direction).
A bounded-tape LBA never runs on the empty input, so its language is ε-free.
ε-free CS ⊆ LBA_pos (Kuroda) #
Given a context-sensitive L with [] ∉ L, restrict to the non-contracting core
(Grammars/NonContracting) with finitely many nonterminals
(Grammars/ContextSensitive/Basic/FiniteNT.lean), whose language is L \ {[]} = L. On input w
of length n the Kuroda LBA carries the frozen input and a working sentential form over the finite
grammar-symbol alphabet, nondeterministically runs the grammar backwards, and accepts once the form
reduces to [S]. Non-contraction bounds every form of a derivation of w by n, so the |w|-cell
tape suffices.
The characterization #
The bounded-tape LBA languages are exactly the ε-free context-sensitive languages.
CS = LBA (the canonical endmarker model) #
Promoting the ε-free core to the full theorem: the empty word is carried by the endmarker model,
which runs on the genuine two-cell tape ⊢⊣. The two simulations live in
Equivalence/EndmarkerTape.lean (simMachine/language_simMachine_eq) and
Equivalence/EndmarkerToFlag.lean (flagMachine/language_flagMachine_eq).
CS ⊆ LBA. Every context-sensitive language is recognized by an endmarker LBA: the Kuroda
LBA for the ε-free core is run on ⊢ w ⊣ via simMachine, and ε is decided on ⊢⊣ by the bit
decide (ε ∈ L).
LBA ⊆ CS. Every endmarker-LBA language is context-sensitive: its non-empty part is the
bounded-tape language LanguageViaEmbed (flagMachine M') (context-sensitive via Myhill), and ε
(if accepted on ⊢⊣) is adjoined by is_CS_insert_empty_of_CS_grammar.