Context-Sensitive Languages = LBA Languages (`CS = LBA`) #

This file states the Myhill–Landweber–Kuroda theorem in its set-level form,

CS = LBA (CS_eq_LBA),

for the canonical endmarker LBA (is_LBA/LBA, Automata/LinearBounded/Definition.lean).

It is assembled in two layers:

The genuinely space-bounded core, is_LBA_pos_iff : is_LBA_pos L ↔ is_CS L ∧ [] ∉ L: the marker-free |w|-cell model (Automata/LinearBounded/Positive.lean) recognizes exactly the ε-free context-sensitive languages. This is myhill_language_eq (Myhill's bisimulation, →, Equivalence/LBAToCSG.lean) together with the Kuroda construction restricted to the non-contracting core (←, Equivalence/CSGToLBA.lean).
The empty word, carried by the endmarker model on the genuine two-cell tape ⊢⊣ — so the machine itself decides ε, no external flag. CS_subset_LBA runs the Kuroda LBA on ⊢ w ⊣ via simMachine (Equivalence/EndmarkerTape.lean); LBA_subset_CS folds an endmarker LBA onto |w| cells via flagMachine (Equivalence/EndmarkerToFlag.lean) and adjoins ε when accepted.

This makes the two classes equal (the endmarker's ⊢⊣ cell is exactly what upgrades "ε-free CSL" to all of CS), rather than merely "equal up to ε".

References #

Kuroda, S.-Y. (1964), "Classes of languages and linear-bounded automata".
Myhill, J. (1960), "Linear bounded automata".
Hopcroft, Motwani, Ullman, Introduction to Automata Theory, Languages, and Computation, Chapter 9.

LBA ⊆ CS (Myhill) #

source

theorem myhill_language_eq {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] [DecidableEq T] [DecidableEq Γ] [DecidableEq Λ] (M : LBA.Machine Γ Λ) (embed : T ↪ Γ) :

CS_language (MyhillConstruction.myhillGrammar M embed) = LBA.LanguageViaEmbed M ⇑embed

Myhill's construction (core). The context-sensitive grammar built from an LBA in LBAToCSG.lean generates exactly the non-empty language recognised by the automaton.

This is the genuine content of the LBA → CS direction: the bisimulation between cell-sentential-forms and LBA configurations (the grammar and its rule membership lemmas are established in LBAToCSG.lean):

completeness (⊇): an accepting LBA run on w is mirrored by a derivation start ⇒* map terminal w — the start rules lay down one cell per input symbol, the simulation rules track each LBA step (interior moves use the 3-step cellPending protocol), and the cleanup rules recover the terminals once an accepting state appears;
soundness (⊆): conversely every terminal string derivable from start arises this way, by an invariant stating that every reachable sentential form of cells decodes to a configuration reachable by M from the initial configuration on w.

`LBA_pos` ⊆ ε-free CS (Myhill) #

source

theorem is_LBA_pos_imp_isCS {T : Type} [Fintype T] [DecidableEq T] {L : Language T} (h : is_LBA_pos L) :

is_CS L

Every bounded-tape LBA language is context-sensitive (Myhill's direction).

source

theorem not_nil_mem_of_is_LBA_pos {T : Type} [Fintype T] [DecidableEq T] {L : Language T} (h : is_LBA_pos L) :

[] ∉ L

A bounded-tape LBA never runs on the empty input, so its language is ε-free.

ε-free CS ⊆ `LBA_pos` (Kuroda) #

Given a context-sensitive L with [] ∉ L, restrict to the non-contracting core (Grammars/NonContracting) with finitely many nonterminals (Grammars/ContextSensitive/Basic/FiniteNT.lean), whose language is L \ {[]} = L. On input w of length n the Kuroda LBA carries the frozen input and a working sentential form over the finite grammar-symbol alphabet, nondeterministically runs the grammar backwards, and accepts once the form reduces to [S]. Non-contraction bounds every form of a derivation of w by n, so the |w|-cell tape suffices.

source

theorem is_LBA_pos_of_isCS_not_nil {T : Type} [Fintype T] [DecidableEq T] {L : Language T} (hCS : is_CS L) (hne : [] ∉ L) :

is_LBA_pos L

The characterization #

source

theorem is_LBA_pos_iff {T : Type} [Fintype T] [DecidableEq T] {L : Language T} :

is_LBA_pos L ↔ is_CS L ∧ [] ∉ L

The bounded-tape LBA languages are exactly the ε-free context-sensitive languages.

CS = LBA (the canonical endmarker model) #

Promoting the ε-free core to the full theorem: the empty word is carried by the endmarker model, which runs on the genuine two-cell tape ⊢⊣. The two simulations live in Equivalence/EndmarkerTape.lean (simMachine/language_simMachine_eq) and Equivalence/EndmarkerToFlag.lean (flagMachine/language_flagMachine_eq).

source

theorem CS_subset_LBA {T : Type} [Fintype T] [DecidableEq T] :

CS ⊆ LBA

CS ⊆ LBA. Every context-sensitive language is recognized by an endmarker LBA: the Kuroda LBA for the ε-free core is run on ⊢ w ⊣ via simMachine, and ε is decided on ⊢⊣ by the bit decide (ε ∈ L).

source

theorem LBA_subset_CS {T : Type} [Fintype T] [DecidableEq T] :

LBA ⊆ CS

LBA ⊆ CS. Every endmarker-LBA language is context-sensitive: its non-empty part is the bounded-tape language LanguageViaEmbed (flagMachine M') (context-sensitive via Myhill), and ε (if accepted on ⊢⊣) is adjoined by is_CS_insert_empty_of_CS_grammar.

The theorem #

source

theorem CS_eq_LBA {T : Type} [Fintype T] [DecidableEq T] :

CS = LBA

Context-sensitive languages are exactly the LBA languages: CS = LBA. The headline form of the Myhill–Landweber–Kuroda theorem, for the canonical endmarker LBA model (is_LBA/LBA, Automata/LinearBounded/Definition.lean).

Langlib

Langlib.Automata.LinearBounded.Equivalence.ContextSensitive

Context-Sensitive Languages = LBA Languages (`CS = LBA`) #

References #

LBA ⊆ CS (Myhill) #

`LBA_pos` ⊆ ε-free CS (Myhill) #

ε-free CS ⊆ `LBA_pos` (Kuroda) #

The characterization #

CS = LBA (the canonical endmarker model) #

The theorem #

Context-Sensitive Languages = LBA Languages (CS = LBA) #

References #

LBA ⊆ CS (Myhill) #

LBA_pos ⊆ ε-free CS (Myhill) #

ε-free CS ⊆ LBA_pos (Kuroda) #

The characterization #

CS = LBA (the canonical endmarker model) #

The theorem #

Context-Sensitive Languages = LBA Languages (`CS = LBA`) #

`LBA_pos` ⊆ ε-free CS (Myhill) #

ε-free CS ⊆ `LBA_pos` (Kuroda) #