LBA → CS: soundness of the Myhill grammar #
This file develops the soundness half of myhill_language_eq
(Automata/LinearBounded/Equivalence/ContextSensitive.lean): every terminal word derivable
from the Myhill grammar's start symbol is accepted by the automaton,
CS_derives g [start] (map terminal w) → LanguageViaEmbed M embed w.
Proof strategy #
The argument is an induction on the derivation establishing a structural invariant on
sentential forms. Writing g = myhillGrammar M embed, every form s with
CS_derives g [start] s falls into one of the phases of the construction, and each phase
carries the semantic guarantee that ties it back to an actual computation of M:
- start phase —
sis a (possibly partial) row ofcells, optionally terminated by thestartAuxnonterminal. The cells laid down so far fix a prefix of a wordworig, the first carries(some M.initial), and completing the row yieldsencode worig (initCfg worig). - simulation phase —
s = encode worig cfgfor some configurationcfgwithReaches M (initCfg worig) cfg; exactly one cell carries the head state. - pending phase —
sis a simulation row in the middle of the 3-step interior-move protocol: it contains exactly onecellPending, and resolving the protocol lands on a rowencode worig cfg'withReaches M (initCfg worig) cfg'. - cleanup phase — an accepting state has appeared (so
Maccepts frominitCfg worig);sis a mixture of leading/trailingterminals and statelesscell nones, all carrying the frozenworig.
The frozen terminal component worig of every cell/cellPending is invariant under every
rule (start lays it down; simulation, pending and accept rules all preserve it; cleanup emits
exactly it), so an all-terminal form must be worig.map terminal, i.e. w = worig. Reaching
an all-terminal form requires the accept rule to have fired, which is gated on
M.accept q = true for a reachable state q; that is precisely Accepts M (initCfg worig),
hence LanguageViaEmbed M embed w.
This is the genuine content of the soundness direction of the LBA ⊆ CS inclusion.
Rule inversion. Every rule of the Myhill grammar is one of the seventeen shapes laid
down in myhillAllRules. This is the backbone of the soundness closure step: given a single
transformation s ⇒ s', knowing the rule's exact form pins down how s' relates to s.
The disjuncts, in order, mirror the rule families of myhillAllRules:
start (single / first / middle / last), simulation (stay / right-boundary / left-boundary),
right-interior (step1-boundary / step1-interior / step2), left-interior
(step1-boundary / step1-interior / step2), pending resolution, accept, and propagation
(left / right). The interior step-1 families are split into a boundary variant (head cell at
the very end/start, no blocking neighbour) and an interior variant (a stateless neighbour is
required as context, blocking cellPending stacking).
Config-conversion bridge. The list-loaded initial configuration initCfgList equals
the canonical positional initial configuration ⟨M.initial, embed ∘ worig, head 0⟩, when the
frozen word worig reads off the (length-derived) input positions. This is the plumbing
myhill_complete used inline (its hcfg); factoring it out lets the soundness extraction step
turn an abstract Accepts (canonical …) into the list-level LanguageViaEmbed. The
length-derived domain of worig makes both sides share the tape length (w.map embed).length-1
definitionally, so this is a homogeneous equality (no transport).
The soundness invariant #
We prove soundness by induction on the derivation [start] ⇒* s, maintaining a structural
invariant SoundInv s that classifies every reachable sentential form into one of the
construction's phases. With the right-to-left start grammar there are no head-states until the
row is complete, so the start phase is purely structural (startAux followed by stateless
cells); the simulation phase is exactly an encode of a reachable configuration; the cleanup
phase directly carries the target LanguageViaEmbed fact (keyed on the frozen word), so the
final extraction needs no configuration cast.
The frozen terminal carried by a symbol: the original input letter recorded in every cell and recovered verbatim by the cleanup rules.
Equations
- MyhillConstruction.frozenSym (symbol.terminal t) = some t
- MyhillConstruction.frozenSym (symbol.nonterminal (MyhillConstruction.MyhillNT.cell lb rb q a t)) = some t
- MyhillConstruction.frozenSym (symbol.nonterminal (MyhillConstruction.MyhillNT.cellPending lb rb dir q' a t)) = some t
- MyhillConstruction.frozenSym x✝ = none
Instances For
The frozen word read off a sentential form. On a fully-cleaned form w.map terminal it is
exactly w.
Instances For
Start phase: startAux followed by the stateless none-cells laid so far (auxCells of
some nonempty input). No cell carries a head state yet.
Equations
- MyhillConstruction.SP_start embed s = ∃ (tl : List T), tl ≠ [] ∧ s = symbol.nonterminal MyhillConstruction.MyhillNT.startAux :: MyhillConstruction.auxCells embed tl
Instances For
Simulation phase: s encodes a configuration reachable from the initial one.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Dead phase: a complete row of stateless none-cells — no head, no cellPending, no
terminal. The form left when an interior-move cellPending is resolved before step2 hands off
its state. soundInv_step_stuck shows no rule fires on it, and it is never all-terminal.
Equations
- MyhillConstruction.SP_stuck s = ∀ x ∈ s, ∃ (lb : Bool) (rb : Bool) (a : Γ) (t : T), x = MyhillConstruction.cellSym lb rb none a t
Instances For
Cleanup phase: an accepting state has appeared (so M accepts the frozen word worig),
and every position is either the recovered terminal worig i or a still-stateless cell
carrying worig i. The remaining rules (propagation) only convert cells to their terminals.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pending phase: mid an interior-move protocol, three sub-cases each with one cellPending at
the old head index. P1 (s = (encode cfg).set head …, no head state); GEN
(s = (encode cur).set k …, head pinned at the neighbour k±1); CLEANUP (a cleanup row
with one position replaced by a pending whose hand-off neighbour is already a terminal).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The soundness invariant: every form reachable from [start] is in one of the phases.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The invariant holds at the start.
A symbol from w.map terminal is a terminal, never a nonterminal.
If M accepts the canonical initial configuration loaded with worig, the word
List.ofFn worig is in the embedded language. Bridges the abstract Accepts recovered when
the accept rule fires (simulation → cleanup) back to LanguageViaEmbed.
A non-contracting step does not shorten the form (outputs are non-empty).
A derivation does not shorten the form; reachable forms are non-empty.
Extraction. A fully-terminal form satisfying the invariant must be in the cleanup phase, which directly yields the language membership.
Setting at index u.length + 1 of u ++ p :: x :: v replaces the second cons head x,
keeping the first (p). Used to express a step2 rewrite (the pending stays, its neighbour
becomes the new head cell) at the absolute index head + 1.
Every symbol of an encoded configuration is a cell nonterminal.
In an encode of a configuration, the only cell carrying a head state (some q) sits at
the head position. Hence any decomposition exposing such a cell pins its index to the head.
From a head-cell decomposition of encode, recover every parameter of the cell.
Closure, start phase. From a start-phase form, only the middle-cell and head-cell rules
apply: the former stays in the start phase, the latter completes the row to the (reachable)
initial configuration, entering the simulation phase. All other rules are vacuous because their
input/context symbols (a start symbol, a head-state cell, a cellPending, or a terminal) do
not occur in a start-phase form.
Closure, simulation phase. A rule applied to encode worig cfg must act on the unique
head-state cell (all others are stateless none cells, and no start/startAux/cellPending/
terminal symbols occur). The head-stationary moves give the next encode (simulation), the
interior step-1 moves introduce a cellPending (pending phase), and the accept rule turns the
head cell into a terminal — at which point M accepts, entering the cleanup phase.
Closure, cleanup phase. From a cleanup row only the propagation rules apply (all
symbols are terminals or stateless cells, so no start/startAux/cellPending/head-state
cell occurs). Each propagation turns a stateless cell into its frozen terminal, keeping the
cleanup row (and the accepted word) intact.
No rule fires on a dead (all-stateless) row. If every symbol of b is a stateless
cell … none …, then no grammar rule applies: every rule needs a head-state cell, a
cellPending, a terminal, or a start/startAux symbol as its input or context, none of which
occur. This is the closure of the "stuck" phase reached when an interior-move cellPending is
resolved before step2 hands off its state. A building block for the SP_pending closure.
Closure, pending phase. From a SP_pending form (P1 / GEN / CLEANUP), one rule step keeps
the soundness invariant. In P1 only resolution (→ SP_stuck) and the matching step2
(→ GEN) fire; in GEN the pinned head can stay/clamp (→ GEN), accept (→ CLEANUP), or have the
pending resolved (→ SP_sim); in CLEANUP only propagation (→ CLEANUP) and resolution
(→ SP_cleanup) fire. The producers (soundInv_step_sim's step1 arms) build P1; the rest of
the protocol is closed here.
Soundness. Every terminal word derived by the Myhill grammar is accepted by the LBA.
The converse of myhill_complete; together they give myhill_language_eq. The proof is an
induction on the derivation establishing SoundInv, then soundInv_extract.