Langlib

Langlib.Automata.LinearBounded.Equivalence.LBAToCSG.Soundness

LBA → CS: soundness of the Myhill grammar #

This file develops the soundness half of myhill_language_eq (Automata/LinearBounded/Equivalence/ContextSensitive.lean): every terminal word derivable from the Myhill grammar's start symbol is accepted by the automaton,

CS_derives g [start] (map terminal w) → LanguageViaEmbed M embed w.

Proof strategy #

The argument is an induction on the derivation establishing a structural invariant on sentential forms. Writing g = myhillGrammar M embed, every form s with CS_derives g [start] s falls into one of the phases of the construction, and each phase carries the semantic guarantee that ties it back to an actual computation of M:

The frozen terminal component worig of every cell/cellPending is invariant under every rule (start lays it down; simulation, pending and accept rules all preserve it; cleanup emits exactly it), so an all-terminal form must be worig.map terminal, i.e. w = worig. Reaching an all-terminal form requires the accept rule to have fired, which is gated on M.accept q = true for a reachable state q; that is precisely Accepts M (initCfg worig), hence LanguageViaEmbed M embed w.

This is the genuine content of the soundness direction of the LBA ⊆ CS inclusion.

theorem MyhillConstruction.myhill_rule_inv {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) (r : csrule T (MyhillNT T Γ Λ)) (hr : r myhillAllRules M embed) :
(∃ (t : T), r = { context_left := [], input_nonterminal := MyhillNT.start, context_right := [], output_string := [cellSym true true (some M.initial) (embed t) t] }) (∃ (t : T), r = { context_left := [], input_nonterminal := MyhillNT.start, context_right := [], output_string := [symbol.nonterminal MyhillNT.startAux, cellSym false true none (embed t) t] }) (∃ (t : T), r = { context_left := [], input_nonterminal := MyhillNT.startAux, context_right := [], output_string := [symbol.nonterminal MyhillNT.startAux, cellSym false false none (embed t) t] }) (∃ (t : T), r = { context_left := [], input_nonterminal := MyhillNT.startAux, context_right := [], output_string := [cellSym true false (some M.initial) (embed t) t] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t : T) (lb : Bool) (rb : Bool), (q', a', DLBA.Dir.stay) M.transition q a r = { context_left := [], input_nonterminal := MyhillNT.cell lb rb (some q) a t, context_right := [], output_string := [cellSym lb rb (some q') a' t] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t : T) (lb : Bool), (q', a', DLBA.Dir.right) M.transition q a r = { context_left := [], input_nonterminal := MyhillNT.cell lb true (some q) a t, context_right := [], output_string := [cellSym lb true (some q') a' t] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t : T) (rb : Bool), (q', a', DLBA.Dir.left) M.transition q a r = { context_left := [], input_nonterminal := MyhillNT.cell true rb (some q) a t, context_right := [], output_string := [cellSym true rb (some q') a' t] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t₁ : T) (t₂ : T) (rb₂ : Bool) (hi : Option Λ) (b : Γ), (q', a', DLBA.Dir.right) M.transition q a r = { context_left := [], input_nonterminal := MyhillNT.cell true false (some q) a t₁, context_right := [cellSym false rb₂ hi b t₂], output_string := [cellPendingSym true false true q' a' t₁] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t₁ : T) (t₂ : T) (rb₂ : Bool) (hi : Option Λ) (b : Γ) (lb₀ : Bool) (a₀ : Γ) (t₀ : T), (q', a', DLBA.Dir.right) M.transition q a r = { context_left := [cellSym lb₀ false none a₀ t₀], input_nonterminal := MyhillNT.cell false false (some q) a t₁, context_right := [cellSym false rb₂ hi b t₂], output_string := [cellPendingSym false false true q' a' t₁] }) (∃ (q' : Λ) (a' : Γ) (t₁ : T) (t₂ : T) (lb₁ : Bool) (rb₂ : Bool) (b : Γ), r = { context_left := [cellPendingSym lb₁ false true q' a' t₁], input_nonterminal := MyhillNT.cell false rb₂ none b t₂, context_right := [], output_string := [cellSym false rb₂ (some q') b t₂] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t₁ : T) (t₂ : T) (lb₁ : Bool) (hi : Option Λ) (b : Γ), (q', a', DLBA.Dir.left) M.transition q a r = { context_left := [cellSym lb₁ false hi b t₁], input_nonterminal := MyhillNT.cell false true (some q) a t₂, context_right := [], output_string := [cellPendingSym false true false q' a' t₂] }) (∃ (q : Λ) (q' : Λ) (a : Γ) (a' : Γ) (t₁ : T) (t₂ : T) (lb₁ : Bool) (hi : Option Λ) (b : Γ) (rb₀ : Bool) (a₀ : Γ) (t₀ : T), (q', a', DLBA.Dir.left) M.transition q a r = { context_left := [cellSym lb₁ false hi b t₁], input_nonterminal := MyhillNT.cell false false (some q) a t₂, context_right := [cellSym false rb₀ none a₀ t₀], output_string := [cellPendingSym false false false q' a' t₂] }) (∃ (q' : Λ) (a' : Γ) (t₁ : T) (t₂ : T) (lb₁ : Bool) (rb₂ : Bool) (b : Γ), r = { context_left := [], input_nonterminal := MyhillNT.cell lb₁ false none b t₁, context_right := [cellPendingSym false rb₂ false q' a' t₂], output_string := [cellSym lb₁ false (some q') b t₁] }) (∃ (q' : Λ) (a' : Γ) (t : T) (lb : Bool) (rb : Bool) (dir : Bool), r = { context_left := [], input_nonterminal := MyhillNT.cellPending lb rb dir q' a' t, context_right := [], output_string := [cellSym lb rb none a' t] }) (∃ (q : Λ) (a : Γ) (t : T) (lb : Bool) (rb : Bool), M.accept q = true r = { context_left := [], input_nonterminal := MyhillNT.cell lb rb (some q) a t, context_right := [], output_string := [symbol.terminal t] }) (∃ (t₁ : T) (a : Γ) (t₂ : T) (lb : Bool) (rb : Bool), r = { context_left := [symbol.terminal t₁], input_nonterminal := MyhillNT.cell lb rb none a t₂, context_right := [], output_string := [symbol.terminal t₂] }) ∃ (a : Γ) (t₁ : T) (t₂ : T) (lb : Bool) (rb : Bool), r = { context_left := [], input_nonterminal := MyhillNT.cell lb rb none a t₁, context_right := [symbol.terminal t₂], output_string := [symbol.terminal t₁] }

Rule inversion. Every rule of the Myhill grammar is one of the seventeen shapes laid down in myhillAllRules. This is the backbone of the soundness closure step: given a single transformation s ⇒ s', knowing the rule's exact form pins down how s' relates to s.

The disjuncts, in order, mirror the rule families of myhillAllRules: start (single / first / middle / last), simulation (stay / right-boundary / left-boundary), right-interior (step1-boundary / step1-interior / step2), left-interior (step1-boundary / step1-interior / step2), pending resolution, accept, and propagation (left / right). The interior step-1 families are split into a boundary variant (head cell at the very end/start, no blocking neighbour) and an interior variant (a stateless neighbour is required as context, blocking cellPending stacking).

theorem MyhillConstruction.initCfgList_eq_canonical {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) (w : List T) (hw : List.map (⇑embed) w []) (hlen : (List.map (⇑embed) w).length - 1 + 1 = w.length) :
LBA.initCfgList M (List.map (⇑embed) w) hw = { state := M.initial, tape := { contents := fun (i : Fin ((List.map (⇑embed) w).length - 1 + 1)) => embed (w.get (Fin.cast hlen i)), head := 0, } }

Config-conversion bridge. The list-loaded initial configuration initCfgList equals the canonical positional initial configuration ⟨M.initial, embed ∘ worig, head 0⟩, when the frozen word worig reads off the (length-derived) input positions. This is the plumbing myhill_complete used inline (its hcfg); factoring it out lets the soundness extraction step turn an abstract Accepts (canonical …) into the list-level LanguageViaEmbed. The length-derived domain of worig makes both sides share the tape length (w.map embed).length-1 definitionally, so this is a homogeneous equality (no transport).

The soundness invariant #

We prove soundness by induction on the derivation [start] ⇒* s, maintaining a structural invariant SoundInv s that classifies every reachable sentential form into one of the construction's phases. With the right-to-left start grammar there are no head-states until the row is complete, so the start phase is purely structural (startAux followed by stateless cells); the simulation phase is exactly an encode of a reachable configuration; the cleanup phase directly carries the target LanguageViaEmbed fact (keyed on the frozen word), so the final extraction needs no configuration cast.

def MyhillConstruction.frozenSym {T Γ Λ : Type} :
symbol T (MyhillNT T Γ Λ)Option T

The frozen terminal carried by a symbol: the original input letter recorded in every cell and recovered verbatim by the cleanup rules.

Equations
Instances For
    def MyhillConstruction.frozenWord {T Γ Λ : Type} (s : List (symbol T (MyhillNT T Γ Λ))) :

    The frozen word read off a sentential form. On a fully-cleaned form w.map terminal it is exactly w.

    Equations
    Instances For
      def MyhillConstruction.SP_start {T Γ Λ : Type} (embed : TΓ) (s : List (symbol T (MyhillNT T Γ Λ))) :

      Start phase: startAux followed by the stateless none-cells laid so far (auxCells of some nonempty input). No cell carries a head state yet.

      Equations
      Instances For
        def MyhillConstruction.SP_sim {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) (s : List (symbol T (MyhillNT T Γ Λ))) :

        Simulation phase: s encodes a configuration reachable from the initial one.

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          def MyhillConstruction.SP_stuck {T Γ Λ : Type} (s : List (symbol T (MyhillNT T Γ Λ))) :

          Dead phase: a complete row of stateless none-cells — no head, no cellPending, no terminal. The form left when an interior-move cellPending is resolved before step2 hands off its state. soundInv_step_stuck shows no rule fires on it, and it is never all-terminal.

          Equations
          Instances For
            def MyhillConstruction.SP_cleanup {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) (s : List (symbol T (MyhillNT T Γ Λ))) :

            Cleanup phase: an accepting state has appeared (so M accepts the frozen word worig), and every position is either the recovered terminal worig i or a still-stateless cell carrying worig i. The remaining rules (propagation) only convert cells to their terminals.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              def MyhillConstruction.SP_pending {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) (s : List (symbol T (MyhillNT T Γ Λ))) :

              Pending phase: mid an interior-move protocol, three sub-cases each with one cellPending at the old head index. P1 (s = (encode cfg).set head …, no head state); GEN (s = (encode cur).set k …, head pinned at the neighbour k±1); CLEANUP (a cleanup row with one position replaced by a pending whose hand-off neighbour is already a terminal).

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For
                def MyhillConstruction.SoundInv {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) (s : List (symbol T (MyhillNT T Γ Λ))) :

                The soundness invariant: every form reachable from [start] is in one of the phases.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  theorem MyhillConstruction.soundInv_base {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) :

                  The invariant holds at the start.

                  theorem MyhillConstruction.terminal_of_mem_map {T Γ Λ : Type} {x : symbol T (MyhillNT T Γ Λ)} {w : List T} (hx : x List.map symbol.terminal w) :
                  ∃ (t : T), x = symbol.terminal t

                  A symbol from w.map terminal is a terminal, never a nonterminal.

                  theorem MyhillConstruction.accepts_initCfgOf_to_lve {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) {k : } (worig : Fin (k + 1)T) (h : LBA.Accepts M { state := M.initial, tape := { contents := fun (i : Fin (k + 1)) => embed (worig i), head := 0, } }) :
                  LBA.LanguageViaEmbed M (⇑embed) (List.ofFn worig)

                  If M accepts the canonical initial configuration loaded with worig, the word List.ofFn worig is in the embedded language. Bridges the abstract Accepts recovered when the accept rule fires (simulation → cleanup) back to LanguageViaEmbed.

                  theorem MyhillConstruction.cs_transforms_length_le {T : Type} (g : CS_grammar T) {w₁ w₂ : List (symbol T g.nt)} (h : CS_transforms g w₁ w₂) :
                  w₁.length w₂.length

                  A non-contracting step does not shorten the form (outputs are non-empty).

                  theorem MyhillConstruction.cs_derives_length_le {T : Type} (g : CS_grammar T) {w₁ w₂ : List (symbol T g.nt)} (h : CS_derives g w₁ w₂) :
                  w₁.length w₂.length

                  A derivation does not shorten the form; reachable forms are non-empty.

                  theorem MyhillConstruction.soundInv_extract {T Γ Λ : Type} (M : LBA.Machine Γ Λ) (embed : T Γ) (w : List T) (hne : w []) (hinv : SoundInv M embed (List.map symbol.terminal w)) :
                  LBA.LanguageViaEmbed M (⇑embed) w

                  Extraction. A fully-terminal form satisfying the invariant must be in the cleanup phase, which directly yields the language membership.

                  theorem MyhillConstruction.mem_auxCells {T Γ Λ : Type} (embed : TΓ) {x : symbol T (MyhillNT T Γ Λ)} {tl : List T} (hx : x auxCells embed tl) :
                  ∃ (rb : Bool) (a : Γ) (t : T), x = cellSym false rb none a t

                  Every cell produced in the start phase (auxCells) is stateless with lb = false.

                  theorem MyhillConstruction.decomp_head {α : Type} {x : α} {u v rest : List α} (h : u ++ [x] ++ v = x :: rest) (hx : xrest) :
                  u = [] v = rest

                  List surgery: if x occurs uniquely as the head of x :: rest, any decomposition u ++ [x] ++ v = x :: rest must have u = [] and v = rest.

                  theorem MyhillConstruction.set_at_length {α : Type} (u : List α) (x y : α) (v : List α) :
                  (u ++ x :: v).set u.length y = u ++ y :: v

                  Setting at the join index u.length of u ++ x :: v replaces the cons head x.

                  theorem MyhillConstruction.set_at_succ_length {α : Type} (u : List α) (p x y : α) (v : List α) :
                  (u ++ p :: x :: v).set (u.length + 1) y = u ++ p :: y :: v

                  Setting at index u.length + 1 of u ++ p :: x :: v replaces the second cons head x, keeping the first (p). Used to express a step2 rewrite (the pending stays, its neighbour becomes the new head cell) at the absolute index head + 1.

                  theorem MyhillConstruction.mem_encode {T Γ Λ : Type} {n : } (worig : Fin (n + 1)T) (cfg : DLBA.Cfg Γ Λ n) {x : symbol T (MyhillNT T Γ Λ)} (hx : x encode worig cfg) :
                  ∃ (lb : Bool) (rb : Bool) (st : Option Λ) (a : Γ) (t : T), x = symbol.nonterminal (MyhillNT.cell lb rb st a t)

                  Every symbol of an encoded configuration is a cell nonterminal.

                  theorem MyhillConstruction.encode_some_pos {T Γ Λ : Type} {n : } (worig : Fin (n + 1)T) (cfg : DLBA.Cfg Γ Λ n) {u v : List (symbol T (MyhillNT T Γ Λ))} {lb rb : Bool} {q : Λ} {a : Γ} {t : T} (h : encode worig cfg = u ++ cellSym lb rb (some q) a t :: v) :
                  u.length = cfg.tape.head

                  In an encode of a configuration, the only cell carrying a head state (some q) sits at the head position. Hence any decomposition exposing such a cell pins its index to the head.

                  theorem MyhillConstruction.frozenWord_encode {T Γ Λ : Type} {n : } (worig : Fin (n + 1)T) (cfg : DLBA.Cfg Γ Λ n) :
                  frozenWord (encode worig cfg) = List.ofFn worig

                  The frozen word of an encoded configuration is the original input worig.

                  theorem MyhillConstruction.encode_head_cell {T Γ Λ : Type} {n : } (worig : Fin (n + 1)T) (cfg : DLBA.Cfg Γ Λ n) {u v : List (symbol T (MyhillNT T Γ Λ))} {lb rb : Bool} {q : Λ} {a : Γ} {t : T} (h : encode worig cfg = u ++ cellSym lb rb (some q) a t :: v) :
                  u.length = cfg.tape.head lb = decide (cfg.tape.head = 0) rb = decide (cfg.tape.head = n) q = cfg.state a = cfg.tape.contents cfg.tape.head t = worig cfg.tape.head

                  From a head-cell decomposition of encode, recover every parameter of the cell.

                  theorem MyhillConstruction.soundInv_step_start {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) {b c : List (symbol T (MyhillNT T Γ Λ))} (hb : SP_start (⇑embed) b) (hstep : CS_transforms (myhillGrammar M embed) b c) :
                  SoundInv M embed c

                  Closure, start phase. From a start-phase form, only the middle-cell and head-cell rules apply: the former stays in the start phase, the latter completes the row to the (reachable) initial configuration, entering the simulation phase. All other rules are vacuous because their input/context symbols (a start symbol, a head-state cell, a cellPending, or a terminal) do not occur in a start-phase form.

                  theorem MyhillConstruction.soundInv_step_sim {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) {b c : List (symbol T (MyhillNT T Γ Λ))} (hb : SP_sim M embed b) (hstep : CS_transforms (myhillGrammar M embed) b c) :
                  SoundInv M embed c

                  Closure, simulation phase. A rule applied to encode worig cfg must act on the unique head-state cell (all others are stateless none cells, and no start/startAux/cellPending/ terminal symbols occur). The head-stationary moves give the next encode (simulation), the interior step-1 moves introduce a cellPending (pending phase), and the accept rule turns the head cell into a terminal — at which point M accepts, entering the cleanup phase.

                  theorem MyhillConstruction.soundInv_step_cleanup {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) {b c : List (symbol T (MyhillNT T Γ Λ))} (hb : SP_cleanup M embed b) (hstep : CS_transforms (myhillGrammar M embed) b c) :
                  SoundInv M embed c

                  Closure, cleanup phase. From a cleanup row only the propagation rules apply (all symbols are terminals or stateless cells, so no start/startAux/cellPending/head-state cell occurs). Each propagation turns a stateless cell into its frozen terminal, keeping the cleanup row (and the accepted word) intact.

                  theorem MyhillConstruction.soundInv_step_stuck {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) {b c : List (symbol T (MyhillNT T Γ Λ))} (hb : xb, ∃ (lb : Bool) (rb : Bool) (a : Γ) (t : T), x = cellSym lb rb none a t) (hstep : CS_transforms (myhillGrammar M embed) b c) :

                  No rule fires on a dead (all-stateless) row. If every symbol of b is a stateless cell … none …, then no grammar rule applies: every rule needs a head-state cell, a cellPending, a terminal, or a start/startAux symbol as its input or context, none of which occur. This is the closure of the "stuck" phase reached when an interior-move cellPending is resolved before step2 hands off its state. A building block for the SP_pending closure.

                  theorem MyhillConstruction.soundInv_step_pending {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) {b c : List (symbol T (MyhillNT T Γ Λ))} (hb : SP_pending M embed b) (hstep : CS_transforms (myhillGrammar M embed) b c) :
                  SoundInv M embed c

                  Closure, pending phase. From a SP_pending form (P1 / GEN / CLEANUP), one rule step keeps the soundness invariant. In P1 only resolution (→ SP_stuck) and the matching step2 (→ GEN) fire; in GEN the pinned head can stay/clamp (→ GEN), accept (→ CLEANUP), or have the pending resolved (→ SP_sim); in CLEANUP only propagation (→ CLEANUP) and resolution (→ SP_cleanup) fire. The producers (soundInv_step_sim's step1 arms) build P1; the rest of the protocol is closed here.

                  theorem MyhillConstruction.myhill_sound {T Γ Λ : Type} [Fintype T] [Fintype Γ] [Fintype Λ] (M : LBA.Machine Γ Λ) (embed : T Γ) (w : List T) (h : CS_derives (myhillGrammar M embed) [symbol.nonterminal MyhillNT.start] (List.map symbol.terminal w)) :
                  LBA.LanguageViaEmbed M (⇑embed) w

                  Soundness. Every terminal word derived by the Myhill grammar is accepted by the LBA.

                  The converse of myhill_complete; together they give myhill_language_eq. The proof is an induction on the derivation establishing SoundInv, then soundInv_extract.