Langlib

Langlib.Classes.ContextFree.Basics.Ogden

Ogden's Lemma for Context-Free Languages #

This file states and proves Ogden's lemma, a strengthened version of the pumping lemma for context-free languages. While the pumping lemma guarantees that long enough strings in a CFL can be "pumped", Ogden's lemma allows the user to mark certain positions of the string and guarantees that the pump pieces contain marked positions.

Main declarations #

Implementation notes #

The proof follows the standard Ogden argument:

  1. Convert to Chomsky Normal Form (CNF).
  2. Navigate along the "marked path" to find a subtree with bounded marked height.
  3. Use the pigeonhole principle on branching nonterminals to find a repeat.
  4. The branching property ensures marked positions in the pump parts (v, y).
  5. The bounded marked height ensures the marked-count bound on vxy.

The core technical lemma ogdens_marked_path_decomp implements the marked-path pigeonhole argument. It tracks branching nonterminals in a finite set and detects collisions to construct the Ogden decomposition.

References #

Counting marked positions #

noncomputable def countMarkedIn (P : Prop) [DecidablePred P] (start len : ) :

Count how many natural numbers in the interval [start, start + len) satisfy P.

Equations
Instances For
    @[simp]
    theorem countMarkedIn_zero (P : Prop) [DecidablePred P] (start : ) :
    countMarkedIn P start 0 = 0
    theorem countMarkedIn_le (P : Prop) [DecidablePred P] (start len : ) :
    countMarkedIn P start len len
    theorem countMarkedIn_add (P : Prop) [DecidablePred P] (start len₁ len₂ : ) :
    countMarkedIn P start (len₁ + len₂) = countMarkedIn P start len₁ + countMarkedIn P (start + len₁) len₂
    theorem countMarkedIn_mono_len (P : Prop) [DecidablePred P] (start : ) {len₁ len₂ : } (h : len₁ len₂) :
    countMarkedIn P start len₁ countMarkedIn P start len₂

    Parse tree marked count #

    noncomputable def ChomskyNormalFormGrammar.parseTree.mc {T : Type uT} {g : ChomskyNormalFormGrammar T} (P : Prop) [DecidablePred P] (offset : ) {n : g.NT} (t : parseTree n) :

    The number of marked positions in a parse tree's yield.

    Equations
    Instances For
      theorem ChomskyNormalFormGrammar.parseTree.mc_node {T : Type uT} {g : ChomskyNormalFormGrammar T} (P : Prop) [DecidablePred P] (offset : ) {n c₁ c₂ : g.NT} (t₁ : parseTree c₁) (t₂ : parseTree c₂) (h : ChomskyNormalFormRule.node n c₁ c₂ g.rules) :
      mc P offset (t₁.node t₂ h) = mc P offset t₁ + mc P (offset + t₁.yield.length) t₂

      Marked height #

      noncomputable def ChomskyNormalFormGrammar.parseTree.markedHeight {T : Type uT} {g : ChomskyNormalFormGrammar T} (P : Prop) [DecidablePred P] (offset : ) {n : g.NT} :

      The "marked branching depth": maximum number of branching points (where both children have marked descendants) on any root-to-leaf path.

      Equations
      Instances For

        Key bounds #

        theorem ChomskyNormalFormGrammar.parseTree.mc_le_pow_markedHeight {T : Type uT} {g : ChomskyNormalFormGrammar T} (P : Prop) [DecidablePred P] (offset : ) {n : g.NT} (t : parseTree n) :
        mc P offset t 2 ^ markedHeight P offset t

        mc ≤ 2^markedHeight

        theorem ChomskyNormalFormGrammar.parseTree.mc_implies_markedHeight {T : Type uT} {g : ChomskyNormalFormGrammar T} (P : Prop) [DecidablePred P] (offset : ) {n : g.NT} (t : parseTree n) (h : ) (hmc : 2 ^ h mc P offset t) :
        h markedHeight P offset t

        2^h ≤ mc → h ≤ markedHeight

        markedHeight ≤ height

        Derivation helper lemmas #

        From a derivation of the left child, build a derivation of the node.

        From a derivation of the right child, build a derivation of the node.

        Helper lemmas for extending Ogden results through nodes #

        theorem ChomskyNormalFormGrammar.parseTree.ogden_extend_left_via_left {T : Type uT} {g : ChomskyNormalFormGrammar T} [DecidableEq g.NT] {n c₁ c₂ : g.NT} {t₁ : parseTree c₁} {t₂ : parseTree c₂} (hnc : ChomskyNormalFormRule.node n c₁ c₂ g.rules) (P : Prop) [DecidablePred P] (offset : ) {u v x y z : List T} (hyield : t₁.yield = u ++ v ++ x ++ y ++ z) (hmark : 0 < countMarkedIn P (offset + u.length) v.length + countMarkedIn P (offset + u.length + v.length + x.length) y.length) (hbound : countMarkedIn P (offset + u.length) (v.length + x.length + y.length) 2 ^ g.generators.card) (hpump : ∀ (i : ), g.Derives [Symbol.nonterminal c₁] (List.map Symbol.terminal (u ++ v ^+^ i ++ x ++ y ^+^ i ++ z))) :
        ∃ (u' : List T) (v' : List T) (x' : List T) (y' : List T) (z' : List T), (t₁.node t₂ hnc).yield = u' ++ v' ++ x' ++ y' ++ z' 0 < countMarkedIn P (offset + u'.length) v'.length + countMarkedIn P (offset + u'.length + v'.length + x'.length) y'.length countMarkedIn P (offset + u'.length) (v'.length + x'.length + y'.length) 2 ^ g.generators.card ∀ (i : ), g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal (u' ++ v' ^+^ i ++ x' ++ y' ^+^ i ++ z'))
        theorem ChomskyNormalFormGrammar.parseTree.ogden_extend_left_via_right {T : Type uT} {g : ChomskyNormalFormGrammar T} [DecidableEq g.NT] {n c₁ c₂ : g.NT} {t₁ : parseTree c₁} {t₂ : parseTree c₂} (hnc : ChomskyNormalFormRule.node n c₁ c₂ g.rules) (P : Prop) [DecidablePred P] (offset : ) {u v x y z : List T} (hyield : t₂.yield = u ++ v ++ x ++ y ++ z) (hmark : 0 < countMarkedIn P (offset + t₁.yield.length + u.length) v.length + countMarkedIn P (offset + t₁.yield.length + u.length + v.length + x.length) y.length) (hbound : countMarkedIn P (offset + t₁.yield.length + u.length) (v.length + x.length + y.length) 2 ^ g.generators.card) (hpump : ∀ (i : ), g.Derives [Symbol.nonterminal c₂] (List.map Symbol.terminal (u ++ v ^+^ i ++ x ++ y ^+^ i ++ z))) :
        ∃ (u' : List T) (v' : List T) (x' : List T) (y' : List T) (z' : List T), (t₁.node t₂ hnc).yield = u' ++ v' ++ x' ++ y' ++ z' 0 < countMarkedIn P (offset + u'.length) v'.length + countMarkedIn P (offset + u'.length + v'.length + x'.length) y'.length countMarkedIn P (offset + u'.length) (v'.length + x'.length + y'.length) 2 ^ g.generators.card ∀ (i : ), g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal (u' ++ v' ^+^ i ++ x' ++ y' ^+^ i ++ z'))
        theorem ChomskyNormalFormGrammar.parseTree.ogden_extend_right_via_left {T : Type uT} {g : ChomskyNormalFormGrammar T} {n c₁ c₂ : g.NT} {t₁ : parseTree c₁} {t₂ : parseTree c₂} (hnc : ChomskyNormalFormRule.node n c₁ c₂ g.rules) (P : Prop) [DecidablePred P] (offset : ) (s : Finset g.NT) {n' : g.NT} (hn's : n' s) {p' : parseTree n'} {u₁ z₁ : List T} (hyield : t₁.yield = u₁ ++ p'.yield ++ z₁) (hmc' : 0 < mc P (offset + u₁.length) p') (hderiv : g.Derives [Symbol.nonterminal c₁] (List.map Symbol.terminal u₁ ++ [Symbol.nonterminal n'] ++ List.map Symbol.terminal z₁)) :
        n''s, ∃ (p'' : parseTree n'') (u₂ : List T) (z₂ : List T), (t₁.node t₂ hnc).yield = u₂ ++ p''.yield ++ z₂ 0 < mc P (offset + u₂.length) p'' g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal u₂ ++ [Symbol.nonterminal n''] ++ List.map Symbol.terminal z₂)
        theorem ChomskyNormalFormGrammar.parseTree.ogden_extend_right_via_right {T : Type uT} {g : ChomskyNormalFormGrammar T} {n c₁ c₂ : g.NT} {t₁ : parseTree c₁} {t₂ : parseTree c₂} (hnc : ChomskyNormalFormRule.node n c₁ c₂ g.rules) (P : Prop) [DecidablePred P] (offset : ) (s : Finset g.NT) {n' : g.NT} (hn's : n' s) {p' : parseTree n'} {u₁ z₁ : List T} (hyield : t₂.yield = u₁ ++ p'.yield ++ z₁) (hmc' : 0 < mc P (offset + t₁.yield.length + u₁.length) p') (hderiv : g.Derives [Symbol.nonterminal c₂] (List.map Symbol.terminal u₁ ++ [Symbol.nonterminal n'] ++ List.map Symbol.terminal z₁)) :
        n''s, ∃ (p'' : parseTree n'') (u₂ : List T) (z₂ : List T), (t₁.node t₂ hnc).yield = u₂ ++ p''.yield ++ z₂ 0 < mc P (offset + u₂.length) p'' g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal u₂ ++ [Symbol.nonterminal n''] ++ List.map Symbol.terminal z₂)
        theorem ChomskyNormalFormGrammar.parseTree.ogden_pump_from_left {T : Type uT} {g : ChomskyNormalFormGrammar T} [DecidableEq g.NT] {n c₁ c₂ : g.NT} {t₁ : parseTree c₁} {t₂ : parseTree c₂} (hnc : ChomskyNormalFormRule.node n c₁ c₂ g.rules) (P : Prop) [DecidablePred P] (offset : ) (hmc₂ : 0 < mc P (offset + t₁.yield.length) t₂) (hmc_bound : mc P offset (t₁.node t₂ hnc) 2 ^ g.generators.card) {p' : parseTree n} {u₁ z₁ : List T} (hyield₁ : t₁.yield = u₁ ++ p'.yield ++ z₁) (_hmc' : 0 < mc P (offset + u₁.length) p') (hderiv₁ : g.Derives [Symbol.nonterminal c₁] (List.map Symbol.terminal u₁ ++ [Symbol.nonterminal n] ++ List.map Symbol.terminal z₁)) :
        ∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), (t₁.node t₂ hnc).yield = u ++ v ++ x ++ y ++ z 0 < countMarkedIn P (offset + u.length) v.length + countMarkedIn P (offset + u.length + v.length + x.length) y.length countMarkedIn P (offset + u.length) (v.length + x.length + y.length) 2 ^ g.generators.card ∀ (i : ), g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal (u ++ v ^+^ i ++ x ++ y ^+^ i ++ z))
        theorem ChomskyNormalFormGrammar.parseTree.ogden_pump_from_right {T : Type uT} {g : ChomskyNormalFormGrammar T} [DecidableEq g.NT] {n c₁ c₂ : g.NT} {t₁ : parseTree c₁} {t₂ : parseTree c₂} (hnc : ChomskyNormalFormRule.node n c₁ c₂ g.rules) (P : Prop) [DecidablePred P] (offset : ) (hmc₁ : 0 < mc P offset t₁) (hmc_bound : mc P offset (t₁.node t₂ hnc) 2 ^ g.generators.card) {p' : parseTree n} {u₁ z₁ : List T} (hyield₂ : t₂.yield = u₁ ++ p'.yield ++ z₁) (_hmc' : 0 < mc P (offset + t₁.yield.length + u₁.length) p') (hderiv₂ : g.Derives [Symbol.nonterminal c₂] (List.map Symbol.terminal u₁ ++ [Symbol.nonterminal n] ++ List.map Symbol.terminal z₁)) :
        ∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), (t₁.node t₂ hnc).yield = u ++ v ++ x ++ y ++ z 0 < countMarkedIn P (offset + u.length) v.length + countMarkedIn P (offset + u.length + v.length + x.length) y.length countMarkedIn P (offset + u.length) (v.length + x.length + y.length) 2 ^ g.generators.card ∀ (i : ), g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal (u ++ v ^+^ i ++ x ++ y ^+^ i ++ z))

        Core Ogden marked-path pigeonhole #

        theorem ChomskyNormalFormGrammar.parseTree.ogdens_marked_path_decomp {T : Type uT} {g : ChomskyNormalFormGrammar T} [DecidableEq g.NT] {n : g.NT} (p : parseTree n) (P : Prop) [DecidablePred P] (offset : ) (s : Finset g.NT) (hs_sub : s g.generators) (hp : g.generators.card markedHeight P offset p + s.card) (hmc : 0 < mc P offset p) (hmc_bound : mc P offset p 2 ^ g.generators.card) :
        (∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), p.yield = u ++ v ++ x ++ y ++ z 0 < countMarkedIn P (offset + u.length) v.length + countMarkedIn P (offset + u.length + v.length + x.length) y.length countMarkedIn P (offset + u.length) (v.length + x.length + y.length) 2 ^ g.generators.card ∀ (i : ), g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal (u ++ v ^+^ i ++ x ++ y ^+^ i ++ z))) n's, ∃ (p' : parseTree n') (u₁ : List T) (z₁ : List T), p.yield = u₁ ++ p'.yield ++ z₁ 0 < mc P (offset + u₁.length) p' g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal u₁ ++ [Symbol.nonterminal n'] ++ List.map Symbol.terminal z₁)
        theorem ChomskyNormalFormGrammar.parseTree.ogdens_restrict_mh {T : Type uT} {g : ChomskyNormalFormGrammar T} {n : g.NT} (p : parseTree n) (P : Prop) [DecidablePred P] (offset k : ) (hmh : k markedHeight P offset p) (hmc : 0 < mc P offset p) :
        ∃ (n' : g.NT) (q : parseTree n') (u₀ : List T) (z₀ : List T), p.yield = u₀ ++ q.yield ++ z₀ markedHeight P (offset + u₀.length) q = k 0 < mc P (offset + u₀.length) q g.Derives [Symbol.nonterminal n] (List.map Symbol.terminal u₀ ++ [Symbol.nonterminal n'] ++ List.map Symbol.terminal z₀)

        Ogden's lemma for CNF grammars #

        theorem ChomskyNormalFormGrammar.parseTree.ogdens_cnf {T : Type uT} {g : ChomskyNormalFormGrammar T} [DecidableEq g.NT] {w : List T} (hwg : w g.language) (P : Prop) [DecidablePred P] (hw : countMarkedIn P 0 w.length 2 ^ g.generators.card) :
        ∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), w = u ++ v ++ x ++ y ++ z 0 < countMarkedIn P u.length v.length + countMarkedIn P (u.length + v.length + x.length) y.length countMarkedIn P u.length (v.length + x.length + y.length) 2 ^ g.generators.card ∀ (i : ), u ++ v ^+^ i ++ x ++ y ^+^ i ++ z g.language

        Ogden's lemma for general context-free languages #

        theorem Language.IsContextFree.ogdens_lemma {T : Type} {L : Language T} (hL : L.IsContextFree) :
        ∃ (p : ), wL, ∀ (P : Prop) [inst : DecidablePred P], p countMarkedIn P 0 w.length∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), w = u ++ v ++ x ++ y ++ z 0 < countMarkedIn P u.length v.length + countMarkedIn P (u.length + v.length + x.length) y.length countMarkedIn P u.length (v.length + x.length + y.length) p ∀ (i : ), u ++ v ^+^ i ++ x ++ y ^+^ i ++ z L

        Ogden's lemma for context-free languages (Mathlib formulation).

        theorem CF_ogdens_lemma {T : Type} {L : Language T} (cf : is_CF L) :
        ∃ (p : ), wL, ∀ (P : Prop) [inst : DecidablePred P], p countMarkedIn P 0 w.length∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), w = u ++ v ++ x ++ y ++ z 0 < countMarkedIn P u.length v.length + countMarkedIn P (u.length + v.length + x.length) y.length countMarkedIn P u.length (v.length + x.length + y.length) p ∀ (i : ), u ++ v ^+^ i ++ x ++ y ^+^ i ++ z L

        Ogden's lemma for context-free languages (project formulation).