Langlib

Langlib.Classes.ContextFree.Decidability.Membership

Decidability and Computability of Membership #

This file proves that membership is decidable—and indeed computable—for context-free languages (represented by context-free grammars).

The proof proceeds via the CYK algorithm on Chomsky-normal-form grammars.

Main results #

Part 1: Context-Free Languages – CNF Decidability #

@[irreducible]
noncomputable def ChomskyNormalFormGrammar.canDerive {T : Type} (g : ChomskyNormalFormGrammar T) [DecidableEq g.NT] (n : g.NT) :
List TProp

CYK-style predicate: can nonterminal n derive word w in CNF grammar g? Quantifies over rules (a Finset) instead of nonterminals, so does NOT require Fintype g.NT.

Equations
Instances For
    @[irreducible]
    def ChomskyNormalFormGrammar.cykDecideAux {T : Type} [DecidableEq T] {NT : Type} [DecidableEq NT] (rulesList : List (ChomskyNormalFormRule T NT)) (n : NT) (w : List T) :

    Bool-valued CYK decision function. Takes an explicit list of rules so that the function is genuinely computable (not noncomputable).

    Equations
    Instances For
      theorem ChomskyNormalFormGrammar.cykDecideAux_iff_canDerive {T : Type} [DecidableEq T] (g : ChomskyNormalFormGrammar T) [DecidableEq g.NT] (rulesList : List (ChomskyNormalFormRule T g.NT)) (hrules : ∀ (r : ChomskyNormalFormRule T g.NT), r rulesList r g.rules) (n : g.NT) (w : List T) :
      cykDecideAux rulesList n w = true g.canDerive n w

      canDerive is equivalent to Derives (derivation in the CNF grammar).

      Bitvector operations #

      def ntInSet (bv idx : ) :

      Check if bit idx is set in bv.

      Equations
      Instances For
        def ntSetBit (bv idx : ) :

        Set bit idx in bv.

        Equations
        Instances For
          theorem ntInSet_ntSetBit_self (bv idx : ) :
          ntInSet (ntSetBit bv idx) idx = true
          theorem ntInSet_ntSetBit_of_true (bv idx idx' : ) (h : ntInSet bv idx = true) :
          ntInSet (ntSetBit bv idx') idx = true

          Primrec proofs for bitvector operations #

          theorem primrec₂_ntInSet :
          Primrec₂ fun (bv idx : ) => ntInSet bv idx
          theorem primrec₂_ntSetBit :
          Primrec₂ fun (bv idx : ) => ntSetBit bv idx

          Fold over fixed lists is Primrec #

          theorem primrec_foldl_fixed {α σ : Type} [Primcodable α] [Primcodable σ] {β : Type} (l : List β) (step : βασσ) (h_step : ∀ (b : β), Primrec₂ (step b)) :
          Primrec₂ fun (a : α) (init : σ) => List.foldl (fun (acc : σ) (b : β) => step b a acc) init l

          Folding over a fixed list with a Primrec step function is Primrec.

          CYK Table Definitions (Bitvector-based) #

          def cykLeafBV {T : Type} [DecidableEq T] (leafData : List ( × T)) (t : T) :

          Bitvector lookup for terminal rules: given terminal t, compute the bitvector of nonterminals with leaf rule nt → t.

          Equations
          Instances For
            def cykBuildTable {T : Type} [DecidableEq T] (leafData : List ( × T)) (nodeData : List ( × × )) (w : List T) :

            Build the CYK table bottom-up.

            The result is a flat List where entry at index l * n + i (with n = w.length) is the bitvector of nonterminals that derive the substring w[i .. i+l] (length l+1 starting at position i).

            extraRows is the number of rows beyond the base row.

            Equations
            Instances For
              def cykMemCheck {T : Type} [DecidableEq T] (leafData : List ( × T)) (nodeData : List ( × × )) (startIdx : ) (w : List T) :

              Full membership check via CYK bitvector table.

              Equations
              Instances For

                CYK Bitvector is Primitive Recursive #

                theorem primrec₂_ruleStep (ntR lR rR : ) :
                Primrec₂ fun (ctx : List × × × × ) (bv : ) => if (ntInSet (ctx.1.getD (ctx.2.2.2.2 * ctx.2.1 + ctx.2.2.1) 0) lR && ntInSet (ctx.1.getD ((ctx.2.2.2.1 - ctx.2.2.2.2) * ctx.2.1 + (ctx.2.2.1 + ctx.2.2.2.2 + 1)) 0) rR) = true then ntSetBit bv ntR else bv
                theorem primrec₂_allRulesStep (nd : List ( × × )) :
                Primrec₂ fun (ctx : List × × × × ) (bv : ) => List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet (ctx.1.getD (ctx.2.2.2.2 * ctx.2.1 + ctx.2.2.1) 0) r.2.1 && ntInSet (ctx.1.getD ((ctx.2.2.2.1 - ctx.2.2.2.2) * ctx.2.1 + (ctx.2.2.1 + ctx.2.2.2.2 + 1)) 0) r.2.2) = true then ntSetBit bv' r.1 else bv') bv nd

                The fold over all rules (fixed nodeData) is Primrec₂.

                theorem primrec_splitFold (nd : List ( × × )) :
                Primrec fun (ctx : List × × × ) => List.foldl (fun (bv j : ) => List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet (ctx.1.getD (j * ctx.2.1 + ctx.2.2.1) 0) r.2.1 && ntInSet (ctx.1.getD ((ctx.2.2.2 - j) * ctx.2.1 + (ctx.2.2.1 + j + 1)) 0) r.2.2) = true then ntSetBit bv' r.1 else bv') bv nd) 0 (List.range (ctx.2.2.2 + 1))
                theorem primrec₂_cellValue (nd : List ( × × )) :
                Primrec₂ fun (ctx : List × × ) (i : ) => if i + ctx.2.2 + 2 > ctx.2.1 then 0 else List.foldl (fun (bv j : ) => List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet (ctx.1.getD (j * ctx.2.1 + i) 0) r.2.1 && ntInSet (ctx.1.getD ((ctx.2.2 - j) * ctx.2.1 + (i + j + 1)) 0) r.2.2) = true then ntSetBit bv' r.1 else bv') bv nd) 0 (List.range (ctx.2.2 + 1))
                theorem primrec₂_cykStep {T : Type} [Primcodable T] (nd : List ( × × )) :
                Primrec₂ fun (w : List T) (p : × List ) => p.2 ++ List.map (fun (i : ) => if i + p.1 + 2 > w.length then 0 else List.foldl (fun (bv j : ) => List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet (p.2.getD (j * w.length + i) 0) r.2.1 && ntInSet (p.2.getD ((p.1 - j) * w.length + (i + j + 1)) 0) r.2.2) = true then ntSetBit bv' r.1 else bv') bv nd) 0 (List.range (p.1 + 1))) (List.range w.length)
                theorem primrec_cykBuildTable {T : Type} [DecidableEq T] [Primcodable T] (ld : List ( × T)) (nd : List ( × × )) :
                Primrec fun (w : List T) => cykBuildTable ld nd w (w.length - 1)
                theorem cykMemCheck_eq {T : Type} [DecidableEq T] (ld : List ( × T)) (nd : List ( × × )) (si : ) (w : List T) :
                cykMemCheck ld nd si w = ntInSet ((cykBuildTable ld nd w (w.length - 1)).getD ((w.length - 1) * w.length) 0) si

                The full membership check is Primrec.

                theorem primrec_cykMemCheck {T : Type} [DecidableEq T] [Primcodable T] (ld : List ( × T)) (nd : List ( × × )) (si : ) :
                Primrec fun (w : List T) => cykMemCheck ld nd si w

                The full membership check is Primrec.

                CYK Bitvector Correctness #

                theorem ntInSet_ntSetBit_ne (bv idx idx' : ) (h : idx idx') :
                ntInSet (ntSetBit bv idx') idx = ntInSet bv idx
                theorem cykLeafBV_correct {T : Type} [DecidableEq T] (leafData : List ( × T)) (t : T) (idx : ) :
                ntInSet (cykLeafBV leafData t) idx = true (idx, t) leafData
                theorem cykBuildTable_length {T : Type} [DecidableEq T] (ld : List ( × T)) (nd : List ( × × )) (w : List T) (k : ) :
                (cykBuildTable ld nd w k).length = (k + 1) * w.length
                theorem cykBuildTable_getD_prev {T : Type} [DecidableEq T] (ld : List ( × T)) (nd : List ( × × )) (w : List T) (k idx : ) (h : idx < (k + 1) * w.length) :
                (cykBuildTable ld nd w (k + 1)).getD idx 0 = (cykBuildTable ld nd w k).getD idx 0
                theorem cykBuildTable_getD_stable {T : Type} [DecidableEq T] (ld : List ( × T)) (nd : List ( × × )) (w : List T) (k k' : ) (hk : k k') (idx : ) (h : idx < (k + 1) * w.length) :
                (cykBuildTable ld nd w k').getD idx 0 = (cykBuildTable ld nd w k).getD idx 0
                theorem foldl_rules_ntInSet (nd : List ( × × )) (left_bv right_bv init idx : ) :
                ntInSet (List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet left_bv r.2.1 && ntInSet right_bv r.2.2) = true then ntSetBit bv' r.1 else bv') init nd) idx = true ntInSet init idx = true rnd, r.1 = idx ntInSet left_bv r.2.1 = true ntInSet right_bv r.2.2 = true
                theorem foldl_splits_ntInSet (nd : List ( × × )) (table : List ) (n i k idx : ) :
                ntInSet (List.foldl (fun (bv j : ) => List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet (table.getD (j * n + i) 0) r.2.1 && ntInSet (table.getD ((k - j) * n + (i + j + 1)) 0) r.2.2) = true then ntSetBit bv' r.1 else bv') bv nd) 0 (List.range (k + 1))) idx = true jList.range (k + 1), rnd, r.1 = idx ntInSet (table.getD (j * n + i) 0) r.2.1 = true ntInSet (table.getD ((k - j) * n + (i + j + 1)) 0) r.2.2 = true
                theorem cykBuildTable_correct_base {T : Type} [DecidableEq T] (g : ChomskyNormalFormGrammar T) [DecidableEq g.NT] (enc : g.NT) (_enc_inj : Function.Injective enc) (leafData : List ( × T)) (h_leaf : ∀ (nt : g.NT) (t : T), (enc nt, t) leafData ChomskyNormalFormRule.leaf nt t g.rules) (nodeData : List ( × × )) (w : List T) (_hw : w []) (i : ) (hi : i < w.length) (nt : g.NT) :
                ntInSet ((cykBuildTable leafData nodeData w 0).getD i 0) (enc nt) = ChomskyNormalFormGrammar.cykDecideAux g.rules.toList nt (List.take 1 (List.drop i w))

                Base case of CYK correctness: single characters.

                theorem cykBuildTable_entry_step {T : Type} [DecidableEq T] (ld : List ( × T)) (nd : List ( × × )) (w : List T) (l i : ) (hi : i + l + 2 w.length) :
                (cykBuildTable ld nd w (l + 1)).getD ((l + 1) * w.length + i) 0 = List.foldl (fun (bv j : ) => List.foldl (fun (bv' : ) (r : × × ) => if (ntInSet ((cykBuildTable ld nd w l).getD (j * w.length + i) 0) r.2.1 && ntInSet ((cykBuildTable ld nd w l).getD ((l - j) * w.length + (i + j + 1)) 0) r.2.2) = true then ntSetBit bv' r.1 else bv') bv nd) 0 (List.range (l + 1))
                theorem cykDecideAux_long {T : Type} [DecidableEq T] {NT : Type} [DecidableEq NT] (rules : List (ChomskyNormalFormRule T NT)) (nt : NT) (w : List T) (hw : w.length 2) :
                ChomskyNormalFormGrammar.cykDecideAux rules nt w = (List.finRange (w.length - 1)).any fun (x : Fin (w.length - 1)) => match x with | j, _hj => rules.any fun (r : ChomskyNormalFormRule T NT) => match r with | ChomskyNormalFormRule.node n' c₁ c₂ => decide (n' = nt) && ChomskyNormalFormGrammar.cykDecideAux rules c₁ (List.take (j + 1) w) && ChomskyNormalFormGrammar.cykDecideAux rules c₂ (List.drop (j + 1) w) | x => false

                cykDecideAux on a word of length ≥ 2 checks split points and node rules.

                theorem cykBuildTable_correct_step {T : Type} [DecidableEq T] (g : ChomskyNormalFormGrammar T) [DecidableEq g.NT] (enc : g.NT) (enc_inj : Function.Injective enc) (leafData : List ( × T)) (h_leaf : ∀ (nt : g.NT) (t : T), (enc nt, t) leafData ChomskyNormalFormRule.leaf nt t g.rules) (nodeData : List ( × × )) (h_node : ∀ (nt l r : g.NT), (enc nt, enc l, enc r) nodeData ChomskyNormalFormRule.node nt l r g.rules) (h_node_range : rnodeData, ∃ (nt : g.NT) (l : g.NT) (r' : g.NT), r = (enc nt, enc l, enc r')) (h_leaf_range : pleafData, ∃ (nt : g.NT), p.1 = enc nt) (w : List T) (hw : w []) (l i : ) (hi : i + l + 2 w.length) (nt : g.NT) (ih : l' < l + 1, ∀ (i' : ), i' + l' + 1 w.length∀ (nt' : g.NT), ntInSet ((cykBuildTable leafData nodeData w l').getD (l' * w.length + i') 0) (enc nt') = ChomskyNormalFormGrammar.cykDecideAux g.rules.toList nt' (List.take (l' + 1) (List.drop i' w))) :
                ntInSet ((cykBuildTable leafData nodeData w (l + 1)).getD ((l + 1) * w.length + i) 0) (enc nt) = ChomskyNormalFormGrammar.cykDecideAux g.rules.toList nt (List.take (l + 2) (List.drop i w))

                Step case of CYK correctness: longer substrings. Requires h_node_range to ensure all nodeData entries are in range of enc.

                theorem cykBuildTable_correct {T : Type} [DecidableEq T] (g : ChomskyNormalFormGrammar T) [DecidableEq g.NT] (enc : g.NT) (enc_inj : Function.Injective enc) (leafData : List ( × T)) (h_leaf : ∀ (nt : g.NT) (t : T), (enc nt, t) leafData ChomskyNormalFormRule.leaf nt t g.rules) (nodeData : List ( × × )) (h_node : ∀ (nt l r : g.NT), (enc nt, enc l, enc r) nodeData ChomskyNormalFormRule.node nt l r g.rules) (h_node_range : rnodeData, ∃ (nt : g.NT) (l : g.NT) (r' : g.NT), r = (enc nt, enc l, enc r')) (h_leaf_range : pleafData, ∃ (nt : g.NT), p.1 = enc nt) (w : List T) (hw : w []) (k l : ) (hl : l k) (hk : k < w.length) (i : ) (hi : i + l + 1 w.length) (nt : g.NT) :
                ntInSet ((cykBuildTable leafData nodeData w k).getD (l * w.length + i) 0) (enc nt) = ChomskyNormalFormGrammar.cykDecideAux g.rules.toList nt (List.take (l + 1) (List.drop i w))

                Key correctness lemma: CYK bitvector table entries correspond to cykDecideAux. Proved by strong induction on l.

                Main theorem: CF membership is ComputablePred #

                theorem cykMemCheck_correct_cnf {T : Type} [DecidableEq T] [Primcodable T] (cnf : ChomskyNormalFormGrammar T) [DecidableEq cnf.NT] (enc : cnf.NT) (enc_inj : Function.Injective enc) (leafData : List ( × T)) (h_leaf : ∀ (nt : cnf.NT) (t : T), (enc nt, t) leafData ChomskyNormalFormRule.leaf nt t cnf.rules) (nodeData : List ( × × )) (h_node : ∀ (nt l r : cnf.NT), (enc nt, enc l, enc r) nodeData ChomskyNormalFormRule.node nt l r cnf.rules) (h_node_range : rnodeData, ∃ (nt : cnf.NT) (l : cnf.NT) (r' : cnf.NT), r = (enc nt, enc l, enc r')) (h_leaf_range : pleafData, ∃ (nt : cnf.NT), p.1 = enc nt) (w : List T) (hw : w []) :
                cykMemCheck leafData nodeData (enc cnf.initial) w = true w cnf.language

                Membership in a context-free language is a computable predicate.

                Membership is uniformly computable for the context-free languages: encoded context-free grammars are an adequate, effective presentation (contextFreeLanguageOf_characterizes) with uniformly decidable membership (ComputableMembership).