Non-contracting and non-erasing context-sensitive grammars
There are two classical ways to define the context-sensitive (Chomsky type-1) grammars, and they are provably the same class only via a real construction (Kuroda normal form). This page records exactly which definition Langlib uses and how much of the bridge to the other one is formalized.
The two definitions
- Non-contracting (monotone): no rule shortens the sentential form — every rule
α → βhas|β| ≥ |α|. A single rule may rewrite a whole block, e.g.A B → C D. - Non-erasing / context-preserving: every rule has the restricted shape
α A β → α γ βwithγ ≠ ε— only one nonterminal is rewritten, and only in a fixed left/right context.
Langlib’s context-sensitive languages are defined via the non-contracting form.
is_CS quantifies over grammar_context_sensitive grammars, whose core condition is
grule_noncontracting (|output| ≥ |input|), together with an optional distinguished
S → ε rule guarded by initial_not_on_rhs (S may not appear on any right-hand
side when S → ε is present).
In Lean
The definition:
is_CSviagrammar_context_sensitiveandgrule_noncontracting— the non-contracting definition.- The non-erasing form is a separate notion:
csrule/CS_grammar(withoutput_nonempty), surfaced asis_CS_via_csg.
What is proven (and what isn’t)
- ✅ Non-erasing ⇒ non-contracting:
CS_is_noncontracting— everyCS_grammarinduces a non-contracting grammar with the same language. - ✅ Non-contracting ⇒
is_CS:is_CS_of_is_noncontracting— but this is essentially by definition, sinceis_CSalready means non-contracting (the proof is a one-liner). - ◑ Non-contracting ⇒ non-erasing (the hard direction): the “lock” construction builds a non-erasing
CS_grammarfrom a non-contracting grammar, and one language inclusion is proven unconditionally (grammar_language_subset_csg_of_noncontracting). - ⬜ The reverse inclusion is not yet established: the full correctness of the construction (and hence a genuine non-erasing ⇔ non-contracting equivalence) is currently conditional on an unproven property,
NC_locked_dirty_interval_macro_property, which appears only as a hypothesis and is never discharged.
So Langlib uses the non-contracting definition, fully proves that non-erasing grammars are a special case of it, and has built — but not yet completely verified — the harder converse construction.
Proof idea (the lock construction)
The remaining work is to simulate an arbitrary block-rewriting non-contracting rule
X₁…Xₘ → Y₁…Yₙ by single-symbol, context-preserving (non-erasing) steps. Fresh
“lock” nonterminals fire the rule one position at a time: a marker sweeps across
the m positions, replacing each Xᵢ while carrying the surrounding context along,
and only unlocks once the whole block has become Y₁…Yₙ. Each individual step is
genuinely context-sensitive. The missing piece is the bookkeeping lemma
(NC_locked_dirty_interval_macro_property) needed to show the locked grammar derives
exactly the same terminal words, not a superset.
Keywords / also known as
context-sensitive grammar definition, non-contracting vs non-erasing grammars, monotone grammars, Kuroda normal form, type-1 grammar normal form, length-increasing grammars, Lean formalization.
Formalized in Lean 4 with Mathlib, in Langlib.