Non-Contracting and Context-Sensitive #
This file contains class-level comparison results between non-contracting grammars and context-sensitive grammars.
main results
is_noncontracting_implies_is_CS— every non-contracting language is context-sensitiveCS_is_noncontracting— every CS grammar induces a non-contracting unrestricted grammarnoncontracting_transforms_length_le— non-contracting steps don't decrease lengthcsg_of_noncontracting— translate a non-contracting grammar into a context-sensitive grammargrammar_language_subset_csg_of_noncontracting— forward correctness of the translation
The unrestricted grammar obtained from a context-sensitive grammar (via grammar_of_csg)
is non-contracting.
Every non-contracting language is context-sensitive in the broader definition.
A single transformation step in a non-contracting grammar does not decrease the length of the sentential form.
Non-contracting → CS #
The conversion from a non-contracting grammar to a context-sensitive grammar is a classical
construction. Given a non-contracting rule L ++ [A] ++ R → output with
|output| ≥ |L| + 1 + |R|, we simulate it using context-sensitive rules with fresh
auxiliary nonterminals.
Construction #
The nonterminal type of the constructed CS grammar is g.nt ⊕ (T ⊕ (ℕ × ℕ)):
Sum.inl n: original nonterminalnSum.inr (Sum.inl t): a "lifted" nonterminal representing terminaltSum.inr (Sum.inr (i, j)): auxiliary nonterminal for rule indexi, stepj
The sentential forms in the CS grammar always consist entirely of nonterminals
(either Sum.inl n or Sum.inr (Sum.inl t)), except at the very end when
un-lifting rules convert Sum.inr (Sum.inl t) to terminal t.
The rules are:
Convert a symbol to a nonterminal in the new grammar (lifting terminals).
Equations
- nc_symToNT (symbol.terminal t) = Sum.inr (Sum.inl t)
- nc_symToNT (symbol.nonterminal n) = Sum.inl n
Instances For
Lift a symbol: all symbols become nonterminal symbols in the new grammar.
Equations
Instances For
Collect all terminals from a list of symbols.
Equations
- nc_collectTerminals [] = []
- nc_collectTerminals (symbol.terminal t :: rest) = t :: nc_collectTerminals rest
- nc_collectTerminals (symbol.nonterminal a :: rest) = nc_collectTerminals rest
Instances For
All terminals appearing in the grammar's rules (inputs and outputs).
Equations
- nc_grammarTerminals g = List.flatMap (fun (r : grule T g.nt) => nc_collectTerminals (r.input_L ++ r.input_R ++ r.output_string)) g.rules
Instances For
All CS rules: un-lifting rules plus simulation rules for each grammar rule.
Equations
- nc_allRules g = nc_unliftRules g ++ List.flatMap (fun (x : ℕ × grule T g.nt) => match x with | (i, r) => nc_simRules i r) ((List.range g.rules.length).zip g.rules)
Instances For
Every rule in nc_unliftRules has nonempty output.
Every rule in nc_simRules has nonempty output, provided the non-contracting property.
Construct a context-sensitive grammar equivalent to a given non-contracting grammar.
Equations
- csg_of_noncontracting g hg = { nt := NC_NT g, initial := Sum.inl g.initial, rules := nc_allRules g, output_nonempty := ⋯ }
Instances For
Helper lemmas for language equivalence #
A single grammar step can be simulated by CS derivation on lifted forms.
Forward correctness of the non-contracting-to-CS construction.
Every terminal word generated by the original non-contracting unrestricted grammar is generated by the constructed context-sensitive grammar. The reverse containment is the remaining hard direction because derivations in the constructed grammar may pass through auxiliary nonterminals before returning to clean sentential forms.
Locked non-contracting → CS construction #
The first construction above is convenient for forward simulation, but its phase-2 cleanup rules use clean output symbols as right context. That is too permissive for the reverse direction: a cleanup rule can match symbols from the surrounding sentential context and skip the locked output phase. The locked construction below uses fresh output markers for every produced output position, and only erases those markers to clean lifted symbols after the simulated rule has completed.
Nonterminals for the locked CS construction.
- orig {T : Type} {g : grammar T} : g.nt → NC_LockedNT g
- term {T : Type} {g : grammar T} : T → NC_LockedNT g
- mark {T : Type} {g : grammar T} : ℕ → ℕ → NC_LockedNT g
- out {T : Type} {g : grammar T} : ℕ → ℕ → symbol T g.nt → NC_LockedNT g
Instances For
Convert an original symbol to a locked-construction nonterminal.
Equations
Instances For
Lift an original symbol into the locked construction.
Equations
Instances For
Input-position marker for a simulated rule.
Equations
- nc_locked_mark ri k = symbol.nonterminal (NC_LockedNT.mark ri k)
Instances For
Output-position marker for a simulated rule.
Equations
- nc_locked_out ri k s = symbol.nonterminal (NC_LockedNT.out ri k s)
Instances For
Marked output suffix beginning at position k.
Equations
- nc_locked_outputSuffix ri out dflt k = List.map (fun (off : ℕ) => nc_locked_out ri (k + off) (out.getD (k + off) dflt)) (List.range (out.length - k))
Instances For
Marked output suffix beginning at position k, forced syntactically nonempty.
For non-contracting rules the head index is in bounds; the default only keeps the
definition total.
Equations
- nc_locked_outputFrom ri out dflt k = nc_locked_out ri k (out.getD k dflt) :: nc_locked_outputSuffix ri out dflt (k + 1)
Instances For
Un-lifting rules for the locked construction.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Simulation rules for one original non-contracting rule in the locked construction.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All locked-construction CS rules.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Locked CS grammar for a non-contracting grammar.
Equations
- locked_csg_of_noncontracting g _hg = { nt := NC_LockedNT g, initial := NC_LockedNT.orig g.initial, rules := nc_locked_allRules g, output_nonempty := ⋯ }
Instances For
A locked simulation rule is a rule of the locked constructed CS grammar.
An un-lifting rule is a rule of the locked constructed CS grammar.
Un-lift one locked terminal marker to the actual terminal.
Un-lift all locked terminal markers in a terminal word.
Clean up one locked output marker into the corresponding lifted output symbol.
Forward correctness of the locked non-contracting-to-CS construction.
Projection from locked-construction symbols back to original grammar symbols.
Equations
- nc_locked_proj_sym g (symbol.terminal t) = symbol.terminal t
- nc_locked_proj_sym g (symbol.nonterminal (NC_LockedNT.orig n)) = symbol.nonterminal n
- nc_locked_proj_sym g (symbol.nonterminal (NC_LockedNT.term t)) = symbol.terminal t
- nc_locked_proj_sym g (symbol.nonterminal (NC_LockedNT.mark a a_1)) = symbol.nonterminal g.initial
- nc_locked_proj_sym g (symbol.nonterminal (NC_LockedNT.out a a_1 a_2)) = symbol.nonterminal g.initial
Instances For
Project a locked-construction sentential form back to original grammar symbols.
Equations
- nc_locked_proj g s = List.map (nc_locked_proj_sym g) s
Instances For
A locked sentential form is clean if it contains no input or output markers.
Equations
- nc_locked_is_clean g s = ∀ x ∈ s, (∀ (ri k : ℕ), x ≠ symbol.nonterminal (NC_LockedNT.mark ri k)) ∧ ∀ (ri k : ℕ) (a : symbol T g.nt), x ≠ symbol.nonterminal (NC_LockedNT.out ri k a)
Instances For
If a transform newly creates marker k, then the rule that creates it is a
phase-1 marking rule and all lower markers for that same simulation occur in the
source. This records the context dependency of the locked input-marking phase.
If a step whose source contains a single phase-0 marker creates the phase-1
marker for that same simulation, then the step is exactly the first continuation
marking rule for the corresponding original production.
Exact source and target shape for the first continuation marking step of a
locked simulation. Once the boundary phase-1 marker is newly created, the
whole step is forced to replace the second lifted input symbol by that marker,
with the original surrounding context unchanged.
Counted locked derivations #
Step-counted derivations for the locked construction. This is equivalent to
CS_derives, but exposes split points needed by the dirty-phase grouping proof.
- zero {T : Type} {g : grammar T} {hg : grammar_noncontracting g} (s : List (symbol T (NC_LockedNT g))) : NC_locked_derives_n g hg 0 s s
- step {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {n : ℕ} {s₁ s₂ : List (symbol T (locked_csg_of_noncontracting g hg).nt)} {s₃ : List (symbol T (NC_LockedNT g))} : CS_transforms (locked_csg_of_noncontracting g hg) s₁ s₂ → NC_locked_derives_n g hg n s₂ s₃ → NC_locked_derives_n g hg (n + 1) s₁ s₃
Instances For
Counted locked derivations whose every boundary state is dirty.
- zero {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s : List (symbol T (NC_LockedNT g))} : ¬nc_locked_is_clean g s → NC_locked_dirty_derives_n g hg 0 s s
- step {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {n : ℕ} {s₁ s₂ : List (symbol T (locked_csg_of_noncontracting g hg).nt)} {s₃ : List (symbol T (NC_LockedNT g))} : CS_transforms (locked_csg_of_noncontracting g hg) s₁ s₂ → ¬nc_locked_is_clean g s₁ → ¬nc_locked_is_clean g s₂ → NC_locked_dirty_derives_n g hg n s₂ s₃ → NC_locked_dirty_derives_n g hg (n + 1) s₁ s₃
Instances For
In a dirty interval that starts from a clean form, the unique output marker cleaned by the final dirty-to-clean step must have been introduced during the dirty part of the interval.
A step-split form of
nc_locked_dirty_interval_final_cleanup_added_out_event: the output marker cleaned
by the final return-to-clean step has a concrete introduction step inside the
dirty interval.
A dirty interval cannot return to a clean form immediately after its clean-to-dirty start. The start state contains a single input marker and no output marker, while a dirty-to-clean step must clean a unique output marker.
Nondegenerate dirty intervals expose their first dirty-to-dirty step.
Packaged consumed-boundary branch for the first dirty-to-dirty step: if the marker opened by the clean-to-dirty boundary disappears, then the simulated input pattern had length one, the next state has no input markers, and the same step produced an output marker for that simulation.
In the consumed-boundary branch, the first dirty-to-dirty step has exactly the clean macro target for the opened original rule.
Locked clean reverse traces #
Clean macro steps in the locked construction. The sim constructor is the
macro endpoint for one complete locked simulation of an original
non-contracting rule under arbitrary surrounding context.
- clean {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s₁ s₂ : List (symbol T (locked_csg_of_noncontracting g hg).nt)} : CS_transforms (locked_csg_of_noncontracting g hg) s₁ s₂ → nc_locked_is_clean g s₁ → nc_locked_is_clean g s₂ → NC_locked_clean_macro_step g hg s₁ s₂
- sim {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {ri : ℕ} {r : grule T g.nt} {u v : List (symbol T (NC_LockedNT g))} : (ri, r) ∈ (List.range g.rules.length).zip g.rules → NC_locked_clean_macro_step g hg (u ++ List.map nc_locked_liftSym (r.input_L ++ [symbol.nonterminal r.input_N] ++ r.input_R) ++ v) (u ++ List.map nc_locked_liftSym r.output_string ++ v)
Instances For
Every locked clean macro step projects to an unrestricted derivation in the original grammar.
Step-counted traces of locked clean macro steps.
- zero {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s : List (symbol T (NC_LockedNT g))} : nc_locked_is_clean g s → NC_locked_clean_macro_derives_n g hg 0 s s
- step {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {n : ℕ} {s₁ s₂ s₃ : List (symbol T (NC_LockedNT g))} : NC_locked_clean_macro_step g hg s₁ s₂ → NC_locked_clean_macro_derives_n g hg n s₂ s₃ → NC_locked_clean_macro_derives_n g hg (n + 1) s₁ s₃
Instances For
Locked clean macro traces project to unrestricted derivations in the original grammar.
A finite locked CS derivation trace whose adjacent endpoints are all clean.
- zero {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s : List (symbol T (NC_LockedNT g))} : nc_locked_is_clean g s → NC_locked_clean_derives_n g hg 0 s s
- step {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {n : ℕ} {s₁ s₂ : List (symbol T (locked_csg_of_noncontracting g hg).nt)} {s₃ : List (symbol T (NC_LockedNT g))} : CS_transforms (locked_csg_of_noncontracting g hg) s₁ s₂ → nc_locked_is_clean g s₁ → nc_locked_is_clean g s₂ → NC_locked_clean_derives_n g hg n s₂ s₃ → NC_locked_clean_derives_n g hg (n + 1) s₁ s₃
Instances For
Ordinary locked clean traces are a special case of locked clean macro traces.
The remaining local grouping property for the locked construction: any dirty interval bracketed by one clean-to-dirty step and one dirty-to-clean step compresses to some clean macro trace.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Abstract compression principle for locked traces. Once every dirty interval between clean endpoints can be replaced by a clean macro trace, every clean-to-clean locked derivation can be compressed to a clean macro trace.
Reverse language inclusion for the locked construction, assuming the local dirty-interval grouping property.
Language equality for the locked construction, reduced to the local dirty-interval grouping property.
The consumed-boundary first-step case has the expected macro endpoint: the dirty state produced by consuming the boundary marker is the locked output block, that block cleans to the macro target, and the original clean boundary step already determines the corresponding clean macro simulation.
Counted clean-macro pref for the consumed-boundary branch. This is the canonical first macro segment that will later be concatenated with whatever clean behavior remains after all dirty markers have disappeared.
First-step dichotomy for dirty intervals, in a form suitable for the final grouping induction. After the first dirty-to-dirty step, either the marker opened by the clean-to-dirty boundary is still present, or that first step has already completed the boundary simulation and yields a canonical one-step clean macro pref.
First-step dichotomy with the retained branch expanded to its eventual output event. If the marker opened by the clean-to-dirty boundary survives the first dirty step, then the remaining dirty interval must later emit an output marker for the same original rule before it can return to a clean form.
Step-split version of
nc_locked_dirty_interval_first_step_event_or_macro_prefix. In the retained
branch this records the exact later transform that removes the start marker and
emits an output marker for the same original rule.
If a step removes the phase-0 marker for a fixed original rule, then it is
one of exactly two emit steps. For a unit input pattern it is the emitLast
rule and exposes the whole marked output block; otherwise it is the emitRest
rule at index 0 and requires the already-emitted output suffix as right
context.
First-step dichotomy with the retained branch expanded to the local shape of the later emit step. This is the form needed by the eventual grouping argument: the branch where the boundary marker survives the first dirty step now carries the exact source and target decomposition of the step that finally removes that marker.
First-step dichotomy plus the marker-introduction invariant for that same
first dirty step. In the retained branch this keeps the later emit-step shape,
while also recording that the first dirty step can only introduce a fresh
phase-0 marker or the phase-1 marker for the boundary simulation.
Dirty-interval first-step data specialized to the case where the step creates
the boundary simulation's phase-1 marker. This identifies that branch as the
first continuation marking rule of the same original production.
Full first-step trichotomy for a dirty interval opened from a clean form.
Either the first dirty-to-dirty step already gives the complete one-step clean
macro pref, or it advances the boundary simulation from phase 0 to phase 1
with exact target shape, or it leaves the boundary phase-1 marker absent and
any newly introduced marker is a nested phase-0 marker.
Retained-branch version of the first-step trichotomy with fresh markers
normalized. If the boundary simulation has not advanced to phase 1, then any
marker created by the first dirty step is a phase-0 marker for a genuine
simulation rule whose input pattern is nonempty.
Terminal words reached by locked clean macro traces are generated by the original non-contracting grammar.
Backward direction helpers #
Projection from extended grammar symbols to original grammar symbols. For clean forms (no aux nonterminals), this is the natural "un-lifting" map.
Equations
- nc_proj_sym g (symbol.terminal t) = symbol.terminal t
- nc_proj_sym g (symbol.nonterminal (Sum.inl n)) = symbol.nonterminal n
- nc_proj_sym g (symbol.nonterminal (Sum.inr (Sum.inl t))) = symbol.terminal t
- nc_proj_sym g (symbol.nonterminal (Sum.inr (Sum.inr val))) = symbol.nonterminal g.initial
Instances For
proj ∘ liftSym = id: projecting a lifted symbol recovers the original.
Cleanliness is preserved by taking a sublist in the membership sense.
Lifted original symbols are clean.
Lifted original sentential forms are clean.
Projection of the initial symbol.
Projection of a terminal string.
If a CS step creates a symbol that was not present before the step, that symbol must come from the selected rule's output string.
The only rules of the non-contracting simulation whose output string contains an auxiliary nonterminal are phase-1 marking rules.
Stronger form of nc_rule_output_aux_is_phase1: if a constructed rule outputs
an auxiliary, the whole selected rule is exactly the corresponding phase-1 rule.
A clean-to-dirty step is specifically phase 0 of a simulated rule. Positive
phase-1 steps have an auxiliary in their left context, so they cannot apply to a
clean source.
If a selected constructed rule rewrites an auxiliary nonterminal, then it is one of the right-to-left phase-2 cleanup rules.
A dirty-to-clean step must be the final phase-2 cleanup rule at phase 0.
Positive phase-2 cleanup rules preserve earlier auxiliaries in their left context,
so their target cannot be clean.
Inductive step-count version of CS_derives.
- zero {T : Type} {g : CS_grammar T} (s : List (symbol T g.nt)) : CS_derives_n g 0 s s
- step {T : Type} {g : CS_grammar T} {n : ℕ} {s₁ s₂ s₃ : List (symbol T g.nt)} : CS_transforms g s₁ s₂ → CS_derives_n g n s₂ s₃ → CS_derives_n g (n + 1) s₁ s₃
Instances For
Clean reverse traces #
The final reverse direction has two conceptual parts:
- clean-to-clean steps project to derivations in the original grammar;
- dirty auxiliary phases can be grouped into clean-to-clean macro steps.
The first part is captured below. The remaining work is the phase-grouping
argument for derivations that temporarily contain nc_aux symbols.
Clean macro steps in the constructed grammar. The second constructor is the intended endpoint of dirty-phase grouping: one complete auxiliary simulation of an original non-contracting rule, under arbitrary surrounding context.
- clean {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s₁ s₂ : List (symbol T (csg_of_noncontracting g hg).nt)} : CS_transforms (csg_of_noncontracting g hg) s₁ s₂ → nc_is_clean g s₁ → nc_is_clean g s₂ → NC_clean_macro_step g hg s₁ s₂
- sim {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {ri : ℕ} {r : grule T g.nt} {u v : List (symbol T (NC_NT g))} : (ri, r) ∈ (List.range g.rules.length).zip g.rules → NC_clean_macro_step g hg (u ++ List.map nc_liftSym (r.input_L ++ [symbol.nonterminal r.input_N] ++ r.input_R) ++ v) (u ++ List.map nc_liftSym r.output_string ++ v)
Instances For
Every clean macro step projects to an unrestricted derivation in the original grammar.
Step-counted traces of clean macro steps.
- zero {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s : List (symbol T (NC_NT g))} : nc_is_clean g s → NC_clean_macro_derives_n g hg 0 s s
- step {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {n : ℕ} {s₁ s₂ s₃ : List (symbol T (NC_NT g))} : NC_clean_macro_step g hg s₁ s₂ → NC_clean_macro_derives_n g hg n s₂ s₃ → NC_clean_macro_derives_n g hg (n + 1) s₁ s₃
Instances For
Clean macro traces project to unrestricted derivations in the original grammar.
A finite CS derivation trace all of whose adjacent endpoints are clean.
- zero {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {s : List (symbol T (NC_NT g))} : nc_is_clean g s → NC_clean_derives_n g hg 0 s s
- step {T : Type} {g : grammar T} {hg : grammar_noncontracting g} {n : ℕ} {s₁ s₂ : List (symbol T (csg_of_noncontracting g hg).nt)} {s₃ : List (symbol T (NC_NT g))} : CS_transforms (csg_of_noncontracting g hg) s₁ s₂ → nc_is_clean g s₁ → nc_is_clean g s₂ → NC_clean_derives_n g hg n s₂ s₃ → NC_clean_derives_n g hg (n + 1) s₁ s₃
Instances For
Clean CS traces project to unrestricted derivations in the original grammar.