Context-Free Language Closure Properties #
This file proves that context-free languages are closed under substitution, and derives closure under concatenation, union, and Kleene star as corollaries. The proof follows the approach by Ullman and Hopcroft in "Introduction to Automata Theory, Languages and Computation".
The substitution grammar ContextFreeGrammar.subst g f is constructed by replacing each terminal
a in g with the start symbol of f a, and adding all rules from every f a (for terminals
actually used in g). The key technical lemma is that derivation steps from the two components
commute: since F-rules (from the substituting grammars) never introduce nonterminals targeted by
G-rules (from the original grammar), any mixed derivation can be reordered into a G-phase followed
by an F-phase (ContextFreeGrammar.derives_commute_of_not_mem_output).
Main definitions #
ContextFreeGrammar.subst: The substitution of a context-free grammargby a family of context-free grammarsf.
Main theorems #
Language.IsContextFree.subst: Context-free languages are closed under substitution.Language.IsContextFree.mul: Context-free languages are closed under concatenation.Language.IsContextFree.add: Context-free languages are closed under union.Language.IsContextFree.kstar: Context-free languages are closed under Kleene star.
The set of terminals used in a context-free grammar g is the set of all terminals appearing in the
right-hand side of any rule in g.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The rules from the substituting grammars f a are lifted to the combined non-terminal type
g.NT ⊕ (Σ a, (f a).NT). We only include rules for terminals a that are actually used in g.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The rules of the original grammar g are transformed. Non-terminals n become Sum.inl n, and
terminals a are replaced by the start symbol of the substituting grammar f a, which is
Sum.inr ⟨a, (f a).initial⟩.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The substitution grammar is constructed by taking the disjoint union of non-terminals and the union
of the transformed rules from g and the lifted rules from f.
Equations
Instances For
liftSymbolG maps symbols from g to the substitution grammar. Non-terminals are mapped to
the left component of the sum, and terminals are mapped to the start symbol of the corresponding
substituting grammar.
Equations
- g.liftSymbolG f (Symbol.nonterminal n) = Symbol.nonterminal (Sum.inl n)
- g.liftSymbolG f (Symbol.terminal a) = Symbol.nonterminal (Sum.inr ⟨a, (f a).initial⟩)
Instances For
liftSymbolF maps symbols from f a to the substitution grammar. Non-terminals are mapped to
the right component of the sum, and terminals are kept as terminals.
Equations
- g.liftSymbolF f a (Symbol.nonterminal n) = Symbol.nonterminal (Sum.inr ⟨a, n⟩)
- g.liftSymbolF f a (Symbol.terminal b) = Symbol.terminal b
Instances For
If r is in g.rules, then the lifted rule (non-terminals to Sum.inl, terminals to the start
symbol of f a) is in the rules of g.subst f.
If g produces v from u in one step, then g.subst f produces the lifted version of v
from the lifted version of u.
If g derives v from u, then g.subst f derives the lifted version of v from the
lifted version of u.
If a is a used terminal in g and r is a rule in f a, then the lifted rule (where
non-terminals are tagged with a) is in the substitution grammar.
If a substituting grammar f a produces v from u, then the substitution grammar g.subst f
produces the lifted version of v from the lifted version of u.
If a substituting grammar f a derives v from u, then the substitution grammar g.subst f
derives the lifted version of v from the lifted version of u.
If each W[i] is in (f u[i]).language, then g.subst f derives the concatenation of W
from the lifted terminals of u.
If a terminal appears in the output of a rule in the grammar, it is in the set of used terminals.
If g produces v from u, then any terminal in v is either in u or is a used terminal
of g.
If g derives v from u, then any terminal in v is either in u or is a used terminal
of g.
Any terminal in a string in the language of g must be a used terminal of g.
The substitution of the languages is a subset of the language of the substitution grammar.
If a non-terminal symbol appears in a string lifted from f a, it must be of the form Sum.inr ⟨a, n⟩.
A rule is a G-rule if its input non-terminal comes from G (left side of the sum).
Instances For
A rule is an F-rule if its input non-terminal comes from one of the F grammars (Sum.inr).
Instances For
If a rule rewrites a string lifted from f a, its input must be of the form Sum.inr ⟨a, n⟩.
The function liftSymbolF is injective.
If g.subst f produces v' from a lifted string of f a symbols, then v' is a lifting of
some v produced by f a.
ProducesG is the relation of single-step productions using only G-rules.
Equations
- g.ProducesG f u v = ∃ r ∈ g.subst_rules_g f, r.Rewrites u v
Instances For
ProducesF is the relation of single-step productions using only F-rules.
Equations
- g.ProducesF f u v = ∃ r ∈ g.subst_rules_f f, r.Rewrites u v
Instances For
DerivesG is the relation of derivations using only G-rules.
Equations
- g.DerivesG f u v = Relation.ReflTransGen (g.ProducesG f) u v
Instances For
DerivesF is the relation of derivations using only F-rules.
Equations
- g.DerivesF f u v = Relation.ReflTransGen (g.ProducesF f) u v
Instances For
If we have a sequence of F-productions followed by a sequence of G-productions, we can move all
F-productions to the end. This is a corollary of derives_commute_of_not_mem_output:
F-rules output only Sum.inr nonterminals, so they never introduce the Sum.inl inputs that
G-rules target.
A production in the substitution grammar is either a G-production or an F-production.
Any derivation in the substitution grammar can be rearranged into a sequence of G-rules followed by a sequence of F-rules.
If f a produces v from u, then g.subst f produces the lifted v from the lifted u
using an F-rule.
If g.subst f produces v' from a lifted u via a G-rule, then v' is a lifting of some v
produced by g from u.
If g.subst f derives v' from a lifted u using only G-rules, then v' is a lifting of some
v derived by g from u.
If the lifted string has no Sum.inl non-terminals, then the original string consists only of
terminals.
If g.subst f produces v' from a lifted u (from component a) via an F-rule, then v' is
a lifting of some v produced by f a from u.
If g.subst f derives v' from a lifted u (from component a) using only F-rules, then v'
is a lifting of some v derived by f a from u.
If an F-production results in a string with no Sum.inl non-terminals, then the input string
also had no Sum.inl non-terminals.
The substitution of the languages is a subset of the language of the substitution grammar.
If an F-derivation results in a string with no Sum.inl non-terminals, then the input string also
had no Sum.inl non-terminals.
If ProducesF transforms u ++ v to w, then the transformation occurs entirely within u
or entirely within v.
If DerivesF transforms u ++ v to w, then w splits into u' and v' such that u
derives u' and v derives v' using only F-rules.
If u derives w using F-rules, then w can be split into parts corresponding to each symbol
in u.
If the lifted start symbol of f a F-derives a lifted terminal string w,
then w ∈ (f a).language.
If lifted start symbols of u F-derive a terminal string w, then w is in the product of the
languages (f a).language for each a ∈ u.
The language of the substitution grammar is a subset of the substitution of the languages.
The language of the substitution grammar is exactly the substitution of the languages of the component grammars. This proves that context-free languages are closed under substitution.
Context-free languages are closed under substitution.
Main corollaries: Closure under concatenation, union, and Kleene star #
Singleton language is context-free
The universal language over Unit is context-free
Context free languages are closed under concatenation / multiplication
Context free languages are closed under addition / union
Context free languages are closed under the Kleene Star operation