Context-sensitive languages and recursiveness (CS ⊆ Recursive, strictly)

Statement

Every context-sensitive language (CSL) is recursive (decidable): there is an algorithm that, given a word, always halts and correctly answers whether the word is in the language. As a class inclusion, CSRecursive.

Moreover the inclusion is strict: there is a recursive language that is not context-sensitive, so CSRecursive.

Nonemptiness of the alphabet is necessary for strictness — over an empty alphabet there are only two languages and both classes coincide.

Why CS ⊆ Recursive (the easy inclusion)

A context-sensitive grammar is non-contracting: every rule has a right-hand side at least as long as its left-hand side, so a derivation can never shrink. Therefore a word of length n can only be derived through sentential forms of length ≤ n, drawn from a finite alphabet — finitely many in all, bounding both how long and how many derivation sequences need to be examined.

So membership is decided by a terminating bounded derivation-sequence search, which is in fact primitive recursive in the word, giving a genuine computable membership predicate; a computable decider is recursive. This is exactly what makes the diagonal language below recursive.

In Lean

The inclusion CSRecursive:

The strictness CSRecursive:

Proof idea

The argument is the classical diagonal construction, made effective.

1. An effective enumeration of context-sensitive languages. A context-sensitive grammar is coded as data with its nonterminals fixed to (Code T = List (grule T ℕ) × ℕ), which is Primcodable. A word u is decoded to a grammar by its length (List.length is onto for a nonempty alphabet, so every coded grammar is hit). The language enumLang u is the set of words accepted by a bounded search over derivation sequences: replay candidate sequences of rule applications from the start symbol and check whether one yields exactly the target word, searching all sequences up to a length/position bound that is large enough for non-contracting grammars.

2. The oracle is computable. memOracle u v — decode u to a grammar, then run the bounded search of step 1 — is computable as a function of both arguments (memOracle_computable): the search is primitive recursive in the rule list treated as runtime data (primrec_applyRuleSeq_uniform, the project’s membership search), and decoding the index word is primitive recursive. The diagonal applies the oracle to the pair (v, v), so computability in the code and the word together is exactly what the construction consumes.

3. The bounded search is correct. It is sound (memCode_sound: a found sequence is a real derivation) and complete within the bound (memCode_complete). Completeness is the heart of the argument: take a shortest witnessing derivation sequence; if any intermediate sentential form repeated, splicing out the loop would give a shorter witness, so the forms are duplicate-free. For a non-contracting grammar every such form has length at most |v| over the finitely many symbols occurring in the grammar, so a counting bound (nodup_forms_card_bound) caps the number of distinct forms — hence the sequence length — within the search bound. The optional erasing rule S → ε is handled separately: the start symbol occurs only in the initial form, so that rule can fire at most once and never enlarges the workspace.

4. Coverage. Every context-sensitive language is enumLang u for some u (code_of_CS): relabel the finitely many nonterminals of a context-sensitive grammar to (preserving the language and context-sensitivity), then surjectivity of the word-to-code map finds an index.

5. Diagonalize. Let e : List T → Language T be the enumeration of step 4 (every context-sensitive language equals e u for some u) and let mem : List T → List T → Bool be the total computable oracle of step 2, satisfying mem u v = true ↔ v ∈ e u. Indexing languages by words allows the index u and the candidate word v to range over the same set, so the diagonal is well-typed. Define

\[D = \{\, v \mid v \notin e\,v \,\} \qquad (\text{in Lean: } f\,v := \texttt{cond (mem v v) false true},\ D = \{v \mid f\,v = \texttt{true}\}).\]

D is recursive. The map v ↦ mem v v is computable as the composition of mem with v ↦ (v, v), and Boolean negation is computable, so f is computable. Hence the characteristic predicate of D is computable, which is the definition of is_Recursive D.

D is not context-sensitive. Assume for contradiction that D is context-sensitive. By coverage (step 4) there exists u with e u = D. Then

\[u \in D \iff u \notin e\,u \iff u \notin D,\]

where the first equivalence is the definition of D instantiated at v = u and the second substitutes e u = D. The resulting equivalence u ∈ D ↔ u ∉ D is contradictory. Therefore no u satisfies e u = D, so D is not in the range of e, i.e. D is not context-sensitive.

Thus D is recursive but not context-sensitive, so the inclusion CSRecursive is proper: CSRecursive.

Why the argument does not extend to Recursive. The construction requires mem to be total and computable. This holds for the context-sensitive languages because each has a decidable membership problem and the deciders are uniformly computable from the (finite, enumerable) grammar codes. The analogous step for Recursive would require a computable enumeration of all recursive languages together with a uniform total membership oracle. No such enumeration exists: the recursive languages are not recursively enumerable, since the set of Turing machines that halt on every input is not recursively enumerable. The diagonal set defined from a partial oracle is still well-defined, but its characteristic function is not computable, so it is not recursive. This asymmetry — a uniform total decider exists for CS but not for Recursive — is why the diagonal construction separates CS from Recursive but cannot separate Recursive from RE.

Keywords / also known as

context-sensitive languages are recursive, CS ⊆ Recursive, every context-sensitive language is decidable, CSL decidability, type-1 languages are decidable, LBA languages are decidable; context-sensitive strictly contained in recursive, CSL proper subset of decidable languages, CS ⊊ Recursive, diagonalization over context-sensitive grammars, recursive but not context-sensitive language, NSPACE(n) ⊊ recursive, effective enumeration of context-sensitive grammars.

Formalized in Lean 4 with Mathlib, in Langlib.