1. Statement

For every context-free language L

There is an integer n, such that

For every string $z$ in $L$ of length $\geq n$

There exists $z = u\blue{v}\orange{w}\blue{x}y$ such that:

  1. $|\blue{v}\orange{w}\blue{x}| \leq n$.
  2. $|\blue{v}\blue{x}| > 0$.
  3. For all i > 0, $u\blue{v^i}\orange{w}\blue{x^i} y$ is in L.

2. Proof

<aside> 📖 Theorem (on book)

Suppose we have a parse tree according to a CNF grammar $G = (V,T,P,S )$, and suppose that the yield of the tree is a terminal string $w$.

If the length of the longest path is $n$, then $|w| \leq 2^{n-1}$.

Start with a CNF grammar for $L – \{ε\}$.

Let the grammar have $m$ variables. Pick $n = 2^m$.

Let $z$, of length $\geq n$, be in $L$.

We claim Lemma 1:

a parse tree with yield $z$ must have a path of length $m+2$ or more.

Untitled

Now we know that the parse tree for $z$ has a path with at least $m+1$ variables.

Consider some longest path. There are only $m$ different variables, so among the lowest $m+1$ we can find two nodes with the same label, say $A$.

The parse tree thus looks like:

Untitled

Untitled

Example