_______________________ CONTEXT FREE GRAMMARS Leon Rische _______________________ [2020-05-11 Mon 16:15] Table of Contents _________________ Context free grammars are a concept originating in linguistics but widely used in (theoretical) computer science. For this section, I'm going to use a bunch of mathematical notation, but I'll try to explain everything as it comes along. A *set* is collection of elements where each element appears only once. The set of the number from one to four can be written as $\{1, 2, 3, 4\}$. An *alphabet* is a set with a finite number of elements (the natural numbers 1, 2, 3, ..., would be an example for an infinite set). A context free grammar has four components: 1. An alphabet $T$ of *terminals* (read: “things where the expansion terminates”) 2. An alphabet $N$ of *non-terminals* (read: “things that can expand into other things”) 3. A set $P$ of *production rules* 4. The *start symbol* $S \in N$, an element of the set of *non-terminals* It's important that the terminal and non-terminal alphabets are *disjoint*, i.e. there is no element that is part of both sets. Production rules have the form $N \to (N \cup T)^\star$. $(N \cup T)$ can be read as “Combine the sets $T$ and $N$”, $^\star$ is called *Kleene star* and can be read as “repeated zero or more times”. The part on the right of the $\to$ is called *right hand side*, or *RHS*, the part on the left *left hand side* or *LHS*. A grammar is expanded starting with the start symbol and then replacing each non-terminal with the RHS of one of its rules. Time for a simple example: - $T = \{a\}$ - $N = \{A\}$ - $P = \{A \to BBa, B \to bb\}$ - $S = A$ Starting from $S$, the grammar expands into $BBa$, then each $B$ expands into $bb$, so the end result is $bbbba$. There can be multiple rules with the same *right hand side*, so there are (possibly infinitely) many sequences a grammar can expand to. - $T = \{a\}$ - $N = \{A\}$ - $P = \{A \to Aa, A \to \varepsilon\}$ - $S = A$ $\varepsilon$ is the notation for an empty sequence, so when expanding $A \to \varepsilon$, the $A$ just disappears. Expanding this grammar, we have two options, either expand $A$ to $Aa$ or to $\varepsilon$. Expanding $Aa$ further, we have two options again, either expand it to $Aaa$ or to $\varepsilon a = a$. This can be repeated infinitely many times, so the grammar produces sequences of $a$ of any length. (Remember the Kleene star? We can use it to write that as $a^\star$ instead).