In formal language theory, a contextfree language (CFL) is a language generated by some contextfree grammar (CFG). Different CF grammars can generate the same CF language. It is important to distinguish properties of the language (intrinsic properties) from properties of a particular grammar (extrinsic properties).
The set of all contextfree languages is identical to the set of languages accepted by pushdown automata, which makes these languages amenable to parsing. Indeed, given a CFG, there is a direct way to produce a pushdown automaton for the grammar (and corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.
Contextfree languages have many applications in programming languages; for example, the language of all properly matched parentheses is generated by the grammar S\to SS ~~ (S) ~~ \varepsilon. Also, most arithmetic expressions are generated by contextfree grammars.
Contents

Examples 1

Languages that are not contextfree 2

Closure properties 3

Nonclosure under intersection and complement and difference 3.1

Decidability properties 4

Parsing 5

See also 6

Notes 7

References 8
Examples
An archetypal contextfree language is L = \{a^nb^n:n\geq1\}, the language of all nonempty evenlength strings, the entire first halves of which are a's, and the entire second halves of which are b's. L is generated by the grammar S\to aSb ~~ ab. This language is not regular. It is accepted by the pushdown automaton M=(\{q_0,q_1,q_f\}, \{a,b\}, \{a,z\}, \delta, q_0, z, \{q_f\}) where \delta is defined as follows:^{[note 1]}
\delta(q_0, a, z) = (q_0, az)
\delta(q_0, a, a) = (q_0, aa)
\delta(q_0, b, a) = (q_1, \varepsilon)
\delta(q_1, b, a) = (q_1, \varepsilon)
Unambiguous CFLs are a proper subset of all CFLs: there are inherently ambiguous CFLs. An example of an inherently ambiguous CFL is the union of \{a^n b^m c^m d^n  n, m > 0\} with \{a^n b^n c^m d^m  n, m > 0\}. This set is contextfree, since the union of two contextfree languages is always contextfree. But there is no way to unambiguously parse strings in the (noncontextfree) subset \{a^n b^n c^n d^n  n > 0\} which is the intersection of these two languages.
Languages that are not contextfree
The set \{a^n b^n c^n d^n  n > 0\} is a contextsensitive language, but there does not exist a contextfree grammar generating this language. So there exist contextsensitive languages which are not contextfree. To prove that a given language is not contextfree, one may employ the pumping lemma for contextfree languages^{[3]} or a number of other methods, such as Ogden's lemma or Parikh's theorem.^{[4]}
Closure properties
Contextfree languages are closed under the following operations. That is, if L and P are contextfree languages, the following languages are contextfree as well:
Contextfree languages are not closed under complement, intersection, or difference. However, if L is a contextfree language and D is a regular language then both their intersection L\cap D and their difference L\setminus D are contextfree languages.
Nonclosure under intersection and complement and difference
The contextfree languages are not closed under intersection. This can be seen by taking the languages A = \{a^n b^n c^m \mid m, n \geq 0 \} and B = \{a^m b^n c^n \mid m,n \geq 0\}, which are both contextfree.^{[note 2]} Their intersection is A \cap B = \{ a^n b^n c^n \mid n \geq 0\}, which can be shown to be noncontextfree by the pumping lemma for contextfree languages.
Contextfree languages are also not closed under complementation, as for any languages A and B: A \cap B = \overline{\overline{A} \cup \overline{B}} .
Contextfree language are also not closed under difference: L^{C} = Σ^{*} \ L
Decidability properties
The following problems are undecidable for arbitrary contextfree grammars A and B:

Equivalence: Given two contextfree grammars A and B, is L(A)=L(B)?

Intersection Emptiness: Given two contextfree grammars A and B, is L(A) \cap L(B) = \emptyset ? However, the intersection of a contextfree language and a regular language is contextfree,^{[5]} and the variant of the problem where B is a regular grammar is decidable.

Containment: Given a contextfree grammar A, is L(A) \subseteq L(B) ? Again, the variant of the problem where B is a regular grammar is decidable.

Universality: Given a contextfree grammar A, is L(A)=\Sigma^* ?
The following problems are decidable for arbitrary contextfree languages:

Emptiness: Given a contextfree grammar A, is L(A) = \emptyset ?

Finiteness: Given a contextfree grammar A, is L(A) finite?

Membership: Given a contextfree grammar G, and a word w, does w \in L(G) ? Efficient polynomialtime algorithms for the membership problem are the CYK algorithm and Earley's Algorithm.
According to Hopcroft, Motwani, Ullman (2003),^{[6]} many of the fundamental closure and (un)decidability properties of contextfree languages were shown in the 1961 paper of BarHillel, Perles, and Shamir^{[3]}
Parsing
Determining an instance of the membership problem; i.e. given a string w, determine whether w \in L(G) where L is the language generated by a given grammar G; is also known as recognition. Contextfree recognition for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to boolean matrix multiplication, thus inheriting its complexity upper bound of O(n^{2.3728639}).^{[7]}^{[8]}^{[note 3]} Conversely, Lillian Lee has shown O(n^{3ε}) boolean matrix multiplication to be reducible to O(n^{33ε}) CFG parsing, thus establishing some kind of lower bound for the latter.^{[9]}
Practical uses of contextfree languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string. The process of producing this tree is called parsing. Known parsers have a time complexity that is cubic in the size of the string that is parsed.
Formally, the set of all contextfree languages is identical to the set of languages accepted by pushdown automata (PDA). Parser algorithms for contextfree languages include the CYK algorithm and the Earley's Algorithm.
A special subclass of contextfree languages are the deterministic contextfree languages which are defined as the set of languages accepted by a deterministic pushdown automaton and can be parsed by a LR(k) parser.^{[10]}
See also parsing expression grammar as an alternative approach to grammar and parser.
See also
Notes

^ meaning of \delta's arguments and results: \delta(\mathrm{state}_1, \mathrm{read}, \mathrm{pop}) = (\mathrm{state}_2, \mathrm{push})

^ A contextfree grammar for the language A is given by the following production rules, taking S as the start symbol: S → Sc  aTb  ε; T → aTb  ε. The grammar for B is analogous.

^ In Valiant's papers, O(n^{2.81}) given, the then best known upper bound. See Matrix multiplication#Algorithms for efficient matrix multiplication and Coppersmith–Winograd algorithm for bound improvements since then.
References

^ ^{a} ^{b} Yehoshua BarHillel, Micha Asher Perles, Eli Shamir (1961). "On Formal Properties of Simple PhraseStructure Grammars". Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 14 (2): 143–172.

^ How to prove that a language is not contextfree?

^ Salomaa (1973), p. 59, Theorem 6.7

^ John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman (2003). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. Here: Sect.7.6, p.304, and Sect.9.7, p.411

^ Leslie Valiant (Jan 1974). General contextfree recognition in less than cubic time (Technical report). Carnegie Mellon University. p. 11.

^ Leslie G. Valiant (1975). "General contextfree recognition in less than cubic time". Journal of Computer and System Sciences 10 (2): 308–315.

^ Lillian Lee (2002). "Fast ContextFree Grammar Parsing Requires Fast Boolean Matrix Multiplication". JACM 49 (1): 1–15.

^


Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation (1st ed.). AddisonWesley.

Arto Salomaa (1973). Formal Languages. ACM Monograph Series.

Chapter 2: ContextFree Languages, pp. 91–122.

JeanMichel Autebert, Jean Berstel, Luc Boasson, ContextFree Languages and PushDown Automata, in: G. Rozenberg, A. Salomaa (eds.), Handbook of Formal Languages, Vol. 1, SpringerVerlag, 1997, 111174.




Each category of languages, except those marked by a ^{*}, is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.


This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.