2009 May 11 / |j|d|a|v|i|s|@|c|a|r|l|e|t|o|n|.|e|d|u

Assignment 6

Carleton College CS 254, Prof. Joshua R. Davis

There are three questions on this assignment. They are due Friday May 15 at 5:00 PM. Hand in problems A and B on paper; hand in problem C electronically, either by hsp or by putting your file on COURSES manually.

A. Prove that if A is a CFL and R is a regular language, then their intersection is a CFL. (This is page 308, problem 2.)

B. Prove that the set {a^p : p is prime} is not a CFL. (This is page 335, problem 82.)

C. I want to implement a certain programming language. It is essentially a simple calculator, like this:

Press Control-D to exit.
>> x = 2.718
   2.718
>> 13 + x
   15.718
>> x = 3 * 2 ^ 4
   48.0
>> x
   48.0
>> jim = x / 2
   24.0
>> 17 - jim * 2.43
   -41.32

The interpreter for my programming language performs the following steps in a never-ending loop.

Read a string of input from the user.
Scan the input string into a list of tokens, according to the word-formation rules of the language.
Parse the list of tokens into a tree, according to the grammar of the language.
Evaluate the parse tree, using the environment to look up the values of any variables.
Print the result for the user, or an error message.

(Interpreted languages, such as Python and Lisp, work just like this. Compiled languages, such as C and Java, work similarly.) I have written the evaluator and the user interface; I want you to write the scanner and parser. Download stack.py, binarytree.py, and interpreter.py to a single folder on your computer. You will write your code in interpreter.py; do not edit or hand in the other files.

Here are the scanning rules of our language. Word breaks occur at spaces, at parentheses, and at the operators, which are +, *, -, /, ^, and =. For example, the string

'jan = 3.14xy + (3 - rewgy3-1)'

scans to a token list of

['jan', '=', '3.14xy', '+', '(', '3', '-', 'rewgy3', '-', '1', ')']

Write a Python function scan() that takes in an input string and returns the list of tokens (which are themselves strings). Use a regular expression.

Here are the parsing rules of our language. All of the operators are binary, meaning that they take two arguments. I leave it to you, to figure out what their precedences should be. Write a Python function parse() that takes in a token list, parses using our parsing algorithm from class (with precedence ties broken to the left), and returns a parse tree. The nodes of the parse tree should store tokens — that is, strings — not the values associated to those strings. For example, 3 + 5 parses to the tree

[+, [3, [], []], [5, [], []]]

where +, 3, and 5 are strings, not numbers or functions.

Once you have written scan() and parse() you should have a working interpreter. That is, they should connect perfectly to the evaluator and user interface. Test your interpreter thoroughly. Also make sure that the code is clean, concise, and appropriately commented.