2012 April 12,

CS 254: Assignment G

Carleton College, Spring 2012

Prof. Joshua R. Davis, , Goodsell 106B, x4473

This assignment is due Friday at noon. You will hand in Problem A electronically, and Problems B-D on paper.

Problems

A. Do the Interpreter mini-assignment described below.

B. Write a context-free grammar for the language described in the Interpreter section below. Your terminal alphabet should be all characters that can be typed reasonably on a keyboard, including white space and parentheses.

C. Do Problem 2.26.

D. Let A = {aⁱb^jc^k : i, j, k ≥ 0}. Let B = {aⁿbⁿcⁿ : n ≥ 0}. Construct a PDA for A - B.

Interpreter

In this problem, we're going to implement a simple programming language together. I've already written the evaluator and the user interface; see interpreter.py. You're going to write the scanner and parser. The language has an extremely simple syntax. A valid expression is either an atom or a list of expressions.

An atom is any sequence of (one or more) characters, not including any parentheses or white space ((, ), \t, \r, \n, and the space ). Any atom that can be interpreted as a floating-point number (by Python's underlying float cast) is interpreted as such; all other atoms are keywords or available variable names.
A list of expressions is enclosed in parentheses. The first expression should evaluate to a function (or keyword). The remaining expressions are arguments to that function, and the value of the expression is the output value of the function.

The language has only two data types: floating-point numbers, and functions. The user can add and multiply floating-point numbers using the + and * functions. The user can construct her own functions using the fun keyword. The user can assign values to variables using the set keyword. Functions are "first class" in this language; they can be assigned to variables, they can be arguments to other functions, and they can be returned by other functions.

Here's a transcript. The user input lines begin with the prompt :>. For each user input, the interpreter prints its response, or an error message. The user may enter more than one expression on a single input line.

Desktop > python interpreter.py 
:> (set x 4)
4.0
:> (* 3 x)
12.0
:> (set mult3 (fun (x) (* 3 x)))
(fun (x) (* 3 x))
:> mult3
(fun (x) (* 3 x))
:> (mult3 4)
12.0
:> (set y (mult3 4))
12.0
:> y
12.0
:> ((fun (x) (* 3 x)) 5)
15.0
:> (((fun (y) (* x y)) 4) 2)
error: function: 16.0
:> (((fun (y) (fun (x) (* x y))) 7) 9)
63.0
:> (set applybackwards (fun (x f) (f x)))
(fun (x f) (f x))
:> (applybackwards -2.1 (fun (x) (+ x 5)))
2.9
:> (set - (fun (x y) (+ x (* -1.0 y))))
(fun (x y) (+ x (* -1.0 y)))
:> (- 5 2)
3.0
:> (* 15 40) (- 100 7) x
600.0
93.0
4.0
:> (- x y
error: syntax: input ended unexpectedly
:>

This language looks a lot like Lisp and its derivatives, but it is definitely not Lisp. Knowing Lisp may help you or hinder you, in understanding this language.

Your job is to write the scanner tokenList and the parser parseTree. Stubs of these functions can already be found in interpreter.py.

The tokenList function takes as input a single line of user input, as a string. It outputs a list of strings, namely all of the tokens in the user input. In our language, there are three kinds of tokens: left parentheses, right parentheses, and atoms. The gathering of atoms is done greedily; each atom is made as large as possible. In other words, it is never the case that two atoms abut each other in the input, with no intervening parentheses or white space. Implement tokenList using a regular expression; my solution is one line of code.

The parseTree function takes as input a list of tokens. If the parsing fails, then it prints an informative error message and returns None. If the parsing succeeds, then it outputs a TreeNode representing the root of the parse tree. The TreeNode class is already defined in interpreter.py. It is a simple tree data structure, where each child node possesses a pointer back to its parent. The parent of the root is None. Each node can store an arbitrary Python object as its data "payload". All of the leaf nodes in the parse tree have strings as their data; these strings are exactly the atoms from the token list. The branch nodes' data objects are ignored by the rest of the interpreter, so you may configure them as you like.

The children of the root of the parse tree correspond to the expressions that the user types on the input line. Otherwise, the structure of the parse tree represents the nesting of the parentheses in each expression. For example, the input line (+ x 14) is parsed into

        ROOTNODE
        |
BRANCHNODE
|   |   |
+   x   14

while the input line ((fun (x) (* x x)) 3) (set y 4) is parsed into

             ROOTNODE
             |      |
    BRANCHNODE      BRANCHNODE
    |        |       |   |   |
BRANCHNODE   3      set  y   4
 |   |   |
fun  BR  BRANCHNODE
     |    |   |   |
     x    *   x   x

The parser does not need to know or check any details of the semantics of the language, such as how many arguments a given function or keyword takes. That check is performed later, by the evaluator. My solution to the parseTree function is 16 lines of code. It does not explicitly use context-free grammars, pushdown automata, or stacks. But the ideas are quite similar to the context-free language ideas that we've been exploring in class.

Once you complete tokenList and parseTree, test the interpreter thoroughly. In particular, it should work on all of the example code given above. (Let me know if you find errors or bad behavior. But I already know that there should be several more error checks in the evaluator.) Submit your work electronically, by mounting the COURSES file server and dropping your edited interpreter.py file in your CS 254 handin folder.