2008 September 29 / jjddaavviiss@@ccaarrlleettoonn..eedduu

Checking XML Well-Formedness

Carleton College CS 201, Prof. Joshua R. Davis

In this assignment you will use stacks to check whether an XML document is well-formed. You will also investigate the role of stacks in general debugging.

0. Prepare Yourself

Download the following files to a single folder on your computer.

1. Acquaint Yourself With playlist.xml

Examine the playlist.xml file in Smultron. This is the XML description of an actual iTunes playlist. The first two lines are special. The first line indicates that this is an XML file and names the intended version of XML. The second line describes what kind of XML document it is, using a document type description (DTD) stored on an Apple web server. On this assignment we're going to ignore these two special lines.

After the first two lines, the file takes on a rigid, "stacky" structure, as we've discussed in class. It is a nest of opening and closing tags. The closing tags begin in /, while the opening tags do not have a / at all. Each closing tag matches the most-recently-opened-but-not-yet-closed tag.

There is actually one more kind of tag: an empty tag, which ends in /. An empty tag opens and immediately closes itself. The only example in this file is <true/>; it is equivalent to <true></true>.

In order to determine whether an XML file is well-formed, you don't need to know what the tags actually mean; you just need to check that they are correctly nested. That said, the tags here should be saying something to you. Notice that there is a <dict> tag that contains <key> and <value> tags. Where have we seen this sort of thing before, in our course?

2. Examine The Helper Functions

xmlchecker.py already contains two helper functions: tagFromString() and tagsForXMLFileName(). Read them carefully; if necessary, insert some test code into main() to try them out.

tagFromString() is commented once to explain what it does. To understand how it does it, you'll probably need to look up the Python string method partition(). Nonetheless, this function is so short, and its variable names are so descriptive, that it probably doesn't need any more comments. (On the other hand, if you think that it does, then feel free to add them.)

Similarly, tagsForXMLFileName() is commented once to explain what it does, but the function is long enough and complicated enough that it deserves more comments. Edit this function to add a comment before every line, explaining what that line does. You may need to look up certain Python string methods; you may also need to look up the concept of "list comprehensions" in Python. (Ordinarily having a comment on every line is overkill, but do comment every line in this function so that I'm sure you understand it.)

3. Edit main()

In main() there is a spot for you to insert some code. Here, write code that uses a stack to check whether the given XML file is well-formed, as we've discussed in class. There are two ways that the XML file could fail to be well-formed: a tag could be closed incorrectly (with the wrong closing tag) or the file could end with one or more tags left open. In either case, your program should print out an error message that describes exactly how the XML file is ill-formed; it should also print out the tags that weren't closed correctly, so that the user has some idea of what went wrong. This is called "debugging information". You will want to create multiple versions of playlist.xml, with different errors in them, so that you can test your program.

For example, here's what my program does when I hand it a well-formed XML file:

jdavis$ python xmlchecker.py
The file playlist.xml correctly nests XML tags.

Here's what my program does with an XML file that closes tags incorrectly:

$ python xmlchecker.py
The file playlistincorrect.xml does not correctly nest XML tags.
The tag /sminteger was found where the following tags were expected 
to be closed (from the inside out):
integer
dict
dict
dict
plist

Here's what my program does when I hand it an XML file that leaves tags open:

jdavis$ python xmlchecker.py
The file playlistleftopen.xml leaves some XML tags open; from the 
inside out, they are:
dict
dict
plist

4. Discuss The Behavior Of calculateBonus()

Read sillypayroll.py. We've discussed this code in class before. Execute sillypayroll.py. When it performs

calculateBonus(50000, 16, 6.0)

Python returns 0.30487804878. (Don't worry too much about what this number means; it's a silly calculation.) When it performs

calculateBonus(50000, 16, "jimmy")

Python returns an error. Do this yourself, and figure out where/why the error occurred. Does Python's error message help you locate the error in the code?

Compare the error message here to the ones your XML checker generates for ill-formed XML. How are they similar? Why? How are they different? Discuss. Save your responses to these prompts in a text file called discussion.txt.

5. Submit Your Work Electronically

By Saturday at 11:59 PM, submit your xmlchecker.py and discussion.txt files using

hsp xmlchecker.py cs201-00-f08
hsp discussion.txt cs201-00-f08

Don't forget to put your name at the top of each one. They will be graded based on these criteria: