2008 September 29 / jjddaavviiss@@ccaarrlleettoonn..eedduu
Carleton College CS 201, Prof. Joshua R. Davis
In this assignment you will use stacks to check whether an XML document is well-formed. You will also investigate the role of stacks in general debugging.
Download the following files to a single folder on your computer.
Examine the playlist.xml file in Smultron. This is the XML description of an actual iTunes playlist. The first two lines are special. The first line indicates that this is an XML file and names the intended version of XML. The second line describes what kind of XML document it is, using a document type description (DTD) stored on an Apple web server. On this assignment we're going to ignore these two special lines.
After the first two lines, the file takes on a rigid, "stacky" structure, as we've discussed in class. It is a nest of opening and closing tags. The closing tags begin in
/, while the opening tags do not have a
/ at all. Each closing tag matches the most-recently-opened-but-not-yet-closed tag.
There is actually one more kind of tag: an empty tag, which ends in
/. An empty tag opens and immediately closes itself. The only example in this file is
<true/>; it is equivalent to
In order to determine whether an XML file is well-formed, you don't need to know what the tags actually mean; you just need to check that they are correctly nested. That said, the tags here should be saying something to you. Notice that there is a
<dict> tag that contains
<value> tags. Where have we seen this sort of thing before, in our course?
xmlchecker.py already contains two helper functions:
tagsForXMLFileName(). Read them carefully; if necessary, insert some test code into
main() to try them out.
tagFromString() is commented once to explain what it does. To understand how it does it, you'll probably need to look up the Python string method
partition(). Nonetheless, this function is so short, and its variable names are so descriptive, that it probably doesn't need any more comments. (On the other hand, if you think that it does, then feel free to add them.)
tagsForXMLFileName() is commented once to explain what it does, but the function is long enough and complicated enough that it deserves more comments. Edit this function to add a comment before every line, explaining what that line does. You may need to look up certain Python string methods; you may also need to look up the concept of "list comprehensions" in Python. (Ordinarily having a comment on every line is overkill, but do comment every line in this function so that I'm sure you understand it.)
In main() there is a spot for you to insert some code. Here, write code that uses a stack to check whether the given XML file is well-formed, as we've discussed in class. There are two ways that the XML file could fail to be well-formed: a tag could be closed incorrectly (with the wrong closing tag) or the file could end with one or more tags left open. In either case, your program should print out an error message that describes exactly how the XML file is ill-formed; it should also print out the tags that weren't closed correctly, so that the user has some idea of what went wrong. This is called "debugging information". You will want to create multiple versions of playlist.xml, with different errors in them, so that you can test your program.
For example, here's what my program does when I hand it a well-formed XML file:
jdavis$ python xmlchecker.py The file playlist.xml correctly nests XML tags.
Here's what my program does with an XML file that closes tags incorrectly:
$ python xmlchecker.py The file playlistincorrect.xml does not correctly nest XML tags. The tag /sminteger was found where the following tags were expected to be closed (from the inside out): integer dict dict dict plist
Here's what my program does when I hand it an XML file that leaves tags open:
jdavis$ python xmlchecker.py The file playlistleftopen.xml leaves some XML tags open; from the inside out, they are: dict dict plist
Read sillypayroll.py. We've discussed this code in class before. Execute sillypayroll.py. When it performs
calculateBonus(50000, 16, 6.0)
Python returns 0.30487804878. (Don't worry too much about what this number means; it's a silly calculation.) When it performs
calculateBonus(50000, 16, "jimmy")
Python returns an error. Do this yourself, and figure out where/why the error occurred. Does Python's error message help you locate the error in the code?
Compare the error message here to the ones your XML checker generates for ill-formed XML. How are they similar? Why? How are they different? Discuss. Save your responses to these prompts in a text file called discussion.txt.
By Saturday at 11:59 PM, submit your xmlchecker.py and discussion.txt files using
hsp xmlchecker.py cs201-00-f08 hsp discussion.txt cs201-00-f08
Don't forget to put your name at the top of each one. They will be graded based on these criteria:
tagsForXMLFileName()is commented as described above. (3 points)
main()works correctly on well-formed XML. (3 points)
main()works correctly when given XML with tags incorrectly closed. (3 points)
main()works correctly when given XML with tags left open. (3 points)
main()is clear and appropriately commented (1 point)
calculateBonus()is insightful, thorough, well-written, and concise. (3 points)