2010 October 20 / |j|d|a|v|i|s|@|c|a|r|l|e|t|o|n|.|e|d|u
Carleton College CS 201, Prof. Joshua R. Davis
The purpose of this assignment is to understand hash functions, the resolution of collisions using chains, and dictionaries in general. You may complete this assignment with one partner. It is due Tuesday at 11:59 PM.
Dictionary
In class we have created an IntDictionary
class that behaves much like Python's built-in dictionary, except that it accepts only integer keys. In this section we will use hashing and chaining to overcome that limitation. That is, we'll build a Dictionary
class that supports keys of type int
, float
, str
, and nested tuple
s of those. For example, ("jane", ((3.12, "jane"), 2))
will be a legitimate key. In the end, Dictionary
will be similar to, but not identical to, Python's built-in dictionary type. Here are Dictionary
's methods.
def __init__(self): """Initializes an empty dictionary.""" def __len__(self): """Returns the number of key-value pairs in the dictionary.""" def __delitem__(self, key): """Deletes the key-value pair. If the key is not in the dictionary, then does nothing.""" def __setitem__(self, key, value): """If the key is not already in the dictionary, then puts the key-value pair into the dictionary. If the key is already in the dictionary, then changes its associated value. Returns nothing.""" def __getitem__(self, key): """Returns the value associated to the given key, or None if the key is not present.""" def getFirstKey(self): """The dictionary maintains its keys in a certain order (about which the user should make no assumptions). If the dictionary is empty, then this method returns None. Otherwise, this method returns the first key in the dictionary, according to the dictionary's key order. This key can be used to start a chain of calls to getNextKey, that traverse all of the keys in the dictionary. (The user should not add or remove key-value pairs during the traversal.)""" def getNextKey(self, key): """The dictionary maintains its keys in a certain order (about which the user should make no assumptions). The given key should be a key from the dictionary, typically obtained from getFirstKey or getNextKey. If there is no key after the given key, according to the dictionary's order, then this method returns None. Otherwise, this method returns the next key after the given key. In combination with getFirstKey, this method lets the user traverse all of the keys in the dictionary. (The user should not add or remove key-value pairs during the traversal.)""" def __str__(self): """Returns some string representation of the dictionary. May be implementation-dependent.""" def getCost(self): """Returns some measure of the current worst-case cost of accessing items in the dictionary. May be implementation-dependent.""" def hashKey(self, key): """Private method. The input key is an int, float, string, or nested tuple of those. Returns an integer that can be used as the key to an integer-key dictionary."""
In a new file dictionary_bstn.py
, declare the Dictionary
class and implement these methods. I have some specific requirements.
Dictionary
by hashing all keys to integers (in hashKey
), using those integers as keys into an IntDictionary
, and resolving hash collisions using chaining.
Dictionary
as a subclass of IntDictionary
; however, I think it's better to declare Dictionary
as a subclass of object
, and give it an IntDictionary
as an instance variable. We'll talk about this in class.
__str__
and getCost
by just passing these messages along to the underlying IntDictionary
. In some sense this is bad software design — we're not hiding the implementation from the user — but for educational purposes these methods are more valuable if they reveal the underlying implementation, than if they hide it. (Update: I made a mistake here. The getCost
function should really include both the height of the tree and the length of the chains in the tree. However, I didn't catch this error early enough, so proceed with getCost
as described above; it doesn't matter much.)
A matrix is a rectangular grid of numbers. Matrices are used to do linear algebra calculations, and linear algebra is used heavily in many scientific and technological fields, so computing with matrices is important. What kinds of data structure could you use to store a matrix in a computer program? Here are two ideas. First, you could store the matrix as a list of lists of numbers. For example, a 3 by 5 matrix could be stored as a list of 3 lists, each of which consists of 5 numbers:
Second, you could use a dictionary to store just the matrix entries that are nonzero. For example,matrix = [[4.1, 2.0, -1.1, 0.0, 0.5], [-1.6, 0.0, 1.6, 3.3, 0.0], [0.0, 0.0, 1.1, 1.5, -1.0]]
Which is better? It depends on the problem. If you're working with small matrices, then the list-of-lists representation is probably simpler and faster. On the other hand, if you're working with large matrices that happen to have a lot of zeros, then the dictionary representation may take much less room than the list-of-lists representation.matrix = {(0, 0):4.1, (0, 1):2.0, (0, 2):-1.1, (0, 4): 0.5, (1, 0):-1.6, (1, 2):1.6, ...}
For example, Google's page-ranking algorithm uses matrices that are roughly 1,000,000,000 by 1,000,000,000, but with only maybe 100 nonzero entries in each row. That is, only about 0.00001% of the entries are nonzero. The list-of-lists representation requires storing all 1,000,000,000,000,000,000 entries, while the dictionary representation requires storing only about 0.00003% as many numbers (three numbers per nonzero entry).
Large matrices with mostly zero entries are called sparse matrices. What we've learned is that the dictionary representation of sparse matrices uses far less space than the list-of-lists representation. For the kinds of calculations that one does with matrices — which we won't get into, because this isn't a linear algebra course — the dictionary representation is often also much faster for sparse matrices.
Download the latest version of Bopagopa from my web site. Inside the demos
folder you'll find a file gameoflife.py
. Run this program; you'll get a black screen. Use the right mouse button to select the "Acorn"; now there are some white squares on the black background. Press the space bar; the pattern of white squares starts changing.
You're looking at Conway's Game of Life. It's not really a game, although it is kind of fun. The "game" is played on an infinitely large two-dimensional grid of squares. At any given time, each square is either holding a living cell or not. The cell either survives to the next generation or dies off, depending on the cells in the eight squares around it. Also, new cells are born in some squares, depending on the cells nearby. Here are the precise rules:
Read the instructions at the top of gameoflife.py
to learn about the user interface, and then play around a bit while pondering this question: What does Conway's Game of Life have to do with sparse matrices?
Dictionary
The gameoflife.py
program uses Python dictionaries to store sparse matrices. In this part of the assignment, you are asked to edit gameoflife.py
to replace all uses of Python's dictionary with your Dictionary
. The purpose of this exercise is to make sure you know how to use Dictionary
and Python's dictionary, to make sure you recognize their similarities and differences, and to stress-test your Dictionary
for bugs.
This shouldn't be too difficult or time-consuming. There are only five chunks of code in gameoflife.py
that touch dictionaries: GameOfLife
's methods __init__
, activateCell
, setCell
, incrementDictionary
, and performStep
. You shouldn't need to change or add more than 10 lines of code. You should not need to alter Dictionary
at all, unless you discover bugs in it.
After testing them thoroughly, submit your two files dictionary_bstn.py
and gameoflife.py
electronically as usual. They will be graded according to these criteria:
Dictionary
works (6 points).