2010 October 20 / |j|d|a|v|i|s|@|c|a|r|l|e|t|o|n|.|e|d|u

Assignment: Dictionaries

Carleton College CS 201, Prof. Joshua R. Davis

Introduction

The purpose of this assignment is to understand hash functions, the resolution of collisions using chains, and dictionaries in general. You may complete this assignment with one partner. It is due Tuesday at 11:59 PM.

Write `Dictionary`

In class we have created an IntDictionary class that behaves much like Python's built-in dictionary, except that it accepts only integer keys. In this section we will use hashing and chaining to overcome that limitation. That is, we'll build a Dictionary class that supports keys of type int, float, str, and nested tuples of those. For example, ("jane", ((3.12, "jane"), 2)) will be a legitimate key. In the end, Dictionary will be similar to, but not identical to, Python's built-in dictionary type. Here are Dictionary's methods.

def __init__(self):
	"""Initializes an empty dictionary."""

def __len__(self):
	"""Returns the number of key-value pairs in the dictionary."""

def __delitem__(self, key):
	"""Deletes the key-value pair. If the key is not in the dictionary, then 
	does nothing."""

def __setitem__(self, key, value):
	"""If the key is not already in the dictionary, then puts the key-value 
	pair into the dictionary. If the key is already in the dictionary, then 
	changes its associated value. Returns nothing."""

def __getitem__(self, key):
	"""Returns the value associated to the given key, or None if the key is not 
	present."""

def getFirstKey(self):
	"""The dictionary maintains its keys in a certain order (about which the 
	user should make no assumptions). If the dictionary is empty, then this 
	method returns None. Otherwise, this method returns the first key in the 
	dictionary, according to the dictionary's key order. This key can be used 
	to start a chain of calls to getNextKey, that traverse all of the keys in 
	the dictionary. (The user should not add or remove key-value pairs during 
	the traversal.)"""

def getNextKey(self, key):
	"""The dictionary maintains its keys in a certain order (about which the 
	user should make no assumptions). The given key should be a key from the 
	dictionary, typically obtained from getFirstKey or getNextKey. If there is 
	no key after the given key, according to the dictionary's order, then this 
	method returns None. Otherwise, this method returns the next key after the 
	given key. In combination with getFirstKey, this method lets the user 
	traverse all of the keys in the dictionary. (The user should not add or 
	remove key-value pairs during the traversal.)"""

def __str__(self):
	"""Returns some string representation of the dictionary. May be 
	implementation-dependent."""

def getCost(self):
	"""Returns some measure of the current worst-case cost of accessing items 
	in the dictionary. May be implementation-dependent."""

def hashKey(self, key):
	"""Private method. The input key is an int, float, string, or nested tuple 
	of those. Returns an integer that can be used as the key to an integer-key 
	dictionary."""

In a new file dictionary_bstn.py, declare the Dictionary class and implement these methods. I have some specific requirements.

You must implement Dictionary by hashing all keys to integers (in hashKey), using those integers as keys into an IntDictionary, and resolving hash collisions using chaining.
You could declare Dictionary as a subclass of IntDictionary; however, I think it's better to declare Dictionary as a subclass of object, and give it an IntDictionary as an instance variable. We'll talk about this in class.
Implement __str__ and getCost by just passing these messages along to the underlying IntDictionary. In some sense this is bad software design — we're not hiding the implementation from the user — but for educational purposes these methods are more valuable if they reveal the underlying implementation, than if they hide it. (Update: I made a mistake here. The getCost function should really include both the height of the tree and the length of the chains in the tree. However, I didn't catch this error early enough, so proceed with getCost as described above; it doesn't matter much.)

Think About Sparse Matrices

A matrix is a rectangular grid of numbers. Matrices are used to do linear algebra calculations, and linear algebra is used heavily in many scientific and technological fields, so computing with matrices is important. What kinds of data structure could you use to store a matrix in a computer program? Here are two ideas. First, you could store the matrix as a list of lists of numbers. For example, a 3 by 5 matrix could be stored as a list of 3 lists, each of which consists of 5 numbers:

matrix = [[4.1, 2.0, -1.1, 0.0, 0.5],
          [-1.6, 0.0, 1.6, 3.3, 0.0],
          [0.0, 0.0, 1.1, 1.5, -1.0]]

Second, you could use a dictionary to store just the matrix entries that are nonzero. For example,

matrix = {(0, 0):4.1, (0, 1):2.0, (0, 2):-1.1, (0, 4): 0.5, (1, 0):-1.6, (1, 2):1.6, ...}

Which is better? It depends on the problem. If you're working with small matrices, then the list-of-lists representation is probably simpler and faster. On the other hand, if you're working with large matrices that happen to have a lot of zeros, then the dictionary representation may take much less room than the list-of-lists representation.

For example, Google's page-ranking algorithm uses matrices that are roughly 1,000,000,000 by 1,000,000,000, but with only maybe 100 nonzero entries in each row. That is, only about 0.00001% of the entries are nonzero. The list-of-lists representation requires storing all 1,000,000,000,000,000,000 entries, while the dictionary representation requires storing only about 0.00003% as many numbers (three numbers per nonzero entry).

Large matrices with mostly zero entries are called sparse matrices. What we've learned is that the dictionary representation of sparse matrices uses far less space than the list-of-lists representation. For the kinds of calculations that one does with matrices — which we won't get into, because this isn't a linear algebra course — the dictionary representation is often also much faster for sparse matrices.

Think About Conway's Game of Life

Download the latest version of Bopagopa from my web site. Inside the demos folder you'll find a file gameoflife.py. Run this program; you'll get a black screen. Use the right mouse button to select the "Acorn"; now there are some white squares on the black background. Press the space bar; the pattern of white squares starts changing.

You're looking at Conway's Game of Life. It's not really a game, although it is kind of fun. The "game" is played on an infinitely large two-dimensional grid of squares. At any given time, each square is either holding a living cell or not. The cell either survives to the next generation or dies off, depending on the cells in the eight squares around it. Also, new cells are born in some squares, depending on the cells nearby. Here are the precise rules:

If a cell has more than three neighbors, then it's overcrowded and dies.
If a cell has fewer than two neighbors, then it's lonely and dies.
If an empty square has exactly three neighbors, then the neighbors reproduce to form a new cell in that square.

Read the instructions at the top of gameoflife.py to learn about the user interface, and then play around a bit while pondering this question: What does Conway's Game of Life have to do with sparse matrices?

Implement Conway's Game of Life atop `Dictionary`

The gameoflife.py program uses Python dictionaries to store sparse matrices. In this part of the assignment, you are asked to edit gameoflife.py to replace all uses of Python's dictionary with your Dictionary. The purpose of this exercise is to make sure you know how to use Dictionary and Python's dictionary, to make sure you recognize their similarities and differences, and to stress-test your Dictionary for bugs.

This shouldn't be too difficult or time-consuming. There are only five chunks of code in gameoflife.py that touch dictionaries: GameOfLife's methods __init__, activateCell, setCell, incrementDictionary, and performStep. You shouldn't need to change or add more than 10 lines of code. You should not need to alter Dictionary at all, unless you discover bugs in it.

Submit Your Work

After testing them thoroughly, submit your two files dictionary_bstn.py and gameoflife.py electronically as usual. They will be graded according to these criteria:

Dictionary works (6 points).
Conway's Game of Life works atop your dictionary, with no Python dictionaries (3 points).
The code exhibits Good Python Style (3 points).