2010 November 18 / jj|d|a|v|i|s|@|c|a|r|l|e|t|o|n|.|e|d|u

Project

Carleton College CS 201, Fall 2010, Prof. Joshua R. Davis

Introduction

In this course there is not a final exam but rather a course project that counts for 20% of your course grade. The goal of the project is for you to complete a significant task using one or more data structures from the course in depth. A rough guideline is that the project should require at least three times as much work as a typical assignment. I encourage you to work with one partner.

I give several project ideas below. You can try one of these, invent your own variation, or invent your own project entirely. The ideas are deliberately open-ended, so that you have room for creativity. You also have room for getting lost in too many options; to avoid getting off track, you will give Dan and I frequent progress reports.

Your entire project is due at 5:00 PM on Monday November 22. You should submit your program and any auxiliary files that it uses (data files, image files, etc.) electronically, in a folder project. You must submit a typed report of some kind also. You may hand in your report on paper or electronically. If it's on paper, then staple it and place it in my mailbox in the Math/CS department offices. If it's submitted electronically, then place it in your project folder with the rest of your project. The file format must be plain text (.txt) or PDF (.pdf). Microsoft formats (.doc, .docx, .xls, etc.) are not acceptable. On a Mac you can create a PDF of any document by selecting "Print", then "PDF", then "Save as PDF".

I will read your report when I start grading your project. Therefore the report is your opportunity to frame your work — to influence how I view it all. The following topics should probably be addressed.

Depending on your project, those last two items might be small or large. For example, two students are using graph algorithms to study social networks; all of their interesting sociological findings will show up in their report. One group is creating its own programming language; there should be lots of discussion of their design choices.

Ideas Resembling Interpretation

Computer Algebra

Mathematica and Maple are computer algebra systems — high-powered calculators that are capable of not just numerical calculation but also algebraic calculation. In our course we have studied how an algebraic expression can be represented as a tree, with operators at branch nodes and operands at leaf nodes. This core idea is all you need to write to write your own basic computer algebra system. In fact, I wrote one a couple of years ago to help my calculus students practice differentiation; you can try it out yourself.

I propose that you write your own computer algebra system, capable for example of simplifying complicated expressions, expanding out complicated products, differentiating functions, etc. You can use your Interpretation assignment as a starting point.

Design Your Own Programming Language

In our Interpretation assignment we implemented a simple programming language that, while it contained the key ideas of language implementation, ended up being little more than a glorified calculator. Conspicuously missing features were:

(A Boolean variable has value True or False; to implement if-then-else statements you need some sort of Boolean type, because that's what comes between the if and the then.)

I propose that you design your own programming language that includes these features and whatever else you desire. You get to design the syntax however you like (within reason). You could build on the Interpretation assignment, or you could strike out in a different direction. For example, if you require all syntax to be in prefix form, then your parser can be kept extremely simple even while you add new features to the language. It's up to you.

Other Data Structures

String Matching

A suffix tree is a data structure that enables fast searching for substrings in a given string. It is used in analysis of DNA, analysis of written texts, and compression of images.

I propose that you implement a suffix tree (or related data structure, such as a trie) and use it to for some interesting application related to DNA, natural language text, or images.

Huffman Coding

Huffman coding is a slick method for lossless data compression that uses binary trees heavily. For example, it is one tiny part of the JPEG image compression algorithm. The details of Huffman coding are not easy, and thus interesting.

I propose that you study Huffman coding, implement it in Python using binary trees, and then apply it to some interesting data compression problem.

Better Heaps

There are many variations on the binary heap idea: binomial heaps, Fibonacci heaps, etc. I propose that you study one of these, implement it in Python, and use it for something awesome.

Analysis of Real-World Graph Data

Social Networks

A social network such as Facebook is a graph in which the nodes are people and the edges are friendships among those people. There are publicly available data on such social networks, for example at the INSNA. In recent years, sociological analysis of these networks has exploded. There are many interesting questions to ask. For example, if someone has two friends, then what is the probability that they are friends with each other? Is there any objective way to define groups of friends? To what extent are friendships determined by age, gender, and other classifications? To what extent do they cut across classifications?

I propose that you build these data into a graph and use graph algorithms to answer interesting questions about these social networks, of your own invention.

Road Networks

A road network is a graph in which the nodes are intersections and the edges are roads. At DIMACS you can download the entire road network of any state. For example, Minnesota's has 547028 nodes and 670443 edges (fewer than two edges per node?!).

I propose that you build these data into a graph and use graph algorithms to answer interesting questions about roads. For example, you could make a trip planner with a simple graphical user interface written in Bopagopa.

Web Crawling

Python comes with a module urllib for fetching web pages off the World Wide Web. Once you have the text of a web page, it's not hard to find all of the web pages it links to. You can build part of the World Wide Web into a graph data structure. Then you can ask all kinds of structural questions about that graph, to get insight into the culture and economics of the web. For example, how many links does a typical web page have? Does a web site preferentially link to itself (to keep you stuck there, viewing its ads) or does it let you go to other places? Are commerical web sites different from academic/non-profit/personal web sites in this regard? If there are 24 web sites devoted to some topic, then do they all link to each other equally, or are some more connected than others? Does the answer vary by topic?

I propose that you write some kind of web crawling program and use it to analyze social/economic/cultural questions of your own design.

Simulations on Graphs

Building Virtual Worlds

The Crispy Hexagons program that I showed you in class is a prototype for a large-scale virtual world or game (that I will never write). The landscape is a graph of patches, each of which is essentially a graph of hexagons. There are many opportunities for graph algorithms to be applied. For example, if you make a stream by pouring water onto the landscape at some location, then where will the water collect? What will the shape of the resulting pond be?

I propose that you develop algorithms for such virtual world-building, based on my Crispy Hexagons code. The code is already about 1000 lines long, so a lot of mundane tasks have been handled for you.

Molecular Simulation

In chemistry, a molecule is essentially a graph of atoms, which attract and repel each other according to their electrical charges. In principle, you can numerically simulate the electrical forces, accelerations, velocities, and positions of all of the atoms over time. Does a given starting configuration of atoms settle into a stable molecule? Exactly what rules do you have to specify? How large of molecules can you simulate?

I propose that you write this simulation and animate it in 3D using Bopagopa. Warning: I have never written this, so it might be hard.

Circuit Simulation

An electrical circuit is a graph in which the nodes are components (resistors, batteries, etc.) and the edges are wires. There are basic rules, such as Ohm's law and the Kirchhoff current law, for computing voltages, currents, etc. throughout a circuit.

I propose that you write a program that automatically computes the voltages, currents, etc. throughout a circuit. Note: This requires solving a big system of linear equations; there are Python modules to do this for you. Warning: Students have tried this project in earlier terms, and failed; but I believe they got hung up on user interface.

Traffic Simulation

Automobiles move along roads in a road network in a manner similar to how electrons move through wires in a circuit. I propose that you develop a traffic simulation system, and study how the addition of new routes and new sources of traffic (e.g. suburbs) affects the network flow.