2012 May 1,
Carleton College, Spring 2012
Prof. Joshua R. Davis, , Goodsell 106B, x4473
In this assignment, you will collect experimental data on the speed of some algorithms, and compare those data to our theoretical understanding of those algorithms. This work will be submitted on paper, by each student individually. It is due at the start of class on Monday.
Python's time module contains many utility functions for measuring and managing time. We'll discuss only two of these functions, and use only one of them.
The first function is time, which returns the real time elapsed, in seconds, since a particular date called the epoch. On Mac OS X and other Unix systems, the epoch is January 1, 1970, at 12:00 AM. You can verify this by calling the time function, and then converting its results from seconds to years, using the following code. (Why does the following code work? Does the answer make sense?)
import time
time.time() / (60 * 60 * 24 * 365.25)
We want to measure how much time a Python command (or sequence of commands) takes. To do this, we could take the time, then run the operation, then take the time again, and then subtract the two time measurements. But this approach doesn't measure time exactly as we want. The problem is that a computer is typically running many programs at once. (To see these programs on a Mac OS X machine, enter the command top in bash. Press Q to quit.) The computer doesn't actually run these programs all at once; it just appears to. It maintains this illusion by running one program for a fraction of a second, and then running another for a fraction of a second, and then another, and so on. So, even if your Python program takes 3.6 seconds of real time to execute, it may have been actively running for only 1.4 of those 3.6 seconds. This makes real time, as measured by the time function, a bad measure of how slow the operation is.
The second function is clock, which returns the amount of processor time elapsed, in seconds, since the program started. That is, it measures just the time used by your program — not by other programs running "at the same time" on the computer. We can measure how many seconds an operation takes by taking the clock, then running the operation, then taking the clock, and then subtracting the two clock measurements:
start = time.clock() # Put some commands here, that you'd like to time. finish = time.clock() print finish - start
There's a caveat: The clock function is not perfectly precise. It probably doesn't report times of less than one microsecond accurately. It may not report times of less than one millisecond accurately. To get reliable time measurements, your two calls to clock should occur at least 0.1 seconds apart from each other. In other words, you shouldn't try to time really small commands.
In class, we have discussed two algorithms for computing the greatest common divisor of two numbers a ≥ b ≥ 0:
def gcd(a, b): for d in range(b, 0, -1): if a % d == 0 and b % d == 0: return d def euc(a, b): while b != 0: rem = a % b a = b b = rem return a
We have deduced that the Euclidean algorithm should be significantly faster, at least for large inputs. I would like you to collect and present timing data, to verify this. That is, you'll collect various timing data, using the clock function described above, to show how the running time depends on the size of a and/or b. I recommend this approach:
Once you have some good timing data, that illustrate the theoretical running time estimates that we derived in class, make a graph (either by hand, or using a spreadsheet, or Mathematica, or whatever you like). Also write or type a short explanation of how your data illustrate the theory.
In class, we have discussed three sorting algorithms: bubble sort, selection sort, and merge sort. They are implemented in sorting.py. Collect timing data that illustrates how these three sorting algorithms compare. Present your data graphically. Explain how the data illustrate our theoretical results. Be sure to explain your testing process in detail.
Submit your analysis on paper. It will be graded according to these criteria: