2012 May 1,

CS 111: Assignment: Efficiency

Carleton College, Spring 2012

Prof. Joshua R. Davis, , Goodsell 106B, x4473

Introduction

In this assignment, you will collect experimental data on the speed of some algorithms, and compare those data to our theoretical understanding of those algorithms. This work will be submitted on paper, by each student individually. It is due at the start of class on Monday.

Timing Python

Python's time module contains many utility functions for measuring and managing time. We'll discuss only two of these functions, and use only one of them.

The first function is time, which returns the real time elapsed, in seconds, since a particular date called the epoch. On Mac OS X and other Unix systems, the epoch is January 1, 1970, at 12:00 AM. You can verify this by calling the time function, and then converting its results from seconds to years, using the following code. (Why does the following code work? Does the answer make sense?)

import time

time.time() / (60 * 60 * 24 * 365.25)

We want to measure how much time a Python command (or sequence of commands) takes. To do this, we could take the time, then run the operation, then take the time again, and then subtract the two time measurements. But this approach doesn't measure time exactly as we want. The problem is that a computer is typically running many programs at once. (To see these programs on a Mac OS X machine, enter the command top in bash. Press Q to quit.) The computer doesn't actually run these programs all at once; it just appears to. It maintains this illusion by running one program for a fraction of a second, and then running another for a fraction of a second, and then another, and so on. So, even if your Python program takes 3.6 seconds of real time to execute, it may have been actively running for only 1.4 of those 3.6 seconds. This makes real time, as measured by the time function, a bad measure of how slow the operation is.

The second function is clock, which returns the amount of processor time elapsed, in seconds, since the program started. That is, it measures just the time used by your program — not by other programs running "at the same time" on the computer. We can measure how many seconds an operation takes by taking the clock, then running the operation, then taking the clock, and then subtracting the two clock measurements:

start = time.clock()
# Put some commands here, that you'd like to time.
finish = time.clock()
print finish - start

There's a caveat: The clock function is not perfectly precise. It probably doesn't report times of less than one microsecond accurately. It may not report times of less than one millisecond accurately. To get reliable time measurements, your two calls to clock should occur at least 0.1 seconds apart from each other. In other words, you shouldn't try to time really small commands.

First Task: Greatest Common Divisor

In class, we have discussed two algorithms for computing the greatest common divisor of two numbers a ≥ b ≥ 0:

def gcd(a, b):
    for d in range(b, 0, -1):
        if a % d == 0 and b % d == 0:
            return d

def euc(a, b):
    while b != 0:
        rem = a % b
        a = b
        b = rem
    return a

We have deduced that the Euclidean algorithm should be significantly faster, at least for large inputs. I would like you to collect and present timing data, to verify this. That is, you'll collect various timing data, using the clock function described above, to show how the running time depends on the size of a and/or b. I recommend this approach:

Write a function gcdTest, that takes as input a, b, and n, computes gcd(a, b) n times, and returns the total clock time that those n computations took. (By running the same computation over and over again, we are essentially "magnifying" it, so that the time is large enough to be measured accurately by clock.)
Write a function eucTest, that does the same thing, but using euc instead of gcd.
Pick a large value of n. Also, determine a range of a- and b-values to test. The ranges should be large enough, that you can observe how running time depends on the sizes of a and/or b. That is, you should use many different sizes for a and for b: 1-digit, 2-digit, 3-digit, ..., 20-digit? Also, because the time required to compute the greatest common divisor depends on the particular interaction of a and b, you should consider many different combinations within each size of a and b.
For each chosen combination of a and b, record the results of gcdTest and eucTest, for your chosen n.
If any timing results are smaller than 0.1 seconds, then go back and choose a larger n. If there is not enough range in the timing results, then choose a wider range of sizes of a- and b-values. If the timing results seem to vary chaotically, then try more values of a and b within each size.

Once you have some good timing data, that illustrate the theoretical running time estimates that we derived in class, make a graph (either by hand, or using a spreadsheet, or Mathematica, or whatever you like). Also write or type a short explanation of how your data illustrate the theory.

Second Task: Sorting

In class, we have discussed three sorting algorithms: bubble sort, selection sort, and merge sort. They are implemented in sorting.py. Collect timing data that illustrates how these three sorting algorithms compare. Present your data graphically. Explain how the data illustrate our theoretical results. Be sure to explain your testing process in detail.

Submit Your Work

Submit your analysis on paper. It will be graded according to these criteria:

graph of GCD data, showing the efficiencies of the two algorithms, and how they compare (3 points)
discussion of the GCD data (3 points)
graph of the sorting data, showing the efficiencies of the three algorithms, and how they compare (3 points)
discussion of the sorting data (3 points)