2016 October 25,
In this assignment, you will collect experimental data on the speed of some algorithms, and compare those data to our theoretical understanding of those algorithms. This work will be submitted on paper, by each student individually. It is due at the start of class on Wednesday.
Python's time module contains many utility functions for measuring and managing time. We'll discuss only two of these functions, and use only one of them.
The first function that we'll discuss is time, which returns the real time elapsed, in seconds, since a particular date called the epoch. On macOS and other Unix systems, the epoch is January 1, 1970, at 12:00 AM. You can verify this by calling the time function, and then converting its results from seconds to years, using the following code.
import time
time.time() / (60 * 60 * 24 * 365.25)
We want to measure how much time a Python command (or sequence of commands) takes. To do this, we could take the time, then run the operation, then take the time again, and then subtract the two measurements. But this approach doesn't measure time exactly as we want. The problem is that a computer is typically running many programs at once. (To see these programs on a macOS machine, enter the command top in bash. Press Q to quit.) The computer doesn't actually run these programs all at once; it just appears to. It maintains this illusion by running one program for a fraction of a second, and then running another for a fraction of a second, and then another, and so on. So, even if your Python program takes 3.6 seconds of real time to execute, it may have been actively running for only 1.4 of those 3.6 seconds. This makes real time, as measured by the time function, a bad measure of how slow the operation is.
The second function that we'll discuss is clock. It returns the amount of processor time used by your program, since the program started. That is, it automatically disregards the time used by other programs running on your computer. We can measure how many seconds a Python command takes by taking the clock, then running the operation, then taking the clock again, and then subtracting the two measurements:
start = time.clock() # Put some commands here, that you'd like to time. finish = time.clock() print(finish - start)
Unfortunately, the clock function is not perfectly precise, especially with very small time differences. To get reliable time measurements, your two calls to clock should occur at least 0.1 seconds apart from each other. If you want to time a really small command, then run the command 1000 times (or more, as needed), measure the total time required, and divide that time by 1000 to get the time per command.
Suppose that you want to choose an integer randomly. Then you import random and do something like random.randint(13, 72), right? But what if you want to randomly choose an integer from among all integers whose binary representations are a certain length? Then you can use this function:
def randomIntOfBits(numBits): return random.randint(2 ** (numBits - 1), 2 ** numBits - 1)
Or maybe you want a list of numbers in a mixed-up order, to try out some sorting algorithms. Then you can use this function:
def randomList(cap): unmixed = list(range(cap)) mixed = [] while len(unmixed) > 0: j = random.randint(0, len(unmixed) - 1) mixed += [unmixed[j]] unmixed = unmixed[:j] + unmixed[(j + 1):] return mixed
By the way, whenever I give you code, I expect you to read it carefully and think about how it works. For example, on a future exam I might ask you to do something similar.
In class, we have discussed two algorithms for computing the greatest common divisor of two integers a ≥ 0 and b ≥ 0. As they are written here, these functions do not assume that a ≥ b. In other words, they work even if a < b.
def gcd(a, b): if min(a, b) == 0: return max(a, b) else: for d in range(min(a, b), 0, -1): if a % d == 0 and b % d == 0: return d def euc(a, b): while b != 0: rem = a % b a = b b = rem return a
In class we have deduced that the second algorithm should be dramatically faster than the first, at least for large inputs. I would like you to collect and present data to verify this.
I recommend that you write a function gcdTest. As input it takes an integer numTrials and another integer numBits. As output it returns a floating-point number, to be described in a moment. Essentially, gcdTest randomly chooses an a and b, each of numBits bits, and computes their gcd. It repeats this experiment numTrials times. It measures the clock before starting the trials and after finishing the trials, and it returns the clock difference as a measure of how much processor time the trials required.
Unfortunately, we can't code the algorithm as it was just given, because a lot of the processor time would be spent randomly generating the numbers a and b. Instead we want to choose all of these random numbers before we start the clock. So we use this algorithm for gcdTest:
Once you've written gcdTest, you should test various values of numBits, starting from 1 and going as high as you can, until the tests become impractically large. For each value of numBits, tune numTrials to keep the test not-too-big and not-too-small. Each test should use at least 100 trials and last at least 1 second. Divide each test's clock difference by its numTrials to estimate the time required by each call to gcd.
Similarly, write a function eucTest to test the time required by the Euclidean algorithm. Run eucTest on each value of numBits that you used above, so that you can directly compare the timing results between the two algorithms.
Make a graph with numBits on the horizontal axis and time-required-per-call on the vertical axis. Plot the time required by both gcd and euc on a single graph, so that they are directly comparable. You can make the graph by hand, or in a spreadsheet, or in Mathematica, or whatever you like.
Also write or type a short explanation of how your data illustrate the theory. That is, do your timing experiments corroborate or refute the arguments that we've made in class?
In class, we have discussed three sorting algorithms: bubble sort, selection sort, and merge sort. They are implemented in sorting.py. Collect timing data that illustrates how these three sorting algorithms compare on lists of varying length. Present your data graphically. Explain how the data illustrate our theoretical results.
Perhaps you should mimic what you did for the GCD analysis. Instead of the number of bits, the size of the problem is now the length of the list to be sorted. As in the GCD analysis, you should strive to ensure that your timing data reflect just the time required to sort, not the time required to randomly generate sorting problems.
Print out your code and staple it into the same packet as your graphs and explanations. Your analysis will be graded according to these criteria: