2010 February 4 / j d a v i s @ c a r l e t o n . e d u

Assignment 6: Elevation Models

The purpose of this assignment is to practice with files, loops, and lists. You will write functions that load landscape elevation data from a file, for display by a 3D graphics module. The end result is a kind of baby Google Earth. Here is a screen shot of my version of the program. It is displaying part of North America, centered on Hudson Bay.

You are REQUIRED to work with a partner; if you want help finding one, then e-mail me. Also, because this assignment uses a variety of Python modules that are difficult to install, you are expected to complete the work in the computer labs, rather than on your own computer. It is due Friday at the start of class.

Start Your Engines (Metaphorically)

Download these files to your computer.

Execute the file landscape.py. It displays a sliver of landscape and prints some instructions about keyboard and mouse controls. Play around a bit. The control-click and alt-click controls will not work, because you have not yet implemented them.

You are not supposed to understand the code in landscape.py at this point in the course, but do examine the bottom of the file, where the demonstration code is. You will find that the demonstration program consists of only two commands. The first one makes a grid (a list of lists) of elevation data. The second one creates a window to display that grid as a landscape, along with some brightly colored shapes (just for fun). There is no code after the call to createWindow(); this function takes over the flow of the program in order to run the user interface, and once it is finished the entire program ends.

Open the file 48by60.tsv in TextWrangler. You will find that it contains 60 lines of text, each containing 48 integers separated by tabs. The ".tsv" suffix stands for tab-separated values. This is a standard format for storing grids of data. In fact, you can open this file in a spreadsheet such as Microsoft Excel, if you like. The numbers in this file represent elevations above sea level in meters — except for the ones that have value -9999. That -9999 is a special code indicating that the datum is missing, because that point on the Earth's surface is underwater.

You will not edit any of these files; rather, all of your work will take place in a file elevation.py of your own creation.

Write loadGridFile()

In this section you will write a function to load a 48-by-60 grid of elevation data from a file. This function probably will not make it into your final program, but it is an important step toward writing more difficult functions.

First, a file is essentially a chunk of information stored in a computer's disk. (We'll discuss what a disk is later.) The file has a file name, such as "48by60.tsv", and it also has a location within the hierarchy of all of the files and folders on the computer's disk. Together, the location and the file name form a path string that uniquely names the file among all files on the computer. For example, if the file 48by60.tsv is sitting on my desktop, then its path is "/Users/jdavis/Desktop/48by60.tsv".

The first file operation that we need is the Python function open(), which we have already encountered in the Frequency Analysis lab. It takes in a path string, opens the file at that path, and returns an object of type file. The open() function (like the Python interpreter, bash, and most other programs) allows you to use relative paths instead of absolute paths, if you prefer. A relative path specifies the location of a file relative to the folder where you're currently working. For example, if my program and the file 48by60.tsv are both on my desktop, then either of these lines will work in my program:

myfile = open("/Users/jdavis/Desktop/48by60.tsv")
myfile = open("48by60.tsv")

In contrast, the paths that we discussed earlier, such as "/Users/jdavis/Desktop/48by60.tsv", are absolute paths; they specify the file's location without reference to any particular current location.

The second file operation that we need is the readline() method of the file class. This method takes no inputs and returns a single line from the file, as a string of text. The first time you call readline() it returns the first line of the file; subsequent calls to readline() return subsequent lines. Thus you can march through the file by repeatedly calling readline(). You stop when there are no more lines in the file; you detect this based on readline() returning an empty string.

Here's a quick warm-up exercise: Write a little Python code that opens the file 48by60.tsv, prints every line from the file, and finishes by printing "All done!".

Here's the real exercise: In your file elevation.py, write a function loadGridFile(), as follows. The function takes as input a path string (relative or absolute) indicating a grid file. It loads the grid data from the file into a grid (a list of lists) and returns the grid. The first line of the file is loaded as the first list in the grid, the second line in the file is loaded as the second list in the grid, and so on.

The data in the returned grid must be integers, not strings. For example, grid[0][0] should be -9999 (an int), not "-9999" (a string). A string that represents an integer is called a numeral. You can convert numerals to integers using the int() function; you can convert integers to numerals using the str() function.

Write processGrid()

Once you've written loadGridFile() you can pass the resulting grid straight into createWindow(), but it will look like garbage. There are two reasons.

First, there are a lot of -9999 data in the grid. These do not really indicate points at elevation -9999, but rather points where there is no datum because the surface of the Earth is underwater. Let's replace all of these points with an elevation of 0 (sea level).

Second, the vertical and horizontal scales are all messed up. The elevations are in meters, but the grid points are not spaced one meter away from each other. In the file 48by60.tsv, they're spaced roughly 100,000 meters away from each other. To get a more reasonable looking picture, but one in which we can still detect the topography, let's divide the elevations by something like 1000. Actually, let's divide them by 1000.0, so that we get floating-point numbers rather than rounded-off integers.

Write a function processGrid() that takes in a grid, modifies it to fix the two issues just described, and returns nothing. Once you have written this function you are ready to display your landscape, like this:

grid = loadGridFile("48by60.tsv")
processGrid(grid)
landscape.createWindow(grid, None, None)

If everything is working correctly, then this code displays a 3D model of northeastern North America, in shades of gray.

Add Color

In this section we add color to our 3D model, which for some reason makes it about a gazillion times more attractive.

There are various systems for representing color in physics, in art, in the computer industry, in the television industry, in the publishing industry, etc. We'll use the RGB system, which is perhaps the most popular in computing. In the RGB system, a color is described using three numbers — the amounts of red light, green light, and blue light in the color. The numbers take values between 0.0 (indicating none of the color) and 1.0 (indicating maximum intensity of the color). Here are some common colors described in RGB:

(1.0, 0.0, 0.0) is bright red; it has the full allotment of red, and no green or blue.
(0.0, 1.0, 0.0) is bright green.
(0.0, 0.0, 1.0) is bright blue.
(1.0, 1.0, 1.0) is white; white light is actually the combination of all colors of light.
(0.0, 0.0, 0.0) is black; black is no light at all.
(0.5, 0.5, 0.5) is a neutral color halfway between white and black — that is, gray.
(1.0, 0.5, 0.5) is halfway between red and white — that is, pink.
(0.5, 0.0, 0.0) is halfway between red and black — sort of a dark, grayed-out red.
(0.0, 1.0, 1.0) is bright cyan (green plus blue).
(1.0, 0.0, 1.0) is bright magenta (blue plus red).
(1.0, 1.0, 0.0) is bright yellow (red plus green).

That yellow example may surprise you. In art class I was taught that the primary colors are red, yellow, and blue, that one mixes yellow and blue to make green, and that red mixed with green yields a kind of brown — certainly not yellow. Indeed, that is how paint works, but it is not how light works. We'll discuss this in class.

Returning to our elevation model project, I need to tell you that the createWindow() function accepts grid data in two formats. In the first format, each datum is a single number indicating elevation; then the datum is displayed in white. This is what we've been doing. In the second format, each datum is a tuple or list of four numbers, indicating elevation, red, green, and blue; then the datum is displayed with the indicated RGB color instead of white. This is what we want now.

Modify processGrid() so that it replaces each elevation datum with a four-tuple of elevation, red, green, and blue. You can choose any color scheme you want, within the following requirements.

For example, when I did the assignment myself I colored low elevations using colors between green and yellow, and high elevations using colors between yellow and red.

Once you've modified processGrid() to do this, test it out on the 48by60.tsv data set. The elevation model should have exactly the same shape as it did earlier, but it should now appear in glorious color.

Interlude: Memory vs. Storage

Recall from earlier in the assignment that a file is a named location for storing information on disk. Also recall from Getting Started with Python that a variable is a named location for storing information in memory. These are similar concepts. To distinguish between them, we need to understand the distinction between memory and disk.

Memory or RAM is a device inside the computer that holds information as electrical charges in electronic circuitry. It is volatile, meaning that the information it stores vanishes when the computer's power is turned off. The computer can access information anywhere in memory very rapidly — it takes only a few nanoseconds (ns; billionths of a second). A computer's memory capacity is typically measured in gigabytes (GB; billions of bytes).

Storage or disk is another device inside a computer that holds information by magnetizing parts of a metal platter (disk). It is nonvolatile, so it doesn't lose its contents when the computer's power is turned off. A computer's storage capacity is measured in terabytes (TB; trillions of bytes). Accessing information on the disk takes as long as a few milliseconds (ms; thousandths of a second).

In short, memory is fast (about a million times as fast as storage), but storage is big (about a thousand times as big as memory) and nonvolatile. Storage is used to store information long-term, while memory is used to hold information as it's being worked on. Here's a good analogy for a student. You keep your frequently used books on your desk, so that you can access them quickly. You leave less frequently used books in the library; accessing them takes much longer because you have to wait for the library to open and then trudge over there and find them. When you move out of your room at the end of the year, your books will no longer be on your desk, but the library's books will still be in the library. Your desk is like memory, the library is like storage, and moving out is like shutting off the computer.

(The description of memory vs. storage that I've presented here is oversimplified in several ways. The speed and capacity figures are realistic as of 2010, but they change every year. In fact, a disk's access time can vary dramatically from one moment to the next, for reasons that I won't explain right now. There are forms of memory that are nonvolatile, that are rapidly increasing in popularity and may replace disks in the near future. A computer holds information in more places than just memory and disk — caches, registers, optical drives, across a network, etc. — and these interact in various ways. Also, no program that you write uses these resources in isolation; at any given time there are many programs running on your computer, all vying for the same memory and storage resources, with another program, the operating system, coordinating all of the activity. But the simple memory vs. storage dichotomy described here suffices for our purposes.)

Write loadGrid()

In this section we will write a fancier grid-loading function called loadGrid(). Whereas loadGridFile() loaded all of the file's data from disk into memory, loadGrid() will load just a sampling of the data. Here's the first line of the function definition:

def loadGrid(path, x, y, xSkip, ySkip, xCount, yCount):

The function takes seven inputs. The path input is a relative or absolute path, just as in loadGridFile(). The x input indicates the column of the file where sampling should begin, while y indicates the row where sampling should begin. Starting from there, the function samples the file by skipping xSkip columns and ySkip rows at a time, until it has built up a grid that is xCount samples wide and yCount samples tall.

For example, suppose that we want to sample from a file shown below. There are eight columns and ten rows, but we want only the nine entries shaded in gray. The first entry we want is in column 2 and row 1 (counting from 0, of course, and from the top left), so x is 2 and y is 1. We want to sample every second column and every third row, so xSkip is 2 and ySkip is 3. We want our resulting grid to be three samples wide and three samples tall, so xCount is 3 and yCount is 3. In this case, loadGrid() should return [[44, 41, 45], [23, 15, 64], [75, 44, 23]].

3411686944413838
7173444141724554
3611686863928678
6322328636123887
5561233615826437
1916226333998939
4946668755374291
6464758744962375
6418557877667611
6111576995142488

Here's another example. Earlier we loaded 48by60.tsv by doing loadGridFile("48by60.tsv"). Now we can load the same file using loadGrid(). The following call loads the entire file, because it starts at the top-left corner (as indicated by the 0, 0), hits every column and row (as indicated by the 1, 1), and samples from the entire file (because there are 48 columns and 60 rows in the file).

loadGrid("48by60.tsv", 0, 0, 1, 1, 48, 60)

Write your loadGrid() function now. Test it extensively on the 48by60.tsv file, with various kinds of sampling, to make sure that it's working correctly, before proceeding to the next step. You may assume that whoever is using your function knows not to overrun the size of the file; you do not need to handle bad calls such as

loadGrid("48by60.tsv", 0, 0, 1, 1, 70, 80)
loadGrid("48by60.tsv", 0, 0, 2, 2, 48, 60)
loadGrid("48by60.tsv", 43, 43, 1, 1, 48, 60)

The next step is to use real elevation data obtained from the GTOPO30 database of the U.S. Geological Survey. I have already downloaded a file, pre-processed it a bit, and placed it at the path "/Users/cs/cs111/northeast.tsv". The file has 4800 columns and 6000 rows. It's about 138 MB in size, so you do not want to copy it to your computer, load it into memory, and draw the entire thing at once. That is technically possible, but it would be a waste of our storage and network resources, and the drawing would be painfully slow. Instead, you want to sample from the file, leaving most of the data on disk and keeping only a tiny fraction of it in memory.

Specifically, use your loadGrid() function now to sample a 48-by-60 grid out of the file, by sampling every 100th datum. (Starting where?) The picture you get should be identical to the one you got from 48by60.tsv, because I constructed 48by60.tsv by sampling from /Users/cs/cs111/northeast.tsv in exactly this manner.

Once you've reproduced the 48by60.tsv picture using /Users/cs/cs111/northeast.tsv, play around a bit more. For example, suppose that you're interested in examining the islands in the southern end of Hudson Bay. There are plenty of data in the file to give you a much more detailed picture of these islands. Figure out a 480-by-600 region of the data set that contains the islands. Sample this 480-by-600 region at every 10th datum to build a new 48-by-60 grid. This should give you a picture that is 10 times as detailed as the preceding one, but covering only a tiny portion of North America; it's as if you've zoomed in on the landscape by a factor of 10. Then pick a feature on one of the islands, figure out a 48-by-60 region of the data that contains that feature, and sample that region at every datum. This should give you a picture that is 10 times as detailed as the preceding one, but covering an even smaller area; it's as if you've zoomed in again by a factor of 10.

Once you've tired of northeastern North America, try /Users/cs/cs111/southeast.tsv, which is also 4800 by 6000.

Submit Your Work

Submit your work electronically by Friday at the start of class. It will be graded according to these criteria.

Just to clarify: Your program must have a demo section demarcated by if __name__ == "__main__":. When the grader runs your file as a program, this demo code will run. When the grader imports your file into his/her own grading program, which might look like this...
import landscape
import elevation

grid = elevation.loadGrid("mysteryfile.tsv", ...)
print grid
print len(grid[0])
print len(grid)
raw_input("Press any key to continue...")
elevation.processGrid(grid)
print grid
print len(grid[0])
print len(grid)
raw_input("Press any key to continue...")
landscape.createWindow(grid, None, None)
...the demo code will not run.