2016 September 12,

CS 111: All Data are Numeric

We've already seen how Python can be used as a glorified calculator, to do arithmetic on numbers. In fact, manipulation of numbers is really all that a computer does. This is not obvious, because we use our computers to do so many things — send messages to friends, watch videos, and so on — that look nothing like arithmetic. But, fundamentally, all data inside a computer are numeric.

To begin investigating this point, let's make some graphics. First execute the following line of code, to load the turtle graphics system into memory. (Don't worry about what that means right now.)

import turtle

Then execute the following lines. You'll see a graphics window appear. Then a metaphorical turtle will draw a square in that window. (Don't worry about understanding or memorizing each line of code, although you can probably figure out what some of them do.)

fred = turtle.getturtle()
fred.fillcolor(1.0, 0.0, 0.0)
fred.begin_fill()
fred.forward(200)
fred.left(90)
fred.forward(200)
fred.left(90)
fred.forward(200)
fred.left(90)
fred.forward(200)
fred.end_fill()

In computer graphics, it is common to specify color as a combination of three primary colors: red, green, and blue. The amount of each primary is measured on a scale from 0.0 (none of the primary) to 1.0 (full intensity). The fred.fillcolor(1.0, 0.0, 0.0) line specifies that the fill color should be "full red, no green, no blue" — in other words, bright red.

Question 04A: Starting from fred = turtle.getturtle(), re-execute the code above, but this time making a bright green square. Then make a bright blue square. Then a bright magenta (purple) square.

Question 04B: What happens if you mix red and green?

Question 04C: How do you make a black square? A white square? A pink square? A dark red square?

At this point, here's what my window looks like. (Yours might look different, if you've closed your window along the way or you chose different colors.)

A computer screen is made of tiny squarish dots called pixels. When you view an image on a computer, you're viewing a grid of colors, each of which can be described as a combination of red, green, and blue. The picture I made above is basically a 2x2 image. For comparison, here is a 300x225 image.

A video is essentially a bunch of images, presented at 24 or 30 frames per second, say. So a video is a sequence of images, in which each image is a grid of pixels, in which each pixel's color is specified by three numbers. A video is a big list of numbers.

This is roughly how video exists inside your computer as it is being shown on your screen. A 1000x1000-pixel video at 30 frames per second that lasts 20 minutes requires...

20 * 60 * 30 * 1000 * 1000 * 3

...numbers total. If each number occupies a single byte (a unit of computer memory, which we'll learn about later), then that's about 100 gigabytes total. That's a lot to transmit over the Internet or store on a computer, even in 2016. So in practice the video is kept in a compressed format until just before it's displayed.

Perhaps the simplest kind of image compression is run-length encoding. Consider the following image. Notice how there are large swathes of unvarying color: the blue in the sky, the orange in the skin, the blue in the hair, etc. Such simple coloration is typical of illustrations and cartoons.

The idea of run-length encoding is: Instead of storing the same sky-blue pixel 130 times in a row, just store the command "130 sky blue", or rather the four numbers 130, 0.6, 0.65, 1.0. So, instead of requiring 390 numbers for those pixels, you need only 4 numbers. That's great compression. If you could maintain that compression ratio for an entire 100-gigabyte video, then the video would compress to about 1 gigabyte.

Question 04D: What kind of image would not be well-compressed by run-length encoding?

In practice, run-length encoding is almost never used to compress images, because there are far more effective methods, which are unfortunately harder to explain. As a final note, I'll add that the image above is compressed using the JPEG method, which causes some visible defects, especially around the transition from one color to another.