2010 February 19 / j d a v i s @ c a r l e t o n . e d u

Binary Numerals

A week or two ago we discussed how all information inside a computer (numbers, text, images, video, etc.) is ultimately represented as numbers. In this tutorial, we learn how numbers themselves are represented inside computers in binary. Concretely, we practice conversion among binary, decimal, and hexidecimal numerals, and arithmetic with binary numerals. We learn a little about floating point numbers and RGB colors. We also glimpse a recurring theme in the design of computers and algorithms: the tradeoff between time efficiency and space (memory) efficiency.


Recall from Assignment 6: Elevation Models the distinction between numbers and numerals. Numbers are mathematical objects subject to arithmetic operations (addition, subtraction, multiplication, and division), whereas numerals are the strings that we use to write numbers down. The distinction between numbers and numerals is important to any computer programmer, because numeric types (int, float in Python) have different capabilities than string types (str). For example, in Python 123 + 123 produces 246, while "123" + "123" produces "123123".

Throughout history, various cultures around the world have used various systems for representing numbers as numerals. For example, these three numerals all represent the same number:

When you carry out arithmetic calculations with numbers on paper, you probably use the standard algorithms that you learned in elementary school — "carry the 1" and all that. Those algorithms actually operate on the numerals that represent the numbers, rather than on the numbers themselves. Therefore your culture's choice of numeral system influences your culture's algorithms for arithmetic. It turns out that some numeral systems are more convenient — that is, produce more efficient algorithms &mdash than others. For example, isn't adding numbers in Arabic numerals ("1402" + "713") easier than adding numbers in plain English ("one thousand four hundred two" + "seven hundred thirteen")?

Modern digital computers represent numbers as binary numerals. Binary means base-2, as opposed to the decimal or base-10 Arabic numeral system that we use in everyday life. Of course, when a computer performs an arithmetic calculation, it doesn't actually write the numerals as symbols on paper; it represents them using different voltage levels in electrical circuits. In order to understand what the computer is doing, we'll perform the same calculations, but on paper.

Converting Between Decimal and Binary

In this section we learn how to convert between decimal (base-10) and binary (base-2) numerals. Although technically I should enclose every numeral in quotation marks, because it is a string, I will omit the quotation marks so that we don't go crazy.

For starters, it is useful to know the first few powers of 2 and of 10.


When you write a number as a numeral in base 10, you are expressing it as a weighted sum of powers of 10. For example,

1402=1000 + 400 + 2
=1 * 103 + 4 * 102 + 2 * 100
=1 * 103 + 4 * 102 + 0 * 101 + 2 * 100.

The digits of the numeral 1402 appear as the weights in these terms, in order: 1, 4, 0, 2. Each digit is between 0 and 10, including 0 but not including 10. That is, the possible digits in base 10 are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

Similarly, when you write a number as a numeral in base 2, you express it as a weighted sum of powers of 2. The digits in the numeral are between 0 and 2, including 0 but not including 2. That is, the possible digits in base 2 are just 0 and 1. These are called the binary digits, or bits for short. For example,

1402=1024 + 256 + 64 + 32 + 16 + 8 + 2
=1 * 210 + 1 * 28 + 1 * 26 + 1 * 25 + 1 * 24 + 1 * 23 + 1 * 2 1
=1 * 210 + 0 * 29 + 1 * 28 + 0 * 27 + 1 * 26 + 1 * 25 + 1 * 24 + 1 * 23 + 0 * 22 + 1 * 21 + 0 * 20.

To obtain the base-2 representation of 1402, we just read off the digits: 10101111010. In order to keep things straight between different bases, we attach subscripts to the numerals, indicating which base they're written in. For example,

140210 = 101011110102.

Some remarks: First, when we say that 140210 = 101011110102, we are not saying that the numerals 1402 and 10101111010 are equal (as strings); we're saying that the numbers that they represent are equal, in their respective bases. Second, when we write a numeral without a subscript, it will be understood to be in base 10, by default. Third, the subscripts are themselves numerals, which we interpret to be in base 10 (because they don't have subscripts themselves).

To convert from base-2 to base-10 we just reverse the process. For example,

10110012=1 * 26 + 0 * 25 + 1 * 24 + 1 * 23 + 0 * 22 + 0 * 21 + 1 * 20
=64 + 16 + 8 + 1

Addition in Binary

Recall the standard algorithm for addition in base 10. You add the digits by column, going from right to left. As long as each column sum is less than 10, there is no problem. Whenever a column sum is 10 or greater, you carry a 1 into the next colum, and put the leftover in the current column. For example, here I'm adding 1402 and 713, and I have to carry a 1:




Addition in base 2 is similar. You add the bits by column from right to left. Whenever a column sum is 2 or greater, you carry a 1 and put the leftover in the current column. For example, here I add 010011102 and 001111012 and get an answer of 100010112:




You can add more than two numerals at a time in base 2, just as you do in base 10. When you add more than two numbers together, you sometimes have to carry multiple digits. In order to make this clear, let's do a big base-10 example first. In this example, I have to carry a 7 from the first column into the second column, I have to carry an 11 from the second column into the third and fourth columns, and I have to carry a 2 from the third column into the fourth column. Try it yourself, to make sure you understand how the carries work.




Here's an example in binary. I'm adding four numbers. I have to carry a 12 from the second column to the third, a 102 from the third column to the fourth and fifth, and a 12 from the fourth column to the fifth. Try it yourself.




Multiplication in Binary

Recall the standard algorithm for multiplying two numerals in base 10. You multiply the first numeral by each digit of the second numeral, lining up the products according to which digit of the second numeral produced them. Then you add the products by column. For example,




Multiplication in base 2 is similar. You multiply the first numeral by each bit of the second numeral, lining up the products according to the bit that produced them. In fact, base-2 multiplication is easier than base-10 multiplication in one aspect. Each bit is either 0 or 1, so multiplication by a bit means that you simply copy the number or not. For example,




Word Size and Overflow

In base 10, small numbers have numerals with only a few digits, while large numbers have numerals with many digits. The same is true in base 2; the number 6 can be stored in three bits (6 = 1102), while the number 338 requires nine bits (338 = 1010100102). Suppose that I have a program with two variables, like this:

x = 6
y = 338

Remember that a variable is a named location in the computer's memory. That is, the 6 and 338 have to be stored in memory somewhere. Storing them requires 12 bits of memory, if we pack the two numerals together:


However, packing numerals together like this causes problems. When we read 110101010010 out of memory, how do we know that it should be split up as 110 followed by 101010010, as opposed to 110101 followed by 010010 or 110101010 followed by 010? There is no way of knowing, unless we have somehow recorded information about how the numerals are packed in memory, in addition to the numerals themselves. And what if later in my program I change x to have value 12 = 11002 instead of 6 = 1102?

x = 6
y = 338
x = 12

Now I need four bits to store the value for x, but I've only allocated three bits to the task in memory, because that's all I used to need. I'm going to have to shift the nine bits of y to the right by one, to make room for the larger x:


In a tiny program with only two variables, moving the memory around is not a big deal. But in a realistic program, which might have a million variables, you don't want to be moving memory around every time you change the value of a variable. That would be horrendously slow. We need a better strategy.

The standard way to deal with numerals of varying length is simply to force all numerals to have the same length, by padding them out with zeros. This fixed length is called the word size of the computer. For example, in my home I have several computers of various ages. Two of my computers store numbers as 32-bit words; on these computers, the 6 and 338 would be stored in memory as


My newest computer uses 64-bit words, so that in memory the 6 and 338 look like


This extra padding is obviously wasteful of memory, but it provides a systematic way of storing numbers, which allows the computer to write and read numbers very quickly. That is, it sacrifices some space efficiency for the sake of time efficiency. Space-vs-time tradeoffs are quite common in the design of computers and algorithms.

For the sake of argument, suppose now that we're working on a 4-bit computer (such as the Intel 4004, which was state-of-the-art in 1971). Suppose that we want to add 9 = 10012 and 5 = 01012. (Notice that I've padded 1012 out to four bits, since I'm on a 4-bit computer.) Here's the answer:



No problem; 11012 = 14 is the correct answer. But now suppose that we want to add 9 and 12 = 11002:



The answer 21 = 101012 cannot be stored in four bits; the answer has overflowed the word size of the computer. When this happens, the answer is truncated down to the word size — in our example, 101012 is truncated down to 01012 — and the computer signals that an overflow error has occurred.

Overflow is always a problem when doing numerical calculations on computers, but you and I don't run into it very often in our daily lives. This is because we typically use 32-bit computers, which overflow only when the numbers exceed 232 - 1 = 4294967295, and we rarely do calculations with such large numbers. (A 64-bit computer will overflow even less often — only when a number exceeds 264 - 1 = 18446744073709551615.)

But what do we do, if we need to perform a calculation with extremely large numbers, that don't fit into our computer's word size? We have to adopt some more complicated system, such as representing large numbers as lists of word-size numbers and writing functions to perform arithmetic on large numbers represented as such lists. Python comes with this functionality built-in, and uses it automatically as it's needed. For example, there's no way I should be able to compute 2100 on my 64-bit computer, but Python lets me:

>>> 2**21
>>> 2**100

Notice the L at the end of the second result. This signals that the result is too large for the word size of the computer and is being handled by special algorithms for manipulating large numbers. For the most part we don't care that Python is handling large numbers in a special way; we just want it to give us the answers to all of our questions. But I can think of two reasons why Python should let us know. First, calculations that fit into the underlying computer's word size are handled by the underlying computer's arithmetic system, which is screamingly fast compared to the large-number algorithms. So we might care, if speed is an issue. Second, if we want to copy and paste the result into another program, such as a spreadsheet, then we should be aware that the other program might have problems handling such big numbers.

Disclaimer: The preceding discussion of word size has oversimplified several points and omitted many other points. However, for an introductory computer science that uses the Python programming language it's roughly what you need to know. If we were using the C programming language, for example, then we would have to go into memory issues in much greater detail. Essentially, Carleton teaches CS 111 using Python because Python lets us do interesting things without worrying about a lot of low-level issues.

Negative Numbers: Two's Complement

How can we represent negative numbers in a computer? Well, our Arabic numeral system represents negative numbers by preceding them with a negation sign "-"; the negation of 89 is -89. On paper, we can use similar notation for negative numbers in binary: the negation of 10100012 is -10100012. But deep inside a digital computer there isn't any symbol "-"; there are just bits. So in order to represent negative numbers in a computer, we need to represent "-" using bits somehow. Here's an idea. Let "+" be represented by the bit 0 and "-" by the bit 1. Then we can represent +89 and -89 as 010100012 and 110100012. This is called the sign bit system, because it reserves the first bit in a numeral for recording the sign.

The sign bit system is not crazy — one could build a computer this way — but nobody does, because there is a better way, called two's complement. In two's complement, numerals representing positive numbers (and 0) begin with 0, and numerals representing negative numbers begin with 1 — just as in the sign bit system — but the rest of the bits are organized very differently. The details depend on how many bits are in play — that is, on the word size of the computer. For the moment, suppose that we're working on a 4-bit computer.

To understand two's complement, start counting from 0 in binary. You can do this by starting at 00002 and repeatedly adding 00012, or by counting in decimal and converting decimal numerals to binary:

00002 = 0,
00012 = 1,
00102 = 2,
00112 = 3,
01002 = 4,
01012 = 5,
01102 = 6,
01112 = 7,
10002 = 8,
10012 = 9,
10102 = 10,
10112 = 11,
11002 = 12,
11012 = 13,
11102 = 14,
11112 = 15,
00002 = 0.

What happened at the end? We added 00012 to 11112, overflowed, and truncated back to 00002. We can view this as an error or as an opportunity. The opportunistic view is that 00002 is the number after 11112, so 11112 is the number before 00002. That is, 11112 is the number before 0. So 11112 must be -1. We just pulled negative numbers out of the air.

Continuing the idea, 11102 must be -2, because it is the number before -1, and 11012 must be -3, and so on. In fact, we can count back all the way to 00012 and decree that it is -15, or to 00002 and decree that it is -16. But standard practice is to push the negative numbers back only halfway through the sequence, so that all of the numerals starting with 1 represent negative numbers and all of the numerals starting with 0 represent nonnegative numbers.

To clarify: There are actually two ways to encode numbers into 4-bit words. The first way regards the binary numerals from 00002 to 11112 as encoding the numbers from 0 to 15; it is unable to represent any negative numbers. The second way regards the binary numerals from 00002 to 00072 as encoding the numbers from 0 to 7 and the binary numerals from 10002 to 11112 as encoding the numbers from -8 to -1. These two systems, called unsigned and two's complement, are summarized for 4-bit words in the following table.

BinaryUnsignedTwo's Complement

Here is (some of) the corresponding table for 8-bit words:

BinaryUnsignedTwo's Complement

Negation and Subtraction in Binary

There is a simple procedure for computing the negation of a numeral:

  1. Flip all of the bits; that is, change all 0s to 1s and all 1s to 0s.
  2. Add 1 to the resulting number with truncation.
For example, with an 8-bit word size 89 = 010110012 and -89 is

-010110012=101001102 + 12

You can check this by adding 010110012 and 101001112; you get the 9-bit answer 1000000002, which after truncation is the 8-bit numeral 000000002. This makes sense, because the sum of a number and its negation must be 0.

Subtraction is easy, once we have negation. To compute a - b, you simply compute -b and then add the result to a with truncation. This works because of the algebraic rule a + -b = a - b. For example,

010001112 - 010100012=010001112 + -010100012
=010001112 + 101011112

Suppose now that we want to convert 111101102 to decimal. First we notice that 111101102 is negative, because the first bit is 1. If we negate it, we get

-111101102=000010012 + 12

which is 10. Therefore 111101102 = -10.

By the way, the rule a + -b = a - b holds in two's complement only as long as no overflow occurs. Also, this fundamental algebraic rule does not hold in the sign bit system at all; this is a major reason why the sign bit system is not used.

Conversion Between Binary and Hexadecimal (and Decimal)

Hexadecimal numerals are written not in base 10 or base 2 but in base 16. They require 16 different digits; to write in hexadecimal we need six more digits than we're used to. By convention, the six extra digits are A, B, C, D, E, F. The following table summarizes the conversion among binary, decimal, and hexadecimal for the numbers from 0 to 15.


Conversion from binary to hexadecimal is really easy: You just convert each block of four bits to the corresponding hexadecimal digit. For example,

10110010000011112 = 1011 0010 0000 11112 = B20F16.

Similarly, to convert from hexadecimal to binary, you just expand each hexadecimal digit as four bits:

EB6A16 = 1110 1011 0110 10102 = 11101011011010102.

Converting between base-2 and base-16 numerals is a lot easier than converting between base-2 and base-10 numerals. The reason is that 16 is a power of 2 — 16 = 24, so four base-2 digits correspond perfectly to one base-16 digit — while 10 is not a power of 2, so there is no simple correspondence between decimal and binary digits.

There are two common approaches for converting base-10 numerals to base-16 numerals. First, you could expand the base-10 number as a weighted sum of powers of 16, and then read off the hexadecimal digits. For example,

1402=5 * 256 + 7 * 16 + 10 * 1
=5 * 162 + 7 * 161 + 10 * 160

Second, you could convert base-10 to base-2, and then base-2 to base-16. For example,

=0101 0111 10102

I usually follow the second approach, because I'm better at powers of 2 than powers of 16, and the extra step of converting binary to hexadecimal is not much work.

Floating-Point Numbers

Thus far we have talked exclusively about integers; we don't know anything about representing other real numbers such as 3.14159. In order to understand these numbers, consider this base-10 example:

3.14159=3 + 0.1 + 0.04 + 0.001 + 0.0005 + 0.00009
=3 * 100 + 1 * 10-1 + 4 * 10-2 + 1 * 10-3 + 5 * 10-4 + 9 * 10-5.

The digits after the decimal point are simply coefficients on negative powers of 10. Now you can probably guess how to compute the binary expansion of 3.14159; you just need to use negative powers of 2. Here's a table of the first few.


Unfortunately, the binary expansion of 3.14159 requires infinitely many bits; I'll just show the first ten fractional bits.

3.14159=2 + 1 + 0.125 + 0.0078125 + ...
=1 * 21 + 1 * 20 + 0 * 2-1 + 0 * 2-2 + 1 * 2-3 + 0 * 2-4 + 0 * 2-5 + 1 * 2-6 + 0 * 2-7 + 0 * 2-8 + 0 * 2-9 + 0 * 2-10 + ...

Now we know how to write non-integer numbers as binary numerals. But how do we store these numerals as bits in a computer? The problem is that there is no "." inside the computer; there are just bits. To overcome this obstacle we use scientific notation. In scientific notation, a number is represented in three parts:

For example, the numeral -73220.5 is written in base-10 scientific notation as -7.32205 * 104. Here the sign is "-", the significand is "7.32205", and the exponent is "4". Similarly, the base-2 numeral 1011.012 is written in base-2 scientific notation as +1.011012 * 23. For another example, 0.01112 is written as +1.112 * 2-2. (The number 0 must be treated as a special case in scientific notation, because it has no nonzero digits at all.)

Many (but not all) computers in use today store floating-point numbers according to one of two binary formats, that are defined by the IEEE 754 standard. The single-precision format uses 32 bits:

The double-precision format uses 64 bits:

For example,

5.8125=4 + 1 + 0.5 + 0.25 + 0.0625
=1 * 22 + 0 + 21 + 1 * 20 + 1 * 2-1 + 1 * 2-2 + 0 * 2-3 + 1 * 2-4
=+1.0111012 * 22.

In single-precision floating point, this is stored as the 32-bit numeral

0 10000001 01110100000000000000000

The first bit indicates +, the next eight bits are the unsigned integer 129 (which is 2 plus 127), and the last 23 bits store the fractional part 0111012 of the significand (with a bunch of trailing 0s). If you'd like to experiment with the floating-point representations of other numbers, try this automatic converter.

The IEEE 754 standard goes on to define how arithmetic operations are performed on numbers written on these formats. Because rounding is inevitable and can have serious (and subtle) effects on the results of calculations, there is extensive discussion in the standard of how and when to perform rounding in floating-point calculations. We will not cover the topic any more in this course.

Application: 24-Bit RGB Color

There are infinitely many different colors in the world, but experimental studies of human vision have determined that the human eye is sensitive to only about 10 million of them. The eye is not able to detect the difference between two minutely different shades of lime green, for example. Now consider how this might affect the design of a computer's graphics system. 10 million is less than 224, so 24 bits should be able to encode all of the different colors visible to the human eye.

We want to store colors in 24 bits and we want to store colors in RGB format. So we have eight bits to use for each channel R, G, and B. The unsigned integers that can be represented in eight bits are all of the integers between 0 and 255. So we end up measuring R, G, and B each on a scale of 0 to 255. You've already worked with RGB colors like this, in the cImage module (from Assignment 7: Inking); it uses 24-bit RGB color.

In contrast, 3D graphics systems such as the Bopagopa module (which we used in Assignment 6: Elevation Models) typically measure R, G, and B using floating-point numbers on a scale from 0.0 to 1.0. How much memory is used if these floating-point numbers are single-precision? If they're double-precision? Either way, it's much more memory than the 24 bits that are theoretically sufficient. So why do 3D graphics systems use floating-point? It's because they do a great deal of computation with colors. For example, the final color of a 3D-rendered pixel may depend on the color of the object being rendered, the color of the virtual light in the scene, the angle at which the light is striking the surface of the object, and the angle at which the virtual camera is viewing the surface. These complicated calculations are more quickly and conveniently done in floating-point. In short, 3D graphics systems choose their RGB representation to favor time-efficiency over space-efficiency.

Recall that web pages are written in a language called HTML. In HTML, one can specify various colors on a web page: the background color of the page, the color of the text, the color of a particular feature on the page, etc. These colors are specified in HTML as 24-bit RGB colors, and they are written in hexadecimal rather than binary. That is, each channel is specified using two hexadecimal digits, for a total of six hexadecimal digits. I'll give you two examples now.

Storing RGB colors as 24-bit blocks doesn't align very well with the 32-bit word size of a typical computer. In fact, on many systems a 24-bit RGB color is actually stored as a 32-bit word. The eight extra bits essentially constitute a fourth channel of information, which is often called the alpha channel. An RGB color with an attached alpha channel is called an RGBA color. Sometimes the alpha channel is simply ignored; the eight bits are wasted. When it is not ignored, the alpha channel is frequently used to encode opacity. That is, an alpha value of 0 represents a completely transparent color, while an alpha value of 255 represents a completely opaque color. Software can use this extra opacity information to create interesting effects, such as the blending of one image into another.