2016 October 3,

CS 111: Binary Numerals

Introduction

Recently in class, we have discussed how all information inside a computer (numbers, text, images, video, etc.) is ultimately represented as numbers. In this tutorial, we learn how numbers themselves are represented inside computers in binary. Concretely, we practice conversion among binary, decimal, and hexidecimal numerals, and arithmetic with binary numerals. We learn a little about floating point numbers and RGB colors. We also glimpse a recurring theme in the design of computers and algorithms: the tradeoff between time efficiency and space (memory) efficiency.

In understanding binary notation, it is important to recognize the distinction between numbers and numerals. Numbers are mathematical objects subject to arithmetic operations (addition, subtraction, multiplication, and division), whereas numerals are the strings that we use to write numbers down. The distinction between numbers and numerals is important to any computer programmer, because numeric types (int, float in Python) have different capabilities than string types (str). For example, in Python 123 + 123 produces 246, while "123" + "123" produces "123123".

Throughout history, various cultures around the world have used various systems for representing numbers as numerals. For example, these three numerals all represent the same number:

When you carry out arithmetic calculations with numbers on paper, you probably use the standard algorithms that you learned in elementary school — "carry the 1" and all that. Those algorithms actually operate on the numerals that represent the numbers, rather than on the numbers themselves. Therefore your culture's choice of numeral system influences your culture's algorithms for arithmetic. It turns out that some numeral systems are more convenient — that is, produce more efficient algorithms — than others. For example, isn't adding numbers in Arabic numerals ("1402" + "713") easier than adding numbers in plain English ("one thousand four hundred two" + "seven hundred thirteen")?

Modern digital computers represent numbers as binary numerals. Binary means base-2, as opposed to the decimal or base-10 Arabic numeral system that we use in everyday life. Of course, when a computer performs an arithmetic calculation, it doesn't actually write the numerals as symbols on paper; it represents them using different voltage levels in electrical circuits. In order to understand what the computer is doing, we'll perform the same calculations, but on paper.

Converting Between Decimal and Binary

In this section we learn how to convert between decimal (base-10) and binary (base-2) numerals. Although technically I should enclose every numeral in quotation marks, because it is a string, I will omit the quotation marks so that we don't go crazy.

For starters, it is useful to know the first few powers of 2 and of 10.

n012345678910...
2n12481632641282565121024...
10n110100100010000100000100000010000000100000000100000000010000000000...

When you write a number as a numeral in base 10, you are expressing it as a weighted sum of powers of 10. For example,

1402=1000 + 400 + 2
=1 * 103 + 4 * 102 + 2 * 100
=1 * 103 + 4 * 102 + 0 * 101 + 2 * 100.

The digits of the numeral 1402 appear as the weights in these terms, in order: 1, 4, 0, 2. Each digit is between 0 and 10, including 0 but not including 10. That is, the possible digits in base 10 are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

Similarly, when you write a number as a numeral in base 2, you express it as a weighted sum of powers of 2. The digits in the numeral are between 0 and 2, including 0 but not including 2. That is, the possible digits in base 2 are just 0 and 1. These are called the binary digits, or bits for short. For example,

1402=1024 + 256 + 64 + 32 + 16 + 8 + 2
=1 * 210 + 1 * 28 + 1 * 26 + 1 * 25 + 1 * 24 + 1 * 23 + 1 * 2 1
=1 * 210 + 0 * 29 + 1 * 28 + 0 * 27 + 1 * 26 + 1 * 25 + 1 * 24 + 1 * 23 + 0 * 22 + 1 * 21 + 0 * 20.

To obtain the base-2 representation of 1402, we just read off the digits: 10101111010. In order to keep things straight between different bases, we attach subscripts to the numerals, indicating which base they're written in. For example,

140210 = 101011110102.

Some remarks: First, when we say that 140210 = 101011110102, we are not saying that the numerals 1402 and 10101111010 are equal (as strings); we're saying that the numbers that they represent are equal, in their respective bases. Second, when we write a numeral without a subscript, it will be understood to be in base 10, by default. Third, the subscripts are themselves numerals, which we interpret to be in base 10 (because they don't have subscripts themselves).

To convert from base-2 to base-10 we just reverse the process. For example,

10110012=1 * 26 + 0 * 25 + 1 * 24 + 1 * 23 + 0 * 22 + 0 * 21 + 1 * 20
=64 + 16 + 8 + 1
=89.

Addition in Binary

Recall the standard algorithm for addition in base 10. You add the digits by column, going from right to left. As long as each column sum is less than 10, there is no problem. Whenever a column sum is 10 or greater, you carry a 1 into the next colum, and put the leftover in the current column. For example, here I'm adding 1402 and 713, and I have to carry a 1:

1

1402
+0713

2115

Addition in base 2 is similar. You add the bits by column from right to left. Whenever a column sum is 2 or greater, you carry a 1 and put the leftover in the current column. For example, here I add 010011102 and 001111012 and get an answer of 100010112:

1111

01001110
+00111101

10001011

You can add more than two numerals at a time in base 2, just as you do in base 10. When you add more than two numbers together, you sometimes have to carry multiple digits. In order to make this clear, let's do a big base-10 example first. In this example, I have to carry a 7 from the first column into the second column, I have to carry an 11 from the second column into the third and fourth columns, and I have to carry a 2 from the third column into the fourth column. Try it yourself, to make sure you understand how the carries work.

2
117

277
288
195
497
575
094
186
298
293
179
492
099
+150

3623

Here's an example in binary. I'm adding four numbers. I have to carry a 12 from the second column to the third, a 102 from the third column to the fourth and fifth, and a 12 from the fourth column to the fifth. Try it yourself.

1
101

1011
0110
1100
+1110

101011

Multiplication in Binary

Recall the standard algorithm for multiplying two numerals in base 10. You multiply the first numeral by each digit of the second numeral, lining up the products according to which digit of the second numeral produced them. Then you add the products by column. For example,

1402
*0713

4206
1402
9814
0000

0999626

Multiplication in base 2 is similar. You multiply the first numeral by each bit of the second numeral, lining up the products according to the bit that produced them. In fact, base-2 multiplication is easier than base-10 multiplication in one aspect. Each bit is either 0 or 1, so multiplication by a bit means that you simply copy the number or not. Once you have all of the products lined up, you add them by column. For example,

1101
*0110

0000
1101
1101
0000

1001110

Word Size

In base 10, small numbers have numerals with only a few digits, while large numbers have numerals with many digits. The same is true in base 2; the number 6 can be stored in three bits (6 = 1102), while the number 338 requires nine bits (338 = 1010100102). Suppose that I have a program with two variables, like this:

x = 6
y = 338

Remember that a variable is a named location in the computer's memory. That is, the 6 and 338 have to be stored in memory somewhere. Storing them requires 12 bits of memory, if we pack the two numerals together:

110101010010

However, packing numerals together like this causes problems. When we read 110101010010 out of memory, how do we know that it should be split up as 110 followed by 101010010, as opposed to 110101 followed by 010010 or 110101010 followed by 010? There is no way of knowing, unless we have somehow recorded information about how the numerals are packed in memory, in addition to the numerals themselves. And what if later in my program I change x to have value 12 = 11002 instead of 6 = 1102?

x = 6
y = 338
...
x = 12

Now I need four bits to store the value for x, but I've only allocated three bits to the task in memory, because that's all I used to need. I'm going to have to shift the nine bits of y to the right by one, to make room for the larger x:

1100101010010

In a tiny program with only two variables, moving the memory around is not a big deal. But in a realistic program, which might have a million variables, you don't want to be moving memory around every time you change the value of a variable. That would be horrendously slow. We need a better strategy.

The standard way to deal with numerals of varying length is simply to force all numerals to have the same length, by padding them out with zeros. This fixed length is called the word size of the computer. For example, in my home I have several computers of various ages. Two of my older computers store numbers as 32-bit words; on these computers, the 6 and 338 would be stored in memory as

0000000000000000000000000000011000000000000000000000000101010010

My newer computers use 64-bit words, so that in memory the 6 and 338 look like

00000000000000000000000000000000000000000000000000000000000001100000000000000000000000000000000000000000000000000000000101010010

If I change x from 6 to 12, then the memory looks like this:

00000000000000000000000000000000000000000000000000000000000011000000000000000000000000000000000000000000000000000000000101010010

This extra padding is obviously wasteful of memory, but it provides a systematic way of storing numbers, which allows the computer to write and read numbers very quickly. When a variable's value is changed, we don't have to shift the rest of memory, to make room for that variable; we just have to alter the chunk where that variable is stored. That is, the system sacrifices some space efficiency for the sake of time efficiency. Space-vs-time tradeoffs are quite common in the design of computers and algorithms.

Overflow

For the sake of argument, suppose now that we're working on a 4-bit computer (such as the Intel 4004, which was state-of-the-art in 1971). Suppose that we want to add 9 = 10012 and 5 = 01012. (Notice that I've padded 1012 out to four bits, since I'm on a 4-bit computer.) Here's the answer:

1001
+0101

1101

No problem; 11012 = 14 is the correct answer. But now suppose that we want to add 9 and 12 = 11002:

1001
+1100

10101

The answer 21 = 101012 cannot be stored in four bits; the answer has overflowed the word size of the computer. When this happens, the answer is truncated down to the word size. In our example, 101012 is truncated down to 01012.

Overflow is always a problem when doing numerical calculations on computers, but you and I don't run into it very often in our daily lives. This is because we typically use 64- or 32-bit computers. A 32-bit computer overflows only when the numbers exceed 232 - 1 = 4294967295, and we rarely do calculations with such large numbers. A 64-bit computer overflows even less often — only when a number exceeds 264 - 1 = 18446744073709551615.

But what do we do, if we need to perform a calculation with extremely large numbers, that don't fit into our computer's word size? We have to adopt some more complicated system, such as representing large numbers as lists of word-size numbers and writing functions to perform arithmetic on large numbers represented as such lists. Python comes with this functionality built-in, and uses it automatically as it's needed. For example, there's no way I should be able to compute 2100 on my 64-bit computer, but Python automatically handles it for me:

>>> 2**21
2097152
>>> 2**100
1267650600228229401496703205376

Disclaimer: The preceding discussion of word size has oversimplified several points and omitted many other points. However, for an introductory computer science that uses the Python programming language it's roughly what you need to know. If we were using the C programming language, for example, then we would have to go into memory issues in much greater detail. Essentially, Carleton teaches CS 111 using Python because Python lets us do interesting things without worrying about a lot of low-level issues.

Negative Numbers: Two's Complement

How can we represent negative numbers in a computer? Well, our Arabic numeral system represents negative numbers by preceding them with a negation sign "-"; the negation of 89 is -89. On paper, we can use similar notation for negative numbers in binary: the negation of 10100012 is -10100012. But deep inside a digital computer there isn't any symbol "-"; there are just bits. So in order to represent negative numbers in a computer, we need to represent "-" using bits somehow. Here's an idea. Let "+" be represented by the bit 0 and "-" by the bit 1. Then we can represent +89 and -89 as 010100012 and 110100012. This is called the sign bit system, because it reserves the first bit in a numeral for recording the sign.

The sign bit system is not crazy — one could build a computer this way — but nobody does, because there is a better way, called two's complement. In two's complement, numerals representing positive numbers (and 0) begin with 0, and numerals representing negative numbers begin with 1 — just as in the sign bit system — but the rest of the bits are organized very differently. The details depend on how many bits are in play — that is, on the word size of the computer. For the moment, suppose that we're working on a 4-bit computer.

To understand two's complement, start counting from 0 in binary. You can do this by starting at 00002 and repeatedly adding 00012, or by counting in decimal and converting decimal numerals to binary:

00002 = 0,
00012 = 1,
00102 = 2,
00112 = 3,
01002 = 4,
01012 = 5,
01102 = 6,
01112 = 7,
10002 = 8,
10012 = 9,
10102 = 10,
10112 = 11,
11002 = 12,
11012 = 13,
11102 = 14,
11112 = 15,
00002 = 0.

What happened at the end? We added 00012 to 11112. The answer overflowed and truncated back to 00002. We can view this as an error or as an opportunity. The opportunistic view is that 00002 is the number after 11112, so 11112 is the number before 00002. That is, 11112 is the number before 0. So 11112 must be -1. We just pulled negative numbers out of the air.

Continuing the idea, 11102 must be -2, because it is the number before -1, and 11012 must be -3, and so on. In fact, we can count back all the way to 00012 and decree that it is -15, or to 00002 and decree that it is -16. But standard practice is to push the negative numbers back only halfway through the sequence, so that all of the numerals starting with 1 represent negative numbers and all of the numerals starting with 0 represent nonnegative numbers.

To clarify: There are actually two ways to encode numbers into 4-bit words. The first way regards the binary numerals from 00002 to 11112 as encoding the numbers from 0 to 15; it is unable to represent any negative numbers. The second way regards the binary numerals from 00002 to 01112 as encoding the numbers from 0 to 7 and the binary numerals from 10002 to 11112 as encoding the numbers from -8 to -1. These two systems, called unsigned and two's complement, are summarized for 4-bit words in the following table.

BinaryUnsigned DecimalTwo's Complement Decimal
000000
000111
001022
001133
010044
010155
011066
011177
10008-8
10019-7
101010-6
101111-5
110012-4
110113-3
111014-2
111115-1
000000

Here is (some of) the corresponding table for 8-bit words:

BinaryUnsigned DecimalTwo's Complement Decimal
0000000000
0000000111
0000001022
0000001133
.........
01111100124124
01111101125125
01111110126126
01111111127127
10000000128-128
10000001129-127
10000010130-126
10000011131-125
.........
11111100252-4
11111101253-3
11111110254-2
11111111255-1
0000000000

Negation and Subtraction in Two's Complement

There is a simple procedure for computing the negation of a numeral:

  1. Flip all of the bits; that is, change all 0s to 1s and all 1s to 0s.
  2. Add 1 to the resulting number with truncation.

For example, with an 8-bit word size, 89 = 010110012, and we can compute -89 as

-010110012=101001102 + 12
=101001112.

You can check this by adding 010110012 and 101001112; you get the 9-bit answer 1000000002, which after truncation is the 8-bit numeral 000000002. This makes sense, because the sum of a number and its negation must be 0.

Subtraction is easy, once we have negation. To compute a - b, you simply compute -b and then add the result to a with truncation. This works because of the algebraic rule a + -b = a - b. For example,

010001112 - 010100012=010001112 + -010100012
=010001112 + 101011112
=111101102.

Suppose now that we want to convert 111101102 to decimal. First we notice that 111101102 is negative, because the first bit is 1. If we negate it, we get

-111101102=000010012 + 12
=000010102,

which is 10. Therefore 111101102 = -10.

By the way, the rule a + -b = a - b holds in two's complement only as long as no overflow occurs. If overflow occurs, then certain incorrect results show up, instead of the correct result. Also, this fundamental algebraic rule does not hold in the sign bit system at all; this is a major reason why the sign bit system is not used.

Conversion Between Binary and Hexadecimal (and Decimal)

Hexadecimal numerals are written not in base 10 or base 2 but in base 16. They require 16 different digits; to write in hexadecimal we need six more digits than we're used to. By convention, the six extra digits are A, B, C, D, E, F. The following table summarizes the conversion among binary, decimal, and hexadecimal for the numbers from 0 to 15.

BinaryDecimalHexadecimal
000000
000111
001022
001133
010044
010155
011066
011177
100088
100199
101010A
101111B
110012C
110113D
111014E
111115F

Conversion from binary to hexadecimal is really easy: You just convert each block of four bits to the corresponding hexadecimal digit. For example,

10110010000011112 = 1011 0010 0000 11112 = B20F16.

Similarly, to convert from hexadecimal to binary, you just expand each hexadecimal digit as four bits:

EB6A16 = 1110 1011 0110 10102 = 11101011011010102.

Converting between base-2 and base-16 numerals is a lot easier than converting between base-2 and base-10 numerals. The reason is that 16 is a power of 2 — 16 = 24, so four base-2 digits correspond perfectly to one base-16 digit — while 10 is not a power of 2, so there is no simple correspondence between decimal and binary digits.

There are two common approaches for converting base-10 numerals to base-16 numerals. First, you could expand the base-10 number as a weighted sum of powers of 16, and then read off the hexadecimal digits. For example,

1402=5 * 256 + 7 * 16 + 10 * 1
=5 * 162 + 7 * 161 + 10 * 160
=57A16.

Second, you could convert base-10 to base-2, and then base-2 to base-16. For example,

1402=101011110102
=0101 0111 10102
=57A16.

I usually follow the second approach, because I'm better at powers of 2 than powers of 16, and the extra step of converting binary to hexadecimal is not much work.

Floating-Point Numbers

Thus far we have talked exclusively about integers; we don't know anything about representing other real numbers such as 3.14159. In order to understand these numbers, consider this base-10 example:

3.14159=3 + 0.1 + 0.04 + 0.001 + 0.0005 + 0.00009
=3 * 100 + 1 * 10-1 + 4 * 10-2 + 1 * 10-3 + 5 * 10-4 + 9 * 10-5.

The digits after the decimal point are simply coefficients on negative powers of 10. Now you can probably guess how to compute the binary expansion of 3.14159; you just need to use negative powers of 2. Here's a table of the first few.

n-1-2-3-4-5-6-7-8-9-10...
2n0.50.250.1250.06250.031250.0156250.00781250.003906250.0019531250.0009765625...
10n0.10.010.0010.00010.000010.0000010.00000010.000000010.0000000010.0000000001...

Unfortunately, the binary expansion of 3.14159 requires infinitely many bits; I'll just show the first ten fractional bits.

3.14159=2 + 1 + 0.125 + 0.0078125 + ...
=1 * 21 + 1 * 20 + 0 * 2-1 + 0 * 2-2 + 1 * 2-3 + 0 * 2-4 + 0 * 2-5 + 1 * 2-6 + 0 * 2-7 + 0 * 2-8 + 0 * 2-9 + 0 * 2-10 + ...
=11.0010010000...2.

Now we know how to write non-integer numbers as binary numerals. But how do we store these numerals as bits in a computer? The problem is that there is no "." inside the computer; there are just bits. To overcome this obstacle we use scientific notation. In scientific notation, a number is represented in three parts:

For example, the numeral -73220.5 is written in base-10 scientific notation as -7.32205 * 104. Here the sign is "-", the significand is "7.32205", and the exponent is "4". Similarly, the base-2 numeral 1011.012 is written in base-2 scientific notation as +1.011012 * 23. For another example, 0.01112 is written as +1.112 * 2-2. (The number 0 must be treated as a special case in scientific notation, because it has no nonzero digits at all.)

Many (but not all) computers in use today store floating-point numbers according to one of two binary formats, that are defined by the IEEE 754 standard. The single-precision format uses 32 bits:

The double-precision format uses 64 bits:

For example,

5.8125=4 + 1 + 0.5 + 0.25 + 0.0625
=1 * 22 + 0 + 21 + 1 * 20 + 1 * 2-1 + 1 * 2-2 + 0 * 2-3 + 1 * 2-4
=101.11012
=+1.0111012 * 22.

In single-precision floating point, this is stored as the 32-bit numeral

0 10000001 01110100000000000000000

The first bit indicates +, the next eight bits are the unsigned integer 129 (which is 2 plus 127), and the last 23 bits store the fractional part 0111012 of the significand (with a bunch of trailing 0s). If you'd like to experiment with the floating-point representations of other numbers, try this automatic converter.

The IEEE 754 standard goes on to define how arithmetic operations are performed on numbers written on these formats. Because rounding is inevitable and can have serious (and subtle) effects on the results of calculations, there is extensive discussion in the standard of how and when to perform rounding in floating-point calculations. We will not cover the topic any more in this course.

Application: 24-Bit RGB Color

There are infinitely many different colors in the world, but experimental studies of human vision have determined that the human eye is sensitive to only about 10 million of them. The eye is not able to detect the difference between two minutely different shades of lime green, for example. Now consider how this might affect the design of a computer's graphics system. 10 million is less than 224, so 24 bits should be able to encode all of the different colors visible to the human eye.

As we have discussed in class, computers commonly represent colors as combinations of three primary colors, and the most prevalent primary system is Red Green Blue (RGB). So we want to store colors in 24 bits and we want to store colors in RGB format. The obvious solution to use eight bits for each channel R, G, and B. The unsigned integers that can be represented in eight bits are all of the integers between 0 and 255. So we end up measuring R, G, and B each on a scale of 0 to 255.

Recall that web pages are written in a language called HTML. In HTML, one can specify various colors on a web page: the background color of the page, the color of the text, the color of a particular feature on the page, etc. These colors are specified in HTML as 24-bit RGB colors, and they are written in hexadecimal rather than binary. That is, each channel is specified using two hexadecimal digits, for a total of six hexadecimal digits.

For an example, go to Wikipedia and view the page source. (In your web browser, this feature is probably in a menu called something like "View".) Search through this HTML for the word "background". You'll see an RGB color in hexadecimal. If you scroll down a little from there, you'll see more RGB colors. Try to figure out what each one represents on the page, by determining what color it is (e.g., pale green) and finding objects of that color on the displayed web page.

Storing RGB colors as 24-bit blocks doesn't align very well with the 32-bit word size of a typical computer. In fact, on many systems a 24-bit RGB color is actually stored as a 32-bit word. The eight extra bits essentially constitute a fourth channel of information, which is often called the alpha channel. An RGB color with an attached alpha channel is called an RGBA color. Sometimes the alpha channel is simply ignored; the eight bits are wasted. When it is not ignored, the alpha channel is frequently used to encode opacity. That is, an alpha value of 0 represents a completely transparent color, while an alpha value of 255 represents a completely opaque color. Software can use this extra opacity information to create interesting effects, such as the blending of one image into another.

In 3D computer graphics systems, it is common to measure each channel R, G, and B on a floating-point scale, from 0.0 to 1.0. How much memory is used if these floating-point numbers are single-precision? If they're double-precision? Either way, it's much more memory than the 24 bits that are theoretically sufficient. So why do 3D graphics systems use floating-point? It's because they do a great deal of computation with colors. For example, the final color of a 3D-rendered pixel may depend on the color of the object being rendered, the color of the virtual light in the scene, the angle at which the light is striking the surface of the object, and the angle at which the virtual camera is viewing the surface. These complicated calculations are more quickly and conveniently done in floating-point. In short, 3D graphics systems choose their RGB representation to favor time-efficiency over space-efficiency.