·

Follow

Published in

·

7 min read

·

Nov 28, 2020

--

At some point in our lives, we’ve all come across the idea that computers “speak” binary — a cryptic wall of 0s and 1s that somehow builds up to everything we see and do in our digital lives. Even as you read these words, the device you are using is somehow manipulating 0s and 1s to make it possible.

As a software engineer, and more generally as a curious person, I want to bridge the gap between the polished interfaces that we interact with on a daily basis and the underlying mechanisms at work to enable them. This is a broad overview of how the elements we encounter as we navigate operating systems, apps, games, etc. could be distilled down to 0s and 1s, without getting too granular.

This begs the obvious question— what *is *binary? We know that it’s 0s and 1s, but how exactly can these two digits even begin to hold the complexity we encounter in our daily lives? The answer lies in number bases, meaning the number of digits that a system of counting uses to represent values. It’s easy to take our standard system of counting for granted — or even assume that it’s the only one, but that’s far from the truth. The conventional system of counting that uses digits 0–9 is called base ten, while binary is called base two.

Take a random number like 423. In our conventional counting system, the number 3 is in the ones place, the number 2 is in the tens place, and the number 4 is in the hundreds place. Each place represents that place value times the digit in that place. For example, since we have the number 4 in the hundreds place, we are saying “4 times 100.” Each place is a power of ten, 1, 10, 100, 1,000, etc. The same logic applies for binary, but we only have two digits, 0 and 1. When counting in binary each place is a power of two (1, 2, 4, 6, 8).

As you can see above, each place is a power of two rather than a power of ten. If we wanted to represent one, it would still be 1, we have a 1 in the ones place, which is telling us “one digit with the value of one.”

Two in binary would be 10; we have a 1 in the twos place, telling us “one digit with the value of two.”

Three would be written as 011, we have a 1 in the twos place, plus a 1 in the ones place, telling us “one digit with the value of two plus one digit with the value of one.”

Four would be 100 because we only need a 1 in the fours place to say “one digit with the value of four”.

If this isn’t making total sense, that’s ok. There are endless resources detailing how to count in binary, but the main takeaway here is that within these two digits, we have the same potential for complexity that we do in base ten, or any other base, it’s just a different way of representing values that we happen to be less familiar with.

Now that we have some understanding of what binary is, and how that wall of 0s and 1s could possibly hold meaning, why would we choose binary for our computers, anyway? Why don’t they just represent information in base ten, or any other base? It’s because of the *binary* nature of binary. A digit in binary can only be one of two things, a 0 or a 1, on or off, true or false. Because of this, a binary digit — or a bit, is kind of like a switch that can be on or off. We can think of these bits as corresponding to transistors in your computer’s (or your phone or tablet, etc) CPU. In modern CPUs they are microscopic and there are millions of them. For the purposes of this article we can think of transistors as a switch that can be off or on, corresponding to a bit being 0 or 1. In this way, when many bits, or digits are together (8 bits make a byte) we begin to have complexity and meaning depending on the number being represented by those switches being on or off, representing numbers in binary.

So we have 0s and 1s, they’re bits, they represent numbers (much larger than 0 and 1), and they are a representation of the hardware on our devices. What does that have to do with my experience of this cat gif? (And what about the cat’s digital experience?!)

## Text and Binary

The final missing part of the puzzle is that everything we do on our devices can be (and is!) represented in numbers. Text, images, video, sound, the pixels you are looking at, are all being read by your device as numbers. One way English characters can be converted to numbers is by using the American Standard Code for Information Interchange, or ASCII. In this form of encoding, every character corresponds to a number, which of course can be represented in binary.

However, ASCII is limited to English so if you’re using a different language, or even an emoji, your message may be encoded using Unicode. Every time you feel sassy 💅 or sad 😢, or send text through one of the countless methods available, you are actually transmitting numbers using one of the possible encoding methods like ASCII or Unicode.

## Sound and Binary

We can take the idea of using numbers to represent characters and apply it to sound as well. Audio encoding transforms sound into a digital format by recording sound waves and breaking them down into small segments or “samples.” Each sample is then measured and converted into a digital value, which is a series of numbers. These numbers can then be represented in binary. When you play an audio file, your device reads this binary data and reconstructs the sound waves for you to hear. This is a very simple explanation of audio encoding, but the core concept is the same — all kinds of data, including audio data, can be distilled down to binary and manipulated or transmitted by our digital systems.

## Color and Binary

But what about images, and the overall visual experience of interfacing with a device? As you may already know, the screens we look at are made up of pixels that can display three colors, red green and blue. By mixing these three colors we make up the color gamut of the display. Most modern displays use 24-bit color — a detail we don’t need to focus on, but which allows us to see 16,777,216 color variations. Below is a picture of all of those colors. Note that the color gamut of screens is a subset of the visible light spectrum.

Each individual pixel has three channels — red, green and blue, and each channel has a value, meaning how much of that color should be used. In this way, we can represent colors as numbers, which of course, can be represented in binary. Below is a small sample of some colors and their RGB encodings. Note that each color has three values, one for each channel.

Just as colors, and therefore images can be represented as numbers, so can videos and gifs. They both have a frame rate — they are a sequence of images being flashed before our eyes at a speed that appears like movement. Each frame is a still image that itself is made up of pixels, each pixel has a color that can be represented numerically, and the software on our devices is fetching these numbers from places like YouTube where they are hosted. Every visual you see is being displayed to you using pixels that receive signals about what color they should be, perhaps to form a letter or an emoji, a selfie, or a vlog.

Social media posts, image searches, videos, songs, are all being transmitted to us because deep down, on a machine code level our devices are computing 0s and 1s that hold the values for the pixels that will flash before our eyes, dictated by software that was also compiled down to machine code to provide instructions on what to do with all the information we input or request. At the end of the day, the countless complex things we do on a daily basis and take for granted are at their core being performed by a multitude of tiny switches that can only turn on and off. From the simplest of components the complex virtual worlds we live in are formed.

resources: