Yesterday I introduced you to the idea of words as numbers. There are many ways to create a map between words and numbers. For example, we could assign them the number that represents their position in the dictionary. That would make words that start with “A” have smaller numbers while words that start with “Z” would have the largest numbers.
There are also ways to treat the words themselves as numbers. We can interpret the letters the same way we do digits. Each letter has an assigned numeric value, and then a string of letters—just like string of digits—forms a number. The scheme I showed you yesterday allows us to treat (only!) single words as numbers.
Now let’s extend this so that entire sentences—or even entire books—become numbers!
First, a review of the basics:
• Positional notation is a way of expressing numbers where the position of the each digit has a multiplication factor associated with it. To get the value of the number, multiple the digit in each position by the factor for that position, and then take the sum of those products.
• In our common numbering system, decimal, we use ten symbols (the digits “0” through “9”) and assign them values (“0″=0, “1”=1, “2”=2, etc.). The fact that we use ten symbols—the base or radix of our number scheme—means our multiplication factors go up by ten times for each position.
• We can do this with any set of symbols; for example, we can use the 26 letters of the alphabet. We can assign them the obvious values (“A”=1, “B”=2, “C”=3, etc.). As there are 26 of them, our multiplication factors go up by 26 times for each position.
It’s important to understand that the word is the number. The L26 scheme gives us just one way to assign a value to that number. There are other ways we can do that, and they will give us different values for each word. The key is that the word, the string of letters, is the actual number.
Each word is distinct, so each word value is distinct within a particular scheme. That the values differ from scheme to scheme isn’t important. (It just means we need to agree on a particular scheme.)
Yesterday, I cheated a little in order to simply things. Clearing up the cheat leads us directly to our next step in going beyond just words, so let me explain:
In the decimal system we have ten symbols, the digits “0” through “9”. Since there are ten symbols, we work in factors of ten. In the L26 system, there are 26 symbols (“A” through “Z”), so we work in factors of 26. The cheat was how I assigned the numeric values to the symbols. I used a “natural” mapping where “A”=1, “B”=2, and so on up to—one might assume—”Z”=26.
In base 10, with ten symbols, there is no symbol with the numeric value of ten! Remember that the multiplier for the second digit position will always be the base of the system. In base 10, it’s ten. In base 2, it’s two. In base 26, it’s 26! (This means that, in any numbering system, the string “10” always has the same value as the number base. In decimal, “10” is 10. In binary, “10” is 2; in octal it’s 8; and in hexadecimal it’s 16.
This all means we’re not allowed to have a symbol with a numeric value of 26 in L26 (just like we don’t have a symbol with the value of ten in decimal). In the natural numbering for the alphabet, “Z” would have the value 26. And that would be wrong!
What I didn’t explain yesterday is that, in L26, “Z” would have to have the value of zero (not 26). For one thing, in a set of number symbols, one of them has to be the zero! In L26, “Z” would be our zero (since I started with “A”=1). Having zero at the end may seem odd, but notice how “0” comes after “9” on your keyboard. And zero starts with “Z”, so it kinda works out.
You might wonder what the L26 equivalent of “10” is; how do we express the value 26 in L26? There is neither “1” nor “0” in L26, so we would write “AZ” for 26 (for reasons you should be able to figure out by now).
But what if we didn’t want to treat “Z” as zero, but as the 26th symbol? That means two important things: Firstly, we need something else to play the role of “zero.” Secondly, we end up with 27 symbols (due to our new “zero”), so now our base is 27!
An obvious choice for our new “zero” is the space character. A space is the lack of a letter, so it’s a natural candidate. And there’s a bonus; including the space allows us to treat a string of words as a number. Now we can treat, for example, “Wyrd Smythe” as a single (surprisingly large) number!
The list of L26 position multiplication factors I gave you yesterday shows how big the values get. In L27, they’re a little bit larger. Compare them to this list:
And now let’s see what kind of number we get for “Wyrd Smythe” (an 11-symbol number that happens to have a “zero” (space) in it; think of it as being similar to something like “89640759321” if it helps).
Add up the multiplication products to get: 4,717,452,194,045,806.
Hi! My name is 4,717,452,194,045,806, but you can just call me four-quadrillion.
You see how a small phrase amounts to a fairly large number. If I express my actual full name (including my middle name) in L27, I get:
I’ll leave decoding that back into text as an exercise for the reader! (Yes, I lied about there being no homework.)
Now we’re ready to make the final jumps that allow us to convert sentences, blog articles and full books into single numbers.
You may have noticed we’re using only UPPER case letters. In L27 (or L26) we use the alphabet kind of as an abstraction. We’d get the same result using only lower case letters. But an “A” is the same as an “a”, we can’t tell the difference. To do that, we need to include both the UPPER and lower case alphabet in our numeric symbol set. That immediately bumps us up to L53 (26 UPPER + 26 lower + 1 space).
Sentences can include numbers (an address, a phone number, a postal code), so we definitely should include the ten digits. Now we’re up to L63.
And the thing about sentences is they have punctuation. They have periods (stops) at the very least, and we really can’t do without commas and a handful of other punctuation symbols. We also need some way to indicate the end of a line (basically the hitting of the [Enter] or [Return] key). We might also like some of the special characters such as TAB.
So now our coding scheme is something like L80 or whatever we decide is enough. The next trick is deciding how to assign numeric values to our set of symbols. Basically, we’ll just count them off, but we need to decide on their order. Then their numeric value is just the position in the order.
The only thing is, the numbers start getting really huge now. Just consider just the first half-dozen factors for an L80 system:
In L80, short words will be big numbers, and longer phrases will be astronomical! In one possible L80 scheme, “Wyrd” evaluates to 16,778,760 (compare that to 421,620 in L26!), and “Wyrd Smythe” turns out to be 351,876,110,446,173,033,961. My full name ends up being a 47-digit number (the L27 version above has only 35 digits). That’s just a short phrase. Are you ready to get huge?
The first sentence of this blog article (“Yesterday I introduced…“), in the L80 scheme I’m using, has the 112-digit numeric value:
You can imagine how huge the number for the entire paragraph must be. I won’t show it to you; it has 655 digits! The number for the entire article would have tens of thousands of digits!
The bottom line is simply this: By using some sort of L-scheme, any text can be interpreted as a single (very, very large) number. We’re talking numbers that are really too huge to imagine. These numbers make the count of atoms in the universe look like small change. Keep in mind that, so far, we’re only up to an L80 system, which excludes non-English characters and a lot of punctuation. A universal system is going to require an L-big-big-big system, and that makes the text numbers truly gigantic (imagine a system in which the first two-“digit” number has a value of tens of thousands)!
Look at it this way: The canonical “long novel,” War and Peace, can be viewed as a number so huge, it took 1,225 pages to write down in L-whatever form. The number for War and Peace would take many thousands of pages to write.
Every book, every blog article, every piece of text can be viewed as a number!
Conversely, any number can be converted into text. Nearly all of them will form gibberish, of course. In the vast landscape of huge numbers, only a very tiny fraction are worth reading (or even capable of being read)!