Last week, when I posted about the Mathematical Universe Hypothesis (MUH), I noted that it has the same problem as the Block Universe Hypothesis (BUH): It needs to account for its apparent out-of-the-box complexity. In his book, Tegmark raises the issue, but doesn’t put it to bed.
He invokes the notion of Kolmogorov complexity, which, in a very general sense, is like comparing things based on the size of their ZIP file. It’s essentially a measure of the size of information content. Unfortunately, his examples raised my eyebrows a little.
Today I thought I’d explore why. (Turns out I’m glad I did.)
Honestly, I think Tegmark’s example is fine for those not familiar with mathematical complexity, but unfortunately it takes a swing at a detail and completely whiffs it. I doubt it matters — it’s way-down-in-the-weeds stuff — but it connects to something I’ve been laying the groundwork to talk about: whether real numbers are real.
His overall point is fine. I just think he winged something he should have thought about more. The bottom line is that two things that are supposed to be quite different are a bit closer than Tegmark imagined.
He starts with two images that look very much like these:Both images consist of 128×128 grids of black and white “pixels” (making 16,384 pixels in all). Can you tell which one is random?
Here’s what makes it worse: Any test we apply shows that both patterns have apparently random occurrences of black and white pixels. Measure them up, down, left, right; there is no regularity to find.
But one of them is genuinely random (within the ability of my computer to be random, which is a whole other topic), and the other, in which the bits do indeed occur randomly, comes from a specific sequence of bits.
One of them is the binary bits of the number: .
Can you tell which? (Yes, I have given you a big clue. (Rhymes with clue. Also ‘R’ stands for…))
The point of this is that, if we want to create a computer program that produces that specific random image (rather than a random image), the only way to do so is to store all 16,384 bits in the program and then, very simply, spit them out.
This means the size of the total program is large — at least 16,384 bits (2,048 bytes) — but the process involved, just spitting out stored data, is simple and short. The total size is not much more than the stored data size.
The other extreme is a program that has little data and lots of code. This obviously still results in a large program overall. (Most complex programs, such as Word or Excel, fall on this side of things. A program like Google Earth can be an exception. The database of Earth data is likely larger than the code. Most map programs would be exceptions.)
On the other hand, it’s possible for a program to be small but take lots of time to run. This doesn’t count against its size, which is all we’re talking about here.
The Kolmogorov complexity of something is a measure of the size of the smallest program that can create it.
One would think that highly complex objects require highly complex programs — long programs — to create them.
This turns out to not always be the case. The Mandelbrot, one of my favorite mathematical objects ever, is an awesomely complex (and stunningly beautiful) object that arises from a very simple program:
There’s a little more to it than that, but that’s the core. The basics of the amazing Mandelbrot are in that simple bit of code. From that we get images like the one at the top of this post.
Which brings us back to Tegmark and his images.
Here’s a more colorful version (same basic idea, but color-coding the decimal digits of the square root of two versus a random string of digits):He wants to make the point that the random image takes at least 16K-bits whereas he opines it takes only 100 bits to generate the square root of two. That’s the part that threw a flag on the play. 100 bits seems way too small; it’s just 12 bytes.
In fact, my gut sense was that the code size probably wasn’t terribly different than the data size of the random example. After looking into it,… it’s close. It kinda depends on what we include.
In many program languages we can write something like:
Which is short (only 7 bytes) and does return the value we seek. In most languages it’s a library function we have to import, not a native capability. There are languages, though, ones that specialize in math, in which the function is native.
Spitting out the random bits doesn’t use these more advanced functions, and even in languages where they are native, they are sophisticated pieces of code. If we do include them, they’ll make our code rather large.
There’s a deeper problem here. Only in the most specialized languages does
sqrt(2) give us what we need.
The problem is the bits — there are way too few of them. We need 16,384 of them — 2048 bytes worth. (Which is also our ceiling. If the square root code gets near that, it’s game over.)
Most systems use IEEE floating point, either single precision or double precision (often called
double in programming languages). These give us a measly 24 or 53 bits, nowhere near enough. Not even close.
The biggest version, octuple precision, has 237 bits, but I don’t know any system that uses it, and it’s still chump change for what we need. This is one of those places where we’re gonna have to roll our own.
So how to calculate the square root of two?
Let’s start by figuring out exactly what we need to end up with. We know we need 16K-bits. How many digits of the number are we even talking here?
We find that with: digits = bits × log10(2)
So: 16,384 × 0.30103 ≈ 4900+
We need to calculate to over 4900 digits!
That’s some serious precision; none of the usual algorithms will work.
There is a definition of the value that might be usable. It’s the recursive fraction:
It’s simple enough; if we could code that, we might obtain a precise enough value for the number. What we get is a really large rational fraction that approximates the value. If we can get the approximation close enough, we’ll have what we need.
As it turns out, it’s pretty easy to code (albeit with a caveat):
function sqrt_tail (level): if (depth <= level): return (1,2) t = sqrt_tail(level+1) return ((2*t)+t, t) function sqrt_of_two (depth): t = sqrt_tail(1) return (t+t, t) // Get value from 17000 levels deep! sqrt2 = sqrt_of_two(17000)
It’s not important to understand this code. Suffice to say it treats (p, q) pairs as rational numbers (p/q). It implements the recursive fraction above. The
sqrt_tail function calls itself until it reaches the requested depth. Then it returns, starting with 1/2, and improving the accuracy each time it return through a level.
The size of the fraction increases in each level. This creates a very large final number — hundreds of digits in both p and q. That’s not too hard to deal with. What’s more of an issue is how many levels of recursion are necessary. The code above calls for 17,000 levels, which I suspect is required to get the 4900+ digits of precision.
I say suspect, because Python barfs after 900+ levels, and at that point the accuracy is only 2287 bits of the 16,384 bits we need. The fraction has grown to 345 decimal digits by then (both numerator and denominator).
It’s possible to rewrite the code so it isn’t recursive, but then it will be larger. As you can see, it’s already above the 12 bytes Tegmark imagined, and we still have to turn the big fraction into bits.
We’ll do that with an integer division algorithm:
function divider (p, q, precision): if (q=0): error('Divide by zero!') quotient =  while (0 < p): if (q <= p): quotient[-1] += 1 p -= q else: if (precision < len(quotient)): break p *= 2 // we're working in base 2 quotient.append(0) return (quotient, p)
This function does a division to a specified level of precision. It returns a list of that many bits (the quotient) along with any remainder. Essentially we’re just subtracting q from p very much as in long division (except we don’t use multiples of q).
It’s not important to understand these algorithms. What’s important is that, combined, they are a lot more than 12 bytes.
I have a working version in Python. It’s got a few extras, like statements to display results and some timing instrumentation. The program is just under 900 bytes (7200 bits).
Which is certainly not 16K-bits, and Tegmark’s key point is that, while a truly random pattern pretty much requires just storing the pattern and spitting it out, apparently random patterns may have a much smaller Kolmogorov complexity — a much smaller program might create them.
That said, we are sanding off some corners here. Our code assumes the system can handle arbitrarily large integers. Python can because it has algorithms to do so. Including those makes our program larger.
We arguably should include them. As with the
sqrt() function, handling arbitrarily large numbers is special — not something most systems supply natively.
To do this right we’d need to create a Turing Machine to generate the desired bits. Kolmogorov complexity really is in reference to the smallest TM that emits the desired pattern. That’s the only way to compare apples to apples.
The lesson here, I think, is that generating something that looks random still requires code size, even if it doesn’t require data size. That randomness demands code or data to generate it.
In contrast, the Mandelbrot images, which are rich with repeating patterns. Despite the greater resulting complexity, I think the regularity of the patterns reduces the code size.
This ran long, so I have to leave the real number connections for another time. Very briefly, the MUH and BUH require extremely complex structures, which is contradictory to the view the Big Bang had zero, or extremely low, entropy. There seems a problem with where all the information came from.
As for why I’m glad I got into this, I’ve been putting off implementing a division algorithm for an arbitrary precision number class I’ve been working on. Division is the trickier of the standard four operations (+, −, ×, ÷), and I thought coding division for base-256 arbitrary precision was gonna be a challenge.
But this all made me realize (duh) I can just treat it as integer division and use the slow algorithm. Better than nothing!
Stay squared away, my friends!