Last post I wrote about a simple substitution cipher Robert J. Sawyer used in his 2012 science fiction political thriller, Triggers. This post I’m writing about a completely different cool thing from a different book by Sawyer, The Terminal Experiment. Published in 1995, it’s one of his earlier novels. It won both a Nebula and a Hugo.
I described the story when I posted about Sawyer, and I’ll let that suffice. As with the previous post, this post isn’t about the plot or theme of the novel. It’s about a single thing mentioned in the book — something that made me think, “Oh! That would be fun to try!”
It’s about a very simple simulation of evolution using random mutations and a “most fit” filter to select a desired final result.
The secret code in Triggers turns out to be a Chekhov — something that initially appears trivial and in passing but which later plays an important role in the plot. The string evolution in The Terminal Experiment is genuinely trivial and in passing. It caught my eye because it sounded like it would be fun and interesting to code. (And it was.)
Although it plays no active role in the plot, the explanation of the string evolution idea helps the reader appreciate the fish evolution simulation created by one of the characters. One of the scanned brain AIs central to the story notices the fish simulation and uses it as the basis for a much more sophisticated experiment.
What caught my eye was the idea of evolving strings. As described in the book, it can happen surprisingly quickly, and it sounded like a fun thing to see (and pretty simple to code). True on both counts.
To begin: strings. Not the kind that tie packages, get attached, make music, beautify lawns, or are cosmic, super, silly, or cheese. I mean the kind found in computers — strings of characters. They can be any length from zero (an empty string) to whatever the computer allows.
So, a (computer) string is any text from nothing (“”) to as much as allowed in the circumstances (for instance, all the text of this post).
The game of String Evolution starts with a target string (a word, phrase, quote, or whatever) and then uses generations of offspring mutation to evolve a same-length random starting string into the target string.
For example, given the target string ‘The Terminal Experiment, Robert J. Forward, 2012’, a random starting string required only 76 generations to evolve into the target string.
It did it using an effective mutation protocol (which I’ll get to in a minute).
For now, suffice to say a less effective protocol requires just over 800 generations, best case, and hundreds more, worst case. (All the protocols here eventually do evolve a start string to the target given a few caveats regarding configuration parameters.)
Below is a listing of every other string from the random starting point to the target (the full list would have all 76 strings):
iVdLHY9IAKiAz0GCIvUxVbMGNwptuhjCxTdF6carcHoo9uXi iVdLHY9IAKiAB0GCIvUxVbM9NwptuhjCxTdF6carcHoo9uXi iVdLHY9IAKiABGGCIvUxVbM9NwptuhjCx,dF6carcHoo9uXi iVdLHY9IAKiABGxCIvUxVbM9NwptuhjCI,dF6carcHoo9uXi iVdLHY9IAKiABGxpIvUxVbM9NwptuhjCI,dF6carcHo 9uXi iVdLHY9IiKiiBGxpIvUxVbM9NwptuhjCI,dF6carcHo 9uXi iVdLHY9IiKiiBGxpIvUxVbs9NRptuhjCI,dF6carcHo 9uXi iVdLHY9miKiiBGxpIvUxVbs9NRptuhjCI,dF6carcHo 9u2i iVdLHY9mioiiBGxpIvUxVbs9NRptuhjCI, F6carcHo 9u2i iVdLHY9mioiiBGxpIvUxVbs9NRptuhjCI, F6cwrcH9 9u2i iVdLHY9mioiiBGxpfvUxVbs9NRptuhjCI, F6cwrcc9 9u2i iVdLHY9mioiiBGxpfvUxVbs9NRptuhjCI, Focwrcc9 9u20 iVdLHYqmioiiBGxpfvUxVbs9NRpauhjCI, Focwrcc9 9u20 iVdLHYqmioiiBGxpfvUxVbs9NRpadhjCI, Focwbcc9 9u20 UVdLHYqmioiiBGxpfvUxVbs9NRpadhjCI, Forwbcc9 9u20 UVdLTYqmioiiBGxpfvixVbs9NRpadhjCI, Forwbcc9 9u20 UVdLTYqmioiiBGxpfvixVbs9 RpadhjCI, Forwbrc9 9u20 UidLTYqmioiiBGxpfvixVns9 RpadhjCI, Forwbrc9 9u20 UidATYqmioiiBGxpfvimVns9 RpadhjCI, Forwbrc9 9u20 UidATYqmioaiBGxpfvimVns9 RpadqjCI, Forwbrc9 9u20 UidATYqmioaiBGxpfvimVns9 RpadqtCI, Forwbrc9 2u20 UidATdqmioaiBGxpfvimens9 RpadqtCI, Forwbrc9 2u20 UidATdqmioaiBGxpfrimens9 RpadqtCI, Forwbrc9 2220 UidATdqmioal Gxpfrimens9 RpadqtCI, Forwbrc9 2220 UidATdqmioal Gxpfrimens9 RpaeqtCI, Forwbrc9 2222 UidATdqmioal Dxpfrimens9 Rpaeqt I, Forwbrc9 2222 UidATdqmioal Dxpfrimens9 Rpaeqt I, Forwbrd9 2122 UidATeqmioal Dxperimens9 Rpaeqt I, Forwbrd9 2122 Uid Teqmioal Dxperimens9 Rpaeqt I. Forwbrd9 2122 Uid Teqmioal Dxperimens9 Rpaeqt J. Forwbrd9 2112 Uid Termioal Dxperimens9 Rpaert J. Forwbrd9 2112 Uid Termioal Dxperiment9 Rpbert J. Forwbrd9 2112 Uid Termioal Experiment9 Rpbert J. Forwbrd, 2112 Uid Termioal Experiment, Robert J. Forwbrd, 2112 Tid Termioal Experiment, Robert J. Forward, 2112 Thd Termioal Experiment, Robert J. Forward, 2012 Thd Termioal Experiment, Robert J. Forward, 2012 The Termioal Experiment, Robert J. Forward, 2012 The Terminal Experiment, Robert J. Forward, 2012
It’s surprising how quickly even long strings evolve to their targets. At least initially. The process typically has a long tail. As the current pattern gets closer to the target, progress slows.
Consider the difference between Figure 1, above, and Figure 3, below:
Both show examples of the same evolution configuration, but Figure 1 ended the process at 200 generations (with the strings not fully evolved). In Figure 3 the strings evolved until they reached the target string.
Note how (at least with this mutation mode), a large fraction of progress occurs early, but reaching an exact match takes a huge fraction of the time. That turns out to depend on the mutation “depth”. Progress can be essentially linear if the depth is one (or very low).
The basic process is pretty simple. Given some target string as input, generate a string of random characters the same length as the target. Call that the current pattern. We want to evolve it into the target.
Do so with the following loop:
- Use the current pattern to create a new generation of “child” patterns, each with a random mutation. Include a copy of the current pattern as an unmutated offspring.
- Test each child pattern to determine its “distance” from the target. This distance is a numeric value that depends on how different two strings are. A distance of zero indicates the strings match exactly.
- From among the children, pick the child with the lowest distance value (the one most similar to the target string). Note that if all the mutations are more distant then their parent, the unmutated offspring is closest and is selected (and no change occurs that generation).
- If a child has zero distance from the target, it’s a match, and we’re done. Otherwise repeat the loop with the selected child as the new current pattern.
We can parameterize the loop in several ways. A crucial one is how we mutate the pattern string into a new string. Another is how many children occur each generation. We should also have a max-generations setting in case the loop never matches the target and wants to go forever!
In String Evolution, mutating a string of characters involves altering one or more of the characters in a string.
The first mutation parameter is how many characters to alter. I was surprised to find that having more than one mutation per offspring actually reduces evolutionary progress. Upon reflection, it’s apparent why. Multiple mutations make a string jump a larger distance. A pattern selected as the closest can produce offspring that are more distant to the target than the parent. In some cases, the search can dither, jumping closer and farther, back and forth, running forever.
Bottom line, a single mutation per string works best by far.
The more critical mutation parameter is the mutation depth. This controls how much the character being changed can change. With a depth of 1, the character can only change to the previous or next character. For instance, an ‘S’ can only change to an ‘R’ (-1) or a ‘T’ (+1). On the other hand, if the mutation depth is 5, an ‘S’ can change to any of: ‘N’, ‘O’, ‘P’, ‘Q’, ‘R’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’ (-5 to +5, respectively).
Alternately, the character may change to any other legal character. (As if the depth was infinite.)
In my code, this led to three distinct mutation modes: (1) Random character; (2) Depth forced to ±1; (3) Depth specified by depth parameter.
As illustrated in Figure 4, mutation mode 2 results in linear evolution. (Because the pattern can only evolve per a ±1 change to a single character.)
Mutation mode 3 offers a spectrum between mode 2 (if depth=1) and mode 1 (if depth is set to a large value). The former makes mode 3 exactly like mode 2. The latter differs in that mode 1 assumes a flat distribution — the changed character can be any character with equal probability. Mode 3 with a large depth setting implies a distribution centered on the original character. (That distribution can even be biased rather than flat.)
Figure 6 below shows the effect of the mode 3 mutation depth parameter:
Note the poor performance at depth=1. It’s what we’d expect to see for mode 2, around 2500 generations. This drops rapidly to a sweet spot from around 7 to 15 or so (only 500 generations) and then begins to rise. How high turns out to depend on how the offspring setting.
I believe the “cliff” just before 100 is from alphabet wraparound during mutation.
One obvious parameter is how many offspring occur each generation. So long as enough offspring occur (which depends on target length), the pattern evolves towards the target. Depending on the mutation mode, the offspring setting can sometimes improve (or diminish) evolution progress.
The size of the target string obviously has a strong effect on how long it takes (how many generations it takes) for the pattern to evolve to the target. Figure 2 and Figure 5 show how target string size correlates with how many generations it takes to match the target. Figure 7 below shows something similar:
Note that mutation mode 2 is generally slower than the 1 and 3 modes.
Combined with varying the offspring size, the mode 3 depth setting allows tweaking for very good performance. Even so, mode 1 is generally competitive.
But in Figure 8 note how the same six string segments take about ten times as long using mode 2 linear evolution.
One thing about the way this works is that each generation produces a set of children, each of them mutated. A copy of the parent is included as an unmutated child.
In nature, most offspring are not mutated, and progress is much slower. It can involve millions or billions of generations. Using a group of mutated children (each with a separate mutation) massively speeds things up. Selecting the offspring with the best score (closest pattern) and discarding the rest amplifies and distills the progress.
Therefore, it’s not so surprising we can evolve from:
To the well-known Frank Zappa quote:
You can’t be a real country unless you have a beer and an airline. It helps if you have some kind of a football team, or some nuclear weapons, but at the very least you need a beer.
In only 1178 generations (mode 3, depth 15, offspring per generation: 50). The original random string had a distance of 6303 from the target string.
I’ve had a lot of fun exploring this and poking around in how it all works. For those interested, I expect to post the code on my computer programming blog. RSN
Stay evolving, my friends! Go forth and spread beauty and light.
 A string with a single character is sometimes referred to as just a character. In some programming languages there is no difference, but in others a one-character string and a character are different data types. In such cases, generally the string can have zero, one, or many characters, whereas the character always has the value of some single character (it’s never zero or many).