Among my second tier interests are murder mysteries, detective stories, and cryptography. The first typically includes the second, but there are many detective stories that don’t involve murder. Two of my favorite detectives, Spenser (by Robert B. Parker) and V.I. Warshawski (by Sara Paretsky), often have cases not involving murder.
The third interest I listed, cryptography, doesn’t usually coincide with the first two, but it did play a role in a recent locked-room murder mystery involving the delightful amateur detective Lord Peter Wimsey (by Dorothy L. Sayers). While I’ve always enjoyed secret codes, I’d never heard of the cipher Sayers used — the Playfair cipher.
It dates back to 1854, and is kind of cool, so I thought I’d share it.
The novel in question is Have His Carcase (1932). If you like murder mysteries and have never read Sayers’ Lord Peter stories, I highly recommend them. They’re delightful early twentieth century slices of British life, and Lord Peter is a hoot. (Check out the links to both the character and the author for details.)
The story is a little unusual for a locked-room mystery, since no actual room is involved. The body is discovered lying on a large rock on a beach, but the circumstances make it seem that it must be suicide (in which the dead man slit his own throat with a straight-edge razor — ew!).
Of course, it’s really a murder, and those same circumstances make it seem impossible for the murderer to have gotten away unseen. (There’s a clue not discovered until much later that puts a whole new spin on things.)
It’s various other clues in plain sight that convince Lord Peter that it must be murder (and, of course, he’s right).
Which is all neither here nor there.
The important thing today is that one of the clues is a coded message found on the body. The novel devotes several pages tracing how Lord Peter and Harriet Vane (Lord Peter’s love interest and eventual wife) decipher the message.
It’s a pretty good example of how cryptographers go about cracking codes.
The name, Playfair cipher, is due to Lord Playfair (1818–1898), who promoted its use.
The way it works is pretty clever, and I’m tempted to have a go at writing some Python code to automate it. [Update 10/27/2019: Did that.]
Without automation, it’s rather a bit of manual effort, both to encrypt and to decrypt.
The Wiki article goes into great detail, so I’ll just touch on the basics here.
We start with an empty 5×5 grid (which we’re going to fill in with letters).
Then we pick a word or short phrase that, ideally, has no repeated letters — what is known as an isogram. For instance, “importance,” “playgrounds,” or “incomputable” — for fun, we’ll use “cornflakes.”
The key isn’t required to be an isogram. If letters are repeated, the later occurrences are just ignored. For instance, “wordpress blog” would become “wordpesblg.”
[As an aside, a puzzle for word fans is trying to find am isogram sentence that uses all the letters of the alphabet. A popular one is, “Mr. Jock, TV quiz PhD, bags few lynx.”]
The next step is to fill in the grid with the keyword (or phrase).
Our chosen keyword, “cornflakes,” happens to fill the first two rows, but there is no requirement that it does or doesn’t (it shouldn’t be too short).
The main requirement, of course, is that both parties know the word.
To improve security, a new word should be used for each message. In the novel, the victim had marked isogram words in a dictionary to create a stockpile of keywords.
The characters used a protocol in which each encrypted message started with the word to be used in the next message. Both sides could share a list of words to be used, but this creates a security issue if the list is found. Each side selecting their own is more secure.
(Not that this is a secure code. Any cryptographer can crack it fairly easily.)
The final step in creating the key is to fill in the remainder of the square with the unused letters of the alphabet.
Since the 26 letters of the English alphabet won’t fit in a 5×5 grid, the usual trick is to include ‘I’ and ‘J’ in the same square.
A variation is to not use ‘Q’ or, in some cases, ‘J’.
I like having the full alphabet, so I’ve used the ‘IJ’ combo.
Regardless if the variation used, as you’ll see, it does require some interpolation when decoding the message.
And, obviously, both sides need to agree on the variation as well as how to interpolate any ambiguities.
As with a lot of codes, the encoding and decoding process is similar (but not identical).
Let’s encode the phrase:
“Subscribe to my WordPress blog!”
The first thing is that, as should be clear from the key grid, there is no difference between upper and lower case, and there is no punctuation or spaces (although there is a trick to encode a space if necessary).
OR COMMA ONE CAN USE THE TELEGRAPH PROTOCOL STOP
The first step in encoding (or decoding) a message is to break it into pairs of letters, so, given everything so far, our message becomes:
SU BS CR IB ET OM YW OR DP RE SS BL OG
The basic trick is to take each pair of letters, find them on the grid, and create a rectangle using the letters as opposing corners.
Figure 3 shows the rectangle for the first two letters, SU, but sometimes doing that doesn’t result in a rectangle that includes both multiple rows and multiple columns. There is a protocol to handle that, which I’ll come back to in a moment.
For now, let’s consider the second pair of letters, BS, as shown in figure 4.
These show the sort of multi-row, multi-column rectangle we’ll get from most pairs of letters.
We take the encoded pair of letters from the opposing corners, and the rule is that the letters come from the same row, so BS becomes IL (or we could use JL — it’ll decode the same).
Now we can return to the first pair, and there are two rules that apply, respectively, to letters in the same row or the same column.
¶ If they are in the same row, take the letters immediately to the right (wrap around to the first column if the letter is in the last column).
¶ If they are in the same column (as in figure 3), take the letters immediately below (wrap around to the top row if the the letter is in the bottom row).
Therefore, our SU becomes IZ (or JZ).
Notice that when decoding, these two rules work the opposite way: The coded JZ would take the letters immediately above (wrapping to the bottom if necessary). Likewise coded letters in the same row take the letters immediately to their left.
Now that you know these rules, it should be apparent that the third pair, CR, becomes ON.
So, so far, our encoded message is: IZ JL ON
There is one more rule to know. You’ll notice that our pairs of plaintext letters include the double-letter SS, which doesn’t define a row or column, let alone a multi-row, multi-column rectangle.
Here what we must do is insert a ‘Q’ (or ‘X’) to break up the pair. This turns the original plaintext pairs to:
SU BS CR IB ET OM YW OR DP RE SQ SB LO GQ
And since there are an odd number of letters, we add a ‘Q’ (or ‘X’) to the end to even things out. (Adding a ‘Q’ can also be used to insert a space or break up words when necessary.)
And that’s about all there is to it. Applying these rules to the original message gives us the ciphertext:
IZ JL ON BD HY CP ZX RN PW NK KU LI AC QX
(Assuming I didn’t screw up the encryption.)
The final step is to dress it up a bit by forming “words” and even throwing in some spurious punctuation:
I ZJLON, BDHY CPZXR NPWNK: KULIACQX!
The recipient ignores the spaces and punctuation to re-create the two-letter ciphertext, which allows decoding to the plaintext.
Then it requires a bit of disambiguation to restore the original words. It’s generally obvious what’s required. It often helps to remove all the spaces and then figure out where the word-breaks should be.
(As mentioned above, the letter ‘Q’ can be inserted to break up confusing word combination. When a ‘Q’ appears between matching letters, it’s removed.)
What makes this polygraphic substitution cipher a little better than an ordinary substitution cipher (aka “decoder ring”) is that individual letters have different substitutions, depending on the other letter of the pair. (But identical pairs always result in the same two-letters!)
As Lord Peter and Harriet demonstrate, this code is still fairly easily cracked.
I’ll leave you with this message (using “cornflakes”) to decode if you want to try your hand at this:
GFS: KKMEV DNTAT VUB, R TEHAF PEPY,
HFLFE OHLEPY. TAHLPENE OESSL HFYTS KABKU!
Stay mysterious, my friends!