Back in 1918, John F. Byrne invented an encryption machine, which he called Chaocipher. He tried unsuccessfully to sell it to the US government until his death in 1960 while keeping it a secret. He published some samples of its output in his memoirs, mystifying a whole generation of cryptanalysts. Then, in 2010 his son’s widow decided to release the secret papers describing the inner workings of the machine. It turned out to consist of two rotors with movable letters, which shifted according to a simple pattern. The key was the initial position of the letters in both rotors. Simple and surprisingly effective, although it is somewhat doubtful that Byrne ever built a working machine (the only working prototype was allegedly destroyed (?), and only a cardboard mockup and a blueprint of the original have survived). I ran into the concept a couple weeks ago and I haven’t been able to stop thinking on how to improve it, and I believe I’ve found something as powerful and quite a bit simpler to use. I call it the Scrabble cipher because you can run it with the help of letter tiles.
Before I introduce the Scrabble cipher, let me start citing the main reason why I think Chaocipher runs so well: unlike ciphers based on a straight alphabet that gets shifted around based on entropy collected from a pseudo-random number generator or the plaintext itself (as in the Autokey cipher, and those in the Serpentacci family), the Chaocipher alphabets get internally jumbled, so all permutations are possible. In Chaocipher, the entropy added by each new plaintext letter keeps the alphabets from repeating.
But Chaocipher is more complicated than it needs to be, based on what it does. For instance:
- Why disks with moveable letters? It is easier to straighten out the alphabets and shift the letters around. Easily done with letter tiles placed in a row.
- Also, the motion of the disks is such that they end up shifting by just one position relative to each other, before the letter swap step is taken. It would be a lot easier to shift a single alphabet placed in a row by simply moving the first tile to the last position, or vice-versa.
- In Chaocipher, the letter swaps are made at fixed positions relative to the last letter enciphered, for each alphabet. But the initial and final positions don’t seem to have been optimized. Perhaps we can do that if we apply some statistical tools running on modern computers, which Byrne did not have access to.
- Why two mixed alphabets, one for the plaintext and one for the ciphertext? A substitution can be defined equally well with a single mixed alphabet placed against a straight alphabet.
- Are two letter swaps really needed? Maybe just one will do, or some sort of combination?
- And then, it turns out that the Chaocipher is easily vulnerable to a known plaintext attack. Perhaps there is a way to fix this.
- The final positions of the letters swapped are variable.
- You can decide whether or not to do the rotation (cut, if you are using tiles placed in a row) for the plaintext or ciphertext alphabets.
- A transposition step, under a separate key, has been added in order to combat the known plaintext attack.
- A number of statistical tests are applied to the output. Most sensitive among them are the Chi-squared test of randomness applied to single letters, and Chi-squared test of independence applied to contiguous letter pairs (no reason to believe there is stronger dependence between letters further apart).
Playing with this while encrypting large pieces of Dickens’s “Oliver Twist” and doing some reading on Chaocipher I discovered the following:
- Encryptions done on the same plaintext with two different ciphertext keys are related to each other by a simple substitution. This means that one could use a straight alphabet on the ciphertext wheel, and then perform a substitution on the Chaocipher output and get the same result as if the substitution key had been used as the initial ciphertext alphabet in Chaocipher.
- It is not the same with the plaintext key, so that the output produced from the same plaintext and ciphertext key but different plaintext key is different, but then there is another problem: if you decrypt a ciphertext with the correct ciphertext key but the wrong plaintext key the result, which is related to the correct plaintext through a simple substitution, has the typical statistical properties of plaintext, which makes it possible to mount an attack to recover the ciphertext key without regard for the plaintext key. After this, the jumbled plaintext obtained this way can be attacked with the typical methods for simple substitutions. In other words, the plaintext key does not add any security at all and should not be considered when calculating the key space size.
- Take your key phrase and turn it into a mixed alphabet by doing the following: take each key and write the different letters of the alphabet in the order they appear in the key, if a letter has been used already, write instead the immediately preceding letter in the normal alphabet not yet chosen. If there are letters that did not appear in the key, write them now in reverse alphabetical order. If you are going to do a transposition, do the same with the transposition key, except that you don’t add the unused letters at the end. Then arrange a set of letter tiles to make the encryption alphabet above a ruler, and then place above it another set of tiles making a straight alphabet, as shown in the picture above. The top alphabet will be used for plaintext, the bottom one for ciphertext.
- Optionally but highly recommended, especially if you are not going to do a transposition, generate a set of twenty random letters and add them to the beginning of plaintext. This will provide protection against a known plaintext attack.
- For each letter of the plaintext thus modified, ignoring spaces and punctuation (number can be written out in words or encoded as letters), look it up in the top alphabet and write down as output the letter below it. Then take out those two tiles and swap the group with the two letters preceding them. Finally, take the first letter of the top alphabet, move it to the end, and shift that complete alphabet one step to the left so it lines up again with the bottom alphabet. Repeat step 2 until you have encrypted all the plaintext letters. The result is already a valid ciphertext.
- If you are doing a transposition, this is the time for it. Write the result of the previous step by rows under the (perhaps incomplete) transposition alphabet, then read it off by columns in alphabetical order of the letters in the transposition alphabet. This is the final ciphertext.
To decrypt, do step 1 as above so the generated alphabets are the same as for encryption, then do a reverse transposition using the appropriate alphabet, if a transposition was done for encryption. Then do step 3 except that you’ll be looking up ciphertext letters in the bottom alphabet and writing out as output the corresponding plaintext letters in the top alphabet. If the result starts with twenty gibberish characters, you can just ignore them.
Without transposition, the key space has 26! possibilities, which is equivalent to 88.7 binary bits. Not huge, but adequate for many situations. As I said earlier, scrambling the plaintext alphabet with an additional key does not increase security at all, so it’s better to start with a straight alphabet for the plaintext. Adding a transposition doubles that, for a relatively small increase in the total amount of work. This works because the output of the letter tile process is already indistinguishable from random, and so is the “plaintext” obtained with a wrong ciphertext alphabet, even if off by a single letter. Thus, it is not possible to tell when the correct transposition key has been used in a trial decryption, unless the ciphertext alphabet set with the tiles is correct as well. A second transposition under a different key adds another 88.7 bits, because successive transpositions do not combine into a simple transposition. Substitutions do combine, however, so that adding more substitutions on top of the one built into the ciphertext alphabet would not increase the key space, even if separated by transpositions.
In fact, you can obtain exactly the same ciphertext if you set a straight alphabet on the bottom row of the setup, and then apply the substitution represented by the key at the end of everything. The randomization of the ciphertext, therefore, is entirely due to the plaintext itself, not to the initial position of the ciphertext alphabet. The process works because common text contains some randomness (usually measured in “bits of entropy”), which is constantly being added to randomize the alphabets. English contains about 1.56 bits of entropy per letter, which is approximately one-third of what a perfectly random series of letters would contain. The process involving the tiles etc. randomizes the plaintext by itself while remaining reversible.
We have encountered a similar situation before. The Visionnaire cipher combines each plaintext letter with a previous one by subtraction using a Tabula Recta. The result, however, is less than perfectly random. It is remarkable that the Scrabble cipher manages to do so well, and this involving the entropy supplied by only one letter at a time instead of two. I think this is because it actually disturbs the relative order of the letters within the ciphertext alphabet, rather than simply shifting it around. The swap step is what does the trick.
The original Chaocipher is vulnerable to a known plaintext attack, but it is easy to defeat it simply by prepending a number of random characters to the plaintext prior to encryption, and this also applies to Scrabble. An attacker will only be able to obtain the ciphertext alphabet at the point where the proper plaintext begins, but this is not the key. To move one step backward he will have to guess the previous plaintext letter, which now is random so there is no way to get it except by guessing. The possibilities multiply as he goes further back, and they become larger than the number of possible keys when 26^n = 26!, where n is the number of gibberish letters. Solving this equation gives n = 18.8, so nineteen gibberish letters are enough. The spec for the Scrabble cipher is twenty, for a little added security.
Of course, adding a transposition after the main encryption has the same effect since then an attacker won’t be able to match the ciphertext letters to those in the known plaintext, so that in this case the gibberish letters at the beginning wouldn’t really add any security and are better skipped. But a transposition, although fast compared to the moving tile process, still would take more effort than simply adding a few extra letters to a long text.
The program allows you to swap two groups of letters different from those in the description above (besides allowing you to shift the top alphabet forward rather than backward, if you so choose), but this works well only if the distance between the groups to be swapped is an odd number other than 13. The reason is that 2 nd 13 are the factors that make up 26, the length of the alphabets. Letters separated by a multiple of 2 or 13 positions will never swap with letters not in those sets, leading to imperfect mixing of the alphabets. If the alphabet were to contain 27 letters, as is common in some Western languages, then the bad intervals would be all multiples of three.
Let me now address point 4 on the first list. Can we achieve decent security working with a single alphabet rather than two? It turns out we do, almost, and this is what I’m going to address next. In this case you write out the plaintext alphabet, which is fixed, right on the rules, and use tiles just for the ciphertext alphabet. Here’s a picture of the setup for what I call “Half Scrabble” cipher:
|Letter 1||Letter 2||Alphabet shift|
In all cases, one of the letters to be swapped is always the one just written, and it swaps with a letter next to it, whether on the right (1) or on the left (25). Sometimes this leads to the letter just written ending up at the same absolute position, so that a repeated plaintext letter will produce a repeated ciphertext letter. This artifact is undesirable because it leaks some information about the plaintext and could be used by an attacker, for instance, to mount a chosen plaintext attack. It would work this way: generate some chatter that includes some names containing double letters separated by known intervals. When the enemy uses those names in their transmission, it becomes easy to spot their location on the ciphertext because of the double letters, which allows the attacker to obtain some of the ciphertext alphabet in use at that point in the message. If this happens a few times, there may be enough to complete the alphabet, which will allow the decryption of the rest of the message from the point where the alphabet is complete.
Another quirk is that the output can be less than perfectly random even with optimized parameters, especially if the plaintext is long (over 20,000 letters). You can always check this if using the program, but obviously this cannot be done easily if encrypting by hand. But again, likely we’re not going to be encrypting by hand anything that long, so this might bot be much of a problem in practice.