A while ago, I got very paranoid about my computer being bugged, and so developed a series of paper and pencil ciphers that I published on this blog. After letting them sit for a while, my favorites from the bunch are FibonaRNG, a stream cipher seeded from three words which has formed the basis for the Human encryption mode in PassLok, and Snake, which achieves something close to the absolute impregnability of the one time pad by using a simple piece of text. In this article, I go through additional calculations related to Snake, and find that this cipher appears to be even better than previously thought.
First, a bit of theory. We’re going to see how close to perfect “random” is the output of Snake. Perfect randomness is achieved when it is impossible to predict which letter comes next even if you know all the letters that come before it. This implies two things, at least:
- All letters appear with the same frequency.
- Any letter is as likely as any other to come after a certain letter, not just immediately after, but with some distance between them as well.
The independence property defined in the second point must also hold for groups of three, four, and more letters, but we’ll stop at two to avoid making things too complicated. A relatively easy way to measure two-letter independence is through the two-parameter chi-squared statistic of two letters separated by a given distance, defined by this function. When the value drops below 671, which is typical of truly random text with 26 different letters, we can assume that statistical independence has been reached. Applied to English text (I used a corpus of Dickens’s novels), this happens for letters separated by 10 positions or more. This is likely to be different for other languages, especially if they use a lot of compound letter groups such as the English ch, sh, th, and qu, though probably not by a lot.
The same test applied to the output of FibonaRNG or Snake (using 3 pieces of keytext), yields statistical independence for letters separated one (contiguous) or more spaces, so long as the pieces of keytext are themselves independent of each other (easy to achieve if the plaintext is longer than 10 characters). So the question remains whether letter frequency is uniform.
As expected, the result is not very uniform for sums of two, but it gets progressively better as you increase the number of letters involved. When you reach four letters, which is what TripleText and Snake with a triple keytext use (three for the keytext, plus the plaintext), you get this distribution of frequencies:
Observe that all the frequencies are already quite close to 1/26 = 0.03846, which is what you’d get if all the letters came up with exactly the same probability. As you add more pieces of text, the frequencies get closer and closer to this number, but they never become quite exactly that because they are still the combination of frequencies that are not equally probable, no matter how many texts are involved. In any case, just three pieces of keytext plus plaintext get so close to the distribution of random letters, that it is impossible in practice to say it isn’t random.
If the ciphertext were thousands of characters long, it might be possible to work out the statistics for each letter A to Z and discover the telltale frequency distribution resulting from four independent pieces of English added together (shown above), for whatever that might be worth because that by itself gives no clue as to what was the source of the keytext. But it is unlikely that anyone will use paper and pencil encryption for more than a couple hundred characters, in which case the small sample size will make it impossible to see these statistics (the differences in frequency shown above are typically less than one in a thousand). The same would happen with a truly random text, which might end up showing more deviation from the ideal 1/26 frequency than an encrypted text.
So, say that I’m a spy, and I wonder whether Snake is a good substitute for a true one-time pad. I’d say it is, but I still have to be careful about not making the source for the keytext easy to guess. If I use a widely available book, like the Bible, then the enemy might guess it. If I use a very rare source, say, the book of my award-winning poems (assuming it exists), then someone could sneak into my house, make a record of all the books in my library, and figure out that this might be the one. The best solution is likely to be in-between: neither a must-have book, nor something so obscure that its mere presence gives it away. Maybe some recent best-seller, or a textbook, or an old classic.
So now I don’t have one-time pads lying around that might give me away and, better yet, it was very cheap to replace them. Soviet one-time pads used to require a small army of women typing more or less randomly on a small army of typewriters. Had they had access to the Snake cipher, they might have placed those daughters of the Motherland in more interesting spying jobs. We can only guess what night have happened.