Chaos from Order, Take 2

A while ago, I got very paranoid about my computer being bugged, and so developed a series of paper and pencil ciphers that I published on this blog. After letting them sit for a while, my favorites from the bunch are FibonaRNG, a stream cipher seeded from three words which has formed the basis for the Human encryption mode in PassLok, and Snake, which achieves something close to the absolute impregnability of the one time pad by using a simple piece of text. In this article, I go through additional calculations related to Snake, and find that this cipher appears to be even better than previously thought.

First, a bit of theory. We’re going to see how close to perfect “random” is the output of Snake. Perfect randomness is achieved when it is impossible to predict which letter comes next even if you know all the letters that come before it. This implies two things, at least:

1. All letters appear with the same frequency.
2. Any letter is as likely as any other to come after a certain letter, not just immediately after, but with some distance between them as well.

The independence property defined in the second point must also hold for groups of three, four, and more letters, but we’ll stop at two to avoid making things too complicated. A relatively easy way to measure two-letter independence is through the two-parameter chi-squared statistic of two letters separated by a given distance, defined by this function. When the value drops below 671, which is typical of truly random text with 26 different letters, we can assume that statistical independence has been reached. Applied to English text (I used a corpus of Dickens’s novels), this happens for letters separated by 10 positions or more. This is likely to be different for other languages, especially if they use a lot of compound letter groups such as the English ch, sh, th, and qu, though probably not by a lot.

The same test applied to the output of FibonaRNG or Snake (using 3 pieces of keytext), yields statistical independence for letters separated one (contiguous) or more spaces, so long as the pieces of keytext are themselves independent of each other (easy to achieve if the plaintext is longer than 10 characters). So the question remains whether letter frequency is uniform.

This is no problem for FibonaRNG which, as a lagged Fibonacci generator (LFG), has been widely studied and confirmed to be uniform except in rare, easy to detect degenerate cases. So I made a new JavaScript program, available here, to compute this for Snake, TripleText, and their derivatives. When you add or subtract letters using a Tabula Recta, the probability of each resulting letter is the sum of the probabilities of all combinations of two letters that yield the same result. The probability of each combination is the product of the probabilities of each letter involved, provided the letters are independent of each other (which they are, if they are separated by 10 or more spaces in the same English text). The program starts with the measured frequency of each letter in English, as published in Wikipedia, and applies the formula just described to sums of 1 to 6 letters. It also applies it to “snake” operations on a Tabula Recta where additions and subtractions alternate.

As expected, the result is not very uniform for sums of two, but it gets progressively better as you increase the number of letters involved. When you reach four letters, which is what TripleText and Snake with a triple keytext use (three for the keytext, plus the plaintext), you get this distribution of frequencies:

ABCDEFGHIJKLMNOPQRSTUVWXYZ