Sorry about the title. This post is motivated by Steven’s comments to the “What is Randomness?” post, where he describes a way that the current paper-and-pencil cipher champion, FibonaRNG, could be broken. Rather than responding with more comments, I thought a whole new post on the issue would make more sense, since it’s going to be rather long. For those who prefer the short version: yes, what Steven says would work, but not very well, although it looks like it should. Read on if you prefer the long version.

Let me begin by inserting what Steven (didn’t get a last name) said on the other post, with minor edits:

While yes, the continuation of the calculation is dependent on the characters in between the two characters and the third generated character, that’s not what I meant by a repeated state. While a short periodic repeat is essentially death for a PRNG, having a deterministic outcome of one part of the stream based on previous parts of the stream is also death for a PRNG. Any repeated pair results in a deterministically repeated output, even if there is stuff between the pair and the output.

Because of the birthday paradox math, it actually wouldn’t take more than about 30 characters of keystream to find a repeated pair. In your example, the pair WD occurs twice. If the message were just three characters longer, there would be a predictable V right there in the keystream, and that’s without constructing a tabula recta. And if a message were say, 500 characters long, then enough information from the plaintext would leak into the ciphertext through this property that we could likely determine the seed length from the ciphertext alone (assuming we know the language of the message). I could attack it essentially like I would attack a broken Vigenère cipher.

So even if the keystream appeared to be good, with no periodic repeats, there is still the fundamental property that it is deterministic from previous outputs, regardless of your seed choice.

The way I interpret this is the following: The lagged Fibonacci generator (LNG) step combines two consecutive letters from the keystream to generate a third, which ends up located a distance after the first letter equal to the seed length. Now, the LNG passes independence tests for any two keystream letters separated by any distance, but will it pass independence tests for groups of *three* letters? It doesn’t seem like it should, since any pair of consecutive letters forces a particular third letter a little distance away due to the way the keystream is being generated, and therefore those three letters are not independent. In other words, it should not be true that P(x,y,z) = P(x)*P(y)*P(z) for certain combinations of x, y, and z, where the function P(something) denotes the probability of encountering that something in the keystream.

One way to start breaking the cipher, therefore, would be this:

- Run a battery of tests of independence for groups of three letters, the first two next to each other, the third some distance down from them, which you change from test to test.
- When that distance is equal to the seed length minus one, the test should fail (and possibly also for the multiples of the seed length, minus one), but likely not for any other distance. This will tell us what the seed length is.
- Now that we know the seed length, cryptanalysis has begun. Steven would do something similar to attacking the Vigenère cipher, which essentially consists on getting a histogram on each group containing all the letters separated by the seed distance; since in the Vigenère cipher those would have been encrypted with the same offset on a Tabula Recta, the original statistics of the language would still be present (as in “E” is most frequent, and so forth), which would quickly lead to the repeating key, and a complete solution. I don’t think this is going to work for FibonaRNG, where cipher letters separated by any regular distance do not use the same keystream letter, but I’m sure greater intelligences than mine will be able to find some prying leverage here.

In order to test this, I wrote a little JavaScript function that I can run as soon as the ciphertext is generated, using a variation of the FibonaRNG demo page that you can find here. This is the function:

function corrAtDistance3(array,base,distance1,distance2){ var length = array.length, highIndex = length - distance1 - distance2, result = 0, operand, expected, freqTable = new Array(base); for(var i = 0; i < base; i++){ freqTable[i] = new Array(base); for(var j = 0; j < base; j++) freqTable[i][j] = new Array(base).fill(0) } for(var k = 0; k < highIndex; k++){ //fill the table with data freqTable[array[k]][array[k + distance1]][array[k + distance1 + distance2]]++ } for(var i = 0; i < base; i++){ //each first character for(var j = 0; j < base; j++){ //each second character for(var k = 0; k < base; k++){ //each third character expected = freqArray[i] * freqArray[j] * freqArray[k]/length/length; //expected P(x,y,z) = P(x)*P(y)*P(z) if(expected > 0){ //in case a letter does not appear at all operand = freqTable[i][j][k] - expected; result += (operand * operand) / expected } } } } return result }

Where “array” is the keystream or whatever, containing numbers from 0 to 25, “base” is 26, “distance1” is going to be 1, “distance2” is going to be variable, and “freqArray” is the histogram containing single-letter frequencies, previously calculated by a separate function. The function performs a Chi-square test on the hypothesis that P(x,y,z) = P(x)*P(y)*P(z), where x, y, and z are items separated by the given distances within the array. With the parameters given, the test passes with 10% probability of error for a value of 18000.

And now some tests. The function works by making a 26x26x26 array to count the frequencies of letter triads. There are 26^3 = 17576 such triads, so in order to get decent statistics we need a large chunk of text, at least 10 times that value. I fed it the first 10 chapters of “Oliver Twist,” gutenberg.org version, which produces a keystream containing 115110 characters. Key material was: Key 1 = “marvelous”, Key 2 = “wonderful”, Key 3 = ‘N’ (not used), Seed = “awesome” (length = 7). Result: the function corrAtDistance3(stream,26,1,6) returns a value of 2897569, which is a solid fail of the test applied on the keystream, whereas the returned value is 17923 (pass) when the last parameter is 5, and 17918 (also pass) when the last parameter is 7. This is precisely as expected, and reveals the seed length quite well.

Ah, but his was if the test is applied to the keystream, which is not the same as the ciphertext. A reasonable expectation (I believe Steven would agree with this) is that such strong correlation within keystream triads would still remain in some way after the keystream is combined with the plaintext in order to produce the ciphertext, since the plaintext is itself far from random. Here are the numbers resulting when the same test is applied to the ciphertext: when the second distance parameter is 5 (seed length -2), the test returns 17455 (pass), when it is 6 it returns 18334 (fail, but not by much), when it is 7 it returns 17336 (pass). In other words, the correlation is still there, but it is barely detectable after the keystream combines with the plaintext. This was with a particularly “bad” initial random seed; in a majority of cases the result returned for all distances is below 18000. For shorter texts, the probability of a pass increases and it is all but assured for less than 65,000 characters (5 chapters of “Oliver Twist”). And who encrypts messages this long just by hand?

In other words: it should work in theory, but it doesn’t in practice because we won’t have the keystream even if we do have the plaintext. Separating the two is likely equivalent (got to check this) to finding the two permutations of 26 elements involved in combining plaintext and keystream, which is a harder problem than finding the seed length.

So perhaps it would have worked if there hadn’t been any substitutions (or possibly just the second, controlled by Key 1). It turns out, however, that we get the same results if we use straight alphabets on the Tabula Recta. It seems that the triad correlation is so weak by itself that just adding common text manages to dissolve it down to undetectable levels.

If we apply the same test to the plaintext, we obtain values for the first 10 chapters of Oliver Twist that are around 90000, decreasing slowly as the distance between the first two letters and the third increases, since the result is mostly controlled by the strong correlation between the first two letters. The plaintext fails the test of independence, as it should, but not as badly as the keystream, and it does not exhibit the same sharp peak of dependence when the distance between the first and third letters involved in the test is equal to the seed length. After this is combined with the keystream using the Tabula Recta, their very different patterns of dependence seem to fight, rather than reinforce, each other with the result that the ciphertext does not longer show a detectable dependence, at least for messages not inconceivably long.

Does this mean that FibonaRNG cannot be broken even by computer analysis? By no means, but certainly we must be cleverer next time.

That’s a nice analysis man! I totally agree on a lot of it! And yes, it would be very difficult get the information out of the cipher text, however, the math I would use would be a bit more complicated.

I would to a triad analysis on the expected language and use the data from that to combine with the triad analysis on the cipher text (frequency of “similar” letters being those distances from each other combined with value ranges, etc. techincal stuff). This is why I specified knowing the language would help the best.

However, this is a wonderful analysis! Great work man! I look forward to more!