What is randomness?

681fibonacci-1In this article, I go over the meaning of randomness and end up with yet another couple ciphers, based on the Fibonacci series, which become the new champions for paper-and-pencil strength with simplicity.

One of the interesting points raised by the Scrabble cipher (a simplification of the classical Chaocipher where only the ciphertext alphabet is scrambled) is that its alphabet-mixing algorithm produces statistically random ciphertext even if no key is involved. It uses the plaintext’s own entropy to randomize it, and does so efficiently that it appears as if the entropy is increased, from an average value (in English) of 1.58 bit per letter to its maximum of 4.7 bits per letter (for a 26-letter alphabet), effectively a factor of three. The result appears to be quite random as far as statistics go, and yet the operation that produced it can be easily reversed, leading to the lower-entropy plaintext. So where did all that entropy come from and where did it go? Isn’t this a violation of some rule of informational thermodynamics? And what about other things that we think of as random? Are they really so?

Consider a coin toss. If he coin is fair, everyone will consider the result as random, meaning that they have no way to tell whether any toss will result in heads or tails since either outcome will happen just about an equal number of times (but not necessarily equal because of the next condition) and any toss will not be affected at all by previous tosses (this is the only way we could ensure that heads and tails come out exactly the same number of times, so it won’t happen if the process is truly random). But if we just drop the coin flat from half an inch off the ground, the outcome is all but assured. So the “randomness” depends on how we toss the coin or, rather, on our ability to predict what the final position will be given the initial conditions of coin orientation, height above the ground, and initial velocity and spin of the coin. Usually we’ll begin to predict a wrong outcome about half the time way before Heisenberg’s uncertainty principle kicks in, and then we say the toss was fair. It is our inability to make an accurate prediction that makes the process “random” but the coin is still subjected to the same physical laws, and it is possible that a computer having more accurate numbers for those initial conditions might still be able to predict the outcome correctly more than half the time. They say that experienced croupiers have a pretty good idea of what number on the roulette will the winner as soon as they they put the ball in motion: instinctive computing that most of us can only hope for, plus extra data coming from sensors embedded inside their muscles.

It is the same with modern encryption algorithms, which are designed to produce statistically random output—with very stringent conditions for randomness—even if the password is simply “password.” The immense majority of them do not extract any entropy from the plaintext, but rather “expand” the little entropy contained in the key (which typically maxes out at 128 or 256 bits) through a pseudo-random number generator (PRNG) to provide what appears to be 1 bit of entropy per binary bit of plaintext. This is not an infinite source of  information entropy, though, since the pseudo-random sequences they produce begin to repeat when the internal state of the algorithm returns to its original value, but it can be very large: a 128-bit internal state has 3.4E38 different values, and it is theoretically possible that the algorithm may run through all of them before the initial state is reached again, at which point everything would repeat. If we are using 8-bit encoding for characters, that’s 1.33E36 characters, or roughly 1E33 pages of printed text, which as a book would be as thick as our galaxy is wide.

Pseudo-random sources are not really “random” if we mean by that word that the outcome is unpredictable. Quite the contrary, the outcome is perfectly predictable (at least, to some people) and this is what makes it useful, for instance, to encrypt a message. Just add the “random” output to the message, and the result will also look random. Those who know how to produce the “random” stuff, however, will be able to retrieve the original message simply by subtracting back the random stuff, which they can reproduce exactly. Another example: the digits of an irrational number such as the square root of two never repeat (otherwise the number would not be irrational) and each value or particular sequence of digits appears roughly with the same frequency given its length, what is usually termed as being a normal number. As statistically random as you can have it, and yet completely predictable by doing the square root algorithm that some of us learned in school (I guess that dates me badly since it hasn’t been taught for a while 😉

So, the square root algorithm is, in fact, a pseudo-random number generator. There are many of them, some simpler than the square root. Some popular ones for binary streams (not for cryptography, though) are the linear congruential generator, the linear-feedback shift register, the Mersenne Twister, and the lagged Fibonacci generator. For crypto use, the PNRG needs to be non-reversible as well, meaning that knowing the complete sequence of bits produced up to a certain point should give no hint as to what the next bit will be. Generators for non-binary strings, such as written language could be produced from binary ones, by a change of base, or directly by doing the necessary operations in a base other than 2. Some take binary code in chunks; RC4, for instance, takes 8-bit chunks (0 to 255), and is still used a lot today. Some are so simple that they can be done with paper and pencil. I’d like to talk about those next.

In a recent article, I introduced the Scrabble cipher, which is a simplification of the almost-classic Chaocipher. This cipher uses two sets of letter tiles to encrypt any text, drawing from the entropy of the text itself to randomize the tile positions. It turns out you can construct a pretty good PRNG simply by feeding each output letter as the next input letter. Start with two mixed alphabets as key, plus any single letter, for a total of 26!^2*26 = 4.23E54 different streams. Then you can add the stream to any message with the help of a Tabula Recta, which can have two built-in substitutions (input and output), for another 26!^2 combinations. The result will be quite securely encrypted.

If you don’t want to mess around with tiles, you can obtain statistically random keystreams with a lagged Fibonacci generator operating on letters. In this case, the whole thing can be done on a Tabula Recta. You start with two mixed alphabets to do substitutions at the input and output sides of a Tabula Recta, plus a seed. Then look up the first seed letter on the input alphabet and follow that row or column until you find the second seed letter, then orthogonally to the output, and write the letter you see there as the first keystream letter, directly to the right of the seed. Then you take the second and third seed letters and do the same, and do the same with all the letter pairs that follow, which will never run out because you are extending the seed with the keystream. If you want to use this to encrypt a message, add another pair of mixed alphabets (the Tabula Recta has four sides, so all four alphabets fit at the same time) and use them to subtract each plaintext letter (input letter) from the keystream (2nd letter you look for), in order to generate each ciphertext letter. Decryption is the same, except that the last two alphabets must reverse positions (if they are different, that is). I have set this up as a JavaScript program and given it the name FibonaRNG (read as “fibonaring”) because it is based on a lagged Fibonacci PRNG. Even with the straight alphabet for all keys, it manages to produce a statistically random keystream. I have also made a base64 version (last two alphabets are the same so it works identically for encryption and decryption), which I call FiboFile.

Here’s a quick description of how to use FibonaRNG, after we have made a couple of scrambled alphabets, which we have placed at the top and sides of a Tabula Recta (instructions are for encryption; to decrypt, reverse the roles of plaintext and ciphertext, and also reverse the input and output alphabets in step 3). We are assuming that the seed is used straight, without the trick that I discuss further down the article, so set it this way in the program if you want to check what follows:

  1. Take the seed and write it on the line below the plaintext and to the left of it, so it ends right before the plaintext begins on the line above.
  2. Extend the seed into a keystream so all spaces below the plaintext are filled, this way: Look up the first keystream letter still available at the top or bottom of the Tabula Recta, then down until you find the letter that follows it in the keystream, then go left to read a letter on the right or left alphabet, which you will write in the next available position on the keystream. Mark the first keystream letter you looked up so next time you start with the next letter.
  3. Now do the following for each pair of letters consisting of a plaintext letter and the keystream letter right below it: Look up the plaintext letter at the top or bottom of the Tabula Recta, then go up until you find the keystream letter, then right to read a letter on the right or left alphabet, which you will write below the pair of letters you involved in this operation, forming the ciphertext. Alphabet order is reversed for decryption.

Let’s say your keys are “marvelous” and “wonderful” and your seed “awesome”. The mixed alphabets will be these:

MARVELOUSZYXWTQPNKJIHGFDCB
WONDERFULZYXVTSQPMKJIHGCBA

If your plaintext is “To be or not to be, that is the question,” the process produces this working table:

        TOBEORNOTTOBETHATISTHEQUESTION
AWESOME DEQMXTBAWDOSQRDVXKRRCJXBBWDWEN
        NVJSFQBBZNTHWESFYNVEULSHDSNRVX

where the last row is the ciphertext. To decrypt, put the ciphertext at the top row and generate the same keystream in step 2, then combine the two as in step 3 but reversing the positions of the mixed alphabets.

FibonaRNG is at this point the most powerful paper-and-pencil encryption method that I’ve come up with so far, after you give scores for different features, as this article does. It does not require any props so you can use it in a desert island if you like, it uses simple operations involving only two letters at a time. It has a large key space of size 26!^2, equivalent to 177 binary bits even without throwing in the seed and optional transposition. This version uses the same two keys for keystream generation and combining it with the plaintext, whereas an earlier version used two sets of two keys each, which is what seems most appropriate given that the two steps can in fact use different keys, which can also be placed on the sides of the Tabula Recta, thus apparently giving an even larger key space. But there are bugs that make this an illusion. Against a ciphertext-only attack, I have found that using the correct keys and seed for keystream generation but incorrect ones for the second operation combining the keystream with the ciphertext yields a “plaintext” that is less than random, effectively separating the process of finding the keystream keys from that of finding the combination keys. Therefore, a brute force attack only needs to go through 26!^2 permutations of the first two keys, times 26 raised to seed length in order to find the correct set, and then again a comparable amount of time to find the other two keys, so that using a different set of keys for keystream generation and combination with plaintext merely doubles the work involved rather than multiplying it. That is, unless a transposition is added, which makes it very hard to see that you are on the right track for the last two substitution keys unless the transposition key is almost perfect.

When faced with a known plaintext attack, an enemy who obtains the keystream will quickly find the keystream keys and seed, since the LFG algorithm is linear. He/she/it would only need a minimum of 26×2 = 52 keystream letters in order to find keys 1 and and 2, by setting up equations similar to those described in this article. Therefore, the PRNG is not cryptographically secure, according to the normal definition of the term. The difficulty for the attacker, however, lies in finding the keystream to begin with. He/she/it would have to test all combinations of the keys used in combining keystream and plaintext (plus the transposition key, if any) to find trial keystreams, which then would be lead to the keys involved in making the keystream and thus the rest of the keystream. Detecting whether or not a given set of keys is correct is as simple as checking whether the keystream they generate (outside of the positions used to find the keys) matches the actual keystream. There are no shortcuts as for finding the plaintext since all trial keystreams are going to be just as statistically random, so an attacker would have to go through a significant portion of the 26!^2 possibilities, giving the scheme an equivalent resistance of 177 binary bits, comparable to that against a ciphertext-only attack with known seed. Therefore, there is little harm in using the same two keys for both steps in the process. We can’t get much beyond a 26!x2 effective key space size, so it’s simpler to use the same two keys for both phases. This is why the final Javascript program only has two boxes for substitution keys, plus the optional transposition, which multiplies the key space by another large factorial (not 26!, since it is a weakness to use always the same transposition key length), plus the seed.

Now, if the attacker has two different sets of plaintext and ciphertext for the same set of keys, then he can pull off a niftier trick. In other cryptosystems, subtracting the two ciphertexts would eliminate the keystream, leaving a combination of two plaintexts that can be separated by guess and match methods (even if the plaintexts are not known at all). In our case, this approach would not produce a usable “mixed plaintext,” but there are things that can be done if the plaintext is known. Suppose that for a certain position one of the plaintexts has “A” and the other “B”, which correspond respectively to ciphertexts “Y” and “Z,” and let’s further suppose the keystream at that position is “K,” the same for both messages. If the substitution defined by the first mixed alphabet is f(x) and that defined by the second is g(x), then we’ll have (all operations mod 26):

Y = g-1(K – f(A))

Z = g-1(K – f(B))

which can be simplified into this:

g(Y) = K – f(A)

g(Z) = K – f(B)

Subtracting the two equations, we get:

g(Y) – g(Z) = K – f(A) – (K – f(B)) = f(B) – f(A)

The resulting equation links the values of the substitution functions at four different points. There is a total of 25 of those for f(x) and another 25 for g(x) (the last value is forced by the rest), for a total of 50 unknowns in order to determine the substitution alphabets completely. If we repeat this process at 50 carefully chosen points, we will obtain enough equations to solve it. Now, the system of equations is “homogeneous,” meaning that there is no term that does not contain one of the unknowns. Solving this kind of system requires knowing one of the values. No problem, just make a guess for, say, f(A) and obtain all the other values from that. Guess 26 times if you have to. When you replace the values you obtained into the equation obtained for a position different from those in the set, only one of the sets will satisfy the equation no matter what position you choose.

Therefore, a plaintext attack with two sets of plaintext and ciphertext for the same set of keys can yield those keys. But there is a way to fix this problem, and this is by producing a different keystream for each message, which could not possibly by subtracted out as shown above. In this case, the sender comes up with a random string of equal length as the seed, and produces the keystream by running a LFG starting from this string rather than from the seed itself. The random string itself is encrypted by combining it with the seed using the Tabula Recta supplemented with the substitutions, and send along with the ciphertext. The recipient begins by decrypting the random seed by undoing the combination with the Tabula Recta, then he/she generates the original keystream from it, which is then subtracted from the ciphertext. This trick only adds a few operations on the Tabula Recta, but effectively eliminates any vulnerability to a known plaintext attack (so long as the sender does use a fairly random seed every time). Now, some seed choices are known to produce keystreams of poor statistical quality, but since the method is designed to be performed by hand, the sender hopefully will detect this right away and choose a different random seed. The Javascript program uses this process as default.

Lack of security could also arise from using low-entropy passwords to generate the scrambled alphabets that serve as keys, but there is an easy way to concentrate the entropy of a longer passphrase so the scrambled alphabets have an entropy approaching that of truly random text. Each mixed alphabet consists effectively of 25 letters (the value of the last letter is forced by the others in order to complete the set), so let us take a piece of normal text containing 75 letters in order to generate those, this way: write the text in three rows of 25 letters, then take each of the resulting columns and do a “snake” operation on a Tabula Recta with straight alphabets at all the headings; look up the first letter at the top, then go down until you find the second letter, then right or left until you find the third letter, then back up to read the result at the heading alphabet. Here’s a Javascript program that will do this. When doing this for several keys, just take a longer piece of text. Let’s say you want to generate two mixed alphabets plus a seed. Then you’ll start from a text containing more than 150 letters. Divide the length by three and add one to the integer result; this will be the length of each row. Write your entire text into these rows (the last one may be shorter) and do snake operations in each resulting column. You will end up with at least two groups of 25 random-looking letters, which can be used to make scrambled alphabets, plus a few extra letters that can be used as seed.

Leave a Reply

Your email address will not be published. Required fields are marked *