In this article, I go over the meaning of randomness and end up with yet another couple ciphers, based on the Fibonacci series, which become the new champions for paper-and-pencil strength with simplicity.
One of the interesting points raised by the Scrabble cipher (a simplification of the classical Chaocipher where only the ciphertext alphabet is scrambled) is that its alphabet-mixing algorithm produces statistically random ciphertext even if no key is involved. It uses the plaintext’s own entropy to randomize it, and does so efficiently that it appears as if the entropy is increased, from an average value (in English) of 1.58 bit per letter to its maximum of 4.7 bits per letter (for a 26-letter alphabet), effectively a factor of three. The result appears to be quite random as far as statistics go, and yet the operation that produced it can be easily reversed, leading to the lower-entropy plaintext. So where did all that entropy come from and where did it go? Isn’t this a violation of some rule of informational thermodynamics? And what about other things that we think of as random? Are they really so?
Consider a coin toss. If he coin is fair, everyone will consider the result as random, meaning that they have no way to tell whether any toss will result in heads or tails since either outcome will happen just about an equal number of times (but not necessarily equal because of the next condition) and any toss will not be affected at all by previous tosses (this is the only way we could ensure that heads and tails come out exactly the same number of times, so it won’t happen if the process is truly random). But if we just drop the coin flat from half an inch off the ground, the outcome is all but assured. So the “randomness” depends on how we toss the coin or, rather, on our ability to predict what the final position will be given the initial conditions of coin orientation, height above the ground, and initial velocity and spin of the coin. Usually we’ll begin to predict a wrong outcome about half the time way before Heisenberg’s uncertainty principle kicks in, and then we say the toss was fair. It is our inability to make an accurate prediction that makes the process “random” but the coin is still subjected to the same physical laws, and it is possible that a computer having more accurate numbers for those initial conditions might still be able to predict the outcome correctly more than half the time. They say that experienced croupiers have a pretty good idea of what number on the roulette will the winner as soon as they they put the ball in motion: instinctive computing that most of us can only hope for, plus extra data coming from sensors embedded inside their muscles.
It is the same with modern encryption algorithms, which are designed to produce statistically random output—with very stringent conditions for randomness—even if the password is simply “password.” The immense majority of them do not extract any entropy from the plaintext, but rather “expand” the little entropy contained in the key (which typically maxes out at 128 or 256 bits) through a pseudo-random number generator (PRNG) to provide what appears to be 1 bit of entropy per binary bit of plaintext. This is not an infinite source of information entropy, though, since the pseudo-random sequences they produce begin to repeat when the internal state of the algorithm returns to its original value, but it can be very large: a 128-bit internal state has 3.4E38 different values, and it is theoretically possible that the algorithm may run through all of them before the initial state is reached again, at which point everything would repeat. If we are using 8-bit encoding for characters, that’s 1.33E36 characters, or roughly 1E33 pages of printed text, which as a book would be as thick as our galaxy is wide.
Pseudo-random sources are not really “random” if we mean by that word that the outcome is unpredictable. Quite the contrary, the outcome is perfectly predictable (at least, to some people) and this is what makes it useful, for instance, to encrypt a message. Just add the “random” output to the message, and the result will also look random. Those who know how to produce the “random” stuff, however, will be able to retrieve the original message simply by subtracting back the random stuff, which they can reproduce exactly. Another example: the digits of an irrational number such as the square root of two never repeat (otherwise the number would not be irrational) and each value or particular sequence of digits appears roughly with the same frequency given its length, what is usually termed as being a normal number. As statistically random as you can have it, and yet completely predictable by doing the square root algorithm that some of us learned in school (I guess that dates me badly since it hasn’t been taught for a while 😉
So, the square root algorithm is, in fact, a pseudo-random number generator. There are many of them, some simpler than the square root. Some popular ones for binary streams (not for cryptography, though) are the linear congruential generator, the linear-feedback shift register, the Mersenne Twister, and the lagged Fibonacci generator. For crypto use, the PNRG needs to be non-reversible as well, meaning that knowing the complete sequence of bits produced up to a certain point should give no hint as to what the next bit will be. Generators for non-binary strings, such as written language could be produced from binary ones, by a change of base, or directly by doing the necessary operations in a base other than 2. Some take binary code in chunks; RC4, for instance, takes 8-bit chunks (0 to 255), and is still used a lot today. Some are so simple that they can be done with paper and pencil. I’d like to talk about those next.
In a recent article, I introduced the Scrabble cipher, which is a simplification of the almost-classic Chaocipher. This cipher uses two sets of letter tiles to encrypt any text, drawing from the entropy of the text itself to randomize the tile positions. It turns out you can construct a pretty good PRNG simply by feeding each output letter as the next input letter. Start with two mixed alphabets as key, plus any single letter, for a total of 26!^2*26 = 4.23E54 different streams. Then you can add the stream to any message with the help of a Tabula Recta, which can have two built-in substitutions (input and output), for another 26!^2 combinations. The result will be quite securely encrypted.
Here’s a quick description of how to use FibonaRNG, after we have made a couple of scrambled alphabets, which we have placed at the top and sides of a Tabula Recta (instructions are for encryption; to decrypt, reverse the roles of plaintext and ciphertext, and also reverse the input and output alphabets in step 3). We are assuming that the seed is used straight, without the trick that I discuss further down the article, so set it this way in the program if you want to check what follows:
- Take the seed and write it on the line below the plaintext and to the left of it, so it ends right before the plaintext begins on the line above.
- Extend the seed into a keystream so all spaces below the plaintext are filled, this way: Look up the first keystream letter still available at the top or bottom of the Tabula Recta, then down until you find the letter that follows it in the keystream, then go left to read a letter on the right or left alphabet, which you will write in the next available position on the keystream. Mark the first keystream letter you looked up so next time you start with the next letter.
- Now do the following for each pair of letters consisting of a plaintext letter and the keystream letter right below it: Look up the plaintext letter at the top or bottom of the Tabula Recta, then go up until you find the keystream letter, then right to read a letter on the right or left alphabet, which you will write below the pair of letters you involved in this operation, forming the ciphertext. Alphabet order is reversed for decryption.
Let’s say your keys are “marvelous” and “wonderful” and your seed “awesome”. The mixed alphabets will be these:
If your plaintext is “To be or not to be, that is the question,” the process produces this working table:
TOBEORNOTTOBETHATISTHEQUESTION AWESOME DEQMXTBAWDOSQRDVXKRRCJXBBWDWEN NVJSFQBBZNTHWESFYNVEULSHDSNRVX
where the last row is the ciphertext. To decrypt, put the ciphertext at the top row and generate the same keystream in step 2, then combine the two as in step 3 but reversing the positions of the mixed alphabets.
FibonaRNG is at this point the most powerful paper-and-pencil encryption method that I’ve come up with so far, after you give scores for different features, as this article does. It does not require any props so you can use it in a desert island if you like, it uses simple operations involving only two letters at a time. It has a large key space of size 26!^2, equivalent to 177 binary bits even without throwing in the seed and optional transposition. This version uses the same two keys for keystream generation and combining it with the plaintext, whereas an earlier version used two sets of two keys each, which is what seems most appropriate given that the two steps can in fact use different keys, which can also be placed on the sides of the Tabula Recta, thus apparently giving an even larger key space. But there are bugs that make this an illusion. Against a ciphertext-only attack, I have found that using the correct keys and seed for keystream generation but incorrect ones for the second operation combining the keystream with the ciphertext yields a “plaintext” that is less than random, effectively separating the process of finding the keystream keys from that of finding the combination keys. Therefore, a brute force attack only needs to go through 26!^2 permutations of the first two keys, times 26 raised to seed length in order to find the correct set, and then again a comparable amount of time to find the other two keys, so that using a different set of keys for keystream generation and combination with plaintext merely doubles the work involved rather than multiplying it. That is, unless a transposition is added, which makes it very hard to see that you are on the right track for the last two substitution keys unless the transposition key is almost perfect.
Now, if the attacker has two different sets of plaintext and ciphertext for the same set of keys, then he can pull off a niftier trick. In other cryptosystems, subtracting the two ciphertexts would eliminate the keystream, leaving a combination of two plaintexts that can be separated by guess and match methods (even if the plaintexts are not known at all). In our case, this approach would not produce a usable “mixed plaintext,” but there are things that can be done if the plaintext is known. Suppose that for a certain position one of the plaintexts has “A” and the other “B”, which correspond respectively to ciphertexts “Y” and “Z,” and let’s further suppose the keystream at that position is “K,” the same for both messages. If the substitution defined by the first mixed alphabet is f(x) and that defined by the second is g(x), then we’ll have (all operations mod 26):
Y = g-1(K – f(A))
Z = g-1(K – f(B))
which can be simplified into this:
g(Y) = K – f(A)
g(Z) = K – f(B)
Subtracting the two equations, we get:
g(Y) – g(Z) = K – f(A) – (K – f(B)) = f(B) – f(A)
The resulting equation links the values of the substitution functions at four different points. There is a total of 25 of those for f(x) and another 25 for g(x) (the last value is forced by the rest), for a total of 50 unknowns in order to determine the substitution alphabets completely. If we repeat this process at 50 carefully chosen points, we will obtain enough equations to solve it. Now, the system of equations is “homogeneous,” meaning that there is no term that does not contain one of the unknowns. Solving this kind of system requires knowing one of the values. No problem, just make a guess for, say, f(A) and obtain all the other values from that. Guess 26 times if you have to. When you replace the values you obtained into the equation obtained for a position different from those in the set, only one of the sets will satisfy the equation no matter what position you choose.