Since there are so many updates, perhaps it is better to group them by categories, and then mention which apps got each update. The main categories are Cryptography, Steganography, and Compatibility.
Here’s what Manuel Blum suggests you do, which is explained in some detail in this article:
The reason why this works is that you are applying a nonlinear operation (the alphabet mapping) for every input letter, and then another nonlinear operation (the digit permutation) for every output digit. If we call the letters x_{1}, x_{2}, and so forth all the way to x_{n}, the alphabet mapping f(x), the permutation g(x), and the output numbers y_{1}, y_{2}, etc., we have the following equations:
y_{1} = g(f(x_{1}) + f(x_{n}) mod 10) y_{2} = g(f(x_{2}) + y_{1} mod 10) ............. y_{n} = g(f(x_{n}) + y_{n-1} mod 10)
This is quite similar to the operations involved in the FibonaRNG cipher, described not long ago on this blog, except that in FibonaRNG the functions f(x) and g(x) are alphabet permutations that convert letters into other letters, and it’s mod 26 rather than mod 10. So here’s the simplification of Blum’s method that I propose: use letters in all the operations, which means using a Tabula Recta to help with the base 26 operations, but one whose headers have been modified by means of secret permutations (a “Tabula Prava” as described in this article). Since the user will likely end up carrying a printed Tabula Recta in his/her wallet, there is little harm writing in those headers (assuming there is enough security on one’s person) on the table itself, thus avoiding memorization altogether. I will illustrate it with an example to follow the steps:
AMONGOTHERPUBLICBUILDINGSINACERTAINTOWNWHICHFORMAN YREASONSITWILLBEPRUDENTTOREFRAINFROMMENTIONINGANDT OWHICHIWILLASSIGNNOFICTITIOUSNAMETHEREISONEANCIENT -------------------------------------------------- QRRVQHOLEJEMISPEZQCNHXNVXZXPDRJSZKGLTWIVNHTZFKZDKN
from which I obtain the substrings “QRRVQHOLEJEMISPEZQCNHXNVX” and “ZXPDRJSZKGLTWIVNHTZFKZDKN.” The first yields this alphabet: “QRPVOHNLEJDMISKCZGBFAXYUWT”, and the second this one: “ZXPDRJSYKGLTWIVNHQUFEOCBMA” resulting in the Tabula Prava below, which I will use from now on for all operations:
Z X P D R J S Y K G L T W I V N H Q U F E O C B M A --------------------------------------------------- Q | A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | Q R | B C D E F G H I J K L M N O P Q R S T U V W X Y Z A | R P | C D E F G H I J K L M N O P Q R S T U V W X Y Z A B | P V | D E F G H I J K L M N O P Q R S T U V W X Y Z A B C | V O | E F G H I J K L M N O P Q R S T U V W X Y Z A B C D | O H | F G H I J K L M N O P Q R S T U V W X Y Z A B C D E | H N | G H I J K L M N O P Q R S T U V W X Y Z A B C D E F | N L | H I J K L M N O P Q R S T U V W X Y Z A B C D E F G | L E | I J K L M N O P Q R S T U V W X Y Z A B C D E F G H | E J | J K L M N O P Q R S T U V W X Y Z A B C D E F G H I | J D | K L M N O P Q R S T U V W X Y Z A B C D E F G H I J | D M | L M N O P Q R S T U V W X Y Z A B C D E F G H I J K | M I | M N O P Q R S T U V W X Y Z A B C D E F G H I J K L | I S | N O P Q R S T U V W X Y Z A B C D E F G H I J K L M | S K | O P Q R S T U V W X Y Z A B C D E F G H I J K L M N | K C | P Q R S T U V W X Y Z A B C D E F G H I J K L M N O | C Z | Q R S T U V W X Y Z A B C D E F G H I J K L M N O P | Z G | R S T U V W X Y Z A B C D E F G H I J K L M N O P Q | G B | S T U V W X Y Z A B C D E F G H I J K L M N O P Q R | B F | T U V W X Y Z A B C D E F G H I J K L M N O P Q R S | F A | U V W X Y Z A B C D E F G H I J K L M N O P Q R S T | A X | V W X Y Z A B C D E F G H I J K L M N O P Q R S T U | X Y | W X Y Z A B C D E F G H I J K L M N O P Q R S T U V | Y U | X Y Z A B C D E F G H I J K L M N O P Q R S T U V W | U W | Y Z A B C D E F G H I J K L M N O P Q R S T U V W X | W T | Z A B C D E F G H I J K L M N O P Q R S T U V W X Y | T --------------------------------------------------- Z X P D R J S Y K G L T W I V N H Q U F E O C B M A
The process above gives you the equivalent of 177 bits of entropy, which may be overkill in many cases. You can simplify things, at the expense of some of that surplus security, if you use your memorized key phrase (perhaps a simple collection of words) directly to form each alphabet. Again, the algorithm is: write any new letter as it appears, and if the letter has already been written, write the first one preceding it on the alphabet that is still available (cycle to the end if necessary); when you run out of key letters, write the rest of the alphabet in reverse order. Example: “among other public” gives the alphabet “AMONGLTHERPUBKICZYXWVSQJFD”. This process gives you an average 1.58 bits of entropy per letter, or 10 bits per dictionary word.
AMAZON NKIJJKX
I have made a little JavaScript program implementing this process, so you can experiment to your heart’s content. The program also gives some statistical properties of the output Password, so you can check just how “random” it is (and therefore hard to crack). This won’t be very significant for the typical 10 to 20 characters long password, but if you paste in a long challenge (say, a piece of text containing a few thousand characters), you will see that, even though all letters appear with the same frequency, which is good, the string fails the independence test (last one), meaning that some pairs of letters are more frequent than others, a statistical artifact deriving from the challenge being a piece of real text. So don’t use this process to generate a “hash” of the challenge, but it should be fine for passwords. I haven’t checked this, but it is highly likely that Blum’s algorithm suffers from the same defect, so it should not be called a “hash” either.
I have also made a 8.5″ x 11″ graphic containing a Tabula Recta that you can print any time you want, and where you can write in your mixed alphabets. If you want to carry it in your wallet, cut out the extra paper as in the picture below, then fold it so it fits.
Would an attacker who obtains a password so derived be able to compromise other passwords? This would involve coming up with the two mixed alphabets, which is like obtaining the functions f(x) and g(x) from the equations near the top of this article, starting from a knowledge of the result and, presumably, the challenge text from which it is derived. As we saw in this other article, this converts into a system of linear equations where the unknowns are f(“a”) to f(“z”) and g^{-1}(“a”) to g^{-1}(“z”), that is, 50 of them (the last letter in each alphabet is always forced by the others). So 50 equations are needed, which likely would require four or five compromised passwords, not just one. This would buy you time to change the passwords in all your logins, starting from a different set of alphabets.
In the case of Blum’s algorithm, a hacker would need 36 letters, which might come from three or four compromised passwords if everything lines up all right. This by itself is not enough of a problem to prefer the algorithm presented here, however. The clincher is that you need to memorize the parameters for Blum’s method whereas the Tabula Prava you can carry in your pocket. If the mixed alphabets are of the kind that you make directly from key words, you will still have perfect security if you write them in right before using the Tabula, and then erase them. Otherwise you can leave them written in, in which case you should treat that piece of paper as you would treat your house keys. You can also the other side of the paper to write down some frequently used passwords and do away with the whole algorithm until you want to change the passwords again
]]>
May 20, 2016. lifehacker.com Editor-in-Chief Alan Henry likes PassLok enough to entitle his article “PassLok Simplifies Email Encryption so Anyone Can Use It.” If you google some of the words you will find a lot of other articles that derive from it.
January 14, 2017. Mihir Patkar names PassLok as one of the essential “5 Privacy Protecting Apps You Need to Use Right Now” in his article at makeuseof.com. He does call it “PassLock” with an extra c, and says it is a client for PGP (it is not), but we forgive him for the slight inaccuracies because he nails PassLok’s ease of use. Again, a little googling will lead you to other articles obviously inspired from it, like this one, and this other one in Spanish.
]]>To encipher something, you need to have a longish key phrase and a Tabula Recta like the one in the picture (after you straighten it, of course). You can make it by hand, but if you feel lazy you can print this file, which contains the Tabula Recta in a gridded 8’5×11 inch sheet. The key phrase should be long so it contains enough entropy. English text has about 1.58 bits of entropy per letter, and since random text contains 4.7 bits of entropy per letter, this means that you need three letters of common text for each letter of random output. We are going to make two mixed alphabets, each of which is characterized by 25 letters (the last letter is forced by the others), so we need a minimum of 150 text letters. Since we also need to provide a seed containing at least three letters so the keystream does not repeat for a message of regular length, the minimum number of letters is 159. I have written a JavaScript program that automates this where the minimum is set at 180 letters; if the text is too short, it repeats until 180 letters are gathered; if too long, it just repeats until the length is a multiple of three.
This is the complete process (as done in the JavaScript program):
I could give you a worked-out example, but it is easier to run the JavaScript program using whatever key phrase and plaintext you want, and look at the intermediate steps and the output. Notice that, since the process differs slightly for encryption and decryption, there is a radio button on the program page to tell it what we want to do. Try doing it by hand as well; the most time-consuming part is likely to be steps 1 to 3, since you will be generating the same high-entropy mixed alphabets even if the actual plaintext is short.
I have covered the security of this algorithm in this other article, so I won’t repeat it here. I will add a couple points that I haven’t covered before:
If you follow the steps correctly, you will have a cipher of security comparable to that of the best of today’s computer-based ciphers, but involving a minimum of calculations that can actually be done by hand. This cipher came up as the clear winner in a recent article on high-powered low-tech encryption methods.
]]>Hint: it has all to do with the quotes in the first two sentences.
Quick recap first. January 13th, 2017, articles like this one reveal that WhatsApp, the popular messaging app, seems to have the capability to add encryption keys for offline users to any given conversation. This is the sanitized, technical language. In practice, it means that there is a way for messages to be decrypted by a specific third party, simply by adding their key to the encrypted message, without the main parties knowing about it. This is commonly known as a backdoor, which FBI’s Director Comey pushed for on every encryption product available in the US just last year, and is enshrined in a new law in the UK, to say nothing of places with a less stellar record on citizen privacy. But WhatsApp’s end-to-end encryption is based on the critically acclaimed Signal protocol, and it was presented with much fanfare and plenty of oohs and aahs a couple of years ago. The creators of Signal, of course, vehemently deny that this is a backdoor in WhatsApp. What happened here?
Well. . . perhaps the “unthinkable,” which I’m placing in quotes because it is actually not so hard to think of: one thing is code shown on a display case, and quite another code actually running in a computer. The code shown to the public has no backdoors, but the running code might have them. “Open source” doesn’t necessarily mean “actual code.” Surprised?
I’ve said it before, but it bears repeating here given the seriousness of this breach of trust. I’ll put it in capitals to help it sink in: SERVERS ARE NOT TO BE TRUSTED. You can’t see what’s running in a server; it could be running precisely what you don’t want and you’ll never know about it. The moment you involve a “trusted” server in an app or program, and you give it something that needs to be secured, you have in fact forfeited all security.
This is a flaw shared by all server-based solutions. Even the Signal app itself, which so far has been cleared of the charges leveled against its cousin WhatsApp, can be subverted just as easily. Just enter their server by legal or illegal means (software is agnostic about human ethics), and you can make it do exactly what WhatsApp has been doing, or worse. It could be its X3DH key agreement protocol, which relies on a server to store ephemeral public keys (authenticated by a separate channel) and delete them after use so others can’t impersonate their owners. It could be a software update that changes the processing on your very machine (can you read that, either?) so it adds a backdoor before anything else. It could be a dozen other things.
I know this because I was tempted to do it to PassLok only a couple years back. This is when terrorists in San Bernardino had been using a locked iPhone and the FBI was all in a fuss about reading what they might have said. PassLok uses pretty much the same cryptographic primitives as WhatsApp, and it is therefore possible to add a trusted recipient (law enforcement?) to all encrypted messages, even without involving any servers. The trusted recipient could then use its private key to decrypt the messages, just as if it had been encrypted for them. I thought about doing this while in a bout of concerned citizenship, but in the end I concluded that it would be a very bad idea, and so PassLok remains backdoor-free to this day and will remain so as long as I’m in charge of it and the code is public. And, since there is no server and the running code is visible to anyone who may want to look under the hood, you’ve got a pretty good assurance even beyond that.
PassLok is also more secure than Signal because it does not store secret material beyond the message where it needs to be used, which also lessens the “need” for a server (there’s never any real need for a server; everybody else uses a server so they can make money off you). In PassLok’s Read-once mode, a new Diffie-Hellman ephemeral key pair is generated for each outgoing message, while Signal does that only after a message is received. In PassLok, the Diffie-Hellman operation itself is of the old-fashioned Alice-and-Bob-and-nobody-else sort, rather than Signal’s X3DH protocol with a server holding extra keys.
Practical conclusion: use PassLok. You can find links to all its varieties at passlok.com, including Chrome and Firefox extensions that integrate it directly into your web-based email so it is available any time you want to use it but otherwise stays out of the way.
]]>One of the interesting points raised by the Scrabble cipher (a simplification of the classical Chaocipher where only the ciphertext alphabet is scrambled) is that its alphabet-mixing algorithm produces statistically random ciphertext even if no key is involved. It uses the plaintext’s own entropy to randomize it, and does so efficiently that it appears as if the entropy is increased, from an average value (in English) of 1.58 bit per letter to its maximum of 4.7 bits per letter (for a 26-letter alphabet), effectively a factor of three. The result appears to be quite random as far as statistics go, and yet the operation that produced it can be easily reversed, leading to the lower-entropy plaintext. So where did all that entropy come from and where did it go? Isn’t this a violation of some rule of informational thermodynamics? And what about other things that we think of as random? Are they really so?
Consider a coin toss. If he coin is fair, everyone will consider the result as random, meaning that they have no way to tell whether any toss will result in heads or tails since either outcome will happen just about an equal number of times (but not necessarily equal because of the next condition) and any toss will not be affected at all by previous tosses (this is the only way we could ensure that heads and tails come out exactly the same number of times, so it won’t happen if the process is truly random). But if we just drop the coin flat from half an inch off the ground, the outcome is all but assured. So the “randomness” depends on how we toss the coin or, rather, on our ability to predict what the final position will be given the initial conditions of coin orientation, height above the ground, and initial velocity and spin of the coin. Usually we’ll begin to predict a wrong outcome about half the time way before Heisenberg’s uncertainty principle kicks in, and then we say the toss was fair. It is our inability to make an accurate prediction that makes the process “random” but the coin is still subjected to the same physical laws, and it is possible that a computer having more accurate numbers for those initial conditions might still be able to predict the outcome correctly more than half the time. They say that experienced croupiers have a pretty good idea of what number on the roulette will the winner as soon as they they put the ball in motion: instinctive computing that most of us can only hope for, plus extra data coming from sensors embedded inside their muscles.
It is the same with modern encryption algorithms, which are designed to produce statistically random output—with very stringent conditions for randomness—even if the password is simply “password.” The immense majority of them do not extract any entropy from the plaintext, but rather “expand” the little entropy contained in the key (which typically maxes out at 128 or 256 bits) through a pseudo-random number generator (PRNG) to provide what appears to be 1 bit of entropy per binary bit of plaintext. This is not an infinite source of information entropy, though, since the pseudo-random sequences they produce begin to repeat when the internal state of the algorithm returns to its original value, but it can be very large: a 128-bit internal state has 3.4E38 different values, and it is theoretically possible that the algorithm may run through all of them before the initial state is reached again, at which point everything would repeat. If we are using 8-bit encoding for characters, that’s 1.33E36 characters, or roughly 1E33 pages of printed text, which as a book would be as thick as our galaxy is wide.
Pseudo-random sources are not really “random” if we mean by that word that the outcome is unpredictable. Quite the contrary, the outcome is perfectly predictable (at least, to some people) and this is what makes it useful, for instance, to encrypt a message. Just add the “random” output to the message, and the result will also look random. Those who know how to produce the “random” stuff, however, will be able to retrieve the original message simply by subtracting back the random stuff, which they can reproduce exactly. Another example: the digits of an irrational number such as the square root of two never repeat (otherwise the number would not be irrational) and each value or particular sequence of digits appears roughly with the same frequency given its length, what is usually termed as being a normal number. As statistically random as you can have it, and yet completely predictable by doing the square root algorithm that some of us learned in school (I guess that dates me badly since it hasn’t been taught for a while
So, the square root algorithm is, in fact, a pseudo-random number generator. There are many of them, some simpler than the square root. Some popular ones for binary streams (not for cryptography, though) are the linear congruential generator, the linear-feedback shift register, the Mersenne Twister, and the lagged Fibonacci generator. For crypto use, the PNRG needs to be non-reversible as well, meaning that knowing the complete sequence of bits produced up to a certain point should give no hint as to what the next bit will be. Generators for non-binary strings, such as written language could be produced from binary ones, by a change of base, or directly by doing the necessary operations in a base other than 2. Some take binary code in chunks; RC4, for instance, takes 8-bit chunks (0 to 255), and is still used a lot today. Some are so simple that they can be done with paper and pencil. I’d like to talk about those next.
In a recent article, I introduced the Scrabble cipher, which is a simplification of the almost-classic Chaocipher. This cipher uses two sets of letter tiles to encrypt any text, drawing from the entropy of the text itself to randomize the tile positions. It turns out you can construct a pretty good PRNG simply by feeding each output letter as the next input letter. Start with two mixed alphabets as key, plus any single letter, for a total of 26!^2*26 = 4.23E54 different streams. Then you can add the stream to any message with the help of a Tabula Recta, which can have two built-in substitutions (input and output), for another 26!^2 combinations. The result will be quite securely encrypted.
If you don’t want to mess around with tiles, you can obtain statistically random keystreams with a lagged Fibonacci generator operating on letters. In this case, the whole thing can be done on a Tabula Recta. You start with two mixed alphabets to do substitutions at the input and output sides of a Tabula Recta, plus a seed. Then look up the first seed letter on the input alphabet and follow that row or column until you find the second seed letter, then orthogonally to the output, and write the letter you see there as the first keystream letter, directly to the right of the seed. Then you take the second and third seed letters and do the same, and do the same with all the letter pairs that follow, which will never run out because you are extending the seed with the keystream. If you want to use this to encrypt a message, add another pair of mixed alphabets (the Tabula Recta has four sides, so all four alphabets fit at the same time) and use them to subtract each plaintext letter (input letter) from the keystream (2nd letter you look for), in order to generate each ciphertext letter. Decryption is the same, except that the last two alphabets must reverse positions (if they are different, that is). I have set this up as a JavaScript program and given it the name FibonaRNG (read as “fibonaring”) because it is based on a lagged Fibonacci PRNG. Even with the straight alphabet for all keys, it manages to produce a statistically random keystream. I have also made a base64 version (last two alphabets are the same so it works identically for encryption and decryption), which I call FiboFile.
Here’s a quick description of how to use FibonaRNG, after we have made a couple of scrambled alphabets, which we have placed at the top and sides of a Tabula Recta (instructions are for encryption; to decrypt, reverse the roles of plaintext and ciphertext, and also reverse the input and output alphabets in step 3). We are assuming that the seed is used straight, without the trick that I discuss further down the article, so set it this way in the program if you want to check what follows:
Let’s say your keys are “marvelous” and “wonderful” and your seed “awesome”. The mixed alphabets will be these:
MARVELOUSZYXWTQPNKJIHGFDCB
WONDERFULZYXVTSQPMKJIHGCBA
If your plaintext is “To be or not to be, that is the question,” the process produces this working table:
TOBEORNOTTOBETHATISTHEQUESTION AWESOME DEQMXTBAWDOSQRDVXKRRCJXBBWDWEN NVJSFQBBZNTHWESFYNVEULSHDSNRVX
where the last row is the ciphertext. To decrypt, put the ciphertext at the top row and generate the same keystream in step 2, then combine the two as in step 3 but reversing the positions of the mixed alphabets.
FibonaRNG is at this point the most powerful paper-and-pencil encryption method that I’ve come up with so far, after you give scores for different features, as this article does. It does not require any props so you can use it in a desert island if you like, it uses simple operations involving only two letters at a time. It has a large key space of size 26!^2, equivalent to 177 binary bits even without throwing in the seed and optional transposition. This version uses the same two keys for keystream generation and combining it with the plaintext, whereas an earlier version used two sets of two keys each, which is what seems most appropriate given that the two steps can in fact use different keys, which can also be placed on the sides of the Tabula Recta, thus apparently giving an even larger key space. But there are bugs that make this an illusion. Against a ciphertext-only attack, I have found that using the correct keys and seed for keystream generation but incorrect ones for the second operation combining the keystream with the ciphertext yields a “plaintext” that is less than random, effectively separating the process of finding the keystream keys from that of finding the combination keys. Therefore, a brute force attack only needs to go through 26!^2 permutations of the first two keys, times 26 raised to seed length in order to find the correct set, and then again a comparable amount of time to find the other two keys, so that using a different set of keys for keystream generation and combination with plaintext merely doubles the work involved rather than multiplying it. That is, unless a transposition is added, which makes it very hard to see that you are on the right track for the last two substitution keys unless the transposition key is almost perfect.
When faced with a known plaintext attack, an enemy who obtains the keystream will quickly find the keystream keys and seed, since the LFG algorithm is linear. He/she/it would only need a minimum of 26×2 = 52 keystream letters in order to find keys 1 and and 2, by setting up equations similar to those described in this article. Therefore, the PRNG is not cryptographically secure, according to the normal definition of the term. The difficulty for the attacker, however, lies in finding the keystream to begin with. He/she/it would have to test all combinations of the keys used in combining keystream and plaintext (plus the transposition key, if any) to find trial keystreams, which then would be lead to the keys involved in making the keystream and thus the rest of the keystream. Detecting whether or not a given set of keys is correct is as simple as checking whether the keystream they generate (outside of the positions used to find the keys) matches the actual keystream. There are no shortcuts as for finding the plaintext since all trial keystreams are going to be just as statistically random, so an attacker would have to go through a significant portion of the 26!^2 possibilities, giving the scheme an equivalent resistance of 177 binary bits, comparable to that against a ciphertext-only attack with known seed. Therefore, there is little harm in using the same two keys for both steps in the process. We can’t get much beyond a 26!x2 effective key space size, so it’s simpler to use the same two keys for both phases. This is why the final Javascript program only has two boxes for substitution keys, plus the optional transposition, which multiplies the key space by another large factorial (not 26!, since it is a weakness to use always the same transposition key length), plus the seed.
Now, if the attacker has two different sets of plaintext and ciphertext for the same set of keys, then he can pull off a niftier trick. In other cryptosystems, subtracting the two ciphertexts would eliminate the keystream, leaving a combination of two plaintexts that can be separated by guess and match methods (even if the plaintexts are not known at all). In our case, this approach would not produce a usable “mixed plaintext,” but there are things that can be done if the plaintext is known. Suppose that for a certain position one of the plaintexts has “A” and the other “B”, which correspond respectively to ciphertexts “Y” and “Z,” and let’s further suppose the keystream at that position is “K,” the same for both messages. If the substitution defined by the first mixed alphabet is f(x) and that defined by the second is g(x), then we’ll have (all operations mod 26):
Y = g^{-1}(K – f(A))
Z = g^{-1}(K – f(B))
which can be simplified into this:
g(Y) = K – f(A)
g(Z) = K – f(B)
Subtracting the two equations, we get:
g(Y) – g(Z) = K – f(A) – (K – f(B)) = f(B) – f(A)
The resulting equation links the values of the substitution functions at four different points. There is a total of 25 of those for f(x) and another 25 for g(x) (the last value is forced by the rest), for a total of 50 unknowns in order to determine the substitution alphabets completely. If we repeat this process at 50 carefully chosen points, we will obtain enough equations to solve it. Now, the system of equations is “homogeneous,” meaning that there is no term that does not contain one of the unknowns. Solving this kind of system requires knowing one of the values. No problem, just make a guess for, say, f(A) and obtain all the other values from that. Guess 26 times if you have to. When you replace the values you obtained into the equation obtained for a position different from those in the set, only one of the sets will satisfy the equation no matter what position you choose.
Therefore, a plaintext attack with two sets of plaintext and ciphertext for the same set of keys can yield those keys. But there is a way to fix this problem, and this is by producing a different keystream for each message, which could not possibly by subtracted out as shown above. In this case, the sender comes up with a random string of equal length as the seed, and produces the keystream by running a LFG starting from this string rather than from the seed itself. The random string itself is encrypted by combining it with the seed using the Tabula Recta supplemented with the substitutions, and send along with the ciphertext. The recipient begins by decrypting the random seed by undoing the combination with the Tabula Recta, then he/she generates the original keystream from it, which is then subtracted from the ciphertext. This trick only adds a few operations on the Tabula Recta, but effectively eliminates any vulnerability to a known plaintext attack (so long as the sender does use a fairly random seed every time). Now, some seed choices are known to produce keystreams of poor statistical quality, but since the method is designed to be performed by hand, the sender hopefully will detect this right away and choose a different random seed. The Javascript program uses this process as default.
Lack of security could also arise from using low-entropy passwords to generate the scrambled alphabets that serve as keys, but there is an easy way to concentrate the entropy of a longer passphrase so the scrambled alphabets have an entropy approaching that of truly random text. Each mixed alphabet consists effectively of 25 letters (the value of the last letter is forced by the others in order to complete the set), so let us take a piece of normal text containing 75 letters in order to generate those, this way: write the text in three rows of 25 letters, then take each of the resulting columns and do a “snake” operation on a Tabula Recta with straight alphabets at all the headings; look up the first letter at the top, then go down until you find the second letter, then right or left until you find the third letter, then back up to read the result at the heading alphabet. Here’s a Javascript program that will do this. When doing this for several keys, just take a longer piece of text. Let’s say you want to generate two mixed alphabets plus a seed. Then you’ll start from a text containing more than 150 letters. Divide the length by three and add one to the integer result; this will be the length of each row. Write your entire text into these rows (the last one may be shorter) and do snake operations in each resulting column. You will end up with at least two groups of 25 random-looking letters, which can be used to make scrambled alphabets, plus a few extra letters that can be used as seed.
]]>So let’s start with the desirable features of this kind of cipher. A few come to mind right away:
And then, there are some other desirable traits that are not so obvious. For instance:
Of course, not all properties listed have equal importance. Resistance against a ciphertext-only attack is paramount, but it can be further decomposed as properties on the second list. Speed and simplicity are very important too, for otherwise people won’t use the cipher. Resistance to error is important because no machines are supposed to be involved, which means errors. This is related to simplicity: the more steps, the greater opportunity for error. I have attempted to weigh these properties in a reasonable matter when I made the table below. You may disagree with these weights, but then you can re-calculate the results using weights more of your liking.
A couple notes before introducing the contestants. The first is that only symmetric ciphers are considered. I don’t know of any asymmetric ciphers that can be done by hand and give decent security. Second: I am not considering one-time pads and their variants. These are a category apart, for two reasons:
Ladies and gentlemen, here are the contestants for this comparison: Bruce Schneier’s Solitaire, a 26-letter version of the RC4 algorithm, the Aguilar cipher (3 or 6 transpositions), the classic VIC cipher used during the Cold War, Handy Cipher by Bruce Kallick, Chaocipher by John F. Byrne, and my own SuperSkink, Scrabble, and FibonaRNG ciphers. Let’s take a look at them one by one (links are on the titles):
This cipher was first published as an appendix to the novel “Cryptonomicon”, by Neal Stephenson. In the novel, it takes the name Pontifex, and is used by some characters to communicate while in jail. It is a stream cipher that uses a French card deck, containing 52 numbered cards plus two Jokers, to generate a base26 keystream that that is added to the plaintext (or maybe subtracted from it, so the process is reversible). It has been looked at by experts, who have uncovered significant bias in the supposedly random output of the keystream generation algorithm, but this flaw is only of concern for long messages so I will give Solitaire a full security score. It is however, a very slow process involving counting of cards, so that one can produce about one keystream character per minute. Additionally, the deck must be scrupulously kept in sync on both ends of the encryption, or further communication is impossible; no way to go back if a mistake is made.
Like all stream ciphers, Solitaire must use a new key for each new message, or both the old and the new messages would be easy prey to an attacker. The key consists of the state of a shuffled deck, which is hard to describe simply and about impossible to memorize. It does have, however, a significant key space of size 26!*26*26 = 2.7E29, corresponding to about 98 binary bits (only two suits are used, rather than four).
RC4 (Shevek’s or Mindflare’s version)
In some very recent posts, people have taken the RC4 stream cipher, originally designed to work with a 256-bit “alphabet” and have modified it so it uses the regular 26-letter English alphabet. The result is a stream cipher with an excellent track record, if the qualities of RC4 translate with the conversion. Once you get to the algorithm itself, it looks a lot like a simplification of Solitaire, which makes me suspect that it might share its undesirable bias. In essence, a mixed alphabet containing all 26 character is used as key, and this is shifted through a few simple operations, after which a single pseudo-random character is produced as output. Like Solitaire, RC4 is a stream cipher and must use a new key for each new message, The key space is slightly smaller than Solitaire’s, since the pointers (the Jokers, in Solitaire) always start at the beginning, leading to a total of 26! possibilities, which is about 88.7 binary bits.
This cipher has three-stage and six-stage versions that are conceptually quite similar. The plaintext is written down by rows in a special form containing blanked-out spaces, and is read out by consecutive columns, sometimes after the letters have been subtracted modulo 26 from the key. The ciphers uses one 43-character “mixed alphabet” as permanent key, and a similar one as initialization vector, which is transmitted with the ciphertext. The permanent key is meant to be random-looking and written on a strip of paper so it can be easily swallowed in case of need. The size of the key space is, therefore, 43! = 6E52, or 176 binary bits, which is quite respectable.
The Aguilar cipher consists essentially of straight unkeyed transpositions that are quite easy to reverse, though sometimes the key or the initialization vector are subtracted mod 43, as many times as necessary to complete multiples of 43 characters. This is quite similar to the standard Vigenère cipher, which today is well known to be insecure. Because of this, I am reducing its security score to about half of the maximum. Speed is better than the previous two, but not stellar because of the relatively large number of steps required. These steps are different from each other, which is less than ideal from the simplicity viewpoint. In addition, the cipher needs special forms, which could not have other uses. A good feature, however, is that the key can be reused, especially since the random initialization vector is changed for each message. Mutability is assured by using forms with different patterns of blocked-out cells.
This cipher was in use by Soviet spies during the Cold War. Even though it is a classical cipher, it defied all of the US intelligence agencies’ attacks until it was revealed by a defector. It is, however, quite complicated, involving several steps of key-stretching by way of a not particularly secure lagged Fibonacci generator plus a couple transpositions, one of them of the disrupted type. It security lay especially in that the method was not known by the intelligence agencies until it was revealed. The keys used in the cipher were short, so that its key space is also small.
From the description of the cipher, we know that it is a stream cipher, with all their advantages and disadvantages, and that its key space is of size 31! or 113 bits. It does include a homophonic nomenclator with 3045 entries, but this is likely to remain fixed so it does not constitute key material (though it adds some mutability if those are changed). The encryption process is quite complicated, partly due to the fact that two different messages can be encrypted at the same time, under two separate keys. Being a stream cipher keys cannot be reused, and there is no allowance for initialization vectors that might make this somewhat better. Apparently no one has studied its security in a serious way, but the definition paper includes some theoretical attacks. It states that the security of the cipher against hill-climbing attacks depends on the randomness of nulls that are supposed to be inserted at various times in the process. Therefore, I am giving it a high but less than perfect score in the security area.
This was invented by John Byrne in 1918 and kept secret until 2010, 50 years beyond Byrne’s death. The description that has come down to us involves two wheels with the alphabet around them, the letters placed in such a way that they can be switched by pairs. One wheel contains the alphabet used for the plaintext, the other for the ciphertext. Users look up each letter on one wheel, then read off the ciphertext letter on the other wheel, and then both wheels are rotated and two letters switched according to a fixed pattern. Byrne’s working prototype, if it did exist, has not survived to be analyzed, but the underlying cipher gleaned from the papers is quite strong and relatively simple. It is, however, vulnerable to a known plaintext attack, unless some simple precautions are used. Keyspace size is claimed to be 26!ˆ2 = 1.62E53, or 177 binary bits, but analysis reveals that the key encoded in one of the wheels adds hardly any security, so it is more like 26! 0r 88.7 bits.
This is one of my ciphers based on “snake” operations on a Tabula Recta, plus a regular substitution after that in order to combat a known plaintext attack. The snake operations involve four letters: three from the plaintext and one from the ciphertext, with most of the effort being invested in looking up those letters as they are entered, which is quite fast in any case. It uses two substitution keys plus a transposition key for a maximum key space of size 26!^3, equivalent to 266 binary bits. It can be done on a piece of paper with a standard Tabula Recta printed on it.
This is also mine, deriving from Byrne’s Chaocipher but greatly simplified so that you only need a set of two alphabets made of letter tiles. Unlike Super Skink, Scrabble can be made impervious to known plaintext attacks without the transposition step simply by prepending a number of random characters to the true plaintext prior to encryption, so it’s a one-step cipher. In this configuration, the key space is a smallish 26!, or 88.7 bits, which becomes twice that if a transposition is added. This cipher involves very little mental effort, but it does require a set of alphabet tiles to be executed properly, which lets Super Skink take the lead in this survey.
My most recent paper-and-pencil cipher, it is based on a lagged Fibonacci pseudo-random number generator rather than on the entropy of the plaintext itself, which makes it more suitable for low-entropy plaintexts, like lists of numbers or even files. It has a basic computational complexity similar to that of Skink (Super Skink minus the transposition) since it does not need a transposition step in order to avert a known plaintext attack, which ends up making it simpler. It is simply amazing that this works at all. The key space is a respectable 26!^2*26^n, where n is the length of the seed. Like Super Skink, it does not need any props beyond a Tabula Recta, which can be made up by hand if necessary.
And here is the summary of all scores:
maximum | Solitaire | Manual RC4 | Aguilar | VIC | HandyCipher | Chaocipher | SuperSkink | Scrabble | FibonaRNG | |
Ciphertext-only resistance | 50 | 50 | 50 | 30 | 40 | 40 | 50 | 50 | 50 | 50 |
Known-plaintext resistance | 10 | 10 | 10 | 5 | 5 | 10 | 0 | 10 | 10 | 10 |
Simplicity | 10 | 5 | 10 | 5 | 0 | 0 | 5 | 10 | 10 | 10 |
Speed | 30 | 0 | 5 | 10 | 10 | 10 | 20 | 20 | 30 | 30 |
Key Space | 20 | 5 | 5 | 15 | 5 | 10 | 10 | 20 | 10 | 20 |
Resistance to error | 20 | 0 | 0 | 0 | 0 | 10 | 5 | 10 | 10 | 15 |
Key reusability | 20 | 0 | 0 | 20 | 20 | 0 | 20 | 20 | 20 | 20 |
Multiplicity | 10 | 0 | 0 | 10 | 0 | 10 | 5 | 10 | 10 | 10 |
Memorizable key | 10 | 0 | 0 | 0 | 10 | 0 | 10 | 10 | 10 | 10 |
Mutability | 10 | 0 | 0 | 10 | 5 | 5 | 5 | 10 | 10 | 10 |
Prop-free | 10 | 0 | 10 | 5 | 10 | 10 | 0 | 10 | 0 | 10 |
Total Score | 200 | 70 | 90 | 110 | 105 | 105 | 130 | 180 | 170 | 195 |
Unsurprisingly, my own ciphers come up on top They do so well because they have been designed with all of the above requirements in mind. Maybe you can come up with a different list of requirements where a different cipher will do better (human-computable only, of course). FibonaRNG comes out as clear winner, but in a fairly quiet situation where you have been able to procure yourself a set of letter tiles (two complete alphabets), Scrabble may be just as good because it requires less mental effort, although its key space is not so large. You can always increase the key space by adding transpositions.
]]>Before I introduce the Scrabble cipher, let me start citing the main reason why I think Chaocipher runs so well: unlike ciphers based on a straight alphabet that gets shifted around based on entropy collected from a pseudo-random number generator or the plaintext itself (as in the Autokey cipher, and those in the Serpentacci family), the Chaocipher alphabets get internally jumbled, so all permutations are possible. In Chaocipher, the entropy added by each new plaintext letter keeps the alphabets from repeating.
But Chaocipher is more complicated than it needs to be, based on what it does. For instance:
To test all this, I started by building a Javascript program to simulate the operation of the Chaocipher, with a few additions:
Playing with this while encrypting large pieces of Dickens’s “Oliver Twist” and doing some reading on Chaocipher I discovered the following:
I made a number of Javascript prototypes so I could encrypt large chunks of text (usually taken from the Gutenberg project, in several Western languages) in a second or two, and made little changes here and there as I also played with a set of Bananagrams tiles (similar to Scrabble, but without number points) on top of a blank ruler. Here’s a picture of my setup, with key “marvelous” set on the ciphertext alphabet.
Initially I moved the tiles in large groups, much like Chaocipher, which tended to be tricky because the tiles did not want to stay together, but eventually I discovered that I could obtain pretty good randomness by swapping two contiguous vertical groups plus shifting one alphabet just by one tile, which minimized both the work and the chances of making a mistake. So here is the final Javascript app for the Scrabble cipher, and now the description, which uses the default values in the program:
To decrypt, do step 1 as above so the generated alphabets are the same as for encryption, then do a reverse transposition using the appropriate alphabet, if a transposition was done for encryption. Then do step 3 except that you’ll be looking up ciphertext letters in the bottom alphabet and writing out as output the corresponding plaintext letters in the top alphabet. If the result starts with twenty gibberish characters, you can just ignore them.
Without transposition, the key space has 26! possibilities, which is equivalent to 88.7 binary bits. Not huge, but adequate for many situations. As I said earlier, scrambling the plaintext alphabet with an additional key does not increase security at all, so it’s better to start with a straight alphabet for the plaintext. Adding a transposition doubles that, for a relatively small increase in the total amount of work. This works because the output of the letter tile process is already indistinguishable from random, and so is the “plaintext” obtained with a wrong ciphertext alphabet, even if off by a single letter. Thus, it is not possible to tell when the correct transposition key has been used in a trial decryption, unless the ciphertext alphabet set with the tiles is correct as well. A second transposition under a different key adds another 88.7 bits, because successive transpositions do not combine into a simple transposition. Substitutions do combine, however, so that adding more substitutions on top of the one built into the ciphertext alphabet would not increase the key space, even if separated by transpositions.
In fact, you can obtain exactly the same ciphertext if you set a straight alphabet on the bottom row of the setup, and then apply the substitution represented by the key at the end of everything. The randomization of the ciphertext, therefore, is entirely due to the plaintext itself, not to the initial position of the ciphertext alphabet. The process works because common text contains some randomness (usually measured in “bits of entropy”), which is constantly being added to randomize the alphabets. English contains about 1.56 bits of entropy per letter, which is approximately one-third of what a perfectly random series of letters would contain. The process involving the tiles etc. randomizes the plaintext by itself while remaining reversible.
We have encountered a similar situation before. The Visionnaire cipher combines each plaintext letter with a previous one by subtraction using a Tabula Recta. The result, however, is less than perfectly random. It is remarkable that the Scrabble cipher manages to do so well, and this involving the entropy supplied by only one letter at a time instead of two. I think this is because it actually disturbs the relative order of the letters within the ciphertext alphabet, rather than simply shifting it around. The swap step is what does the trick.
The original Chaocipher is vulnerable to a known plaintext attack, but it is easy to defeat it simply by prepending a number of random characters to the plaintext prior to encryption, and this also applies to Scrabble. An attacker will only be able to obtain the ciphertext alphabet at the point where the proper plaintext begins, but this is not the key. To move one step backward he will have to guess the previous plaintext letter, which now is random so there is no way to get it except by guessing. The possibilities multiply as he goes further back, and they become larger than the number of possible keys when 26^n = 26!, where n is the number of gibberish letters. Solving this equation gives n = 18.8, so nineteen gibberish letters are enough. The spec for the Scrabble cipher is twenty, for a little added security.
Of course, adding a transposition after the main encryption has the same effect since then an attacker won’t be able to match the ciphertext letters to those in the known plaintext, so that in this case the gibberish letters at the beginning wouldn’t really add any security and are better skipped. But a transposition, although fast compared to the moving tile process, still would take more effort than simply adding a few extra letters to a long text.
The program allows you to swap two groups of letters different from those in the description above (besides allowing you to shift the top alphabet forward rather than backward, if you so choose), but this works well only if the distance between the groups to be swapped is an odd number other than 13. The reason is that 2 nd 13 are the factors that make up 26, the length of the alphabets. Letters separated by a multiple of 2 or 13 positions will never swap with letters not in those sets, leading to imperfect mixing of the alphabets. If the alphabet were to contain 27 letters, as is common in some Western languages, then the bad intervals would be all multiples of three.
Let me now address point 4 on the first list. Can we achieve decent security working with a single alphabet rather than two? It turns out we do, almost, and this is what I’m going to address next. In this case you write out the plaintext alphabet, which is fixed, right on the rules, and use tiles just for the ciphertext alphabet. Here’s a picture of the setup for what I call “Half Scrabble” cipher:
The process is the same as above, except that you don’t swap groups of two tiles since there is only one alphabet, and shift the ciphertext alphabet instead. The key space has the same size, and there’s also a Javascript model of it, with optimized parameters. It turns out that Half Scrabble does not work well for certain sets of parameters, while two-alphabet Scrabble always works well so long as the distance between swapped letters is not a multiple of 2 or 13. The default values in the program are those that are easiest to use with good performance for English text. If you are encrypting Spanish, for instance, you will want to swap the letter tile you wrote with the next tile, rather than the preceding one. In French, you can do exactly as in English with good results. This difference is due to the letter frequency distributions proper to each language, which affect the alphabet mixing process. Here’s a list of optimal settings for several Western languages (other values also work with little performance impact):
Letter 1 | Letter 2 | Alphabet shift | |
English | 0 | 25 | fwd |
Spanish | 0 | 1 | fwd |
French | 0 | 25 | fwd |
German | 0 | 1 | bkwd |
Italian | 0 | 25 | bkwd |
Portuguese | 0 | 25 | bkwd |
Dutch | 0 | 25 | bkwd |
Latin | 0 | 25 | bkwd |
In all cases, one of the letters to be swapped is always the one just written, and it swaps with a letter next to it, whether on the right (1) or on the left (25). Sometimes this leads to the letter just written ending up at the same absolute position, so that a repeated plaintext letter will produce a repeated ciphertext letter. This artifact is undesirable because it leaks some information about the plaintext and could be used by an attacker, for instance, to mount a chosen plaintext attack. It would work this way: generate some chatter that includes some names containing double letters separated by known intervals. When the enemy uses those names in their transmission, it becomes easy to spot their location on the ciphertext because of the double letters, which allows the attacker to obtain some of the ciphertext alphabet in use at that point in the message. If this happens a few times, there may be enough to complete the alphabet, which will allow the decryption of the rest of the message from the point where the alphabet is complete.
Another quirk is that the output can be less than perfectly random even with optimized parameters, especially if the plaintext is long (over 20,000 letters). You can always check this if using the program, but obviously this cannot be done easily if encrypting by hand. But again, likely we’re not going to be encrypting by hand anything that long, so this might bot be much of a problem in practice.
Let me finish with a historical note. John F. Byrne spent his entire life trying to get the US Government to use his Chaocipher, which lead to some correspondence between him and William F. Friedman, father of the modern statistical methods used in cryptography. The correspondence spanned several decades, but the last piece of it, a letter from Friedman dated March 3, 1957, contains his most conclusive indictment of the system. He said, according to Moshe Rubin, that he “will make no attempt at solving Exhibit #4, not because he feels he can’t do it, but because it would serve no purpose. Informs Byrne ‘hand-ciphers’ are passé, no government would be interested. Also advises belief that Chaocipher is not indecipherable, and suggests Byrne’s algorithm has been ‘thought of before’ by Engineers.” Since the Scrabble cipher derives directly from Chaocipher, it would seem that Friedman’s unfavorable opinion should also extend to it. Nevertheless, the Javascript version of Scrabble (and also of Chaocipher) manages to beat Friedman’s own detection statistics—and some even more sensitive—with considerable ease for any normal text. You be the judge. Test it by hand or by computer, and form your own opinion. Who knows, you may need it someday.
]]>I don’t think anyone will argue seriously that this doesn’t matter because he/she/(it?) has got nothing to hide. If this were true nobody would wear clothes in summer. Everyone has things that at least would be embarrassing to see aired in public. To say nothing of banking information, taxes, or medical stuff. This has always been the case since the invention of speech (that’s way before writing), and has given rise to jargon, code, and cipher of different kinds. After millennia of competition between those, cipher seems to have emerged victorious as the strongest tool for security in communications.
Here “cipher” is used to mean a perhaps mathematically-heavy manipulation of individual symbols (letters in classical ciphers, binary digits in modern computer ciphers) that render the message unintelligible, but which nonetheless can be reversed by those possessing the knowledge of a special “key,” again, made of letters or binary digits. Classical ciphers have evolved over time, from the very simple Caesar cipher where a constant alphabetical shift is applied to all letters, which seems laughable to day but apparently worked quite well for Julius Caesar, to pre-computer spy ciphers like VIC, which the NSA was unable to crack until the secret was revealed by a defector.
Except for a privileged few, all of those classical ciphers were broken by “the enemy,” meaning a group of highly intelligent and motivated individuals—usually because a war was going on at the time—often in the payroll of a major government. If a cipher can be compared to armor around a secret, those guys and their tools were bullets. Traditionally, bullets always won against armor. These “bullets” were magnificent indeed, and went as far as building the first electronic computers as tools to crack ciphers such as the German Enigma.
But then something happened. Computers began to be used to produce better armor as well, and here math played on their side. Every time they complicated the cipher so that, for instance, an additional letter could be used in the key, this added a small percentage of additional effort when enciphering, but to those trying to decipher the message without the key, it multiplied their effort by 26 times. Of course, computers use binary rather than letters, but the image still illustrates what was going on. Today we are living the shift from 128 to 256 bits for most binary keys, which perhaps doubles the amount of enciphering computation, but makes the work of a brute-force cracker 2^128 = 3.4E38 times harder. That’s 38 zeros, folks, the English language does not have a word for a number that large, or even a fraction of that, so forget about it.
Quantum computing? No problem. Double or quadruple the encrypting effort, taking advantage of the new technology. Then cracking effort multiplies by a factor that I don’t dare to guess but is sure to be absolutely huge. Armor wins again, perhaps by a wider margin than ever.
Okay, so then where’s the But?
The big But (no pun intended, of maybe it is; you decide) is that we feel less private than ever, perhaps because in fact we are. We are being asked to supply all kinds of personal information in order to access services as trivial as storing vacation pictures or voting on who the coolest band is. Older folks feel out of control because they are uncomfortable with the technology. Younger folks feel just as out of control because they sense that all that data collection is not being done for their benefit. And indeed they are right. The largest beneficiaries are the services themselves who, under pretense of a “more satisfying user experience” are running social experiments (Facebook, are you listening?) or simply selling that data to advertisers. Whoever is not receiving “targeted” spam these days please raise your hand. If that wasn’t enough, major hacks happen every day, and the public is quickly losing the sense that there is any privacy anymore.
Because everybody, including the young or even “experts” wearing Grateful Dead t-shirts day and night, are being forced to trust the untrustworthy, every minute they spend online. Great security on paper, but running somewhere where users cannot possibly verify it and therefore trust it with any level assurance.
It doesn’t have to be this way.
Some of us still remember the time when computers were really expensive and all you could buy was a dumb terminal that connected you to a large (for the time) computer shared by many. Then hardware got cheaper and people moved processing to their own machine, where they could keep some control over it. And then the Internet got faster and everything moved back to “the Cloud”. The only real cloud here is that inside users’ heads. That “Cloud” that users trust so implicitly (perhaps because increasingly they aren’t given any options), is in reality a big computer owned by a big corporation that needs to make money from your data. Your touch-screen ultra-high resolution computer is merely passing everything along, its mass storage filled with code that you nothing about and its processor largely occupied with the task of popping ads at convenient times (for the advertiser) and collecting usage data to be sent to its big brother. Sounds reassuring?
So am I advocating to go back to pigeon mail? No, just to educate yourself and act consequently. For instance:
In other words, stop being lazy and please drop that willful ignorance about what is happening to your data. If you want control, you can get it. Just don’t expect Big Brother to tell you how to do it.
]]>Burned by the disappointment to find übchi broken, they decided to replace one of the two rounds of transposition with a classic Vigenère step, where a repeating and constant “ABC” key was added to the letters before the remaining single transposition. Since the addition key was always the same, the step did not add any security, so that the ABC cipher was in fact little more a single transposition, much easier to break than a double transposition like übchi. They must have had a personnel problem back there, because otherwise it is hard to fathom why they would have done such a stupid change. Painvin lost no time, and before the end of January, 1915, he had produced a method that would recover the transposition key from a single message encrypted with ABC. Once again, the Germans got wind of this and this time they replaced ABC with something stronger, which Painvin and associates eventually broke, as well as all their replacements all the way to the end of the conflict.
But what if instead they had fixed the essential problems of the ABC cipher? These are two:
In a recent article, I discussed some extensions of Blaise the Vigenère’s classic autokey cipher, which uses the plaintext itself as a running key, after an initial “seed” key. Using “snake” operations on a standard Tabula Recta, one can involve several letters in the calculation of each ciphertext letter, rather than just two, with little extra effort. Three is a lot better than two, and four already gives a result that is hard to distinguish from random. Additionally, one can replace the alphabets on the sides of the Tabula Recta with mixed alphabets based on a key, and this increases the difficulty of cryptanalysis enormously with, again, little extra effort, as this paper attests to.
In this article, I show some results from combining the Skink cipher (4-letter operations, non-reciprocal) with a transposition, much like ABC combined the Vigenère cipher with a transposition. The Skink cipher by itself produces output that is close to random, but it has two lingering problems:
Adding a transposition step helps a lot with the second problem, since an attacker won’t know a priori which ciphertext letters correspond to which plaintext letters but, as we will see, it also helps with the first problem. To test this, I encrypted the first four chapters of Dickens’s “Oliver Twist” (Gutenberg.org edition) with key “wonderful” for all key and seed choices, and recorded the dependence chi-square measure of the ciphertext letters, both contiguous and separated by seed distance (9 characters in this case). This is what I got, when a single direct or inverse transposition was added at the beginning (plaintext) or end (ciphertext) of the process. In all cases, values below 671 denote less than 10% chance of not being random.
Method | Contiguous-letter dependence | Dependence at seed distance |
---|---|---|
No transposition | 582.58 | 715.78 |
Direct at start | 623.09 | 625.97 |
Reverse at start | 631.10 | 778.58 |
Direct at end | 717.98 | 579.63 |
Reverse at end | 581.21 | 582.51 |
We can see that the transposition has an effect in all cases. A direct transposition at end has the result of moving the point of highest dependence from seed distance to contiguous letters if the transposition key length is equal to the seed length (not otherwise), this is because the transposition joins letters originally separated by the key length. Likewise, a reverse transposition at the start places originally contiguous, highly correlated plaintext letters into positions separated by the key length; if this equal to the seed length, the high correlation of ciphertext letters separated by this distance become all the greater. A reverse transposition at the end seems to lower all the correlations significantly, even for equal lengths of seed and transposition key, but then originally contiguous ciphertext letters end up placed largely at locations separated by a predictable distance, which is bad against a plaintext attack. The best choice overall, therefore, seems to be the direct transposition on the plaintext, which reduces the telltale statistics while protecting against a plaintext attack and does not introduce new problems. We have a winner so far, which from now on will be known as “Super Skink 1,” and which was already mentioned in this article.
Now, a large-scale war is not the same situation as two people secretly communicating with one another. In a large-scale application one has to worry about two extra considerations:
Specifically, a lazy operator may decide to skip the initial transposition in Super Skink. When the message is being deciphered, the operators on the other end will get the plaintext before the last reverse transposition, at which point they will stop. Everybody has saved time, but the message has become vulnerable to the known plaintext attack, thus endangering all messages transmitted that day. Because of this, if the cipher is to be used on a large scale, it is best to do the transposition at the end on the ciphertext, rather than at the beginning on the plaintext. Then decryption will have to start with a reverse transposition; if the enciphering clerk skips the final transposition on his end, then those deciphering will end up scrambling the ciphertext, and the decryption will fail. Now, this choice introduces a statistical artifact in the ciphertext if the transposition key has the same length as the seed, but in this situation the keys are issued by a central authority, who can make sure that those keys are of different lengths. Because in this case the transposition is done at the end, it can use with advantage the addition of a few nulls before the plaintext right before the first operation, which the JavaScript file linked below also implements via a button.
There is another powerful reason why it is better to do the transposition on the ciphertext rather than on the plaintext. Since transpositions do not change the actual letters in a text, but only their relative order, if the transposition is performed on the plaintext the partially decrypted message prior to the final reverse transposition will contain a highly biased set of characters. This makes it possible to know when the correct substitution set of keys has been used, even if the transposition key is completely wrong. Therefore, the key space represented by the transposition (26! possibilities, give or take a factor of 2 to account for keys shorter than 26 characters) does not multiply the substitution key space, but rather adds to it. This is an immense loss of key space that it is best to avoid.
So this is the final Super Skink cipher, specifically designed for the German High Command of World War 1:
All in all, this is not a lot more complex than ABC already was. Finding mixed alphabets for the substitutions (step 1) will add perhaps one minute per key. The operation in step 3 are equally complex to those in ABC, and those in step 4 take perhaps 50% longer per letter. There is only one transposition step, as in ABC. Super Skink could have been used to replace ABC if the Germans had had at their disposal the statistical tools that allowed us to optimize the method. Had they received it on time, it is likely that the Allies would have seen a complete blackout in their main source of intelligence, arguably leading to a very different outcome for the whole war effort.
]]>