Snake cipher

439_1Not too long ago, I introduced the TripleText cipher, which gets dangerously close to perfect encryption while remaining very simple. Well, it just got even simpler, almost like playing a game. Because using this cipher is so much like playing the old “snake” video game, I’ve called it just that: Snake

Under the hood, Snake is still basically TripleText: take a piece of text three times the length of the plaintext you want to encrypt, split it into three parts of equal length, add them modulo 26 either by hand or with the help of a Tabula Recta, then subtract the plaintext from it so that it can be recovered by going through the exact same steps with the resulting ciphertext. The problem is that adding letters with a 26×26 Tabula Recta is a bit of an exercise in precision; you need to find one of the letters on the right, then the other at the top, and then follow their respective row and column until their intersection, which takes two fingers and maybe the chameleon-like ability to look at two places at the same time. Not very ergonomic, and on top of that this needs to be done twice for each letter. The last step, a subtraction, is actually easier on the eyes: find the letter to be subtracted on the right, follow that row until the first letter is found, then go up that column to read the result at the top.

So, what if we do only subtractions rather than additions? The result is Snake, which has the added feature that the operations can be chained so that you only “crash” to the top or one of the sides when you have involved all the letters for that operation. The keystream from which the plaintext is subtracted is actually the first piece of key text, minus the second, plus the third, rather than the sum of all three, but its statistical properties are just as good. I have made a JavaScript version of it that displays them along with the result, in case you are curious. The program allows you to use a factor other than three (though it must be an odd number) for the key text length, but here I will describe the process using the default triple size.

Here’s how it goes, step by step, with an example:

  1. Get the plaintext you want to encrypt and take out spaces and punctuation. Convert any numbers to letters using a standard code such as 1 = A, 2 = B, etc. The result is the processed plaintext, which contains only the letters A to Z. Write it on square-ruled paper so you can write four more lines of identical length below it. Let’s say your plaintext is “Hello world,” which converts to “HELLOWORLD” (10 letters).
  2. Now go to your shelf (electronic is fine) and select a book that your friend on the other side also has. Pick a paragraph at random that you have never used before for this purpose, and start writing the letters below your processed plaintext, also skipping spaces and punctuation and converting numbers to letters. Stop when you have filled three extra lines of the same length as the processed plaintext. Let’s say you are using Herman Melville’s “Moby Dick” starting from the first paragraph of chapter 1. This is what you will get:
    HELLOWORLD
    CALLMEISHM
    AELSOMEYEA
    RSAGONEVER
  3. encrypting-passwords-with-old-school-tabula-recta.w654 (1)Now get a conventional 26×26 Tabula Recta, just like the one on the right, or this one that you can print for yourself, to do the operations that follow: At this point you have an option: either keep the straight alphabet at the top and sides, or replace them with a scrambled alphabet based on the key text or something else you have agreed on with your friend. If you want to use the key text for this, start writing any new letter as they appear in the first sentence (up to the first period) of the key text, followed by the rest of the letters in reverse alphabetical order so that Z isn’t likely to be the last letter, and so forth. For our key text, this would be: “CALMEISHZYXWVUTRQPONKJGFDB” We are not going to do this in the example, but I recommend it because it hardly adds any extra work and you get very nice advantages from doing it. First, the encryption is even harder to reverse without the key because the letter frequencies of the plaintext will be different from those of the keystream (what is added to the plaintext so it becomes encrypted). Second, an attacker won’t be able to obtain the keystream even if the plaintext is known or compromised. Third, the encrypted message becomes non-malleable, meaning that an attacker won’t be able to modify the ciphertext so it decrypts to something of his/her/its choosing.
  4. And now it’s time to play Snake. Take each column in the plaintext plus key text you wrote earlier and do this with the letters: find the first letter on the right of the Tabula Recta and follow that row until you find the second letter; then follow that column up or down until you find the third letter; then follow that row left or right until you find the fourth letter; finally follow that column up and write down the letter at the top. In our example, we find H on the left and go to the right until we see C near the right edge, then up that column a couple of spaces to A, then left to R, and finally up to M, which we write down. For the second letter: E right to A, down to E, left to S, up to K. The third one is trickier: L right to L, which is on the left edge, then up to L, which means not to move, then right to A, and up to P. This makes things simpler, actually: when a letter is repeated, we don’t move and proceed to the next letter. Instead of starting on the right and ending at the top, you can start at the top and end on the right. Your pick, the result will be the same. In the end, we have written “MKPOYJUYWA”, which is the ciphertext. The picture below shows the “snakes” we made on the Tabula Recta with the first three groups of letters:

snake2 (1)

You can now send out the ciphertext along with the start location of the key text within the book (but NOT the title of the book, obviously). Your friend on the other end will follow exactly the same process, except that he/she will write the ciphertext in place of the plaintext. The result will be the processed plaintext.

As I mentioned above, I have written a little web app that does this automatically, should you want to accept it. But you can do it by hand if you don’t have a computer handy. I have put a standard Tabula Recta in this page, to save you some effort, and the same plus instructions, here. Just print them and make as many copies as you want. Make sure you destroy copies after using them.

Now, time to put on the hacker’s hat and try to break this cipher, which means at least one of these:

  1. I have only the ciphertext and I manage to obtain the plaintext, possibly obtaining the key text as well.
  2. I also have all or part of the plaintext, and this allows me to obtain the key text, and possibly the rest of the plaintext.
  3. I have some information about the plaintext, like where key words or numbers are located, and I manage to alter the ciphertext so it decrypts to something of my choosing.

Ciphertext-only analysis is often attempted by brute force, that is, trying every possible key until one works. While this often works with conventional user-chosen password, the entropy of the key text is just too high to attempt this. There’s just too many different texts that could be used as key. The only situation where this might work is if I know all the books on the sender’s or the recipient’s shelves, and so I start from a fairly small key space.

Another method is to make guesses of plaintext words, and see the result when they are subtracted at different points. If the cipher had used just one piece of key text that is combined with the plaintext, rather than three pieces, this would have revealed words or parts of words, and then I’d know I’m on the right track and I’d be able to guess the rest. Unfortunately, the snake cipher keystream (ciphertext minus plaintext) has an imperfect but still very good randomness so that I’d need a very long piece of it, amounting to tens of thousands of characters, so it begins to show some non-random characteristics, such as statistically significant greater frequencies for some letters than for the rest. I am not likely to guess right a piece of plaintext this long. Therefore, I’d be stuck not knowing if any of my guesses are right.

But what if I happen to know the plaintext of a particular message, which then I subtract from the ciphertext in order to obtain the random-looking keystream? Would I then be able to obtain the actual key text or a part of it, and thus deduce the source so that other messages that extract their key text from the same book would be compromised? This is equivalent to reversing the sums and subtractions done on the Tabula Recta. For each output letter, there would be 26 possibilities for the letter in the fourth row, and for each of those another 26 in the third row. The first row letter then becomes known. That is 26×26 = 676 possibilities to try. Now, those letters are not contiguous in the source, but rather separated by a distance equal to the plaintext length. The letters of text written in any language lose all correlation when they separated by fifteen to twenty letters, which is typically less than the interval we are dealing with here. We can expect, therefore, that the three letters combined to make any keystream letter are completely independent of one another, so that even after taking into account single-letter frequencies, we end up with a large set of three-letter groups that could have conceivably produced a given keystream letter.

Another way to look at the last problem: English text has an average entropy of 1.58 bits per character  after spaces are removed. Guessing two whole rows in order to obtain the third one from the keystream is equivalent to guessing passwords of length 1.58*2*(plaintext length) bits. Remember that each row is independent, and therefore guessing one row does not mean we have guessed any of the others. For a 20-character plaintext, that is already 63.2 bits. We reach the still-impossible-for-today level of 128 bits at about 40 plaintext characters. Shorter messages have difficulties of their own, too, since statistics run on the keystream will fail due to the small sample size, leaving only brute force as a possibility.

And if a scrambled alphabet has been used as recommended, then I won’t even know the real keystream, but one of 26! = 4 x 10^26 possibilities. And this effect does not really cancel out with the combination of letters, as it would if the sender had scrambled using itself as new alphabet the first row of the key text, because the substitution was not applied to the key text, but to the plaintext and then again to the ciphertext. In other words: forget about it.

The only attack where I might get somewhere is changing the ciphertext so it decrypts to something else. This is called malleability and it is a problem with all stream ciphers like this one. If I know the location of an “A” in the plaintext, I only need to increment that position in the ciphertext by one, so it decrypts to  a “B”. That is, assuming no substitution is in use. If it is, then likely every “A” has been turned into something else and every “B” into another letter that is no longer consecutive with the replacement for “A”. Since I cannot afford to make any guesses when doing this attack (wrong guesses lead to unreadable plaintext when decrypted), then I’m once again foiled.

Not bad for a cipher that you can do on a piece of paper and that feels like playing a game. Check it out and tell us your experience in the comments. Perhaps this is the beginning of the end for the age of surveillance.

Leave a Reply

Your email address will not be published. Required fields are marked *