Sounds like a play on words, doesn’t it? And yet, this is exactly what I mean. Sixty years ago, renowned mathematician John von Neumann, published a little trick that allowed using a biased coin, where heads and tails do not come out at 50-50, to generate a true, unbiased, 50-50 random sequence. It turns out that this trick can be extended to larger sets, such as alphabet letters, in order to generate what appears to be a true random sequence of letters (chaos) from common text (order, unless you’re starting from a political speech or your latest cellphone bill).

The result, as you probably have already guessed, is yet two more paper-and-pencil ciphers,**DicePad**, and **LetterPad**, that come dangerously close to perfect unbreakability.

This is how von Neumann fixes a bad coin, provided one throw does not affect the result of the next (that is, they are statistically independent):

- Throw the coin an even number of times and record the result. If we call Heads = 0 and Tails = 1, the result looks like a long binary number.
- Now take the digits in groups of two and make this conversion: 01 = 0, 10 = 1, 00 and 11 = no result.
- The resulting string, which will be at most one-quarter the length of the original, will have 0’s and 1’s in a truly random pattern and with frequency approaching 50%.

It is simple to understand how this works. If the throws are independent of each other, the sequences 01 and 10 will have the same probability since they only differ in the order in which the 0 and the 1 appear, which should be truly random if the throws are independent. This is true even if Heads is much more (or less) likely to appear than Tails. Say the probability of Heads is 90%, Tails 10%, then the probability of Heads-Tails is 0.9 x 0.1 = 0.09, and that of Tails-Heads is 0.1 x 0.9 = 0.09 also. The other two possibilities don’t work so well: Heads-Heads has probability 0.9 x 0.9 = 0.81, and Tails-Tails has probability 0.1 x 0.1 = 0.01, but this doesn’t matter because we are not using them.

Clément Pit-Claudel has extended this result to dice and published it on his blog. Now instead of taking the resulting numbers in groups of two, he takes them in groups of three and looks at the relative ordering of the numbers, rejecting any group where a number is repeated. For instance, the result 1,2,3 becomes 1, while 3,5,1 becomes 3. Since the final result depends on the order in which the low-middle-high numbers appear rather than the values themselves, and the tosses are supposed to be independent, then each possible permutation of three different low-middle-high numbers will appear with the same frequency, and there are 6 of them. Take only those and you’ve got perfect die performance.

Pit-Claudel does not mention permutations in that article, but this is what is at the heart of the process. Each permutation can be converted to a number by first converting it to a Lehmer code, which for a set of three throws would consist of three digits (0 to 2), counting the number of inversions from the natural order involving that throw. For instance, if the result is 3,5,1 (middle-high-low), the first digit of the Lehmer code would be 1 because the 3 is ahead of a smaller number once (5 is larger, but 1 is smaller, which would have appeared first in a natural ordering). The second digit is also 1 because the 5 is ahead of 1, which is smaller. The third and last digit is zero, which is always the case since nothing follows the last number. Thus the Lehmer code for 3,5,1 is 110.

Once the Lehmer code is found, it can be converted to decimal by multiplying the last digit by o! = 1, the second last by 1! = 1, third last by 2! = 2, fourth last by 3! = 6, and so on, and adding all the results. This is because Lehmer codes are factoradic numbers at heart. Thus for the example we get 1 x 2 + 1 x 1 + 0 = 3. The smallest result that can be obtained is 0, the largest 2 x 2 + 1 = 5, so these results work out exactly like dice whose faces are marked with numbers 0 to 5. Observe that numbering the permutations through Lehmer codes does not produce the same numbers as with Pit-Claudel’s scheme, but the expected frequencies are still the same.

The scheme can be extended to “coins” having more than two sides (say, fat coins that might fall edgewise) and “dice” with more than six sides, when used as source data. Suppose we have a three-sided coin like this, which can produce results heads=0, tails=1, and edge=2. On top of that, it is not a fair coin, so that the probability of heads is 70%, tails is 20%, and edge is 10%. We will take data in groups of two as before, but now we will assign the result “heads” if the assigned numerical value increases from first to second and “tails” if it decreases, and reject the pair if the two values are equal. Thus a “heads” result will be recorded for the sets heads-tails, tails-edge, and heads-edge, and “tails” will be recorded for tails-heads, edge-tails, and edge-heads. Then the probability of the result “heads” will be 0.7 x 0.2 + 0.2 x 0.1 + 0.7 x 0.1 = 0.23. The probability of the result “tails” will be 0.2 x 0.7 + 0.1 x 0.2 + 0.1 x 0.7 = 0.23 again, of course, because multiplication is commutative. Since the other results are rejected, “heads” will be recorded the same number of times as “tails” over a long run, which is a 50-50 split.

The Lehmer code scheme can also be extended to sets larger than 6. The next is permutations of four different symbols, which have 4! = 24 possible orderings, and then five symbols, with 5! = 120 possibilities. Four symbols comes close to alphabet size, and we will use it to encode messages with the help of a Tabula Recta. Next best is doing the dice trick twice so we have 6 x 6 = 36 different combinations, sufficient for 26 alphabet letters plus 10 decimal digits. Therefore I will discuss two ways to produce a near-random keystream starting from common text, which will be used as a one-time pad of sorts in order to encrypt a message with something approaching perfect unbreakability. The fact that there are 26 different letters, which is more than 6 or 24, shouldn’t bother us in light of what I discussed in the previous paragraph, but we have to worry about the strong correlation between letters.

This is because, if we start from common text as a source of randomness (which it does contain in relatively small amounts), we must deal not only with the fact that the letters appear with different frequencies (like imperfect coins or loaded dice), but also are not independent of each other. For instance, “Q” is nearly always followed by “U”, “X” almost never follows “K” or “B”, and so forth. There is a lot of order in speech, in the form of correlation between letters, which we must destroy first if we want to use the trick we’ve been dealing with.

Fortunately, the correlation between letters only goes so far. Can you predict what letter next sentence will begin with, even if you know the previous sentence perfectly? Can you predict the word that comes three words after the present one? If you could, you wouldn’t need to read hardly anything; all the meaning would be contained in the first few letters and the rest would follow from those. This correlation has been measured in different ways. This article, for instance, displays it in Figure 3 in the form of “conditional entropy,” which becomes smaller and smaller as we move away from the the root letter (“block length,” in that paper). After 8 to 12 characters, depending on the text being analyzed, the effect is zero within the bounds of statistics. Another very sensitive way to measure this correlation, which is implemented in the program shown below, is to calculate the chi-squared statistic of groups of two symbols separated by a certain distance, testing the hypothesis that the two symbols are independent of each other. If the hypothesis is correct, the statistic returns a low value that does not depend on the sample size. Here’s the program, in JavaScript language. It returns an array giving the chi-squared statistic (minus threshold value) with increasing distance between letters (first element: consecutive letters):

function indepTestArray(text,maxDistance){ text = text.toUpperCase().replace(/[^A-Z]/g,'').trim(); //the text should be clear of accents, etc. before this var base26 = 'ABCDEFGHIJKLMNOPQRTSUVWXYZ', output = [], freqArray = [], data,result; //first get frequencies for each letter for(var i=0; i<26; i++){ data = 0; for(var j=0; j<text.length; j++){ if(text.charAt(j) == base26.charAt(i)) data++ } freqArray[i] = data } //now get the chi-squared statistic, which will have a value smaller than 670.7 if the test passes at 10% error for(var l = 1; l <= maxDistance; l++){ //for each shift, do a 2-letter chi-squared result = 0; for(var i=0; i<26; i++){ //each first letter for(var j=0; j<26; j++){ //each second letter data = 0; var expected = freqArray[i]*freqArray[j]/text.length; //expected P(xy) = P(x)*P(y) if(expected > 0){ //in case a letter does not appear at all for(var k=0; k<text.length-l; k++){ if((text.charAt(k) == base26.charAt(i)) && (text.charAt(k+l) == base26.charAt(j))) data++ } result += Math.pow(data - expected,2) / expected } } } output[l-1] = result - 670.7 //negative means a pass at 10% error } return output }

I have applied this function to whole books from gutenberg.org (hundreds of thousands of characters), with the following result. The value for adjacent letters (shift 1) is always very high, meaning that adjacent letters are clearly not independent of one another. But the value drops as the distance between letters increases, and it typically dips below the threshold value for a shift of 14 and beyond, sometimes before. The same result is obtained both in English and Spanish. That is, letters that are 14 spaces apart from each other in regular text can be safely assumed to be statistically independent. Smaller spacings might work in practice, too, but this would have to be checked experimentally.

One way to ensure that we are using independent data is to take the source text in sufficiently large chunks. For instance, let’s say the source is “It was the best of times, it was the worst of times,” and we are going to take it in chunks of eight characters, ignoring spaces. We will wrap the text after eight letters until three rows are formed and then repeat with another block, obtaining the following:

ITWASTHE THEWORST BESTOFTI OFTIMES MESITWAS

where the second block is still incomplete. If we take the letters in groups of three vertically (I B M, T E E, and so forth), those letters are 8 spaces apart in the original text and stand a good chance at being independent of each other. Now, the letters in the first group have a fairly strong correlation with the letters in the second, since they are consecutive in the original text. This means that the result of evaluating a permutation in the first group might have a correlation with the result of doing the same in the second. In order to prevent this, a further step might be to scramble the letters in each row with different shifts. Let’s say we are going to leave the first row unscrambled, but that we will scramble the second by making 4 rows, and the third by making 2 (row numbers should divide the block length, 8 in this case). Therefore, we will write the second row over two columns and then read it off by columns, and the third row over four columns, and do the same. This is the process:

BE MESI ST TWAS OF TI

Resulting in “BSOTETFI” for the second row and “MTEWSAIS” for the third. In practice, this kind of shuffling can be carried out as the letters are first written down, by leaving spaces that are filled as more letters are added. The first block then becomes:

ITWASTHE BSOTETFI MTEWSAIS

And the groups, read vertically, are “I B M,” “T S T,” “W O E,” and so forth. Now we can look at the permutation within each group in order to obtain a “die toss” from each one. Thus, “I B M” is of the form middle-low-high, which gives Lehmer code 100, which translates to decimal 2 x 1 + 0 = 2. “T S T” has the letter “T” repeated, so it does not yield a die toss, “W O E” gives Lehmer code 210, which in decimal is 2 x 2 + 1 = 5, and so forth. In the end, this block gives the following base 6 number, where dashes represent the groups that had at least one repeated letter and were not used: 2-50–20.

Observe that we are using letters and alphabetic order rather than numbers from a die in order to run the scheme. Since the result is based on the relative order of the symbols rather than on their absolute values, it works just as well as we saw earlier. In fact, the larger the set from which the symbols are drawn (26, in the case of letters versus 6 for die numbers), the smaller the probability of a repeated symbol in the group and the more efficient the scheme becomes.

If we use groups of four symbols, then we can go all the way to base 24, affording the possibility of operating directly with text. The sentence in our example becomes this when arranged in groups of four lines of length 8 characters (only the first block is complete):

ITWASTHE OFTIMES BESTOFTI MESITWAS THEWORST

We’re not going to scramble the rows this time, but go directly to obtaining the base 24 permutation codes for each group of four letters, read vertically. Four-figure Lehmer codes are converted to decimal by multiplying the first figure by 6, the second by 2, and the third by 1 and adding all of this together. Thus the first vertical group “I B M T” gives Lehmer code 1000, which translates to decimal 1 x 6 + 0 x 2 + 0 = 6. The second and third groups include repeated letters so they won’t be used. The fourth group “A T I W” gives Lehmer code 0100, which in decimal is 0 x 6 + 1 x 2 + 0 = 2, and so forth. This is what the first block yields, with dashes again representing skipped groups: 6 – – 2 – 13 10 0.

And this is it. We have extracted chaos out of order. This is real uncertainty, not something that looks like it as in pseudo-random number generators because the source text does contain some information entropy that has been concentrated through the process. Some quick calculations: a base 6 random sequence contains log 6/log 2 = 2.585 bits/digit of entropy; the first block produced 5 digits, so that is 12.92 bits of entropy, was there enough entropy in the source text to justify this? Common English text averages 1.58 bits/character and we had 3 x 8 = 24 characters, for a total of 37.92 bits of entropy. So yes, we are not “creating” entropy that wasn’t there originally. The numbers for the base 24 example are: log 24 / log 2 = 4.585 bits/number for a total of 22.92 bits of entropy, and we started from 32 characters of text, which are expected to contain 50.56 bits of entropy. Even if no rejects had taken place, the entropy of the result would not have exceeded the input entropy, in both cases.

We can now move on to describe a couple new ciphers based on this algorithm. The first one, which I am calling DicePad, operates in base 6, and the second, called LetterPad, uses base 24 and a Tabula Recta for addition and subtraction. The links lead to Javascript versions of either one, which also explain the steps in detail. The only difference between these ciphers and what we have seen above is that the Lehmer codes obtained in the ciphers count the times that letters are in the normal order rather than in inverted order, which I feel is easier to do in the head. Statistical properties are not affected by this change.

Here’s how to encrypt a message with **DicePad**:

- Take a sufficiently long piece of key text (about five times the length of the message to be encrypted), which
*will never be used again*for another message, and strip everything except letters and numerals. Do the same with the message. - If key text-derived encoding is to be used (highly recommended, for it makes the ciphertext non-malleable), proceed to generate the encoding Polybius square this way:
- Take the first sentence of the key text and start writing different letters and numerals in the order they appear.
- When we come to the end of the sentence, write the rest of the numerals in normal order (0 is first), followed by the rest of the alphabet in reverse order.
- Fill the 6 x 6 Polybius Square with the result, row by row. Row and column headings are the numerals 0 to 5.

Example: for key text “It was the best of times, it was the worst of times.” the sequence obtained is: ITWASHEBOFMR0123456789ZYXVUQPNLKJGDC, leading to this encoding square:

0 1 2 3 4 5 ------------ 0 | I T W A S H 1 | E B O F M R 2 | 0 1 2 3 4 5 3 | 6 7 8 9 Z Y 4 | X V U Q P N 5 | L K J G D C

If the square is not to be based on the key text, fill the square by rows with the digits 0 to 9, followed by the straight alphabet.

- Encode the message using the square, which will result in two digits per character. If the plaintext message is “Eve”, this results in encoded message: 10 41 10.
- Break up the key text into blocks comprising three rows, as seen above. Let’s say we use rows of 8 characters each which yields the result shown above.
- Optionally, shuffle each row with a constant period, also as shown above. The row length (8 in our case) is the least common multiple of the shuffle periods. We won’t shuffle for this example but it is recommended because it improves the statistical randomness of the keystream. Usually one of the rows will be left unshuffled, and the rest shuffled by short, different periods that divide the row length exactly (say, 2 and 4 in our example).
- Now take the letters (and digits) in vertical groups of three letters and compute a result for each group this way, starting from an initial value of zero:
- If the second or third letters follow the first in alphabetical order (digits come before letters), add 2 for each
- If the third letter follows the second in alphabetical order, add 2.
- If at least two letters or digits are the same, skip that group.

- The result of the previous step is the keystream, which is in base 6. We repeat the steps above until the number of keystream digits produced is at least the length of the encoded message. The last step is to subtract the encoded message from the keystream, neglecting any carries so it can be done from left to right, and stopping when the last digit of the encoded message is used. Bear in mind that this is a base 6 operation, so we must add 6 every time the result would have been negative. Examples: 4 – 2 = 2, 5 – 1 = 5, 1 – 5 = 2, 2 – 4 = 4. The result is the base 6 ciphertext.
- Optionally, decode the base 6 ciphertext back to letters and digits in groups of two, using the encoding square.

To decrypt the ciphertext the recipient, who knows the key used for encryption, performs steps 1 through 7 exactly like the sender, except for encoding the message. If the ciphertext came as letters and numbers rather that in base 6, it must now be encoded into base 6 by means of the Polybius square. He/she now subtracts the base 6 ciphertext from the keystream, resulting in the encoded message, which is then decoded back to letters and digits.

And here’s how to encrypt a message with **LetterPad**:

- Take a sufficiently long piece of key text (about five times the length of the message to be encrypted), which
*will never be used again*for another message, and strip everything except letters and numerals. Do the same with the message. Additionally, the message must use a 24-letter alphabet, so convert all J’s to I’s and all Q’s to K’s, for instance. Any numerals in the message are to be encoded as letters; for instance, 0 = A, 1 = B, etc. - If a key text-derived alphabet is to be used (highly recommended, for it makes the ciphertext non-malleable), proceed to generate it this way:
- Take the first sentence of the key text and start writing different letters in the order they appear, using the substitutions above so the alphabet contains only 24 letters.
- When we come to the end of the sentence, write the rest of the letters in reverse order.
- Place that alphabet at the top, left, and right of an otherwise normal Tabula Recta.

Example: for key text “It was the best of times, it was the worst of times.” the alphabet obtained is: ITWASHEBOFMRZYXVUPNLKGDC, leading to this Tabula Recta, which also contains digits on the left in order to convert between decimal and base 24:

I T W A S H E B O F M R Z Y X V U P N L K G D C ----------------------------------------------- 00: I | A B C D E F G H I K L M N O P R S T U V W X Y Z | I 01: T | B C D E F G H I K L M N O P R S T U V W X Y Z A | T 02: W | C D E F G H I K L M N O P R S T U V W X Y Z A B | W 03: A | D E F G H I K L M N O P R S T U V W X Y Z A B C | A 04: S | E F G H I K L M N O P R S T U V W X Y Z A B C D | S 05: H | F G H I K L M N O P R S T U V W X Y Z A B C D E | H 06: E | G H I K L M N O P R S T U V W X Y Z A B C D E F | E 07: B | H I K L M N O P R S T U V W X Y Z A B C D E F G | B 08: O | I K L M N O P R S T U V W X Y Z A B C D E F G H | O 09: F | K L M N O P R S T U V W X Y Z A B C D E F G H I | F 10: M | L M N O P R S T U V W X Y Z A B C D E F G H I K | M 11: R | M N O P R S T U V W X Y Z A B C D E F G H I K L | R 12: R | N O P R S T U V W X Y Z A B C D E F G H I K L M | Z 13: Y | O P R S T U V W X Y Z A B C D E F G H I K L M N | Y 14: X | P R S T U V W X Y Z A B C D E F G H I K L M N O | X 15: V | R S T U V W X Y Z A B C D E F G H I K L M N O P | V 16: U | S T U V W X Y Z A B C D E F G H I K L M N O P R | U 17: P | T U V W X Y Z A B C D E F G H I K L M N O P R S | P 18: N | U V W X Y Z A B C D E F G H I K L M N O P R S T | N 19: L | V W X Y Z A B C D E F G H I K L M N O P R S T U | L 20: K | W X Y Z A B C D E F G H I K L M N O P R S T U V | K 21: G | X Y Z A B C D E F G H I K L M N O P R S T U V W | G 22: D | Y Z A B C D E F G H I K L M N O P R S T U V W X | D 23: C | Z A B C D E F G H I K L M N O P R S T U V W X Y | C

If the Tabula Recta is not to be based on the key text, place the straight 24-character alphabet at top, left and right.

- Break up the key text into blocks comprising four rows, as seen above. Let’s say we use rows of 8 characters each which yields the result shown above. There is no need to delete the digits or reduce the alphabet to 24 letters.
- Optionally, shuffle each row with a constant period, also as shown above. The row length (8 in our case) is the least common multiple of the shuffle periods. We won’t shuffle for this example but it is recommended because it improves the statistical randomness of the keystream. Usually one of the rows will be left unshuffled, and the rest shuffled by short, different periods that divide the row length exactly (say, 2, 2 and 4 in our example).
- Now take the letters (and digits) in vertical groups of four letters and compute a result for each group this way, starting from an initial value of zero:
- If the second, third, or fourth letters follow the first in alphabetical order (digits come before letters), add 6 for each.
- If the third or fourth letters follow the second in alphabetical order (digits come before letters), add 2 for each.
- If the fourth letter follows the third in alphabetical order, add 1.
- If at least two letters or digits are the same, skip that group.
- Convert each result to base 24 letters using the pattern on the left side of the Tabula Recta. For instance: 2 = W, 13 = Y, etc.

- The result of the previous step is the keystream, which is in base 24. We repeat the steps above until the number of keystream digits produced is at least the length of the message. The last step is to subtract the message from the keystream, letter by letter from left to right using the Tabula Recta, stopping when the last message letter is used. Subtraction with the Tabula Recta is done this way: find the second letter (the one being subtracted) on the top and go down that column until the first letter is found, then follow that row to left or right to find the result. Examples with the given Tabula Recta: T – R = E, O – I = Y, O – P = K, A – R = Y. The result is the ciphertext.

To decrypt the ciphertext the recipient, who knows the key used for encryption, performs steps 1 through 5 exactly like the sender, except for encoding the message. He/she now subtracts the ciphertext from the keystream using the Tabula Recta, resulting in the original message (after reduction to 24 characters and numbers encoded as letters).

As you can see, LetterPad involves fewer steps than DicePad, but keystream generation is a little more involved. Subtraction with a Tabula Recta is also a little more complicated. On the other hand, the length involved is half, which may more than make up for the additional complexity. In my tests, I have found that LetterPad seldom needs shuffling of the key text in order to produce a keystream of good randomness, while DicePad sometimes does (we’re talking about 2-character Chi-squared statistic for messages containing thousands of characters; no problems appear for messages in the hundreds of characters) probably due to involving a larger number of uncorrelated symbols, so LetterPad has an edge here.

The Javascript versions of DicePad and LetterPad include an optional step after the keystream has been generated, consisting of a single-row Lagged Fibonacci Generator (LFG) in order to further improve the randomness of the keystream. To do this, repeat the last digit or letter of the keystream below the first and add base 6 (DicePad) or subtract first from second (LetterPad) every two numbers or letters in the vertical direction, placing the result on the right of the bottom number or letter until the bottom row is filled. That bottom row becomes the new keystream. The LFG step helps to improve the statistical quality of the keystream if the key text used is too short so it it is repeated at least partially. Its effect, like that of shuffling, is most noticeable for very long messages.

Let me finish this post with some musings concerning the security of these two ciphers. In order to do this, I will put on my hacker’s hat a pretend that I’m trying to break an encrypted message in various ways. I know the encryption method used but I don’t know the key text. Sometimes I may actually know the plaintext, and then what I’m trying to do is change the ciphertext so it decrypts to something else, or try to guess the source of the key text so I can break other encrypted messages.

**Brute force**. This can be done in two different ways:- I try every possible key text until I find one that “works,” meaning that the plaintext resulting from subtracting the ciphertext from the keystream derived from the guess makes sense in English or whatever language I suspect it was written in.
- I try every possible plaintext until the keystream resulting from subtracting the plaintext from the ciphertext “makes sense” as something that has derived from common text, possibly because of flaws in the keystream-generation process.

Method 1 is not going to work because the key text space is extremely large, much larger than the plaintext space, and has more entropy than a random string of plaintext length. Per Shannon’s one-time pad theorem, I’ll always be able to find a key text that leads to any plaintext of the given length at all. Method 2 fares a little better because the keystream is generated by blocks. Say I know the block length; then I’ll try likely plaintext blocks of this length and obtain the corresponding piece of keystream by subtracting the plaintext from the ciphertext. That keystream piece likely will look like gibberish, but if it isn’t perfectly random and I know that the keystream-generating process has some flaws (for instance, some residual correlation between consecutive letters, as manifested by the two-letter chi-squared value), then I might be able to tell a bad guess, where the “keystream” piece obtained passed the 2-letter chi-squared test, from a decent one, where the result of the test is what I expect. The problem with this approach is that the blocks most likely won’t be long enough to provide meaningful statistics. Even in base 6, the minimum row length I’ll need is of the order of 6 x 6 x 5 = 180, which is unlikely to be used because it is too long. If care was taken to eliminate all correlation in the keystream (say, by appropriate shuffling of of sufficiently long key text blocks), then this method will never yield anything.

**Known plaintext**. Let’s say I know the whole thing; can I get the key text, and so find its source so I can crack any subsequent ciphertexts from the same source? This would first require obtaining the keystream, which would be trivial if I knew the Polybius square (DicePad) or the headings of the Tabula Recta (LetterPad), which is the situation if the straight alphabet is used instead of one derived from the key text. If the alphabet derives from the key text this would be quite difficult, since I have no way to know which encoding is the correct one until the key text is found. But let’s say I make a bunch of guesses and try them all. For DicePad, I’d have to make 36! = 3.72E41 guesses, for LetterPad only 24! = 6.2E23. In any case, these are very large numbers. Even if a computer manages to check a billion every second, checking all the LetterPad alphabets would take 620 trillion seconds, which even if it gets distributed over millions of CPUs is many times the age of the Universe. And then, after I get the keystream, I’d have to invert it in order to find the key text. But the keystream generation process loses a lot of entropy because of the skipped groups and the reduction from full alphabet to essentially three or four symbols subject to a permutation, meaning that I’d have to make further guesses concerning those in order to set up those permutations. Very quickly I’ll find myself trying every possible key text, which is equivalent to brute force and, as we saw above, doomed to fail.**Change plaintext**. But what if I don’t want to invert the keystream but only change the plaintext? This would be trivial if the straight alphabet has been used, but otherwise I won’t know how to subtract the plaintext, as we saw above. Even worse, I must be right on the first try, so guessing is not an option. Might as well forget it.**Reused key text**. This means that I manage to get the sender to use the exact same key text for two different messages, which is something that should never be done. The keystream, therefore, will be identical, and I can subtract it out by subtracting the two ciphertexts, leaving the sum of the two ciphertexts. If the default square or straight alphabet were used, I only have to make educated guesses as to what words might be in one of the messages. Then I place the word at different shifts from the start and subtract it from the combined plaintexts. If the word is there and I have guessed the right location for it, the result will also be a word, or part of it. Then I can complete the word and subtract it, giving me another word, and so forth until I have recovered both plaintexts. If, on the other hand, a key-derived square or alphabet was in use I would have a lot of trouble separating the two plaintexts, for I would not know how to encode my guesses (DicePad) or would not be able to make the subtraction correctly to begin with (LetterPad).**Errors**. I change a few letters here and there in the ciphertext so the recipient cannot decrypt it (actually, not much of an attack, but it may happen spontaneously; the sender or the recipient may also make mistakes that amount to the same). If this were a typical computer-based cipher, any change typically leads to a complete decryption failure, but here a single-letter error on the plaintext or ciphertext does not propagate to the rest, which remains unaffected. A switched letter on the key text will lead to a single-letter error if there is no change on whether the group using that letter gets used or rejected by the keystream-generating algorithm, but will garble the rest of the block if it does. An addition or omission may garble the rest of the message after that point, but the recipient will be able to detect it and possibly correct the error so the rest of the message can be decrypted.**Physical break-in**. I enter the sender’s and the recipient’s houses and record the names of all the books in their possession. Since both must have the same key text, any repeated books will give me an excellent indication of what book they are using to draw the key text from. Unless, of course, they are not using any of those but rather an equally available Internet source, a newspaper, or some other source I cannot detect so easily. I might have no choice but to actually tie one of them to a chair and apply some physical persuasion until the title is revealed, at which point I will have killed all future usefulness of my efforts.

I know I’m coming to this post 5 years later, but I think you should clarify that Clément Pit-Claudel’s approach is looking at generic inequalities, as your example of “1, 2, 3” becoming “1” and “3, 5, 1” becoming “3” could lead the reader of this post to assume that the leading toss determines the result, which isn’t true (assuming they didn’t click through to read Clément’s post).

Instead, it’s a mapping of the orderings of “high”, “mid”, and “low”:

low, mid, high = 1

low, high, mid = 2

mid, high, low = 3

high, mid, low = 4

high, low, mid = 5

mid, low, high = 6

Thus, “1, 2, 3” is 1, because “low, mid, high” = 1. A toss of “2, 3, 4” would also yield “1” for the outcome. And “3, 5, 1” is 3, because “mid, high, low = 3”. A toss of “4, 5, 1” would also yield “3” for the outcome.

In fact, Clément’s post is just a lookup table of the results of Lehmer code, and lookup tables are more efficient than calculations at the cost of storage. Thus, applying the Lehmer code approach of starting with the most significant die toss and comparing to the right-adjacent die tosses, I would have mapped Clément’s table as:

low, middle, high = 000 = 1

low, high, middle = 010 = 2

middle, low, high = 100 = 3

middle, high, low = 110 = 4

high, low, middle = 200 = 5

high, middle, low = 210 = 6

Then no calculations are needed. Anyway, just a thought.