BookPad encryption

© Francisco Ruiz, 2015

This page implements the paper-and-pencil "BookPad" cipher by F. Ruiz for those who wish to use a computer as a convenience. All steps can be performed by hand without excessive effort. The process is described in detail in this article.

BookPad uses normal text taken from a book or similar source to generate a one-time pad of sorts (hence the name), which is then used to encrypt a message. Theoretical unbreakability is approached when the key text has the same entropy as a random string of equal length as the plaintext message (after encoding). BookPad uses a straddling checkerboard for encoding into decimal digits, which have entropy log(10)/log(2) = 3.32193 bits per digit. The straddling checkerboard produces encoded decimal strings that are typically 1.33 times the length of the original English text (similar for other Western languages). Since English text, according to Shannon, has an average entropy of 1.58 bits per character, meeting the information theoretic criterion for perfect secrecy would require a key text that is 3.32193 x 1.33 / 1.58 = 2.8 times the length of the plaintext to be encrypted. To make sure that sufficient entropy is collected, BookPad takes a piece of encoded key text that is three times the length of the encoded plaintext, then hashes it to message length by adding the three parts before it optionally generates a keystream by a lagged Fibonacci generator.

The process is nearly identical for encryption and decryption. The difference is that ciphertext might have an extra digit that would not be there if it was encoded plaintext; this extra digit is decoded back as ! or ?. Numbers in plaintext are converted to letters and then encoded.

 

Step 1. Encoding set-up

To cover the case where the encoding pattern depends on the key text taken from a book or similar source, let us enter the key text ahead of everything else, in the box below, which is shaded blue like all the other boxes where you can enter something.

Key Text

We will encode this into decimal digits later on. For the time being, we need to decide whether the default "Arsenio" checkerboard encoding is to be used, or a similar one derived from the key text. This is the Arsenio checkerboard:

    1 2 3 4 5 6 7 8 9 0
   --------------------
  | A R S E N I O +
9 | B C D F G H J K L M
0 | P Q T U V W X Y Z =

where "+" designates a space and "=" a period or other important punctuation. To use the checkerboard, simply replace each letter, space or strong punctuation with a single-digit (top) or two-digit code (first left, then top digit), as shown on the checkerboard. Examples: s = 3, h = 96, p = 01.

A key-derived checkerboard would be made this way: count the number of letters in the first two words of the key text (mod 10); these become N1 and N2; if there is only one word or all the words have equal length, take N2=N1+1 mod 10. Then order the letters in "arsenio" plus the space, as they appear in the first sentence of the key text (before the first period) and assign to them the single numerals 1-9,0, skipping N1 and N2. Do the same for the rest of the letters in the English alphabet, placing them in the two rows headed by N1 and N2, then follow with the '=' character, and then the letters that are not in the key text, in reverse alphabetical order (if they are part of "arsenio", they go on the top row, in reverse "arsenio" order). The encoding pattern is displayed below (+ represents a space, = a generic punctuation mark).

Encoding Pattern

Settings

Let us now tell the program what we want to do by checking one of these two buttons:

     Encrypt     Decrypt

And whether or not we are basing the encoding on the key text (if we use key-based encoding, a MAC is not needed):

     Default encoding     Key text-derived encoding

The next setting determines whether or not a lagged Fibonacci generator will be used to make the keystream (should be yes if the key text is short)

   No LFG    LFG

Now that we are checking boxes, we might as well tell the program whether or not a shift and/or MAC are to be embedded in the ciphertext, plus the MAC length.

     Separate shift     Embedded shift   (the number of digits depends on message length)

     Separate MAC     Embedded MAC    Number of digits (1-9)

 

Step 2. Plaintext encoding / Ciphertext preparation

Now we write the plaintext that we wish to encrypt, which will be converted to lowercase. Punctuation marks other than periods, colons, exclamations and interrogations are ignored. Spaces are encoded as high-frequency letters. Diacritical marks (accents) are ignored. If there are any numbers, they are first converted to letters as in a = 1, ... i = 9, j = 0. Plaintext numbers are not decoded back upon decryption, but hopefully the user can spot them easily.

Plaintext / Ciphertext as letters

And this is the same text, encoded as decimal digits. If now you type into the Encoded Plaintext box, its contents are automatically decoded and the result placed in the Plaintext box. When decrypting a ciphertext already encoded as digits, we start with the next box.

Encoded Plaintext / Ciphertext as digits

If a key shift and/or MAC are embedded in ciphertext, they are now extracted and placed in the shift box below and the embeded MAC box at the bottom of the page. This is not done for encoded plaintext.

After shift and MAC extraction

 

Step 3. Keystream generation

The next step is to generate the keystream. Since the key text may be shifted, we need to enter the shift, if any, in the box below.

Key shift

 Key shift: Possible values are 0 to encoded plaintext length

This number will be used to shift the first and second thirds of the piece of key text we use in order to do the encryption. To generate the keystream, we first encode the key text into decimal digits and take a piece equal to three times the length of the encoded plaintext. Then we split the result into three parts of equal length, and shift the first and second cyclically to the left by as many digits as shift1 and shift2, respectively, where shift1 and shift2 are respectively the remainder and the integer result of dividing the number in the Key shift box by the number of digits of the encoded plaintext. Then we add the three parts digit by digit without carries in order to obtain the seed. If the key text is not long enough, it will be repeated and a warning will appear below this line.

This is where the warning will appear

Encoded Key Text

Encoded Key Text parts after shift

Seed

If the No LFG checkbox above is checked we skip the following step and use the seed as keystream. But if the LFG checkbox above is checked, the seed is used to initialize a lagged Fibonacci generator, where each digit after the seed is the carryless sum of the previous digit and the one located a number of spaces equal to the encoded plaintext length before the current digit. This is best done by rows, adding the numbers immediately above and to the left of the one we are filling. We stop when three rows are completed. The process is initialized by repeating the last digit of row one as the first in row two.

And take from the end a number of digits equal to the length of the encoded plaintext to obtain the keystream.

Keystream

Information about keystream quality will appear here

 

Step 4. Encrypted Ciphertext / Decrypted Plaintext

Finally, we subtract, again without carries, the encoded plaintext (encoded ciphertext, when decrypting) from the keystream, resulting in the raw ciphertext below (plaintext, when decrypting), which is ready to be sent out. The next couple steps only serve to harden the encryption further, or decode the encoded plaintext. Because this may be output, the cells are shaded green.

Raw Ciphertext / Encoded Plaintext

The shift, if used, can be hidden within the ciphertext rather than sent in the plain, as set by a radio button near the top of this page. The same thing goes for the MAC. We will assume they are inserted after a number of digits equal to the character count in the first two words of the key text (or the first only, if there is only one word), so those who know the key text can extract it easily. This step is not taken when decrypting; instead, the shift and MAC are found and extracted before generating the keystream.

Ciphertext with embedded shift and MAC

And finally, we decode the result back to letters using the encoding pattern. When encrypting, it is possible to find a single N1 or N2 digit at the end. In this case, we convert N1='!' and N2='?'

Text-based Ciphertext / Final Plaintext

 

Step 5 (optional). Message Authentication Code (MAC)

We were basically done in the previous step, but there is still something we can do in order to prevent an adversary from altering an encrypted message without knowledge of the key text. We can make a Message Authentication Code (MAC) based on the plaintext and a suitably large piece of the key text (three times the length of the MAC, as encoded into digits). To do this, we first pick a piece of key text that has not been used yet and encode it, resulting in what is in the box below.

A warning will appear here if there is not enough key text

Then we take the encoded plaintext and append the above to the end of it, and then the shift, if there is any, resulting in this:

Now we divide it into chunks of the length of the MAC (which is set near the top of the page; default is 3 digits), padding it with zeros at the end so all the chunks are of equal length, and then we add them all without carries. The result is below, and it should match the MAC attached to the ciphertext. If now we are decrypting and the MAC was embedded in the ciphertext, the embedded MAC is displayed in the last box so we can compare the two:

Calculated MAC:    

Embedded MAC: