The idea is the same as in BookPad, Letteracci, DicePad, and all the rest: use a large source of entropy (a piece of text, in the case of the other ciphers, a whole file, in FilePad) in order to obtain the Vernam-type encryption of another piece of data (Gilbert Vernam is the guy in the photo). Since common text is not perfectly random, additional manipulation is needed in order to turn it into something that will pass as random, which is what the Vernam cipher needs. Regular files are not random either, although many have a lot of compression and therefore are close to looking random, but we can manipulate them using the same tricks as for common text, operating on binary bits rather than regular numbers or letters.
Here are the tricks that are used in FilePad, which you can mix at will:
- Von Neumann extractor: take bits in pairs, if they are different, take the first (or the second, but do so consistently), if they are equal, ignore them and go to the next pair. This algorithm will produce unbiased bits—that is, ones are as likely to appear as zeroes—provided the original bits are uncorrelated, which means that any pairs of bits, separated by any distance, have the same probability. This operation reduces the number of bits to one-fourth the original or less.
- Columnar transposition: write down the bit stream in rows of a given length, then read the result by columns. This separates bits that are correlated because they come from certain bytes or characters, so that adjacent bits have little correlation (so that the von Neumann extractor can be applied to them). This operation conserves the number of bits. Separation is maximized if the row length is the square root of the string length.
- Binary inflation: since the von Neumann extractor only requires that the bits in each pair be uncorrelated, we can reuse some bits as long as the new pairs are different from the previous ones. One simple way to do this is to take the first (or second, but be consistent) bit of each pair and append it to the end of the stream, whether or not they are equal. This operation produces a string that is double the length of the original, less one.
- Lagged Fibonacci generator (LFG): take the last bit and write it under the first bit, and then make a new second row by writing down the result of XORing the top and bottom bits immediately to the left. The process can be repeated as many times as one desires, always resulting in a string of the same length as the original. The effect of this is that every original bit affects all bits that follow, if applied once, or all bits in the resulting string, if applied twice or more times.
There is a preferred order to these operations. The von Neumann extractor, if used, needs to be the last operation because it is the only one that is mathematically guaranteed to produce unbiased bits. Inflations are best applied right before the extractor. If there are two of them, then it is best to do the transposition, if any, between them, since otherwise there would be repetitions in the resulting stream. Finally, LFG operations tends to introduce spurious correlations that need to be ironed out by transposition; therefore the LFG steps, if any, should be done before the regular transposition. An additional transposition can be performed between LFG operations if there are two of them.
The program outputs two files. The first one, with extension .rnd, is the result of randomizing the bits of the key file. The second, with extension .filepad, is the result of encrypting a second file, using the randomized bits of the first as a keystream. The program recognizes these extensions in order to name the output appropriately.
Meanwhile, here are some insights on what the program does:
- As expected, using the von Neumann extractor produces highly unbiased output, provided bit independence has been assured first, which usually can be done with a single transposition. But then, the von Neumann extractor loses a lot of bits. Adding inflation steps helps to increase the output and does not damage the randomness of the output so long as the transposition is done between inflations.
- Surprisingly, doing a LFG followed by transposition twice produces keystreams of just as good quality with considerably less processing, even though the process is reversible, unlike the vonNeumann extractor. Every bit in the output depends on every bit in the input, which is a desirable property that would make this process something akin to a hash, if it wasn’t for its reversibility (but then, a hash would only take a portion of the output bits, so that the whole file could not be obtained from those).