© Francisco Ruiz, 2016

This page discusses a very simple way to extract entropy from any file so it can be used in a Vernam-style cipher. This could be very useful in practice. Consider this: a 4 TB drive (about $100 in mid-2016) can contain enough bits to encrypt a high-definition video feed (about 1000 kbits/s), continuously, for longer than a year! The trick is that those bits must be truly random, or at least appear to be random so that no cryptanalysis is possible, per Shannon's criterion. This page is all about taking any file and turning its non-random bits into bits that will pass stringent randomness tests so we can use it for encryption.

In order to do this, we will apply a choice of several simple algorithms to the binary data:

- Von Neumann extractor: take bits in pairs, if they are different, take the first (or the second, but do so consistently), if they are equal, ignore them and go to the next pair. This algorithm will produce unbiased bits—that is, ones are as likely to appear as zeroes—provided the original bits are uncorrelated, which means that any pairs of bits, separated by any distance, have the same probability. This operation reduces the number of bits to one-fourth the original or less.
- Columnar transposition: write down the bit stream in rows of a given length, then read the result by columns. This separates bits that are correlated because they come from certain bytes or characters, so that adjacent bits have little correlation (so that the von Neumann extractor can be applied to them). This operation conserves the number of bits. Separation is maximized if the row length is the square root of the string length.
- Binary inflation: since the von Neumann extractor only requires that the bits in each pair be uncorrelated, we can reuse some bits as long as the new pairs are different from the previous ones. One simple way to do this is to take the first (or second, but be consistent) bit of each pair and append it to the end of the stream, whether or not they are equal. This operation produces a string that is double the length of the original, less one.
- Lagged Fibonacci generator (LFG): take the last bit and write it under the first bit, and then make a new second row by writing down the result of XORing the top and bottom bits immediately to the left. The process can be repeated as many times as one desires, always resulting in a string of the same length as the original. The effect of this is that every original bit affects all bits that follow, if applied once, or all bits in the resulting string, if applied twice or more times.

There is a preferred order to these operations. The Von Neumann extractor, if used, needs to be the last operation because it is the only one that is mathematically guaranteed to produce unbiased bits. Inflations are best applied right before the extractor. If there are two of them, then it is best to do the transposition, if any, between them, since otherwise there would be repetitions in the resulting stream. Finally, LFG operations tends to introduce spurious correlations that need to be ironed out by transposition; therefore the LFG steps, if any, should be done before the regular transposition. An additional transposition can be performed between LFG operations if there are two of them.

We begin by taking a file from the computer and loging it in the box below. It is loaded as a link so that large files can be used (Chrome can take up to 1.5 MB, Firefox can take even more). Although invisible here, the file will be loaded as a text string consisting of base64 characters.

Typically, uncompressed files contain large regions filled with ones or zeroes. These "lumps" will detract from the randomness of the result and may be best to remove them. The optional algorithm we use here simply removes repeated base64 characters and replaces them with a single copy.

Leave repeats Remove repeats

And let us now decide what algorithms will be applied, and how many times. The default is: use Von Neumann with double inflation (so that the number of bits is roughly conserved), square transposition between the inflation steps, no LFG applied before those:

Apply Von Neumann No Von Neumann

Double inflation Single inflation No inflation

Apply Transposition No Transposition

No LFG Single LFG Double LFG Double LFG w/ Transposition in-between

If we use the result to encrypt another file, we can save computation by processing only the number of bits we need rather than the whole file. Since the von Neumann extractor might throw away more than three quarters of the bits fed into it, we'll be safe and add an extra 50% to the bits taken from the original, in all cases when this depends on the size of the second file:

Process only required bits Process whole file

Here we can optionally input a second file, which also loads as a link. If the key file is too small to encrypt this file, a warning will appear above. If the file is encrypted and the key file and all settings are the same as for encryption, it will be decrypted below.

The next step is to generate the "keystream file" by applying the operations selected above to the key file. We can encrypt a plaintext file (or decrypt an encrypted file) as a bonus, as soon as the button below is clicked. But there is still an extra setting to be aware of. We can keep reusing the key file so long as we "cut" it at a different point each time (especially if the whole file is processed every time), just as a deck of cards is cut and the bottom part is the placed at the top. The place for the cut will be entered as a percentage of the total, from 0% (no cut), to 100% (again, no cut), including decimals. The cut operation is applied right after removing repeats.

And now, the all important button (if you check the box an extra *long* test will be added):

base64 indep.And here is the resulting keystream file, followed by some analysis of its randomness. You can save it by right-clicking on it:

Information about keystream quality will appear here

And here is the second file after its bits have been XORed with an equal number of bits from the keystream file, starting from the beginning. Like the keystream file, it can be saved by right-clicking on it.