© Francisco Ruiz, 2016
This page discusses a very simple way to extract entropy from any file so it can be used in a Vernam-style cipher. This could be very useful in practice. Consider this: a 4 TB drive (about $100 in mid-2016) can contain enough bits to encrypt a high-definition video feed (about 1000 kbits/s), continuously, for longer than a year! The trick is that those bits must be truly random, or at least appear to be random so that no cryptanalysis is possible, per Shannon's criterion. This page is all about taking any file and turning its non-random bits into bits that will pass stringent randomness tests so we can use it for encryption.
In order to do this, we will apply a choice of several simple algorithms to the binary data:
There is a preferred order to these operations. The Von Neumann extractor, if used, needs to be the last operation because it is the only one that is mathematically guaranteed to produce unbiased bits. Inflations are best applied right before the extractor. If there are two of them, then it is best to do the transposition, if any, between them, since otherwise there would be repetitions in the resulting stream. Finally, LFG operations tends to introduce spurious correlations that need to be ironed out by transposition; therefore the LFG steps, if any, should be done before the regular transposition. An additional transposition can be performed between LFG operations if there are two of them.
We begin by taking a file from the computer and loging it in the box below. It is loaded as a link so that large files can be used (Chrome can take up to 1.5 MB, Firefox can take even more). Although invisible here, the file will be loaded as a text string consisting of base64 characters.
Typically, uncompressed files contain large regions filled with ones or zeroes. These "lumps" will detract from the randomness of the result and may be best to remove them. The optional algorithm we use here simply removes repeated base64 characters and replaces them with a single copy.
Leave repeats Remove repeats
And let us now decide what algorithms will be applied, and how many times. The default is: use Von Neumann with double inflation (so that the number of bits is roughly conserved), square transposition between the inflation steps, no LFG applied before those:
Apply Von Neumann No Von Neumann
Double inflation Single inflation No inflation
Apply Transposition No Transposition
No LFG Single LFG Double LFG Double LFG w/ Transposition in-between
If we use the result to encrypt another file, we can save computation by processing only the number of bits we need rather than the whole file. Since the von Neumann extractor might throw away more than three quarters of the bits fed into it, we'll be safe and add an extra 50% to the bits taken from the original, in all cases when this depends on the size of the second file:
Process only required bits Process whole file
Here we can optionally input a second file, which also loads as a link. If the key file is too small to encrypt this file, a warning will appear above. If the file is encrypted and the key file and all settings are the same as for encryption, it will be decrypted below.
The next step is to generate the "keystream file" by applying the operations selected above to the key file. We can encrypt a plaintext file (or decrypt an encrypted file) as a bonus, as soon as the button below is clicked. But there is still an extra setting to be aware of. We can keep reusing the key file so long as we "cut" it at a different point each time (especially if the whole file is processed every time), just as a deck of cards is cut and the bottom part is the placed at the top. The place for the cut will be entered as a percentage of the total, from 0% (no cut), to 100% (again, no cut), including decimals. The cut operation is applied right after removing repeats.
And now, the all important button (if you check the box an extra *long* test will be added):
base64 indep.And here is the resulting keystream file, followed by some analysis of its randomness. You can save it by right-clicking on it:
Information about keystream quality will appear here
And here is the second file after its bits have been XORed with an equal number of bits from the keystream file, starting from the beginning. Like the keystream file, it can be saved by right-clicking on it.