This page discusses a simple way to extract entropy from any file so it can be used in a Vernam-style cipher. This could be very useful in practice. Consider this: a 5 TB drive (about \$100 in early 2022) can contain enough bits to encrypt a high-definition video feed (about 1000 kbits/s), continuously, for longer than a year! The trick is that those bits must be truly random, or at least appear to be random so that no cryptanalysis is possible, per Shannon's criterion. This page is all about taking any file and turning its non-random bytes (which nevertheless contain a lot of entropy) into bytes that will pass stringent randomness tests so we can use it for encryption. A good source is any file that has been shared already with others, such as a photo or a document in the cloud.

The algorithms used are:

1. Columnar transposition: write down the byte stream in rows of a given length, then read the result by columns. This separates bytes that are correlated because they are near each other in the file. Separation is maximized if the row length is the square root of the block length.
2. Lagged Fibonacci generator (LFG): take the last byte and write it under the first byte, and then make a new second row by writing down the result of adding the top and bottom bytes (mod 256) immediately to the left. The process can be repeated as many times as one desires, always resulting in a byte array of the same length as the original. The effect of this is that every original byte affects all bytes that follow, if applied once, or all bytes in the output, if applied twice or more times.
3. xor two halves of the byte stream. This makes it almost impossible to reverse the above processes, which are intrinsically reversible, if the byte stream prior to this step has good randomness statistics. It also reduces the length by a factor of two.

LFG operations tends to introduce spurious correlations that need to be ironed out by transposition. Also, bytes in a single LFG (except the last) only affect those that follow them. Therefore the basic algorithm is two LFGs with a transposition between them. The result is still reversible so, if you don't want that, do the xor thing (there's a checkbox for that below).

## Step 1. Key file input

We might begin by taking a file from the computer and loading it in the box below. Although invisible here, the file will be loaded as a byte array. Uncheck the "Show as image" box to save memory for large files.

We can concentrate its entropy by applying raw Deflate compression before using it. The file often will become smaller as a result but the process may be slow with large files. Select this before loading the file.

No compression     Compress

### Key File

 Show as image If there is any problem with the file, a warning will appear here As an image:

If we use the result to encrypt another file, we can save computation by processing only the number of bytes we need rather than the whole file:

Process only required bytes     Process whole file

We can process the whole thing in one block or split it into smaller blocks and then process each one. The box below sets the block size, in bytes (0 means no process; negative means process all at once).

### Block size

Allowed values are integers 0 to file length (default 23), plus negative.

Adding a repeating sequence at start the process for the first block helps with files of poor randomness. If you check the Whitening box the sequence will also be xored after processing each block, which adds extra security if the sequence is kept secret.

### Initial sequence

Whitening        Preferred values are integers 0 to 255, separated by commas (default 1), though it will take any list of integers.

We can keep reusing the key file so long as we "cut" it at a different point each time, just as a deck of cards is cut in two and the bottom part is the placed at the top. The place for the cut will be entered as a percentage of the total, from 0% (no cut), to 100% (again, no cut), including decimals. The cut operation is applied right at the start.

### Cut location (percent)

%     Allowed values are 0 to 100. Decimals are OK.

## Step 2. Plain file / encrypted file

Here we can optionally input a second file, which also loads as a byte array. If the key file is too small to encrypt this file, a warning will appear above. If the file is encrypted and the key file and all settings are the same as for encryption, it will be decrypted below. A cut operation using the percentage below will be applied to the plain file content at the start, and the reverse cut to the resulting encrypted material at the end of the process. In order to overcome the malleability of the encryption process, a message authentication code (MAC) is taken. The algorithm for this is explained below.

### Plain cut location (percent)

%     Allowed values are 0 to 100. Decimals are OK.

As with the Key file, the Plain file can be compressed in order to make the encrypted file smaller. If compression was used for encryption, we must use decompression for decryption.

No compression     Compress     Decompress

## Step 3. Keystream file

The final step is to generate the "keystream file" by applying the operations selected above to the key file. We can encrypt a plaintext file (or decrypt an encrypted file) as a bonus, as soon as the button below is clicked. We can use the regular forward process or its reverse. The forward process is much better than the reverse at creating randomness.

Forward     Reverse     Reverse & decompress

And now, the all important button. To make the process non-reversible, keep the first box checked, which splits the keystream in two halves, does a transposition on the first, and xors it with the second.

Non-reversible      Randomness tests

And here is the resulting keystream file, followed by some analysis of its randomness. You can save it with the button:

### Keystream file

 Information about keystream quality will appear here As an image:

## Step 4. Encrypted / decrypted file

And here is the second file after its bits have been xored with an equal number of bytes from the keystream file, starting from the beginning. There is a button to save this one too. For added security, record the MAC of the input file when encrypting and send it along with the encrypted file. The MAC displayed for the decrypted file should be the same. The algorithm that makes the MAC is described below.

### Encrypted/decrypted file

This is how the MAC is made: take the initial byte sequence, expand it to 32 bytes, and xor it with the first 32 bytes of the input file (no cut), then take the LFG and a transposition with row length 6, xor 32 bytes of the key bytes (after cut) and do another LFG. Repeat the process for all the remaining file bytes, using the previous 32-byte result instead of the initial sequence, and the following 32 bytes of the plain and key files, stretched to 32-byte length with zeros if necessary. At the end, split the result in the middle and xor the two halves to get 16 bytes, which are then converted to hexadecimal for display.