About a year ago, I added to PassLok and its derivatives a very secure algorithm for image steganography. It was presented at the ForenSecure 2017 conference on cybersecurity and forensics, but I just dawned on me that I didn’t post anything about it on this blog, for those who may not have attended that conference. I believe that, one year later, this method is still the reigning world champion for image steganography. This article explains how it works, hopefully in a form that is easy to understand, and includes a sample program and some sample results.
“Steganography” is the art of hiding things in plain sight. This Wikipedia article tells you a lot about the science, which goes way back even though computers have made it much more powerful in recent times. One of the most common forms of steganography uses digital images as a container for secret data, because images are fairly easy to post and transmit, and typically no one suspects them. It is more used than people think. For instance, it is said that Al-Qaeda communicated with its operatives for years through images posted on pornographic websites. Who knows how much secret material is being sent back and forth, right now, through FaceBook or Instagram…
A key quality of a good steganographic method is undetectability, that is, that no one can tell whether a particular picture (or audio file, or whatever) contains a secret payload, even if they are scanning it for this very purpose and know what method was used to hide the payload. This last condition is also known as Kerckhoffs’s Principle. It follows that a good method must include a secret key or password, shared by the maker of the picture and those wanting to retrieve the hidden payload, for otherwise everyone knowing the method would be able to extract that payload.
My own PassLok, URSA, and SeeOnce encryption apps include the ability to conceal encrypted messages into text that does not appear to be encrypted (you know, like a whole bunch of gibberish letters), but these do not pass Kerckhoffs’s Principle because there is no key involved in the concealment, and so anyone can retrieve the encrypted data if only they use any of those apps. It would also be easy to make single-purpose scanning apps that would just detect the presence of hidden content. They are included in my apps simply to add concealment against casual scanning by human eyes, and this is why my apps disable this process except when messages are already encrypted by the main algorithm.
But PassLok and URSA also add concealment into images, and in this case an additional key—optional, but omit it at your own risk—is involved in the process. The algorithm was also implemented into a standalone app, whose output is not compatible with those in PassLok and URSA because it does not contain the highly secure encryption algorithms proper of those apps. Nevertheless it is quite secure in its own right, and this is why I am writing this post, in case you want a lightweight app that does only image steganography but is very good at it. I will next explain how it works, pointing out the difference with PassLok and URSA when it comes up.
This will work better if you are testing the app by yourself. You can load it by clicking this link. Once you have it, you can save it to your computer and it will run fine, because all processing is done locally without connecting to any machine over the Internet.
First a little theory on how image steganography works. Digital images are made of little pixels, each of which have a red, green, and blue component if they are in color. Some formats, such as GIF and PNG, add a fourth component for transparency. Other formats, like JPEG, split the image into blocks and encode rather how those blocks are to be put together to reconstruct the image, which typically results in a smaller file size. To hide something in the pixels or block data (“coefficients” are called in JPEG), the most common strategy is to alter the values ever so slightly so that, for instance, an even value means a digital zero and an odd value means a digital one. The resulting image will be different from the original, but human eyes are not keen enough to detect the difference. Human ears, on the other hand, are quite sensitive to this kind of manipulation being done on sound data, and this is why sound files are rarely used to conceal hidden data.
A program trying to detect hidden data in a picture will fare a lot better than human eyes, so long as artifacts are left or statistics are changed. This is why pictures containing large portions of uniform color are not good covers for hidden data. The same can be said of the transparency (“alpha”) channel of GIF and PNG images, which tend to be pegged at full-on or full-off in the original, so that a change in the least significant digit would stick out, though not to human eyes. If the hiding method consists of lowering the pixel value in order to make it odd or even, this will result in a generalized darkening of the picture that will be detectable if the original picture is known. This is why a good method will lower the value half the time and rise it half the time so that on average there is no change of luminosity. For a JPEG image, the process is applied to block “coefficients” that can be positive, negative, or zero. The problem here is that the distribution of coefficient values is not uniform, but rather is shaped like a “bell curve,” seen at left, where zeros dominate by far. A method that reduces the number of zeros will be easily detectable, but so is a method that makes new zeros (plus the additional difficulty of determining whether those are odd or even).
The previous reigning champion for JPEG encoding is F5, developed in Andreas Westfeld by 1999. The figure on the right shows the general strategy. The idea is to disperse the changes throughout the image, plus minimize the total number of changes. Nevertheless, F5 tends to create extra zeros, so that the histogram of coefficients of the modified image looks different from that of the original, and this makes it detectable, as shown in this article, though the hidden data still cannot be extracted without knowing the key. PassLok improves on F5 by adding two tricks:
- The definition of “odd” and “even” switches when the number is negative (thus, 1 is “odd,” but -1 is “even”), so that zero coefficients are not modified, and no extra zeros are created.
- Coefficients are sometimes changed by making their absolute value larger, sometimes by making them smaller, with this being decided randomly with a probability “y” based on how the histogram of coefficients of the original image decays with absolute value. The figure below explains how this is calculated.
The result is that the histogram of coefficients in the original and the modified pictures are nearly identical. Even better, if we edit the modified picture in any way (which will erase all the hidden data), the result has the same histogram as if the same editing had been done on the original. Below is a series of pictures showing increasing the histogram of coefficients with increasing magnification. You need to blow it up 1000 times before you begin to see any difference between the histograms of the original and the modified images. Any editing on the pictures has more effect on the histogram than hiding data. If you want to see the slides for my ForenSecure 2017 talk, you can get them here.
You may ask, what picture and what data are we talking about? The picture is a larger version of the one on the right, which is from the 1967 musical “Oliver!” You can get the original file from this link, and the modified file from this other link (about 2.2 MB each). If you click them, they should appear on a new tab, from where you can save them to disk by right-clicking. If you now load the modified picture into the app, and type the password “oliver twist” (without the quotes) in the small box for that purpose, then uncheck “Smart Pwd.” and click “Decrypt,” what do you get after a few seconds of processing?
That’s right, you get the entire gutenberg.org version of Charles Dickens’s “Oliver Twist.” This bears repeating: when you look at the modified picture, you are looking at the entire novel, that’s more than 500 pages in paperback printing, right there before your eyes. Can you see it? Well, a computer looking for it can’t see it either. Now, 2.2 MB is a bit large for an image on a website, and this why I’ve put it on a download link rather than directly on the post, but it is a normal size for a picture taken with a cell phone or sent in an email. And this is a JPEG image. A PNG image, which is the other format that the app can hide data into, can contain an order of magnitude more hidden material. The algorithm for hiding into PNG is similar, except that, since least-significant pixel values are expected to be more evenly distributed than JPEG coefficients, the odd-even trick mentioned above is not necessary. Both formats begin with encrypting the data by XORing it with the pseudo-random number generator used for dispersing the data among the pixels of coefficients, so that the data to be hidden has quasi-random statistics. This step is not necessary when hiding already encrypted material, and this is where this program differs from PassLok and URSA.
The entire novel is also encoded into this PNG image, which derives from this original (thumbnail at right), with the same password as before. Observer that, even though the image is black and white and it has large portions of single color, it is still pretty much impossible to tell that anything is in there.
As it turns out, the hiding operation does not use all the capacity of the image (especially in PNG), so I’ve added a second pass that encrypts additional material into whatever space is left. To reveal the doubly-hidden data, you need to add a vertical bar character “|” after the first password, and then the password for the additional material. It can be different, but in the samples I’m showing here both passwords are the same. The JPEG image contains a few lines from Oliver’s chapter 1, while the PNG image contains the whole of that chapter thus encoded. To retrieve it, you must have both passwords right.
Let me finish this post telling you how to enhance the carrying capacity of a given image. The first requirement is lots of pixels, as you likely guessed. You can increase this by re-sampling the image in Photoshop or similar program. An image taken directly with a camera is not likely to require much pixel multiplication. Then, it is best if, rather than solid colors, the image contains many different colors. Noisy pictures are the best. You can add noise in Photoshop quite easily. Finally, since hiding a lot of data tends to decrease the perceived contrast somewhat, it is best to increase the contrast a bit before hiding the data. Any picture manipulation must be done before adding the hidden data since editing the picture changes pixel and coefficient values, and thus erases the hidden data.