You’ve heard it before: “This is as likely as a monkey sitting on a typewriter writing Shakespeare.” It sounds very unlikely but . . . how unlikely, exactly? In this article, I go through the math and use the result to estimate how likely it is for life to have arisen spontaneously out of a primordial soup of chemicals.
I won’t bore you with how this whole trope of monkeys and typewriters came about. It has been used for quite a long time (that’s why typewriters, not laptops, are mentioned) and Wikipedia has a couple excellent articles on it. I’d start with this one. Succinctly, the most popular version says that an infinite number of monkeys typing randomly for an infinite time can produce any work of literature, because the probability of getting it right, even if small, is not zero, and they have unlimited time. The original version, attributed to Émile Borel, referred to the probability of a substance departing momentarily from its most probably thermodynamic macrostate, even for a short time. Borel described it in terms of thousands of monkeys typing complete libraries in his 1919 essay “Mécanique Statistique et Irréversibilité“.
But again, can we get a number, please? Here’s my shot at calculating the probability of our simian friend typing the Gutenberg.org version of “Hamlet.”
After removal of its header and the legal gobbledygook at the end, the Gutenberg.org text version of “Hamlet” contains 144048 non-space characters, 172957 including spaces, 3266 paragraphs including 4690 lines. Characters include all 26 letters, small case and uppercase, plus these punctuation signs: !()-;:'”,.? totaling 11 additional characters, plus spaces and newlines (not counted). We understand that spaces will be necessary, as well as carriage returns at the end of each line, for a total of 172957 + 4690 = 177647 characters. We have to discount 384 underscores that are unnecessary. Brackets are assumed to be identical to parentheses. There is one Latin diphthong, to be split as two letters. No numerals. Thus, the total character count is: 177647 – 384 = 177263. The number of different characters, including spaces and linefeeds, is 26×2 + 11 + 2 = 65, but this is not going to matter for our calculation.
My American English keyboard contains 46 main keys, plus 46 more after shifting, plus space and carriage return, for a total of 94, including all the above mentioned characters plus some that don’t appear in Hamlet. Every time the monkey presses a key, it has to be the right one, so that gives a probability 1/94 to the power 177263 for “Hamlet” to be the result.
My calculator overflows trying to compute this number, so here’s a trick. I can multiply and divide by 100, so the number becomes (100/94)^177263 * 100^(-177263). Unfortunately, the first factor is still indigestible to the calculator, so we need to split it, this way:
((100/94)^1000)^177 * (100/94)^263 * 10 ^(-177263 * 2) = (7.44983*10^26)^177 * 11678162.2 * 10^(-354526) = 7.44983^177 * 10^4602 * 1.16781622 * 10^7 * 10^(-354526) = 2.3437224 * 10^154 * 1.16781622 * 10^(4602 + 7 – 354526) = 2.73708 * 10^(-349763)
Another way to get the number is through the use of logarithms. The decimal logarithm of the number we’re looking for would be – 177263 * log10(94) = – 177263 * 1.973128 = – 349762.562713. Raising 10 to this exponent we obtain 10^(1 – 0.562713 – 349763) = 10^0.437287 * 10^(-349763) = 2.73708 * 10^(-349763), same as before.
In linear form, this would be zero, point, then 349762 zeros, and then 273708. Quite a bit to write. If we assume that one page of printed numbers will have roughly 1250 characters (250 words * 5 characters per word), it would take 280 pages to write it, all but the last filled with zeros, making a book longer than “Hamlet” itself.
But maybe we can get a clean copy faster with a whole bunch of monkeys teaming up to produce it. Assuming we have a monkey sitting on every atom in the universe (10^80 of them according to recent estimates), cranking out a full copy one million times per second (these monkeys are pretty fast typists, thanks to an endless supply of Coca-Cola), it would take 1/2.737 * 10^(349763 – 80 – 6) = 3.654 * 10^349676 seconds, or roughly 1.56 * 10^349669 years, which is about 8.4 * 10^349658 times the age of the universe (currently estimated at 13.77 billion years). Sorry, teamwork. I don’t think anyone, not even the Coca-Cola company, would invest in this venture.
So, yes, this is a very small probability. But it is still rather large compared to the probability of a tepid cup of coffee getting hotter where you drink it (at the expense of getting cooler away from that spot) for only a microsecond, which is what Borel wanted to illustrate. This is not impossible, but it would take a whole army of monkeys on typewriters cranking out all the major works of Western literature without flaw or error.
Now that we have a number and a process to arrive at it, maybe we can apply this to some other situation of interest. One of them is the origin of life on earth (or wherever, if life came here riding an asteroid, as some have suggested). Granted that Darwinian evolution can cause living beings to differentiate and become more sophisticated over time, there is still the problem that the first living organism, which cannot possibly have evolved from a previous non-living entity because evolution requires life, seems to have been itself quite complex (note: there’s speculation about “evolution” of molecules in the non-living prebiotic world, but this is not Darwinian evolution). Ongoing research on the subject has posited the existence of a single-cell progenitor of all life currently here, named Last Universal Common Ancestor, LUCA for short. Now, this may or may not be the first cell. It seems, however, that the lineage of all other contemporary competitors of LUCA has become extinct. The estimate is that LUCA’s genome contained at least 355 genes, because this is how many different proteins seem to have been present in its tiny body, all of them having survived, more or less mutated, to our day. It seems that LUCA had DNA (could have been RNA, but this makes little difference) hundreds of thousands of base pairs long.
Perhaps a better estimate of the first living cell’s complexity can be obtained by removing genetic material from a modern cell and seeing what we can get away with before the cell, or its lineage, is no longer alive. Of course, it is not likely that we would chance into anything resembling the historical first living cell by following this method (among other things because the earth’s environment back then was very different from what today is an environment friendly to life), but maybe we’ll get a good estimate of its complexity, since the first living cell had to perform pretty much the same functions in order to stay alive and pass on the trick to its descendants. There’s an excellent Wikipedia article on Minimal Genome that discusses the history of this effort, which has been going on for several decades. The current minimum genome champion, a synthetic cell named JCVI-syn3A, has a genome consisting of 543 kbp in 493 genes. The logic unit here is the “base pair” (bp for short), which is equivalent to 2 binary bits because the base pair language is base-4. Here our monkey would have a typewriter with only four keys: A, G, C, T, corresponding to the four bases. The probability of getting the syn3A genome right in a single typing session is therefore 1/4^543000. The decimal logarithm of this probability is -log10(4) * 543000 = -0.60206 * 543000 = -326918.575291. Raising 10 to this power in order to reconstruct the probability we get 10^(-326918.575291) = 10^(1 – 0.575291 – 326919) = 10^0.424709 * 10^(-326919) = 2.659 * 10^(-326919)
This is actually a lot easier to type than a singe copy of Hamlet. How much easier? 2.73708 * 10^(-349763) / 2.659 * 10^(-326919) = 1.0294 * 10^22844 times, that is, gazillions and gazillions of times easier. Which isn’t saying much, actually. Because if we put our team of highly caffeinated primates on this task, they still take quite a while. To be exact, 1/2.659 * 10^(326919 – 80 – 6) = 3.3761 * 10^326832 seconds, or roughly 8.655 * 10^326814 times the age of the universe.
These chances are not unlike those of getting a car to reassemble from its loose parts by throwing them in a bin and shaking. Except that car parts might snap together one by one when they align correctly, whereas there’s no good reason why segments of a partly correct genome might remain in their happenstance positions in the absence of natural selection rewarding them.