My post on Monkeys, Typewriters, and the Origin of Life has a very active comment stream, but unfortunately WordPress has stopped admitting more long comments, hence this new post where I provide increasingly refined estimates on the numbers of monkeys on typewriters needed to come up with life without anyone guiding their fingers.

In the previous post, I came up with a probability for the spontaneous emergence of life based on the current champion for viable minimal genome, an artificial bacterium named JCVI-syn3A, which counts 493 genes distributed over roughly 543000 base pairs. The probability of a not particularly smart monkey on a 4-key typewriter (one key for each possible base pair) coming up with the genome of this bacterium is 1/4ˆ543000 = 2.659 * 10ˆ(-326919), which needs some special tricks to calculate because pocket calculators tend to overflow.

In his comments to my post, Pete D. complains that maybe a life form that is genetically not exactly the same as syn3A could be just as viable, so there should be a (perhaps quite large) numerator in my formula for the probability, rather than just 1. In this article, I calculate three kinds of non-exclusive numerator, each of which could conceivably multiply the probability. These are:

- The genes don’t have to be necessarily in a unique order within the genome, so permutations of one gene sequence could be just as viable.
- Each gene could have variants (technically named “alleles”), that are equally viable.
- Finally, and this is Pete’s suggestion, the working genes could be immersed in a much longer string of DNA (or RNA, this doesn’t matter for the calculation), where most of the base pairs don’t code for a gene. This would lead for many more variations.

In the following, I will attempt to calculate these factors, using as a starting point the numbers for syn3A plus some assumptions from other places. For the putative larger DNA, I will use E. Coli per Pete’s suggestion, which has roughly 4.6 million base pairs. Let’s go!

Factor 1: gene permutations

Let’s say the 493 genes of syn3A can be found in any order (ignore the fact that prokaryotic DNA forms a ring rather than a linear segment). Then there would be 493! ways to do this. Again, my calculator overflows trying to compute this large factorial, so I will use Stirling’s approximation for a factorial, which says: n! ~ sqrt(2*pi*n)*(n/e)ˆn , where e is the base of natural logarithms. To avoid overflows, I take the decimal logarithm of this formula: log10(n!) ~ log10(2*pi)/2 + log10(n)/2 + n*log10(n/e). Applying this to n = 493, I obtain: log10(493!) = 0.399089 + 1.346423 + 113.466351 = 1115.211865 . Now I raise 10 to this power, yielding the first factor = 493! = 1.628789*10ˆ1115

Factor 2: gene variants

Each gene of the 493 could admit variations that are just as viable. How many? It will depend on the particular gene since it is possible that not all of them are equally critical to life. Perhaps an order of magnitude will suffice for this factor, so here are two ways to estimate it.

The first comes from estimating how many variations a typical gene admits in today’s biosphere. Take eye color, for instance. It can be brown, blue, gray, black, green, and perhaps some more exotic colors. You get a number of the order of 10, not 100. But let’s say 100 variations are possible, and this for every gene. The factor we get is 100ˆ493 = 10^986

Another estimate would come from assuming that the genome of the first living cell could be as varied as the genome of today’s cells. If we assume that every individual of every species today on earth has a different genome, and that archaea and eukaryotes are as abundant as bacteria (they are not, by far), then we get a number. It is estimated that some 5*10ˆ30 bacteria live on earth today. Multiply this by three in order to get the maximum number of DNA variations currently present on earth: 1.5*10ˆ31 . Since this estimate is a lot smaller than the previous, let’s take the first one as factor 2 = 10ˆ986

Factor 3: active genes within a larger DNA (or RNA) strand

Pete D. suggests that the first living cell may have had a much longer RNA (or DNA) and gives the example of the 4.6 Mbp E. Coli DNA, but that the essential genome, which we are estimating as 543 kbp, may be distributed within the longer strand, with pieces of non-coding DNA between the genes. The number of ways this can happen is a bit harder to compute, but not super-hard. It is the same as the ways 4600 kbp – 543 kpb = 4057 kbp of non-coding DNA can be split to occupy the gaps between the 493 genes. Mathematically, this is the same as computing how many ways you can have r non-negative numbers (zero plus integers) making a sum m. I was lucky enough to find a YouTube video solving this problem, and the solution is Combinations(m+r-1,r-1) = (m+r-1)! / ((r-1)! * (m-r)!) In our particular case, m = 4057000 dummy base pairs to be distributed over 493 + 1 = 494 possible locations (all gaps between genes, plus beginning and end), with contiguous genes being a possibility. The large factorials are again calculated using Stirling’s formula, yielding this for the decimal logarithm of the final factor:

log10(factor3) = log10(2*pi)/2 + log10(4057493)/2 + 4057493*log10(4057494/e) – log10(2*pi)/2 – log10(493)/2 – 493*log10(493/e) – log10(2*pi)/2 – log10(4057000)/2 – 4057000 * log10(405700/e) = -0.399089 + 3.304129 + 25050812.86125 – 1.34642 – 1113.46635 – 3.3041025 – 25047555.003175 = 2142.6462

Therefore, the last factor is 10 raised to this power, yielding 4.428316*10ˆ2142

As a table:

raw probability | factor 1 | factor 2 | factor 3 |

2.659 * 10ˆ(-326919) | 1.628789*10ˆ1115 | 10ˆ986 | 4.428316*10ˆ2142 |

Multiplying all these, we get a new estimate for the probability of a monkey on a 4-key typewriter coming up with anything that might remotely code for the first living cell: 1.91788 * 10ˆ(-323168) , which is still absolutely tiny.

How tiny? You can get an idea of the time it would take our team of highly motivated monkeys (remember, we have one sitting on each atom of the universe, and each monkey cranks out a million full copies each second) by inverting the probability and multiplying by 10ˆ(-86) seconds/copy. You may also consider that the age of the universe, in seconds, is 13.77*10ˆ9 years * 365.25 days/year * 24 hours/day * 3600 seconds/hour = 4.34548*10ˆ17 seconds. Go ahead and do the math.

Or you can compare this to the probability of putting a golf ball into the cup of a large green, in complete darkness. Suppose your green is as big as the universe (radius = 300000 km/s * 13.77 billion years = 1.30364*10ˆ26 m), and cup and ball are the size of a quark (radius around 10ˆ(-19) m). Then the chance of blindly sinking a putt, which is the ratio of the areas (that is, the square of the ratio of radiuses) is (10ˆ(-19)/1.30364*10ˆ26)ˆ2 = 5.8841*10ˆ(-91). This is immensely more likely that the monkeys typing the genome of the simplest living cell by chance. In fact, if a whole universe like our own were to reside inside each quark, and another universe inside each quark of that secondary universe, and so on multiple times before we reach the cup, we would would have to go through log10(1.91788*10ˆ(-323168))/(log10(5.8841*10ˆ(-91)) = (0.2828 – 323168)/(0.76968 – 91) = 3581.6 levels of shrinking the whole universe into a quark before the chance of making our putt from the edge of the universe compares to that of a monkey typing the simplest genome.

Another comparison. I live in Illinois, where there is a daily lottery where winning the biggest prize, the Jackpot, has a chance of 1 in 20,358,520. If I am here by sheer luck as a descendent of that first cell, this is the equivalent of winning the Illinois Jackpot log10(1.91788*10ˆ(-323168))/(-log10(20358520)) = (0.282821 – 323168)/(-7.3087462) = 44216.574 times in a row, that is, every day for 121 years. Exciting maybe the first few times, but I guess I’d eventually get tired of waiting for a loss.

One more thought as a conclusion, and this is that the kinds of odds we are handling here resemble somewhat those of cracking an encrypted code by brute force. You may have seen films where the good guys (sometimes the bad guys) have a computer that comes up with an encryption key digit by digit, though it usually takes a few minutes during which the fate of the world hangs in the balance. Of course, this can be done only with the poorest of encryption codes. Normal codes are designed so there’s no way to tell that you are just one bit away from finding the correct key, let alone get the process started from a wild guess. But the movies have caused us to grow accustomed to the false idea that a code might be guessed bit by bit, and so we do not find it preposterous to believe that a bag of nonliving chemicals (notice I’m giving you a cell membrane for free here) might manage to become a piece of RNA (or DNA), base pair by base pair, that actually has a chance of being stable, replicating, and evolving from then on, and this without any recognizable fitness function to be maximized, or a mechanism that might make this happen. But if you are just one base pair away from a viable genetic code, that code normally leads to a nonfunctional cell, and the code does not stand a chance of surviving before it is replaced by another code that is actually further from a good one.