On the Occurrence of Improbable Events

Many events are extremely improbable, but are almost guaranteed to occur. For example: winning the lottery, being struck by lightning, or, as a real-world example of what happened today, seeing snow in Austin.

I wanted to go into a metaphysical or mathematical rant on this, but I’m not quite sure how to proceed into either. Weather is, right now, at best a probability—it’s a chaotic system. The famous Butterfly Effect illustrates that a butterfly flapping its wings can cause a tempest elsewhere in the world.

From Caltech on this phenomenon [Michael Cross, 8/18/2009]:

The “Butterfly Effect” is often ascribed to Lorenz. In a paper in 1963 given to the New York Academy of Sciences he remarks:

One meteorologist remarked that if the theory were correct, one flap of a seagull’s wings would be enough to alter the course of the weather forever.

By the time of his talk at the December 1972 meeting of the American Association for the Advancement of Science in Washington, D.C. the sea gull had evolved into the more poetic butterfly – the title of his talk was:

Predictability: Does the Flap of a Butterfly’s Wings in Brazil set off a Tornado in Texas?

So, a tiny change in initial conditions can alter the overall environment in the future. Now what does the Butterfly Effect have to do with snow in Austin? Simple. It means we have no idea what could have caused the snow. It could have been a butterfly in Brazil. No—it was the one next to it. Or was it a butterfly in Mexico? Was it a butterfly at all?

Okay, I’ll admit that’s extending the facts a little bit. In fact, I’ve been tricking you. The principle that allowed snowfall in Austin is not the Butterfly, but rather, the Law of Large Numbers. This law more or less states that the average value will approach the expected value after a large number of trials. For example, say the probability of appreciable snow on any given day in Austin is 0.1%. This means that we expect one day in every thousand to have snow. But this does not mean that in any given 1000 days, there must be at least one day with snow.

In fact, we can perform a simple calculation to find the chance that there are no days with snow in a 1000-day interval. The probability that there is not snow on a given day is 99.9%. For two days in a row, we multiply this number by itself, and we end up with a number near 99.8%. For 1000 days, we simply raise 0.999, the probability, to the 1000th power; this gives 36.8%. This is the probability that in 1000 days, there is no day with snow, even though we expected one day to have snow. To find the chance that there is at least one day with snow, we subtract the probability from one; this gives 63.2%. With further calculation (using the binomial distribution, or more specifically the Poisson distribution, for those concerned with the math), we find that the probability of X days of snow in 1000 is:

Days with snow Probability
0 36.8%
1 36.8%
2 18.4%
3 6.12%
4 1.53%
5 0.304%

These numbers added together give over 99.9%, meaning the chance that there are six or more days of snow is extraordinarily small. Let us now go to a cumulative probability, which will be more useful here. This means we’re going to sum all the probabilities up to that number.

Days with snow Cumulative probability
0 36.8%
1 73.6%
2 92.0%
3 98.1%
4 99.6%
5 99.9%

What does this mean? Basically, it says there is a 36.8% chance there are zero days of snow, 73.6% chance there is at most one day of snow, 92.0% that there are at most two days of snow, etc.

Let us take the next step: say we measure over 10,000 days. From pure probability, we would expect 0.1% of those 10,000 days to have snow, or 10 days. We again take a cumulative probability:

Days with snow Cumulative probability
0 0.00452%
1 0.0497%
2 0.276%
3 1.03%
4 2.92%
5 6.70%
6 13.0%
7 22.0%
8 33.3%
9 45.8%
10 58.3%
11 69.7%
12 79.2%
13 86.5%
14 91.7%
15 95.1%
16 97.3%
17 98.6%
18 99.3%
19 99.7%
20 99.8%

Now here is the important part. We want there to be on average 1 day of snow for every 1000 days. To make the first case even considerable, we must allow a give or take of 100%. We’ll allow anywhere from 0 days of snow to 2 days of snow for every 1000. Then the probability of this is 92.0% in the 1000-day case, but 99.8% in the 10,000-day case. So in the smaller experiment, there was an 8% chance to deviate by a 100% error, but in the second case, only 0.2%. So it’s more likely to be close to the expected value as the number of trials increases.

There is another way to analyze this. We shall cut the allowed deviation in the previous analysis from 100% to 50%. Basically, we want the chance there is one out of every 1000. For the 1000-day case, this chance is 36.8%, from the very first table. For the 10,000-day case, we look at the numbers from the last table for 5 through 15 days of snow. Subtracting 6.70% from 95.1%, we obtain 88.4% chance that there are between 5 and 15 days of snow in 10,000 days. And 88.4% is much higher than 36.8%. It is then much more likely that the outcome approaches the expected value when the number of trials increases. We may try cases with 100,000 days or 1,000,000 days, and the trend will continue.

So, over a 10,000-day period, there will probably be near 10 days of snow, but in any given run of 1000 days, there is no guarantee of even a single day of snow.

In the case of Austin, we expect there to be several days of snow every decade, but we don’t know in which years they will fall. On the other hand, if I go to a college in the North…