A Single Cause For Everything

Our society loves to pin each problem on one cause. The most recent example is Elliot Rodger. Some say he was a misogynist (Huff Post) and others that he was mentally insane (TIME). Others blame the system instead, claiming that he was operating under a grander systematic male privilege (Salon) or that therapists and law enforcement are inadequate to detect signs mental illness (Slate). And here is yet another pair of conflicting reports in the misogyny (Washington Post) vs mental illness (National Review) debate. Despite the variety of voices in the debate, they all seem to agree on one thing: their reason is the only reason.

The title of the TIME article says blatantly, “Misogyny Didn’t Turn Elliot Rodger Into a Killer,” and the first sentence reads, “Yes, Elliot Rodger was a misogynist — but blaming a cultural hatred for women for his actions loses sight of the real reason why isolated, mentally ill young men turn to mass murder.”

Besides this acknowledgement, the articles all present evidence that furthers their own theories while not considering evidence that might support other theories. It’s very difficult to dig up an article that discusses, for instance, with nuance how much of it was caused by misogyny and how much by mental illness, or how the two factors behave in tandem. (Or whether there is a third factor: this article (Salon) talks about the role of race in Rodger’s motives.)

In case you’ve already made your mind on which side of the misogyny vs mental illness debate you fall on, here is a simpler, non-politically-charged example. Suppose we want a theory to predict where there is snow and where there isn’t snow. The first theory I’ll propose is the latitude theory: higher latitudes are colder and should thus have more snow (assuming we’re in the Northern hemisphere).  If this theory were completely true, the snow distribution might look something like this.


Everywhere north of the latitude line, there is snow, and everywhere south, there is no snow. Clearly this isn’t true.

Here is another theory: water proximity theory. Snow needs water to freeze, so snow will form near bodies of water. If this theory were completely true, then we should only find snow near water. Clearly this isn’t true either.

Here is an actual picture of snow cover from NASA:


And here is an animated gif of world snow cover:


As one can see, neither theory is true as an absolute statement. The correct way to think of these theories is as probabilistic theories. That is, the more north you go, the higher the chance you will encounter snow. The same goes for being near bodies of water, to a lesser extent. Even then, snow cover cannot be explained as a combination of these two factors alone: mountainous regions have more snowfall as well.

The debates in our current-day media are akin to one side saying that latitude determines everything and the other side that proximity to water determines everything. Neither side is willing to look rationally at the cold facts around them.

History is another subject where it is more clear that everything has multiple causes. In just less than two months from today, it will have been 100 years since the beginning of World War I. One might argue that the cause of WWI was the assassination of an archduke, but this simplistic explanation misses all the political tensions and alliances at the time. Similarly, one could argue that it was purely due to the political landscape and that war would have broken out regardless of the assassination. Both causes were necessary to an extent. If Franz Ferdinand had been assassinated in a less tense time, war might have been averted. Similarly, if no assassination had occurred, the great powers might not have had a proper excuse to actually go to war.

So why can’t we use scientific or historical reasoning on sociological issues?

Religion is a great example of this single-cause mentality. The honor killing of Pakistani woman Farzana Parveen last week was unanimously condemned in the US, similarly to the Elliot Rodger shooting. However, whenever someone tried to posit a cause that could have contributed to the honor killing, the other side would knock it down, saying it couldn’t be the right cause, and they give examples. For instance, if you go to the comment section of any major news story about this event, you’ll invariably find that someone criticizes Islam for condoning honor killings and promoting misogyny, and then someone else responds by pointing out that honor killings sometimes happen in other cultures (e.g. Hindu) as well.

Both sides make decent points but such conversations are useless since they are both saying true things but ignoring what the other side is saying. Just as “more north = more snow” is not always true, it is also not false. So sure, Islam might not be the only reason that honor killings occur so much in Pakistan, but it’s a pretty strong factor. Just because a cause is not the only cause does not mean that it is not a cause at all.

With religion in general, people very often make absurdly simplistic statements themselves and assume other people’s views of religion are absurdly simplistic (perhaps by projection). This might also be reflected in the general media and American culture as a whole. We love simple answers to complex problems. I’m not advocating that we personally conduct full academic research for every problem we face, but we are clearly too far on the simplistic side. The problem is that we’re thinking too little, not too much.

Elliot Rodger’s event, just like any other event, has a variety of causes. Both misogyny and ill-handling of mental illness are to blame. Snow cover depends on several conditions. World War I had a complex background, as do honor killings and suicide bombings.

Solutions to oversimplification of causes?

  • Prefer depth of news, not breadth. Instead of gaining a superficial understanding of many stories, try to understand one story really well. Read 10 different articles on Elliot Rodger and look at the issue from all sides.
  • Look at the statistics yourself. Numbers don’t oversimplify themselves.
  • Acquire more information. Have an opinion on Russia’s involvement with Ukraine? See if your opinion changes if you read up on past involvements.
  • Read the comments section of the article. While 90% of it may be trash, someone might point out something worthwhile.

Observer Selection

Today was my graduation from Cornell, but since I’m not a fan of ceremony, the topic for today is completely different: a subset of selection bias known as observer selection.

Selection bias in general is selecting particular data points out of a larger set to distort the data. For example, using the government’s own NOAA website (National Oceanic and Atmospheric Administration), I could point out that the average temperature in 1934 was 54.10 degrees Fahrenheit, while in 2008 it was 52.29. Clearly from these data points, the US must be cooling over time. The problem with the argument is, of course, that the two years 1934 and 2008 were chosen very carefully: 1934 was the hottest year in the earlier time period, and 2008 was the coolest year in recent times. Comparing these two points is quite meaningless, as the overall trend is up.

us_temperature_thru_2013Observer selection is when the selection bias comes from the fact that someone must exist in a particular setting to do the observation. For instance, we only know of one universe, and there is life in our universe—us. Could it have been possible that our universe had no life?

The issue with trying to answer this question is that if our universe indeed had no life, then we wouldn’t exist to witness that.

“The anthropic principle: given that we are observing the universe, the universe must have properties that support intelligent life. It addresses the question “Why is our universe suitable for life?” by noting that if our universe were not suitable for life, then we wouldn’t be here making that observation. That is, the alternative question, “Why is our universe not suitable for life,” cannot physically be asked. We must observe a universe compatible with intelligent life.”

the multiverse

The point is, there may be millions, billions, or even an infinite number of universes. But even if only one in a trillion were suitable for life, we must exist in one of those. So our universe is not “fine tuned” for life, but rather, our existence means we must be in a universe that supports us.

A list of observer effects:

  • The anthropic principle, as above. Our universe must be suitable for life.
  • A planet-oriented version of the anthropic principle: Earth has abundant natural resources, is in the habitable zone, has a strong magnetic field, etc.
  • A species-oriented version of the anthropic  principle: Our species is very well adapted to survive. If we weren’t, then we wouldn’t be thinking about this.
  • There are no recent catastrophic asteroid impacts (the last one being 65 million years ago). If there were, then we again wouldn’t be observing that.
  • The same goes for all natural disasters. No catastropic volcano eruptions, no nearby supernovae or black holes, etc.
  • The same goes for apocalyptic man-made disasters. Had the Cold War led to a nuclear exchange that wiped out humanity, we would not be able to observe a headline that said, “Nuclear Weapons Make Humans Extinct.” Thus, we must observe non-catastrophic events in the past.
  • Individual life follows this as well. Say you had a life-threatening illness or accident in the past, but you’re alive now (of course, given that you’re reading this). Given that you’re alive now, you must have survived it, so to the question, “Are you alive?,” you can only answer yes.

All of these are strong observer effects, in that they are absolute statements and not probabilistic ones, i.e. “Our universe must have life,” and not “Our universe probably has life.”

There are numerous other observer effects that are probabilistic but can be still very significant. For example, given that you are reading this, you are more likely in a literate country than in less literate one. Moreover, the probability would be higher than that if I did not know anything about you.

In this post, I mentioned the example of democracy in political science. In summary, political science has a lot more to say on democracy than on any other form of government. Is this because we are personally biased towards democracy? Not necessarily. In a less open system, fields like political science might be forbidden from research (or academia is rated less important), and hence there are no (or few) pro-totalitarian political scientists. Hence, we end up seeming to favor democracy.

We also know that history is written by the victors. But a related historical example is the rise of strong states combined with the rise of liberalism  and progressive thoughts in the Modern era. Namely, states in which liberalism arose (England, France) tended to be strong states. A weak state adopting progressive measures would be wiped out by a stronger one. Hence, history is also analyzed by the victors.

So what can you do about observer selection? All we can do is try to be aware of it and introduce corrections to study a full set of possibilities rather than the subset we are in by being a particular observer. For instance, if we were just using historical data of natural disasters, we would be underestimating the actual probability of a catastrophic disaster, as we live in a time where none could have occurred for a while.

Statistics in the Social Sciences

I’ve always wondered whether the rigorous application of statistics is underutilized in the social sciences. This is less so a problem in economics, where the subject is, by nature, highly quantitative. But in fields like psychology, sociology, and political science, where a background in mathematics is not common (unlike for biology, chemistry, and physics), researchers can intentionally or, very often, unintentionally (this is a really good Economist article) produce wrong results by abuse or misunderstanding of statistical inference.


As an onlooker whose training is in mathematics, I cannot help but to feel frustrated by the lack of numeracy in our “scientists.” The Economist article does a good job at showing how failure to understand statistical concepts leads to false results being published, even past peer review.

What triggered me to write this post was an assigned reading for a comparative politics class. In it, Adam Przeworski discusses the inherent selection bias in matching countries for experimentation. Noting that democracies have higher economic growth rates than authoritarian regimes, Przeworksi brings in the relevant data that democracies have a significant chance to die off when faced with economic failure whereas authoritarian regimes are not as affected. Hence, observing that democracies have higher growth rates does not signify that democracy leads to economic growth, but rather that economically failing democracies are not observed because they tend to disappear.

“What we are observing here is what the statistical literature calls ‘selection bias.’ Indeed, I am persuaded that all the comparative work we have been doing may suffer potentially from selection bias.”  (p. 19, stable JSTOR link)

In context of a comparative politics theory symposium, this makes a lot of sense to state. But the phrasing is really interesting to a math person: selection bias is a given, and is one of the tools we use to analyze anything. My instinctual reaction to the reading was “Duh, obviously there is selection bias.” While I am sure the field of comparative politics is more aware of selection bias than Przeworski makes it appear to be, the fact that Przeworski framed it as such (“what the statistical literature calls ‘selection bias'”), as if to imply that the formal tools of statistical inference are generally beyond the scope of comparative politics theory, is a bit unnerving.

Przeworski, Adam in The Role of Theory in Comparative Politics: A Symposium, World Politics, Vol. 48, No. 1 (Oct., 1995), pp. 1-49.

The Richest of the Rich

Forbes released two days ago a ranked list of the world’s billionaires—all 1011 of them. Bill Gates, topping the list 14 times in the past 15 years, is now second, behind telecom giant Carlos Slim by a mere half a billion dollars. Mere, at least, by standards of the super rich. They have net worths of $53 billion and $53.5 billion respectively.

Notable is an increase from last year, when the list included only 793 names: a 27.5% increase. This is largely due to economic recovery. Still, the economy has not fully regenerated; the number of billionaires in the 2008 list was 1125.

Right now the Earth’s population is approximately 6.7 billion, meaning approximately one in 6.7 million persons is a billionaire. On the other hand, there were 9.5 million millionaires in 2006 according to the 2007 World Wealth Report—1 in 705 persons is a millionaire. So, even among millionaires, the proportion of billionaires is quite small*: 1 in 9397 millionaires is a billionaire. A person is thus 13.3 times more likely to be a millionaire among the general population than a billionaire among millionaires.

*The figures following the asterisk above use data from two different years: 2006 and 2010, and so are not exact. If we compare data from just 2006, we have 793 billionaires, and so the generalizations would be even stronger—only 1 in 11980 millionaires is a billionaire, and the 13.3 factor becomes 17.0.

On the Occurrence of Improbable Events

Many events are extremely improbable, but are almost guaranteed to occur. For example: winning the lottery, being struck by lightning, or, as a real-world example of what happened today, seeing snow in Austin.

I wanted to go into a metaphysical or mathematical rant on this, but I’m not quite sure how to proceed into either. Weather is, right now, at best a probability—it’s a chaotic system. The famous Butterfly Effect illustrates that a butterfly flapping its wings can cause a tempest elsewhere in the world.

From Caltech on this phenomenon [Michael Cross, 8/18/2009]:

The “Butterfly Effect” is often ascribed to Lorenz. In a paper in 1963 given to the New York Academy of Sciences he remarks:

One meteorologist remarked that if the theory were correct, one flap of a seagull’s wings would be enough to alter the course of the weather forever.

By the time of his talk at the December 1972 meeting of the American Association for the Advancement of Science in Washington, D.C. the sea gull had evolved into the more poetic butterfly – the title of his talk was:

Predictability: Does the Flap of a Butterfly’s Wings in Brazil set off a Tornado in Texas?

So, a tiny change in initial conditions can alter the overall environment in the future. Now what does the Butterfly Effect have to do with snow in Austin? Simple. It means we have no idea what could have caused the snow. It could have been a butterfly in Brazil. No—it was the one next to it. Or was it a butterfly in Mexico? Was it a butterfly at all?

Okay, I’ll admit that’s extending the facts a little bit. In fact, I’ve been tricking you. The principle that allowed snowfall in Austin is not the Butterfly, but rather, the Law of Large Numbers. This law more or less states that the average value will approach the expected value after a large number of trials. For example, say the probability of appreciable snow on any given day in Austin is 0.1%. This means that we expect one day in every thousand to have snow. But this does not mean that in any given 1000 days, there must be at least one day with snow.

In fact, we can perform a simple calculation to find the chance that there are no days with snow in a 1000-day interval. The probability that there is not snow on a given day is 99.9%. For two days in a row, we multiply this number by itself, and we end up with a number near 99.8%. For 1000 days, we simply raise 0.999, the probability, to the 1000th power; this gives 36.8%. This is the probability that in 1000 days, there is no day with snow, even though we expected one day to have snow. To find the chance that there is at least one day with snow, we subtract the probability from one; this gives 63.2%. With further calculation (using the binomial distribution, or more specifically the Poisson distribution, for those concerned with the math), we find that the probability of X days of snow in 1000 is:

Days with snow Probability
0 36.8%
1 36.8%
2 18.4%
3 6.12%
4 1.53%
5 0.304%

These numbers added together give over 99.9%, meaning the chance that there are six or more days of snow is extraordinarily small. Let us now go to a cumulative probability, which will be more useful here. This means we’re going to sum all the probabilities up to that number.

Days with snow Cumulative probability
0 36.8%
1 73.6%
2 92.0%
3 98.1%
4 99.6%
5 99.9%

What does this mean? Basically, it says there is a 36.8% chance there are zero days of snow, 73.6% chance there is at most one day of snow, 92.0% that there are at most two days of snow, etc.

Let us take the next step: say we measure over 10,000 days. From pure probability, we would expect 0.1% of those 10,000 days to have snow, or 10 days. We again take a cumulative probability:

Days with snow Cumulative probability
0 0.00452%
1 0.0497%
2 0.276%
3 1.03%
4 2.92%
5 6.70%
6 13.0%
7 22.0%
8 33.3%
9 45.8%
10 58.3%
11 69.7%
12 79.2%
13 86.5%
14 91.7%
15 95.1%
16 97.3%
17 98.6%
18 99.3%
19 99.7%
20 99.8%

Now here is the important part. We want there to be on average 1 day of snow for every 1000 days. To make the first case even considerable, we must allow a give or take of 100%. We’ll allow anywhere from 0 days of snow to 2 days of snow for every 1000. Then the probability of this is 92.0% in the 1000-day case, but 99.8% in the 10,000-day case. So in the smaller experiment, there was an 8% chance to deviate by a 100% error, but in the second case, only 0.2%. So it’s more likely to be close to the expected value as the number of trials increases.

There is another way to analyze this. We shall cut the allowed deviation in the previous analysis from 100% to 50%. Basically, we want the chance there is one out of every 1000. For the 1000-day case, this chance is 36.8%, from the very first table. For the 10,000-day case, we look at the numbers from the last table for 5 through 15 days of snow. Subtracting 6.70% from 95.1%, we obtain 88.4% chance that there are between 5 and 15 days of snow in 10,000 days. And 88.4% is much higher than 36.8%. It is then much more likely that the outcome approaches the expected value when the number of trials increases. We may try cases with 100,000 days or 1,000,000 days, and the trend will continue.

So, over a 10,000-day period, there will probably be near 10 days of snow, but in any given run of 1000 days, there is no guarantee of even a single day of snow.

In the case of Austin, we expect there to be several days of snow every decade, but we don’t know in which years they will fall. On the other hand, if I go to a college in the North…