Introducing Simpsons Parodox

‘Simpsons Parodox’ is a paradox that exists within mathematics and statistics and essentially shows the importance for presenting whole numbers when compiling analysis. It shows the importance of not coming up with casual conclusions as a data analyst.

The paradox is about how trends that appear in sub-groups do not always reflect the trend given when these sub-groups are combined into their larger group.

Let’s look at the example in the image below.

fpsyg-04-00513-g001.jpg

As you can see from the data, each of the two sub-groups (male and female) appears to show a negative trend, in that the smaller the dosage, the greater the probability of recovery.

However as you can see when adding the black dotted trend line for both groups, the trend does not appear the same, in fact it suggests the opposite, it that a greater dosage leads to a greater probability of recovery.

Now this example actually shows the importance of considering sub-groups that lie within your data. By considering gender we have a completely different conclusion that we would have had we never considered adding the gender variable.

Perhaps the most famous example of Simpsons Paradox has come as a result from the final house vote for the Civil Rights Act of 1964, which took place in the United States.

Results from Southern States.

2016-10-23_17-38-32

Results from Northern States.

2016-10-23_17-38-15

Looking at the tables above, and our two sub-groups, a casual conclusion from this data would be something along the lines of ‘Democrats were more in favour of passing Civil Rights Act than their Republican counterparts’, this conclusion being driven by the fact that a greater % of Democrats voted in favour in both the northern and southern states.

Now comes the importance of showing the pure numbers. The below image shows a unit chart of each individual that participated in the vote.

2016-10-23_17-42-50

We clearly see that whilst no Republicans voted in favour in the Southern States, a much lower proportion of representatives existed in this area. Now by adding the true values, we can now make the correct conclusion that 80% of Republicans voted ‘yea’ vs. 67% of Democrats, clearly showing that Republicans as a whole were more in favour of passing the Civil Rights Act of 1964.


To me this is hugely important as a data analyst. We need to be aware of our audience, and how they use our work for insights to influence and aid the decision making process. They use our work to make quick decisions, this means that they rely on ourselves to provide them with all the necessary information. They will use only the information that exists on the dashboard to make their decision.

This process of quick decision making, when placed hand in hand with ‘Simpsons Parodox’ to me, shows just how important it is to include pure numbers when analysing sub-groups in your data. Reducing the likelihood of them making casual conclusions and poor decisions.

Ben

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s