‘Simpsons Parodox’ is a paradox that exists within mathematics and statistics and essentially shows the importance for presenting whole numbers when compiling analysis. It shows the importance of not coming up with casual conclusions as a data analyst.
The paradox is about how trends that appear in sub-groups do not always reflect the trend given when these sub-groups are combined into their larger group.
Let’s look at the example in the image below.
As you can see from the data, each of the two sub-groups (male and female) appears to show a negative trend, in that the smaller the dosage, the greater the probability of recovery.
However as you can see when adding the black dotted trend line for both groups, the trend does not appear the same, in fact it suggests the opposite, it that a greater dosage leads to a greater probability of recovery.
Now this example actually shows the importance of considering sub-groups that lie within your data. By considering gender we have a completely different conclusion that we would have had we never considered adding the gender variable.
Perhaps the most famous example of Simpsons Paradox has come as a result from the final house vote for the Civil Rights Act of 1964, which took place in the United States.
Results from Southern States.
Results from Northern States.
Looking at the tables above, and our two sub-groups, a casual conclusion from this data would be something along the lines of ‘Democrats were more in favour of passing Civil Rights Act than their Republican counterparts’, this conclusion being driven by the fact that a greater % of Democrats voted in favour in both the northern and southern states.
Now comes the importance of showing the pure numbers. The below image shows a unit chart of each individual that participated in the vote.
We clearly see that whilst no Republicans voted in favour in the Southern States, a much lower proportion of representatives existed in this area. Now by adding the true values, we can now make the correct conclusion that 80% of Republicans voted ‘yea’ vs. 67% of Democrats, clearly showing that Republicans as a whole were more in favour of passing the Civil Rights Act of 1964.
To me this is hugely important as a data analyst. We need to be aware of our audience, and how they use our work for insights to influence and aid the decision making process. They use our work to make quick decisions, this means that they rely on ourselves to provide them with all the necessary information. They will use only the information that exists on the dashboard to make their decision.
This process of quick decision making, when placed hand in hand with ‘Simpsons Parodox’ to me, shows just how important it is to include pure numbers when analysing sub-groups in your data. Reducing the likelihood of them making casual conclusions and poor decisions.