Dangers of Publication Bias in Data Viz

Publication bias is usually something we associate with research, a term used to describe the bias of reviewers to favour publishing papers with statistically significant or clinically favourable results than those with non-significant or unfavourable results.

In other words, how rare is it to read a research paper with no significant results.

And how rare is it to see a data visualisation with no trends/patterns.

Very is the answer.

The whole subject of publication bias is something that annoys me far beyond what it should.

It bothers me because research and data visualisations are there for key stakeholders to make more informed decisions. If published work is falsely representative of the population it is supposed to represent then the informed decisions we are supposed to be making are actually very ill-informed decisions.

Within the science community there is a highly acclaimed example of why publishing unfavourable results is important.

Albert Michelson and Edward Morley, two 19th-century physicists published their results that looked to detect the relative motion of matter through the ‘luminiferous aether’. Their results massively contradicted the results of a popular theory of the time ‘stationary aether’.

Once this paper had been published it opened up a new line of research in the subject area which ended with the development of Einstein’s special theory of relativity.

The publishing of this ‘negative result’ and its subsequent importance to the development of Einstein’s theory played a contributing role in Michelson being awarded the 1907 Nobel Prize in Physics.


Why does publication bias exist?

Institutions and members of the research community are placed in a cycle which revolves around funding and developing a research profile.

Publication bias is so ingrained in the research community that those new to the area understand fully the connotations of writing papers that show no trends or patterns.

First of all, they know the paper is very unlikely to be published at all. From this they know that without publishing material they have a lack of visibility to the community, and thus their prospects are reduced.

Research institutions and universities are again are judged on the number of papers that are published. Their funding is adjusted as a result.

Ranking profiles and university guides also place a large weight to the area of research, and again this is in most cases a ‘number of articles published’ kind of view.

For both institutions and individuals, your publication record is your public profile. As you develop a research paper you outline your hypotheses. These hypotheses will be looking to find trends, not finding no trends. So from this, who wants to publish the fact that their hypotheses are wrong, it is not an image you want to project to your peers, you want people so see the positive work you are doing, by positive I mean those that support your outlined hypotheses. Those that show a trend.

There are also pressures from the audience. Yourselves. Humans are obsessed with trends and patterns, and they will look for it even when it isn’t there.

In a research paper I looked to empirically evidence that momentum exists in tennis. My results, for the most, suggest quite the opposite. And one of my concluding sentences is as follows…

‘ This suggests that perception of momentum in sport could be a result of memory bias (Gilovich et al., 1985). Long sequences of successfully performed events in sport are more memorable than sequences that appear to be random. Subsequently observers are likely to demonstrate an overestimation of momentum effects in sport. It is suggested that humans find it difficult to accept randomness in events (Vergin, 2000) with one study demonstrating how humans themselves are poor randomizers (Wagenaar, 1972). Participants in Waganaar’s (1972) study were asked to produce a random series, with the subsequent results finding that the sequences contained too many short runs; if human perception is a belief that random sequences are so short, when they compare this to even the most mundane streak in sports they believe that this must be evidence of momentum (Vergin, 2000).’

I cannot help but think that some of this theory is transferable to what is being discussed here. We tend not to care for things that don’t show trends or patterns, and from this, of course we don’t promote and share things that we do not care for.

In other words, there is pressure from an audience to produce work with significance because we find it more interesting.

Could we say similar pressures exist in the data viz space? 100%.


A closing thought.

As always when writing articles about data visualisations I have Tableau in the back of my min. I thought of how the above may apply with Tableau, although the same goes with Power BI and Qlik Sense (amongst others in the self-service analytics space).

These platforms have been developed to allow their users to flick between chart types and data at an incredible speed. They have made it so quick in fact, that if we create a view that doesn’t show a trend, we can scrap it and replace it with one that does within seconds.

Is this a problem? Probably not, but it is just a thought.

We also have Tableaus ‘Viz of the day’, again it is rare to see content on there which does not have a trend or pattern. Should the editors start to promote more content which doesn’t show trends and patterns? That’s not for me to say.

A closing line for this blog post is simple.

No trend is still a trend.

Ben

#VizLikeAnArtist

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s