#ReViz – Battling Infectious Diseases in the 20th Century

You have probably seen me write this before, but this visualisation is one of my favourites.

The way in which it puts the message across, with such simplicity, is astounding.

2016-06-12_09-22-01.png

This visualisation was part of a series of visualisations produced in an article by the Wall Street Journal graphics department, the article is well worth a read, and discuses the positive impact of vaccinations.

I will also make a second shout out to the Project Tycho team at the University of Pittsburgh. An initiative to advance the availability of public health data in the US. If you are interested in public health data then it is well worth signing up, they have put together an amazing database.

Now, you may perceive this as something that would be easy to build in Tableau, my response would be yes and no, there were definitely some challenges on the way.

I guess the main reason you have visited this post is to see the how to guide on creating the dynamic key, by Friday this week I intend to write and link a separate blog describing this process here, but for now, the process is outlined towards the bottom of this article.


Data Structure.

After downloading the data set from the Project Tycho site here is the format with which it came in. We have the number of incidents, by states and by week/year.

First things first, we need to pivot the data. To bring all our incident numbers into a singular column. So we would be pivoting each of the state columns around our [YEAR] and [WEEK] fields.

2016-06-12_09-38-18 Pivot

The weekly level of detail was also not necessary, so the next task was to remove this, I used the summarise tool to return the number of incidents simply by year and state.

Okay, i’ll admit I didn’t see this until after building version 1 of the viz, but the incident rates in the WSJ viz are given per 100,000 members of the population. This meant some data blending was required. I found population by state data from the US Cencus and blending this with my data set before creating the incident rate by doing the calculation; ([Sum_Cases]/[Population])*100,000.

The data set I used can be found here.


Bulding the chart

So this wasn’t a simple heat map in that we are using two discrete dimensions and a measure. In order to get the constant lines on our visualisation we needed what would have traditionally been our 2nd dimension, year, to be continuous.

2016-06-13_18-59-46.png

Heat maps usually use the square mark type, however when using just one discrete dimension it is not possible to do this without getting overlapping marks.

2016-06-13_19-02-53.png

To the right side of this image you can see that the squares layer on top of each other.

However you can also see that now we have changed our year to continuous, we know have the ability to select a constant line.

One way in which we can prevent this overlapping is to change the mark type to ‘Gantt Bar’.

We can then create a calculated field which will be the average of 1, AVG(1). In reality it can be any number, providing it assigns the same value to each of our state/year combinations. Which may not be the case if you use the SUM function.

By then dropping this onto the size shelf you see we get our heat map, with the added ability to add features from the analytics pane.

The State(Copy) pill you see on the rows shelf was created to convert the names from capitals to a title case.

2016-06-13_19-10-00

It was then a simple case of adding our two constant lines which represents the points in time when the vaccine was firstly introduced, and secondly reintroduced with an improved formula.


Creating a custom colour palette

You may have noticed that the colour scale used in this visualisation is not a native tableau colour palette, however with Tableau you have the ability to create your own, I knew this was a feature of Tableau, but It was something that I had yet to have a go at.

I followed their great knowledge base article on the subject.

There are three types of colour palette used in Tableau.

  1. a categorical colour palette – ‘A categorical colour palette contains several distinct colours that can be assigned to discrete dimension members.’
  2. a sequential colour palette – ‘Another type of palette is the sequential colour palette. Typically, this type of palette shows a single colour, varying in intensity. This type of colour palette is used for continuous fields, typically for measures.’
  3. a custom diverging colour palette – shows two ranges of values using color intensity to show the magnitude of the number and the actual color to show which range the number is from. Diverging palettes are most commonly used to show the difference between positive and negative numbers.

Here I am just going to outline the reasons why I created the code (see below), rather than take you through the step by step guide on how to, when the Tableau article is so great.

<color-palette name=”Measles Viz Colour Scale” type=”ordered-diverging” >
<color>#ddebf8</color>
<color>#0c9dc5</color>
<color>#44ae57</color>
<color>#fcd43d</color>
<color>#e59e26</color>
<color>#e48925</color>
<color>#e95034</color>
<color>#da4b31</color>
<color>#cf472e</color>
</color-palette>

Note, the XML tag for color is spelt the American (wrong) way, so make sure you are using American English when writing your XML code.

I simply wrote this code in notepad.

A diverging colour palette was the type I chose because it is about blending colours (although usually two/three, in my case 9), and using these colours to then show the intensity of events, in my case the number of measles related incidences.

You can also see that I weighted my colour palette, as in the original visualisation. In order to do this I simply added more of variants of that colour to the palette. Of course they are not identical colours because we don’t want a lower number of incidents to have the same colour as the higher number of incidents, so I therefor made these a couple of shades lighter.

2016-06-13_20-38-16


Building the dynamic key.

Okay, i’ll be honest, it took me a fair few attempts to perfect the method for building the dynamic legend. I knew this would be challenging when I looked at doing this as a #ReViz project and this was something that pushed me towards doing it.

‘Stop mambling, blah blah blah, tell us the how’

Part 1 – Creating the colour legend

  1. Duplicate and then union this duplicate to your original data set. I used alteryx for this step, but you could try using Tableau’s new union feature.
  2. Create an additional column, in  my case [RowCount].
  3. Assign 1 as the [RowCount] value for the original data set
  4. Assign 2 as the [RowCount] value for the duplicated data set

    2016-06-12_10-25-29

  5. Create a calculated field in Tableau that converts your [RowCount] values into the min/max values of your legend. In my case, 0 and 3000.if [RowCount] = 1 then 0 else 3000 ENDChanged the type to discrete.
  6. Create bins based on this field. The bin size will be 1.

    bins

  7. Create a  calculated field, simply; index()
  8. Drag your index() calculated field to columns
  9. Drag your bins onto colour
  10. Drag your index() calculated field to colour, and change the compute using to the bins.
  11. Make the colour scale match that of your chart
  12. Change the mark type to Bar (if it isn’t already)Part 2 – Creating the arrow
  13. Drag the pill used to colour your scale on your chart onto the columns shelf.
  14. Remove everything from the marks card for this 2nd axis.
  15. Change the mark type to shape
  16. Select the “▼” symbol
  17. Dual axis your chart.Part 3 – Adding your interactivity
  18. Create your dashboard
  19. Apply any necessary action filters to create the interactivity between the chart and
    dynamic key.

    2016-06-12_11-01-38

  20. One final change I made was that I did not want to show the arrow unless the filter had been applied. In order to do this I created a calculated field which I placed on my 2nd axis instead of the field used in the chart.if COUNTD([State]) = 1 then avg([Per 1000]) END.This calculated field means that the value will only be returned if there is only one state selected. I.e. if someone is hovering over a value on the chart.If they are not hovering over the chart, then all states are in the view so the COUNTD([State]) will equal 50 and thus the value will be NULL and the arrow will not show.

After this it was a case of formatting the dashboard into the style which you see in the Tableau Public visualisation; I won’t bore you with that.

Ben

#VizLikeAnArtist

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s