Yesterday evening I recorded the 2nd episode to #FastFriday. With Andy Kriebel being away on holiday this week, Alteryx Ace and Tableau Zen Chris Love stepped in to provide Ravi, Simona and I with a dataset, and boy did he produce a good one.
Here is how I spent my 6 minutes during episode 2.
To start Chris provided the dataset in a TDE form so I could simply double click on the file in the explorer and it would spring up in Tableau.
After a slight moment of hesitation, trying to remember my reflections from last week, instead of discovering the data in the data source view, I dived straight into creating a new sheet, allowing me to explore the data through the data pane.
Okay, within seconds I new what this data was, I saw a similar data set numerous times when trying to make an informed decision on what university to go to.
It was university league table data, and immediately I had (what I thought was) a great idea. I would create a scatter plot to allow my audience to see which items correlated strongest with a university’s ranking. At the time I decided that a good way to present this would be to use a parameter that allowed the users to switch between the different characteristics and compare them against rankings.
AT THE TIME. I will explain later why in hindsight I think this was not a great idea.
Building the visualisation.
If you haven’t seen parameters used in this way then this is a great way of allowing your users to explore the data as they wish. In my example I only used the parameter to determine the value of one of my axis, but they also work with two axes.
This blog from The Data School head coach Andy Kriebel outlines the how to very well. He also says in under 5 minutes (sorry Andy, it took me 6).
I am quite comfortable in the use of building and controlling parameters and have implemented this trick on enough projects to know what I needed to do.
After around 20 seconds of forgetting how to create a parameter, including opening a new sheet as if that would make a difference, I realised where I was going wrong and went to the top of the data pane, used the drop down arrow and selected ‘Create Parameter…’.
At this point I made the decision about which characteristics I wanted to include. I decided to use them all bar ‘Score – Overall’ making the assumption that this determined the final ranks. In reality I should have tested this theory and it probably would have taken me a total of 5 seconds.
Making assumptions under time pressure? This seems to be a bit dangerous but I think that is a blog post for another time.
Once I had created my parameter I then needed to create a calculated field to control the parameters. I used an if statement, returning the values for that column that was selected in the parameter.
IF [Select Y Axis] = “Citations” then [Score Citations]
ELSEIF [Select Y Axis] = “Industry Income” then [Score Industry Income]
ELSEIF [Select Y Axis] = “Internation Outlook” then [Score Internation Outlook]
ELSEIF [Select Y Axis] = “Research” then [Score Research]
ELSE [Score Teaching]
This is not the most efficient way to build a statement like this however. I should have used a combination of a WHEN and CASE statement in a similar fashion to Andys blog post. I think this is something to do with my background, having worked heavily with if statements in Excel I am very used to their structure and how they work. In contrast I have very little experience with WHEN and CASE statements so when placed under time pressure as most people do, they will look to work with what they are most comfortable with. Perhaps in future I should challenge myself to use the functions that I am not so familiar with.
The WHEN and CASE statements in this example would look as follows.
CASE [Select Y Axis]
WHEN “Citations” then [Score Citations]
WHEN “Industry Income” then [Score Industry Income]
WHEN “Internation Outlook” then [Score Internation Outlook]
WHEN “Research” then [Score Research]
WHEN “Teaching” then [Score Teaching]
So we just state the aspect that we are evaluating against just once, before creating a set of criteria and outcomes.
It is then a case of showing the parameter control in the view, and then building the scatter plot, by dragging rank to rows, [Select Y Axis] to columns and putting the institution on detail.
You may also see that I bought in the institutions URL, this was done with the intention of creating a URL action, to allow the users easy access to additional information outside of that in the data source.
Unfortunately in my 6 minute visualisation I did not have the time to bring this in.
Before building my dashboard there was one more trick that I bought in which was to create a dynamic axis title.
To do this you simply drag the parameter onto the rows/columns shelf alongside your parameter control pill, before then hiding the original axis title for your parameter control pill. This blog by Interworks outlines the process in more detail.
I then polished off my visualisation by creating a dashboard which followed a similar structure to the previous episodes viz.
One thing I always ensure I do is to make the size of my filters or parameters to one that is realistic of the labels inside of it. One of my biggest annoyances is when I see filter/parameter bars extend well beyond what is necessary.
Differences to week 1.
During this weeks 6 minute viz I saw a number of my habits changed from that of the 1st episode, and I feel like they were positive changes in terms of creating content in a short time period.
First of all, as noted at the beginning, I went straight into exploring the data by opening a sheet. This allowed me to bring content into the view if needed, and also meant that I could see the entire list of columns in one view. This process allowed me to make much quicker and clear decisions on what the content was, and how I could then create an idea from this.
A second was something that I picked up from Simona, which is her organisational skills, how she renamed her calculated fields. This is something that I bought into my process this time, naming my parameters and calculated fields as I went along.
I also left the building of the dashboard until last. I realised that ensuring the content is done well should take priority over the feel and look of the dashboard, of course the latter remains hugely important in terms of presenting the information to an audience.
So I am now going to outline why I feel I made a bad viz choice by allowing the user to identify which characteristics correlates strongest with an institutions rank.
The main issue for me is that the users cannot see the correlation plots against each other. By the time they have used the parameter to change the measure in the view they are likely to have forgotten the pattern of the previous. A much better way would be to use small multiples which would allow them too see each of the measures side by side.
This is something that I have implemented in my 2nd version of the viz.
You will also see that I added the p and R-Squared value to help my users interpret the scatter plots. This is not something I have done before, and it is something that should not be used all the time, I feel including such detail is dangerous in that it may alienate your audience, not everyone knows how to interpret this information and they would much rather let the chart tell the story.
I also made an error in that I did not notice the year field because it was in the measures rather than dimensions pane, this is because tableau saw a number and automatically saw it as something that should be aggregated, rather than a dimension and way to slice your data. Mind their is nothing wrong with including all years, it can reduce the likelihood of identifying stories if the data has been inconsistent over time. In the second viz I only used data from 2015.
One final design choice I made was to move the parameter controlled axis to the columns shelf. Remembering that this is the dynamic header I thought this best by labelled horizontally so my users can easily interpret what each charts represents, meaning ranking, which is a static measure on each chart, is placed rotated on the y-axis.
In other words I prioritised the dynamic variables to the axis which the audience can more easily interpret.
To see the final visualisation I created click here.