R and Tableau. Creating convex hulls.

In this post I will be highlighting how we can use the R integration available in Tableau to build ‘convex hulls’ overlaying scatter plots, as shown in the sample image below.

2017-01-12_21-24-17.png

I want to start by outlining the additional insight that can be gained be using convex hulls, and what the term convex hull refers too.

By definition a convex hull represents the smallest convex (distance) which contains all given points.

220px-ConvexHull.svg.png

Transferring this into insight is quite simple. They allow us to better identify the range/extent of a given set of points.

The image below shows an example from this research paper which used convex hull to track the ‘spatial extent’ of an outbreak in animal epidemics over time.

Picture1.jpg


The R script that builds the above visualisation is as follows… (and of course requires an R serve connection).

2017-01-13_13-56-10

But what does this all actually mean?

Lets start with the chull() function. The chull() function (of course standing for convex hull), essentially returns the list of points with which our convex hull lies (which points represent the edges of our convex hull). These points are returned in clockwise order, essentially giving us a path with which our edge points are connected. For this function you pass in two arrays. Your list of x coordinates and your list of y coordinates, as given by .arg1 and .arg2.

This function is the driving force behind the visualisation and provides us with the knowledge of how to connect the dots.

When using Tableau alongside this function, there lies a problem, we only return values for some, not all of our points. As part of Tableau’s integration with R it is important that the number of values returned in the list match the number of marks on the Tableau visualisation.

Lets take office supplies for instance. Under this category there are 9 sub-categories. However not all of these 9 sub-categories will have a path assigned to draw the convex hull. As a result we will receive an error should we attempt to run the script as follows…

2017-01-12_21-49-05

2017-01-12_21-49-49

Because less values are being returned than their are marks on the viz (in this case 6 of our 9 points have returned a path id (or result) from our script).

Whereas if we run the same script, with the category filtered to furniture, we do in fact receive no error because all the sub-categories have a path ID returned from the script.

2017-01-12_21-51-14

2017-01-12_21-51-53.png

For this reason the script needed to be developed beyond this simple argument. We somehow needed to add null values for our R script to pass back into Tableau.

So lets look at how we can do that.


What I thought is that by joining this list of values, alongside a list of the values that went into the table, then we can create a new table which encompasses these null values into the list.

Something like.. (for clarity // represents notes and bold text represents actual script)

// initial dataset

2017-01-12_22-19-45

values

// run our initial chull function returning a list of which sub-categories create the edge points for the convex hull. ‘.arg1’ represents our X value (as specified at the end of the script), and ‘.arg2’ represents our Y value (again this should be assigned at the end of the script).

// {3, 1, 9, 4, 6, 5}

valuesdf <-data.frame(SubCategoryID = unlist(values))

// takes the result of our chull() function and places them into a one column table

2017-01-12_22-22-10

valuesdf$PathID

// generate a row ID column for this table

2017-01-13_07-13-16

list

// create a list containing a list of the marks that were passed into the R script. ‘.arg3’ represents our level of detail in the view (in my case Sub-Category), which should be specified at the end of our script.

listdf

// convert this list into a table

2017-01-13_07-17-50

listdf$SubCategoryID 

// generate a row ID column for this table

2017-01-13_07-19-26.png

finaltable by.y = ‘SubCategoryID’, all.y=TRUE)

// perform a right outer join on these tables

2017-01-13_07-19-262017-01-13_07-13-16

2017-01-13_07-21-05

finaltable$PathID

// return our new list padded with null values where appropriate

// {2, NULL, 1, 4, 6, 5, NULL, NULL, 3}


Now we can take our table calculation and build our view in Tableau.

Lets firstly build out our scatter plot by dragging SUM([Sales]) to rows, Sum([Profit]) to columns and [Sub-Category] on detail. Filtering to only those within the ‘Office Supplies’ Category.

2017-01-13_07-46-57

Now we can build out our Convex Hull, using our ‘Path ID’ field to build out a polygon.

But before we go further you must ensure your table calculation is being computed along the dimension passed into the R script, in this case [Sub-Category].

Drag SUM([Profit]) onto columns for a second time. Change the mark type of our 2nd Axis to ‘Polygon’. Drag our R intergrated calculation, titled ‘Convex Hull’ in my case, onto the Path shelf.

You will see we quite don’t get what we want. This is because the NULL values are having an affect on how our Path is drawing (essentially it is joining paths 1-6 (or 1-n) and then joining the NULL values too.

So we need to exclude these values, but without using the filter shelf (because this will of course remove these sub-categories from our scatter plot also). In order to do this we can use the ‘hide’ function, which can be accessed by dragging our [ConvexHull] calculation onto the colour shelf, before navigating to the colour legend, right clicking on the Null value and clicking ‘hide’.

2017-01-13_07-57-13

Now that we have hidden our values we can remove our ConvexHull calculation from the colour shelf (the values will stay hidden).

It is then simply a case of making the chart a dual axis chart and implementing some minor formatting (for example making the Convex Hull slightly transparent, as to see the points that lay within it.

2017-01-13_08-02-23.png


We can develop this further by removing our category filter from colour and placing this on the colour shelf on both our first and secondary axis (though be sure the dimensions are placed above the level of detail at which the table calculation is working on (in my case above Sub-Category)).

2017-01-13_13-33-30

The workbook used in this demo can be found here (you can not host a Tableau Workbook that contains external scripts upon Tableau Public), please download it and play away (again you will need a valid R serve connection for this to work)!

Ben

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s