Colour encoding in preparation.

In Tableau we may have scenarios where we have a list of dimension members, and this list of dimension members may not be exhaustive (complete). However, we may know the other dimension members that may exist (but currently don’t).

An example of this would be on a critical risk dashboard. You have data set up that returns a ‘FALSE’ value if there is a fatal error within your system. It is very unlikely, that this fatal error will exist during the dashboard design phase, and therefore you can only ever colour encode the value ‘TRUE’ (meaning no fatal error).

Here I will outline two methods of ‘encoding in preparation’, for such dimension members.

In these examples I will set the scenario that I have a statement that at present only returns the value FALSE. I want to encode it in case TRUE values appear in the future.


Method 1. Directly in Tableau.

  1. Create a calculated field. This calculated field will be the one in which your final calculation, or copy of the dimension will exist.
  2. Within this calculation return, one by one, the values that you predict will exist in the future.

    In my case I have used the formula 1<2 to return the value TRUE

  3. Drag this calculation to the colour shelf
  4. Encode the colour as appropriate

    2017-01-23_22-07-43

  5. If you have more dimension members that you believe will exist in the future, loop through steps 2-5.
  6. Create your actual calculation.
  7. Check out your encoded values once they are ‘active’.

    2017-01-23_22-10-55


Method 2. Hacking the XML.

  1. Place your dimension or calculated field upon the colour shelf.
  2. Encode the variables with a colour palette (not automatic).

    2017-01-23_20-11-29.png

  3. Save your workbook as a .twb file, before closing and opening in notepad.
  4. Identify the location where the current dimension members have been assigned within the XML

    2017-01-23_20-14-00.png

  5. Build out the encoding XML for the missing dimension members and union it underneath the current lines, but before the closing </encoding> tag.

    <Map to=’#HEXCODEHERE’>
    <Bucket>DIMENSIONMEMBERHERE</bucket>
    </map>

    2017-01-23_20-37-37.png

  6. Save the workbook
  7. Open your workbook and wait for the dimension member to exist

    2017-01-23_20-39-06.png


And that’s how you can ‘colour encode in preparation’.

Ben

Advertisements

Text Mining in Alteryx. Parsing hashtags from a list of tweets.

As part of this weeks #MakeoverMonday series, Tableau zen Andy Kriebel, and his partner in crime Eva Murray posted a data set which listed a series of tweets with which president Donald Trump has either tweeted himself, or retweeted, over the last 6 years, during his run up to taking the White House hot seat.

Immediately I saw 3 ways of analysing this data.

  1. Time series analysis
  2. Text analysis
  3. Time series and text analysis

The notion of text analytics is something I, and most of us, find fascinating, but it is notoriously difficult to get real insight from.

In the visualisation I created below I went for something quite simple. I looked to analyse hashtags used within those tweets.

2017-01-20_10-40-51.png

In this post I will show you, how in two Alteryx tools, and one Regex formula I managed to create a list of all hashtags used within the data.


2017-01-20_07-00-04

  1. Input your data (obviously), the data used in this project can be found here.
  2. Use a regex tool to tokenize and parse the text column, by looking for hashtags and from this point taking anything between the hashtag and the next space.#[[:alnum:]]+# = look for hashtag
    [[:alnum:]] = look for alphanumeric characters
    + = look for any number of consecutive alphanumeric characters

    2017-01-20_07-50-22

    You will also note that I used the split to rows tool. This is because a tweet can contain any number of hashtags. If we wished to have a row per tweet we can always bring out data back into that structure using the crosstab tool.

    We can now run the workflow and…

    2017-01-20_08-39-57.png


There was some further data preparation needed to get the data into the correct shape to build the visualisation above and the finished workflow can be found here.

2017-01-20_10-20-34.png

Safe.

Ben

R and Tableau. Writing your entire data source to an R data-frame.

In isolation this may seem like a strange thing to do, but when placed hand in hand with more advanced analytics it can be hugely powerful; one example of which I shall post next week, and discuss briefly towards the end of this article.

Essentially it gives us the power to manipulate our data in any way we wish before passing the data back into Tableau (as a second data source).


The structure of the script and set-up is fairly simple, and is outlined below (note this trick relies on having a RowID type field within your data set). We can use this to return any number of columns in our data source at a row level.

1st things first, as always, initiate your R serve connection.

Now comes writing the script…

SCRIPT_REAL(“

// Open our calculation

RowID <- data.frame(RowID = unlist(.arg1))

// Create a data frame that consists of our list of RowIDs

ColumnOne <- data.frame(ColumnOne = unlist(.arg2))
ColumnOne$RowID <- 1:nrow(ColumnOne)

// Create a data frame that consists of a list of all values

ColumnTwo <- data.frame(ColumnTwo = unlist(.arg3))
ColumnTwo$RowID <- 1:nrow(ColumnTwo)

// repeat script for each column

ColumnN <- data.frame(ColumnN = unlist(.argN+1))
ColumnN$RowID <- 1:nrow(ColumnN)

data <- Reduce(function(x, y) merge(x,y, all=TRUE),list(RowID, ColumnOne, ColumnTwo, ColumnN,…))

// Merge each of the individual data frames together on a common field.

write.csv(data,’TableauDataSource.csv’)

// As a test write our data frame to a .csv file to inspect

1

// Return the value 1 for every mark in our data as we must return a value from our calculation

“,attr([RowID]),attr([ColumnOne]),attr([ColumnTwo]),attr([ColumnN])…)

// Close script and declare the columns which relate to the different arguments passed into the script


Before we close the calculation we should be sure to change our script so it is computed along ‘Row ID’, meaning we will receive every single row without our data set, if we were to compute along a different variable then we would receive an aggregated version of our data set.

Now in order for our script to execute we need to build a view with the script in. You can do this by simply placing your RowID column on detail, before bringing our script onto detail also. The workbook can be downloaded from this link.

2017-01-18_19-18-05.png

The result…

2017-01-19_18-24-10.png


So what can this offer us? Well alongside a couple of colleagues of mine we have been working on creating a ‘loop’ between Tableau, R, SQL and Tableau.

By being able to write our entire data source to an R data frame we then have the power to manipulate our data in any way, shape or form. We can then write this new data structure back into Tableau (via SQL), as a second data source, which can be queried from within Tableau using filters and parameters, I like to label it as ‘filterable data prep’.

2017-01-18_15-31-45.png

An example is highlighted in the image below. Here I have started with Superstore as my base data structure. I have then written this structure into an R data frame before manipulating it in a way that performs and writes a Market Basket analysis of our initial data set, which can be filtered using any of the columns and values that may exist within our initial data set (in the example below I have excluded ‘technology’ products for instance.

Tableau - R - SQL - Tableau Loop.png

I will write and give an example of this loop in my next blog which I hope will add further clarity to how powerful this can become.

Ben

Building a ‘Line Dot Plot’ in Tableau.

Yesterday I published this tweet…

2017-01-17_21-14-32.png

I wanted to gain thoughts on using a ‘line dot plot’ as a comparative charting method (yes it has been done before, I am not claiming to be the inventor).

In the example given by me I used it to compare the current month sales against values from previous months.

The line is given  as a scale between the maximum and minimum values. For example, if the min month sales was 10, and the max month sales was 110, and the current month sales was 60. The dot would be half way across the line.

A second addition was the colour encoding, which is based on percentiles, if the current month value is within the lowest 25 % of all monthly values, it is coloured red, if it is above the lowest 25% of values, but not within the top 25% of monthly values, it is encoded in amber. If it is within the top 25% of values it is given the colour green.

These two methods combined allow the user to gain two different angles of understanding from one simple plot.

I will use this blog to outline how I built the chart type in Tableau.

The workbook for this visualisation can be found here.


First of all we need to write a calculation that determaines the value we want to represent as the dot.

In my case I wanted to use the current month sales, this calculation can be given as:

current-month-sales

Now we need to acknowledge the values that this month should be compared against, so we need to calculate our minimum and maximum values, which can be calculated as…

min-sales

2017-01-17_21-38-54

Once we have these values we need to calculate a ‘% of Total’ between our current month value in comparison to our min and max values.

2017-01-17_21-44-02.png

Now we have our current month translated to a % of max we can start building out our view.


Starting with our initial data table at the level of sub-category

2017-01-17_21-45-18.png

Once in this state we should drag our % calculate to columns, this will (likely) give us a bar chart, we should change the mark type to circles, and lets colour them green for now.

2017-01-17_21-47-29.png

Now we need to bring on our scale, because we have translated our values to be as a % of total, we can simply create a calculation on the columns shelf that is equal to the average of 1, given as AVG(1).

We should change the mark type of our second axis to bar, change the it to a lighter colour so it blends into the background (allowing our key information to stand out), and make the bars thinner so they look more like a line than a bar.

2017-01-17_21-51-06.png

Now it’s a case of changing it to a dual axis chart (by right clicking on our second green pill and selecting dual axis), sychronising these axes (by right clicking on the avg(1) axis and clicking ‘synchronize axis’) and reformatting accordingly. In order to bring our ‘dots’ to the front we should move our % pill to the right of our avg(1) pill.

2017-01-17_21-55-05.png

Now we have our ‘Line Dot Plot’.


Finally it is just a case of calculating the colour of which we should encode our points with, in order to do this then we need to do some relatively simple percentile calculations and compare our current month values against these.

2017-01-17_22-09-07.png

Any questions or issues please reach out and ask!

Ben

Writing to the same Excel file in Alteryx.

One common output style from an Alteryx workflow may be to write different structured outputs to the same Excel file upon different worksheets.

This can be problematic to say the least.

The reason for this is that once Alteryx opens the excel file to upload the 1st sheet, Excel will lock the file making it un-editable by other processes.

There are two ways which we can get around this, one is through using the ‘Parallel Block Until Done’ tool available within the Crew Macro pack.

I’ve found problems with this, which is why I have highlighted a 2nd way, through the R tool and the ‘openxlsx’ library.


Using the Parallel Block Until Done tool.

The Crew Macro pack has been designed by Alteryx employee Adam Riley. There are a tonne of great tools within the pack, and best of all they are well supported and documented, and regularly patched.

The Parallel Block Until Done tool is a development of the standard block until done tool that comes with Alteryx. It is designed with the purpose of writing different streams to different outputs in a set order. In that it outputs all records from input 1 to output 1, before writing the records from input 2 to output 2.

A great description of the Parallel Block Until Done tool can be found on Adam Rileys chaosreignswithin site.

The workflow below (which is also attached here), highlights how you should look to configure your workflow for writing to the same file but different sheets.

parallel.png


The second option, as mentioned previously, makes use of the R tool in Alteryx and is a little more complex to set up (however, I have never experienced an error when writing to Excel in this way).

1st things first, we need to ensure we have the Rtools package installed on our local machine, if not, you can download it from here.

This is because the second package we need, ‘openxlsx’ requires the zip execution file that exists within the Rtools package.

We now must also download the ‘openxlsx’ library and copy it into our default library location for Alteryx which is something like: C:\Program Files\Alteryx\R-3.2.3\library.

That or we can install it directly within the R code.

Next we need to create a stream for each individual data table we wish to write into the R tool.

r

Now we need to create our R script.

1st things first, we need to declare the location of our ZIP execution file within the Rtools package.

Sys.setenv(R_ZIPCMD= “C:/Rtools/bin/zip”)

Now we need to open the ‘openxlsx’ library.

library(openxlsx)

Next I will declare a short name for each of my datasets.

Orders <- read.Alteryx(“#1″, mode=”data.frame”)
People <- read.Alteryx(“#2″, mode=”data.frame”)
Returns <- read.Alteryx(“#3″, mode=”data.frame”)

I will now create a list of the datasets I wish to pass into my excel file, and their sheet names.

List_of_datasets <- list(“Orders” = Orders,”People” = People,”Returns” = Returns)

Note the value in quotation marks represents the sheet name, whilst the value after the = represents the data frame you will be passing to that sheet.

Finally we will use the writexlsx() function to write our data to a specific excel file.

write.xlsx(List_of_datasets,file=”C:\\Users\\Moss Ben\\Documents\\My Tableau Repository\\Datasources\\10.0\\en_US-EU\\Sample – Superstore New.xlsx”)

direct.png

Now once you run the workflow successfully you will see that a new excel document has been located at the given file path.

excel2

excel.png

One thing to note with this second method is that special characters will automatically be replaced with full stops.

The workflow for this second method can be found here.


And there you have it. Two ways to output data to multiple sheets through Alteryx.

Ben

R and Tableau. Creating convex hulls.

In this post I will be highlighting how we can use the R integration available in Tableau to build ‘convex hulls’ overlaying scatter plots, as shown in the sample image below.

2017-01-12_21-24-17.png

I want to start by outlining the additional insight that can be gained be using convex hulls, and what the term convex hull refers too.

By definition a convex hull represents the smallest convex (distance) which contains all given points.

220px-ConvexHull.svg.png

Transferring this into insight is quite simple. They allow us to better identify the range/extent of a given set of points.

The image below shows an example from this research paper which used convex hull to track the ‘spatial extent’ of an outbreak in animal epidemics over time.

Picture1.jpg


The R script that builds the above visualisation is as follows… (and of course requires an R serve connection).

2017-01-13_13-56-10

But what does this all actually mean?

Lets start with the chull() function. The chull() function (of course standing for convex hull), essentially returns the list of points with which our convex hull lies (which points represent the edges of our convex hull). These points are returned in clockwise order, essentially giving us a path with which our edge points are connected. For this function you pass in two arrays. Your list of x coordinates and your list of y coordinates, as given by .arg1 and .arg2.

This function is the driving force behind the visualisation and provides us with the knowledge of how to connect the dots.

When using Tableau alongside this function, there lies a problem, we only return values for some, not all of our points. As part of Tableau’s integration with R it is important that the number of values returned in the list match the number of marks on the Tableau visualisation.

Lets take office supplies for instance. Under this category there are 9 sub-categories. However not all of these 9 sub-categories will have a path assigned to draw the convex hull. As a result we will receive an error should we attempt to run the script as follows…

2017-01-12_21-49-05

2017-01-12_21-49-49

Because less values are being returned than their are marks on the viz (in this case 6 of our 9 points have returned a path id (or result) from our script).

Whereas if we run the same script, with the category filtered to furniture, we do in fact receive no error because all the sub-categories have a path ID returned from the script.

2017-01-12_21-51-14

2017-01-12_21-51-53.png

For this reason the script needed to be developed beyond this simple argument. We somehow needed to add null values for our R script to pass back into Tableau.

So lets look at how we can do that.


What I thought is that by joining this list of values, alongside a list of the values that went into the table, then we can create a new table which encompasses these null values into the list.

Something like.. (for clarity // represents notes and bold text represents actual script)

// initial dataset

2017-01-12_22-19-45

values

// run our initial chull function returning a list of which sub-categories create the edge points for the convex hull. ‘.arg1’ represents our X value (as specified at the end of the script), and ‘.arg2’ represents our Y value (again this should be assigned at the end of the script).

// {3, 1, 9, 4, 6, 5}

valuesdf <-data.frame(SubCategoryID = unlist(values))

// takes the result of our chull() function and places them into a one column table

2017-01-12_22-22-10

valuesdf$PathID

// generate a row ID column for this table

2017-01-13_07-13-16

list

// create a list containing a list of the marks that were passed into the R script. ‘.arg3’ represents our level of detail in the view (in my case Sub-Category), which should be specified at the end of our script.

listdf

// convert this list into a table

2017-01-13_07-17-50

listdf$SubCategoryID 

// generate a row ID column for this table

2017-01-13_07-19-26.png

finaltable by.y = ‘SubCategoryID’, all.y=TRUE)

// perform a right outer join on these tables

2017-01-13_07-19-262017-01-13_07-13-16

2017-01-13_07-21-05

finaltable$PathID

// return our new list padded with null values where appropriate

// {2, NULL, 1, 4, 6, 5, NULL, NULL, 3}


Now we can take our table calculation and build our view in Tableau.

Lets firstly build out our scatter plot by dragging SUM([Sales]) to rows, Sum([Profit]) to columns and [Sub-Category] on detail. Filtering to only those within the ‘Office Supplies’ Category.

2017-01-13_07-46-57

Now we can build out our Convex Hull, using our ‘Path ID’ field to build out a polygon.

But before we go further you must ensure your table calculation is being computed along the dimension passed into the R script, in this case [Sub-Category].

Drag SUM([Profit]) onto columns for a second time. Change the mark type of our 2nd Axis to ‘Polygon’. Drag our R intergrated calculation, titled ‘Convex Hull’ in my case, onto the Path shelf.

You will see we quite don’t get what we want. This is because the NULL values are having an affect on how our Path is drawing (essentially it is joining paths 1-6 (or 1-n) and then joining the NULL values too.

So we need to exclude these values, but without using the filter shelf (because this will of course remove these sub-categories from our scatter plot also). In order to do this we can use the ‘hide’ function, which can be accessed by dragging our [ConvexHull] calculation onto the colour shelf, before navigating to the colour legend, right clicking on the Null value and clicking ‘hide’.

2017-01-13_07-57-13

Now that we have hidden our values we can remove our ConvexHull calculation from the colour shelf (the values will stay hidden).

It is then simply a case of making the chart a dual axis chart and implementing some minor formatting (for example making the Convex Hull slightly transparent, as to see the points that lay within it.

2017-01-13_08-02-23.png


We can develop this further by removing our category filter from colour and placing this on the colour shelf on both our first and secondary axis (though be sure the dimensions are placed above the level of detail at which the table calculation is working on (in my case above Sub-Category)).

2017-01-13_13-33-30

The workbook used in this demo can be found here (you can not host a Tableau Workbook that contains external scripts upon Tableau Public), please download it and play away (again you will need a valid R serve connection for this to work)!

Ben

Two ways to implement ‘Show All’ and ‘Top N’

In this post i’m going to show quickly two methods which I used to create a ‘show all’ option alongside a ‘top n’ option.

Before we start, the visualisations produced in this blog can be found here, so feel free to download them and break them down yourself.


Option 1. Two parameters, one set and one calculated field.

  1. Create your first parameter. This will be our ‘Top N Parameter’ it will be of either float or integer type and should have all the ‘n’ values which you wish your user to select from.
  2. Create your set. This will be based on your dimension that you wish to view the ‘Top N’ of.

    Navigate to the ‘Top’ tab on the create set window, then select ‘By field:’ from the drop down of the numeric value, select the parameter you have just created; and of course select your method for determining your ‘Top N’.

    set

  3. Create your 2nd parameter. This parameter will be used to determine whether the or not we wish to show the ‘Top N’ or we wish to show all. It will be of string type and contain these two text strings.

    2017-01-12_17-30-04

  4. Create a calculated field which determines whether we should use all the values, or only those that meet the criteria given in the set based on the selection from our 2nd parameter.

    2017-01-12_17-36-17

  5. Drag this calculated field onto the filter shelf and select to only include those where the result of the calculated field = True

Done.

2017-01-12_17-37-42


Option 2. One parameter and three calculated fields.

I developed this second idea knowing that the first isn’t the best user experience, ideally you would want your user to be able to select the ‘All’ value from within the Top N drop down.

But this causes a problem, our parameter type must be a string.

Then this causes a problem because our set cannot work off a string field.

So here’s how…

1. Create your parameter. This will be our ‘Top N or Show All’ parameter it will be of string type, and you will have to list the N values which you wish to include, alongside the ‘Show all’ value.

2017-01-12_17-41-36

2. Create a calculated field which simply equals the parameter value

2017-01-12_17-45-10

3. Convert the field type of this column to be of a numeric type (this will just create a NULL value if ‘Show All’ is selected within the parameter).

4. Create  a second calculated field. This will be used to determine whether the value is within the ‘Top N’.

2017-01-12_17-53-11

5. Create your  third calculated field. This will be placed on the filter shelf and will be used to determine whether to return all values or simply those that meet the conditions of the second calculated field.

2017-01-12_17-51-18

6. Drag this calculated field onto the filter shelf and select to only include those where the result of the calculated field = True

Done.

2017-01-12_17-58-28

Ben