Using string calculations to create messages for NULL values

Guiding and ensuring your users interpret the data correctly is the key to a successful data visualisation.

So, question. What happens when your users come across NULL values in your visual, as in the instance below?

2016-08-16_23-51-55.png

Best case, they simply understand that there are no recorded sales for that country during that specific month.

Worst case, they come up with some mind boggling conclusion that the Belgium store was forced to close due to declining sales in September 2015.

But how about we ensure the data is interpreted correctly through creating a combined lookup/conditional/string statement.

2016-08-17_00-12-20.png

Tableau introduced a neat workaround in this Tableau knowledge base article, but there a re limitations. We may need to ensure that we distinguish between these NULL values and true zeros. True zeros may exist because we are rounding, or if you are presenting your data in 1,000s, 1,000,000s etc; they may also exist because you are working with a measure value that can be both positive and negative, such as profit, and in turn summing these values may equate to a true zero value.

The method outlined in this blog is a very simple tweek to that outlined in the tableau article, but instead of returning zero, we return text.

2016-08-17_18-20-14

The lookup(sum([Sales]),0) is effectively evaluating what the value is of that specific cell (because we are offsetting by 0). Where are cells are empty, this with return the value NULL.

So by wrapping this formula with a isnull() we are quite simply returning a TRUE FALSE dependant on whether the vale is infact NULL or not.

By then using an IF statement we can identify what should happens if the condition is TRUE (i.e. it is NULL), where we return the string “No sales”, and where it is FALSE we return the sum of sales.

A note here is that the data type outputs from an if statement must return the same data type, and as a result we must change the datatype of sum([Sales]) calculation and to do this we simply wrap this in an str() – string- function.

Because our output is now a string and not a number, we as a result loose the ability to format the number, as a number (if that makes sense). This is why I include the round() function, but you may also wish to do some further manipulation such as adding a currency indicator, or thousands separators.

2016-08-17_23-32-50.png

So can we use this to label NULL values on bar charts? The answer is yes, but it is not as simple as placing our new string field on the label shelf and the sum([Sales]) on the rows/columns shelf.

2016-08-17_23-48-18

This is because the bars where there are ‘No sales’ don’t actually exist, so there is nothing to label, so we now need to perform a very similar calculation to return a value so there is something to label, I return the value 0, in order to build a mark which we can then label.

2016-08-17_23-47-07

2016-08-17_23-47-54

And now we have a labelled bar chart🙂

Ben

 

‘Chunking’ your Excel output in Alteryx – Part 2 – Introducing the ‘chunk’ macro

Starting where we left off.

A client had asked me to develop a method of automatically segmenting (chunking) our data by implementing a row limit.

It was now up to me to find a solution, and after gaining an understanding of the ways of which you can chunk data within an .xlsx output I was now getting down to trying to implement this knowledge.

And with that may I introduce the ‘Chunk Excel Output‘ macro.

2016-08-22_13-44-05.png


As with all applications and macro’s that I build, I will always build out a traditional workflow first. I do this, a) to ensure that the idea is actually possible, and b) to avoid the complications that come in when introducing interface tools into the process. By doing the latter first, we may never get to a stage where we achieve the goal of point a.

My initial idea was to use the ‘Change Entire File Path’, which I knew would give the users flexibility to either output the chunks to different workbooks, or sheets, as desired.

The challenge here was, how do I acknowledge when a defined row limit was hit?

Firstly, something I knew I needed to be able to do was to acknowledge the row of each line, so the starting point was a Record ID tool. I then thought that I could use a multi-row formula tool to essentially create a condition which asks the question, does the record ID of the current line, meet the criteria of the current file (the current file being an incremental value which starts at 1).

This develops into a formula as (where the row limit is defined as 1000):

if [RecordID] = 1 then 1 elseif [RecordID]<=([Row-1:File]*1000) then [Row-1:File] else [Row-1:File]+1 endif

af.png

//Here we are defining whether the current row is the 1st row in the file, if it is then we will give it the [File] value of 1 (i.e. it is within our first chunk)

IF ISNULL([Row-1:File]) THEN 1

//If the record is not the first row in the file then determine whether the Record ID of the current row is less than the row limit, multiplied by the chunk number

ELSEIF [RecordID]<=([Row-1:File]*1000)

//If the previous statement returns TRUE, then we increment the chunk value, if not we keep the chunk value the same as the previous row

THEN [Row-1:File]

ELSE [Row-1:File]+1 ENDIF

sdsfs

We can then use a formula tool to append this value to a file path string…

“C:\Users\Ben Moss\Documents\Filename.xslx|SheetName”+tostring([File])

And now we have a different file path for each ‘chunk’.

dsd.png

And then as we use the ‘Change Entire File Path’ from our ‘File Output’ field, and run the workflow our records will now output accordingly.

afsefv.png

And now that we have a method too create chunking by a row limit we can now bring our user input. In the case of this app, we simply allow the user to select their file path and chunking type, and manipulate the URL string accordingly, whilst we also allow them to overwrite the default value within the multi-row formula tool that gives the row limit.

I don’t want to go too much in detail in regards to the building of the app, but there are some great blogs out their in regards to the interface designer tools, including ‘One Step Closer to Alteryx Nirvana – Building Macros and Apps‘ by Michael Mixon, whilst Naledi Hollbruegge has posted a number of blogs that focus on specific interface tool on The Data School blog.

If you wish to see the detail of how the final macro was built then it is available to download.

Happy Alteryx-ing

Ben

 

‘Chunking’ your Excel output in Alteryx – Part 1

The main purpose of this blog is two fold. Firstly to discuss the configuration options available when looking to output your data as an xlsx file from Alteryx. Secondly (mainly a result of Excels 1,048,576 row limit), to introduce you to a method that segments your data into separate files/worksheets dependant on a user defined limit.

rsgb.png

Lets start with the ‘Output Options’, there are four;

  1. ‘Append to Existing sheet’ – Alteryx will open an xlsx file and effectively union your data to the existing data under the file and sheet specified. Alteryx can error here for one of two reasons. Firstly your document must already exist, it will not automatically generate a new file if one does not exist originally.Secondly, the data that you are appending must have the same scheme as the data that already exists within that file.
  2. ‘Overwrite Sheet’ – Alteryx will open an xslx file and overwrite any data in the file on the particular sheet specified. Unlike with the previous option, in cases where the file did not exist previously, a new workbook will be created.
  3. ‘Overwrite File’ – Alteryx completely removes any file that exists under the same file path and generates a new file. Again if the file did not exist previously then a documented will be created.
  4. ‘Create new sheet’ – Alteryx will look to open an xslx file and add your new sheet as specified after the pipe within the file name2016-08-18_21-07-27

    In cases where the sheet existed previously you will receive an error ‘Sheet already exists’. If a file did not exist previously Alteryx will create a new document.

Okay so now we have covered the ‘Output Options’, we will now cover all things ‘chunking’.

First of all, what do I mean by the term ‘chunking’, essentially I mean segmenting our full data-set into smaller data-sets as outlined by a set of user defined instructions.

We can see the chunking options available to us with the .xlsx output by selecting the ‘Take File/Table Name from Field’. Here we again have four options.

  1. ‘Append Suffix to File/Table Name’ – Here we can append from a field a value with which we want to segment our data. With an xlsx output, selecting this type will output each value as a new sheet.Essentially we can create separate sheets for the different dimension members of a field. Lets take sample superstore for instance.Here I have set my file to write to the following path:’C:\Users\Ben Moss\Documents\Sample – Superstore.xls|Region_’

    At the bottom I have selected the ‘Append Suffix to File/Table Name’ chunk type, and have selected the Region column as the names to suffix onto my file path.

    2016-08-18_22-10-06.png

    As a result, when I open the outputted file, i see four tabs, one for each of the different dimension members that existed within the region field, ‘North’, ‘East’, ‘South’ and ‘West’, with these values appended to the sheet name given in the original file path ‘Region_’.

    2016-08-18_22-13-08

  2. ‘Append Prefix to File/Table Name’ – Acts in an identical manor as the above but this time adds the dimension members before the sheet name rather than after.2016-08-18_22-32-19
  3. ‘Change File/Table Name’ – Here irrespective of what value you give as the sheet name, Alteryx will simply output the sheet names as just the dimension members within the field specified.2016-08-18_22-35-13
  4. ‘Change Entire File Path’ – This one is pretty much what is says in the title, you must specify a field that contains an entire file path that determines where the file will output to. You can use this to write to either different sheets, as shown above, or to different files, unlike the previous three options. A practical example of the ‘Change entire file path’ could exist in cases when you want to input multiple files, manipulate them in the same way, before outputting them as separate files.In this usecase I have inputted 44 files using a wildcard input from a crime database. As part of the input configuration I checked the ‘Output File Name as Field’ to ‘Full Path’. After completing the manipulation I wanted (to remove crimes which did not have an ID), and then used a string formula to change the path so it was creating a xlsx file instead of a .csv.Replace([FileName],”.csv”,”.xlsx|Street”)

    Essentially I am removing the “.csv” from the original path, and replacing it with the “.xlsx” type, and also adding the sheet name as required with .xlsx file types.

    The workflow highlighted in this example can be found here.

    2016-08-18_23-11-13.png

    2016-08-18_23-06-24.png


So, lets get to the reason I wanted to learn all of this. Well, whilst working on a client project this week, one of the requirements they had was the output must be in an .xslx output. This immediately puts a limitation on the number of records we can output per file, as 1,048,576. The client then further stipulated that he would in fact like the row limit to be significantly lower than that (we settled at 500,000), for performance reasons.

Before I had time to learn all of this detail, we had to put a temporary measure in place, which ended up looking something like this…

2016-08-18_23-49-58.png

Not ideal, firstly because I knew there was a method to chunk the data as outlined above (I just didn’t know how), and two, just because we are filtering to the different months within the data set this doesn’t necessarily mean the row limit won’t be exceeded. And in fact their was an instance where this was a case.

Damn, our temporary solution was now broken. It was time to learn and learn quickly.

In the second part of this blog I outline how I solved the problem outlined above by creating a workflow that restricts the number of rows that can be outputted to each file, and how I then developed this into a macro where users can define this row limit amongst a series of other options available to them.


 

Switching between discrete and continuous colouring

Ever wanted your users to have the ability to select whether they would like to colour their chart by either a continuous field or through discrete colouring but thought it wasn’t possible?

Well it is!

A use case came up last week and after a brain storm I managed to create this.

colour.png

Note, this tip relies on a dual axis, so if your chart is already a dual axis then this is not possible (sorry :(), although an alternative would be to switch between your two worksheets dynamically, as outlined here.

  1. Create a parameter (or another element, such as a count) that defines whether the chart should be coloured in a continuous or discrete way.2016-08-15_19-10-49
  2. Create a calculated field which will return either the value [Sales] in my case, or NULL when each variable is selected.2016-08-15_19-13-25.png2016-08-15_19-15-43
  3. Drag these two pills onto the relevant shelf, in my case I wanted to create a horizontal bar chart, so placed the pills on the column shelf.What you will notice now is that as you switch between the values on your parameter, the charts will take it in turns to appear as your values switch between NULL and the appropriate measure.2016-08-15_19-14-28.png2016-08-15_19-14-57.png
  4. Now that we have two axes, we also have two marks cards, one for each axis that we can treat seperately. We can now format them as so. On the axis that appears when I select ‘Sales’ colouring (i.e. by my measure), i simply drag [Sales] onto the colour shelf.2016-08-15_19-19-33.pngWhilst on my secondary axis, I will drag my [Category] field onto the colours shelf.

    2016-08-15_19-19-48.png

    2016-08-15_19-17-54.png
    2016-08-15_19-18-33.png

  5. Lastly is some clean up work, hiding the NULL values indicator, creating your dual axis, and then synchronising your axes.Of course you cannot have your axes showing in this case because one of them will always show 0. In that case it is probably wise to label your chart as an alternative or make the values clear in your tooltips.

Indeed it is also possible to switch between chart types and measures using this method!

Ben

 

Tableau Tip – Using concatenation to sort values in tables and charts

I have a chart (not quite a dream), this chart is broken down by two dimensions ([Category] and [Sub-Category]), and a measure ([Sales]).

blog1

But when I sort sales I get a type of what I call, ‘nested sorting’, it is sorting sales, but it is sorting sales within each category (if you have a different number of dimensions, it will sort within the 2nd last dimension on your shelf.

blog2

But there is a way to prevent this, I simply need to concatenate the dimension members into one calculation, and place this as the first column, and the hide it.

To achieve this I create a calculated field, but you can achieve an identical result by combining your fields (I don’t know why, but I prefer the former).

blog3

blog4

blog5

Something that I often offer my users when such a requirement exists is a parameter that switches out the table with another sheet when they want to sort, using this great tip by Hashu Shenkar of The Information Lab.

I do this because I prefer the look and feel of the unsorted table, and how it doesn’t include unnecessary dimension members values as they are repeated down the table.

3.png

Ben

#VizLikeAnArtist

4 ways to prevent date conversion problems when building date strings

Recently I was working on a project which had month and year columns but no actual date field in Tableau.

Simple enough, I created a calculated field which added a day number (01) to the front of the string.

picture 1 blog.png

I then changed the data type to a date and we were away, we now have the ability to use Tableaus date/time manipulation functions.

Little did I know that when I sent my workbook over to the states, because of the way in which I have built my date field, and because of the way Americans read their dates (mm/dd/yyyy) my pretty YTD trend graph was now broken. It now showed just a single point because all of my month values were, as defined by my calculated field “01”.

(damn those pesky Americans and their unconventional metrics).

Picture 2 blog.png

Okay, you could of course just get your users to change their default local settings to the UK region, but this could have further impact on workbooks that rely on the US date settings, and perhaps more importantly what if your audience doesn’t know about the change and as a result miss-interpret the dates.

Well in this blog I will outline four ways in which we can prevent this problem.

  1. Create your dates in the backend (probably not the most helpful tip, but as a result Tableau will automatically now how to read them dependant on the users settings).
  2. Build your calculated field as [Year]+”/”+[Month]+”/01”, this format (yyyy/mm/dd) is independent of regional settings.
  3. Use the above structure (yyyy,mm,dd) with the makedate() function, so in our example Makedate([Year],[Month],1)
  4. Build an epic calculation (or steal the one here), which generates the unique number for the underlying date which can then be converted to a date by either wrapping date() around the formula, or changing the field type to date.Every date since 01/01/1900 is given a unique identifier based on the number of days from this start date (this is called the 1900 system). 01/01/1900 is assigned the value 1 (given that it is the 1st day), 02/01/1900 being 2, etc.As a result it is possible to create a formula that converts month/year values into a full date.

    //Calculate the number of non-leap years between your date and 1900, then multiply this by 365

    ((([Year]-1900)-INT((([Year]-1900)/4)+IIF((([Year]-1900)/4)<>INT((([Year]-1900)/4)),1,0)))*365)

    //Calculate the number of leap years between your date and 1900, then multiply this by 366

    +(((INT(([Year]-1900)/4))+IIF((([Year]-1900)/4)<>INT((([Year]-1900)/4)),1,0))*366)

    //Calculate whether the year of the date is a leap year

    +IIF((([Year]-1900)/4)<>INT((([Year]-1900)/4)),

    //If it is not, then calculate the running total of days up to the month value

    IIF([Month]=1,0,IIF([Month]=2,31,IIF([Month]=3,59,IIF([Month]=4,90,IIF([Month]=5,120,IIF([Month]=6,151,IIF([Month]=7,181,IIF([Month]=8,212,IIF([Month]=9,243,IIF([Month]=10,273,IIF([Month]=11,304,334))))))))))),

    //If it is, then calculate the running total of days up to the month value (this is the same as above but with an added day from February)

    IIF([Month]=1,0,IIF([Month]=2,31,IIF([Month]=3,60,IIF([Month]=4,91,IIF([Month]=5,121,IIF([Month]=6,152,IIF([Month]=7,182,IIF([Month]=8,213,IIF([Month]=9,244,IIF([Month]=10,274,IIF([Month]=11,305,335))))))))))))

    //In excel add 1 to your date to make it the 1st of your month/year value. Due to differences in structure between Microsoft serial system and Tableau serial system (First of all, Tableau actually knows that the 29/02/1900 is not an actual date (LOL Microsoft), secondly the serial in Tableau for the first date in the system (01/01/1900) is 0, whereas as previously mentioned with the 1900 system, it is 1.)

    -1


    This calculation was really fun to problem solve and their were a number of complexities (primarily the result of leap years) that I had to work around in order to get it to work correctly. Hopefully the text included in the calculation documents it well enough to provide you with an understanding of how it works.

    I also learnt that what I had been tought at school in regards to leap years was wrong, it isn’t quite as simple as every 4 years.

    Leap_Centuries.jpg

    I was comforted (and I also found it hilarious) that Microsoft appear not to have known this fact when building the 1900 system where 29/02/1900 is acknowledged as a date, although Microsoft blame IBM for this and their users need to move between Excel and IBMs Lotus 1-2-3 spreadsheet programme.

    Peace out.

    Ben

Politics Month and #IronViz

Well that was interesting. Genuinely, it was.

I have never really had an enthusiasm for UK politics, never mind European of Global. But now, now I do.

Tableau’s dedicated Politics month, purposefully conflicting with the UK’s EU referendum and the US presidential race, has been the underlying reason for this, exposing me to a whole world of data that I did not know existed.

Now instead of trying to form a decision by listening to the callus lies of politicians, so detached from reality that I am surprised when I don’t see that Cameron and co’s birthplaces aren’t Narnia; I can now make my own informed, objective decisions based on facts.

Hallelujah.

And it was great to see others doing the same.

Narnia.png

Of course Tableau shouldn’t be the only one’s taking credit for this, so should all companies that are giving people the opportunity to think in a more objective way, i’m talking data viz companies, data viz preachers and the statistics departments of our governments for giving us access to more data than ever.

For us UK based data people, the Office of National Statistics (ONS) is, in my opinion, a data dreamland, providing so much data on what is often such misunderstood subjects.

So now it is up to us few, to spread the word, and make sure political decisions are, more than ever, based on objective facts rather than lies and scare mongering, and make these misunderstood subjects, understood subjects.

Click here to navigate to my politics visualisation portfolio.


My Politics #IronViz is not designed to help people make more informed decisions about an upcoming political decision (as discussed above), such as an election or referendum, it is instead designed to help people understand a recent election, help them understand voting patterns and trends, and help them understand the deeper detail.

I chose to focus on the London Mayoral election from 2016.

Why? Well having only recently moved to London I thought this was a chance for myself to better understand the politics in this richly diverse city. It is also a very recent election so I thought the outcome was still relevant and could still be used by the people of London to help understand the results.

Below I have outlined the four key questions that I wanted my audience to be able to answer, and how I represented these in my visualisation.

  1. Overall, how did the different boroughs of London vote.2016-06-26_14-12-07

    This map is central to  my visualisation. It helps answer the question noted above, ‘overall how did the different boroughs of London vote’, but also acts as a navigation tool to the rest of the visualisation, where selecting the different boroughs filters the other information accordingly.

    So the first thing you are probably wondering is why the ‘lego map’ (as I have dubbed it), well when asking for feedback from my Information Lab colleagues, fellow data schooler Rob Suddaby asked that very question.

    He was right to challenge my design choices, and you should always be able to justify them otherwise they should not be there.

    2016-06-26_14-57-58

    He was right to challenge my design choices, and you should always be able to justify them , which in this case I felt I could.

    2016-06-26_14-57-43.png

  2. How many people took part in the election2016-06-26_14-13-11
    In this summary I included detail on the number of residents eligible to vote, ‘logged electorate’, the number of good ballot papers counted ‘number of good votes’ (some votes are excluded for various reasons), and the % turnout, which is the division of the latter against the former.

    These are seen as valuable metrics in terms of adding context to voting statistics, especially the turnout %.

    I represented these values as text to quickly allow my readers to acknowledge these numbers before moving on to understanding the deeper detail, and if needs  be they could very quickly glance over and refresh their mind on this information.

  3. How did each borough vote for each of the main political partiesOf course when looking to show geographical trends, representing this information on a map is one of the best way of showing this type of information as it allows for easy recognition from your audience, providing they have knowledge of the area being shown. Again I chose ‘lego maps’ to represent this information for the reasons noted in the answer to the 1st question, and also for consistency.

    I decided against including the figures on the tooltips for this section as the information is available in the way in which I have answered point 4.
    2016-06-26_14-13-59

  4. How did voting patterns change from the 2012 mayoral election.People often have issues with sankeys, they see them as something that whilst great to look at, really doesn’t tell you much. I agree, in cases when they are used incorrectly. Otherwise they can be hugely informative and efficient ways of presenting flow data (and of course beautiful).

    My viz is not a traditional sankey, one that is used to show relationships, interactions and flows between different elements, but instead has been adapted to show the change between one point in time to another, of the same object.

    2016-06-26_14-14-48


Credits.

I’d like to give credits to certain members of the data viz community who helped shape this piece of work.

Chris Love, for his blog on sankeys, which helped me to understand the underlying maths behind creating a sankey diagram which I transferred into an alteryx flow.

Lisa Ding, for her great blog on dynamically formatting numbers, which means my values can be summarised  to different levels. I have now used this trick on numerous projects that I have worked on.

A random blog, for the link of which I have lost, that discussed the different types of maps including a ‘lego map’.

ef81862a-52da-4038-99d9-d90bc811fefa.png

And many others of course!

Ben

#VizLikeAnArtist