The main purpose of this blog is two fold. Firstly to discuss the configuration options available when looking to output your data as an xlsx file from Alteryx. Secondly (mainly a result of Excels 1,048,576 row limit), to introduce you to a method that segments your data into separate files/worksheets dependant on a user defined limit.
Lets start with the ‘Output Options’, there are four;
- ‘Append to Existing sheet’ – Alteryx will open an xlsx file and effectively union your data to the existing data under the file and sheet specified. Alteryx can error here for one of two reasons. Firstly your document must already exist, it will not automatically generate a new file if one does not exist originally.Secondly, the data that you are appending must have the same scheme as the data that already exists within that file.
- ‘Overwrite Sheet’ – Alteryx will open an xslx file and overwrite any data in the file on the particular sheet specified. Unlike with the previous option, in cases where the file did not exist previously, a new workbook will be created.
- ‘Overwrite File’ – Alteryx completely removes any file that exists under the same file path and generates a new file. Again if the file did not exist previously then a documented will be created.
- ‘Create new sheet’ – Alteryx will look to open an xslx file and add your new sheet as specified after the pipe within the file name
In cases where the sheet existed previously you will receive an error ‘Sheet already exists’. If a file did not exist previously Alteryx will create a new document.
Okay so now we have covered the ‘Output Options’, we will now cover all things ‘chunking’.
First of all, what do I mean by the term ‘chunking’, essentially I mean segmenting our full data-set into smaller data-sets as outlined by a set of user defined instructions.
We can see the chunking options available to us with the .xlsx output by selecting the ‘Take File/Table Name from Field’. Here we again have four options.
- ‘Append Suffix to File/Table Name’ – Here we can append from a field a value with which we want to segment our data. With an xlsx output, selecting this type will output each value as a new sheet.Essentially we can create separate sheets for the different dimension members of a field. Lets take sample superstore for instance.Here I have set my file to write to the following path:’C:\Users\Ben Moss\Documents\Sample – Superstore.xls|Region_’
At the bottom I have selected the ‘Append Suffix to File/Table Name’ chunk type, and have selected the Region column as the names to suffix onto my file path.
As a result, when I open the outputted file, i see four tabs, one for each of the different dimension members that existed within the region field, ‘North’, ‘East’, ‘South’ and ‘West’, with these values appended to the sheet name given in the original file path ‘Region_’.
- ‘Append Prefix to File/Table Name’ – Acts in an identical manor as the above but this time adds the dimension members before the sheet name rather than after.
- ‘Change File/Table Name’ – Here irrespective of what value you give as the sheet name, Alteryx will simply output the sheet names as just the dimension members within the field specified.
- ‘Change Entire File Path’ – This one is pretty much what is says in the title, you must specify a field that contains an entire file path that determines where the file will output to. You can use this to write to either different sheets, as shown above, or to different files, unlike the previous three options. A practical example of the ‘Change entire file path’ could exist in cases when you want to input multiple files, manipulate them in the same way, before outputting them as separate files.In this usecase I have inputted 44 files using a wildcard input from a crime database. As part of the input configuration I checked the ‘Output File Name as Field’ to ‘Full Path’. After completing the manipulation I wanted (to remove crimes which did not have an ID), and then used a string formula to change the path so it was creating a xlsx file instead of a .csv.Replace([FileName],”.csv”,”.xlsx|Street”)
Essentially I am removing the “.csv” from the original path, and replacing it with the “.xlsx” type, and also adding the sheet name as required with .xlsx file types.
The workflow highlighted in this example can be found here.
So, lets get to the reason I wanted to learn all of this. Well, whilst working on a client project this week, one of the requirements they had was the output must be in an .xslx output. This immediately puts a limitation on the number of records we can output per file, as 1,048,576. The client then further stipulated that he would in fact like the row limit to be significantly lower than that (we settled at 500,000), for performance reasons.
Before I had time to learn all of this detail, we had to put a temporary measure in place, which ended up looking something like this…
Not ideal, firstly because I knew there was a method to chunk the data as outlined above (I just didn’t know how), and two, just because we are filtering to the different months within the data set this doesn’t necessarily mean the row limit won’t be exceeded. And in fact their was an instance where this was a case.
Damn, our temporary solution was now broken. It was time to learn and learn quickly.
In the second part of this blog I outline how I solved the problem outlined above by creating a workflow that restricts the number of rows that can be outputted to each file, and how I then developed this into a macro where users can define this row limit amongst a series of other options available to them.