Having in-depth demographic information can be seen as the holy grail in the eyes of some data anlaysts, and there is a good reason why. The context that can be added to your analytics as a result in superb.
The problem is, up-to-date demographic information is extremely difficult to get hold of and even more difficult too map into your data.
I may not be able to solve the former issue, but I will give you an example of how you can ‘loosely’ (there are some assumptions) map population data into your data to solve the latter issue.
For the purpose of this example I will use information generated by my Data School colleague, Pablo Sáenz de Tejada, in this blog about geocoding addresses with Alteryx.
The dataset produced gives the latitude and longitude for each Tesco store within London. I have then created voronoi style trade areas for each of these stores, following the steps outlined in this blog by Will Griffiths.
Finding demographic information.
It is important in order to maximise the accuracy of our demographic mapping, that we obtain the demographic information at the lowest level of detail available (and of course the latest).
Within England and Wales, the Office of National Statistics release mid-year estimates of population at the level of Lower Super Output Area (LSOA). These estimates are broken down by gender which gives an additional demographic variable.
LSOA’s have been developed to provide areas that contain an average of 1500 residents and 650 households, whilst other characteristics such as social homogeneity also play a role in defining their boundaries.
Additional information can also be found at this level, such as indices of deprivation.
It is unlikely that you will be able to find a free datasource at a lower level than this for the England and Wales.
Now that we have our demographic detail we can look at mapping it into our dataset.
In this example I will show a scenario where we want to assess the potential footfall for each of our Tesco stores alongside some demographic attributes, specifically age and gender.
My methodology is something like this…
- Take the store trade area and spatially match them against the LSOA areas
- Create, for each matched LSOA, an intersection object between itself and the trade area
- Identify the size of the original LSOA shape
- Identify the size of the intersection object shape
- From these two values calculate the % of the total LSOA shape that is within the trade area
- Multiply this value by the demographic variables we may have
So how do we do that ^ in Alteryx.
1st things first we need to input our trade areas.
Then we need to input our LSOA shape files, which can be found here. Whilst I had this object I also used the spatial info tool to return the area value.
Now we need to spatially match these two objects. In my case I configured the spatial match to return objects that ‘touched or intersected’ the second object.
For each of your trade areas you will now have a list of the LSOA areas that are, at least in part, within your trade areas.
Now we know which LSOA objects are linked to which trade areas, alongside the actual spatial objects for each, we can create our intersect object using the Spatial Process tool.
Now we have our intersect object we can use the Spatial Info tool to identify the size of this area before calculating our % overlap.
Once we have these values it is just a case of identifying the demographic attributes we wish to bring in, join these to this file and multiply it by the percentage overlap.
You can find my complete workflow here, and I have also produced an Alteryx application which returns demographic information (age and gender) for each individual shape that exists within a shape file, that can be found here (note this will only enrich shape files within England and Wales).