Alteryx – Master of Spatial Analysis

The Toronto Financial Institution Market: Bridging the gap between Cartography and Analytics using Tableau

Nav Salooja

“Geovis Project Assignment @RyersonGeo, SA8905, Fall 2019”

<script type='text/javascript' src='https://prod-useast-a.online.tableau.com/javascripts/api/viz_v1.js'></script><div class='tableauPlaceholder' style='width: 1920px; height: 915px;'><object class='tableauViz' width='1920' height='915' style='display:none;'><param name='host_url' value='https%3A%2F%2Fprod-useast-a.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='/t/torontofimarketgeovisprojectsa8905fall2019' /><param name='name' value='TheTorontoFIMarketDashboard/TorontoFIMarket' /><param name='tabs' value='yes' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /></object></div>

Introduction & Background

Banking in the 21^st century has evolved significantly especially in the hyper competitive Canadian Market. Big banks nationally have a limited population and wealth share to capture given Canada’s small population and have been active in innovating their retail footprint. In this case study, TD Bank is the point of interest given its large branch network footprint in the Toronto CMA. Within the City of Toronto the bank has 144 branches and is used as the study area for the dashboard created. The dashboard analyzes the market potential, branch network distribution, banking product recommendations and client insights to help derive analytics through a centralized and interactive data visualization tool.

Technology

The technology selected for the geovisualization component is Tableau given its friendly user interface, mapping capabilities, data manipulation and an overall excellent visualization experience. However, Alteryx was widely used for the build out of the datasets that run in Tableau. As the data was extracted from various different sources, spatial element and combining datasets was all done in Alteryx. The data extracted for Expenditure, Income and Dwelling Composition was merged and indexed in Alteryx. The TD Branches was web scrapped live from the Branch Locator and the trading areas (1.5KM Buffers) are also created in Alteryx. The software is also used for all the statistical functions such as the indexed data points in the workbook were all created in Alteryx. The geovisualization component is all created within the Tableau workbooks as multiple sheets are leverged to create an interactive dashboard for full end user development and results.

Figure 1 represents the Alteryx Workflow used to build the Market, Branch and Trade Area datasets

Figure 2 provides the build out of the final data sets to fully manipulate the data to be Tableau prepared

Data Overview

There are several data sets used to build the multiple sheets in the tableau workbook which range from Environics Expenditure Data, Census Data and webscrapped TD branch locations. In addition to these data sets, a client and trade area geography file was also created. The clients dataset was generated by leveraging a random name and Toronto address generator and those clients were then profiled to their corresponding market. The data collected ranges from a wide variety of sources and geographic extents to provide a fully functional view of the banking industry. This begins by extracting and analyzing the TD Branches and their respective trade areas. The trading areas are created based on a limited buffer representing the immediate market opportunity for the respective branches. Average Income and Dwelling composition variables are then used at the Dissemination Area (DA) geography from the 2016 Census. Although income is represented as an actual dollar value, all market demographics are analyzed and indexed against Toronto CMA averages. As such these datasets combined with Market, Client and TD level data provide the full conceptual framework for this dashboard.

Tables & Visualization Overview

Given the structure of the datasets, six total tables are utilized to combine and work with the data to provide the appropriate visualization. The first two tables are the branch level datasets which begin with the geographic location of the branches in the City of Toronto. This is a point file taken from the TD store locator with fundamental information about the branch name and location attributes. There is a second table created which analyzes the performance of these branches in respect to their client acquisition over a pre-determined timeframe.

Figure three is a visualization of the first table used and the distribution of the Branch Network within the market

The third table used consists of client level information selected from ‘frequent’ clients (clients transacting at branches 20+ times in a year. Their information builds on the respective geography and identifies who and where the client resides along with critical information that is usable for the bank to run some level of statistical analytics. The client table shows the exact location of those frequent clients, their names, unique identifiers, their preferred branch, current location, average incomes, property/dwelling value and mortgage payments the bank collects. This table is then combined to understand the client demographic and wealth opportunity from these frequent clients at the respective branches.

Figure four is the visualization of the client level data and its respective dashboard component

Table four and five are extremely comprehensive as they visualize the geography of the market (City of Toronto at a DA level). This provides a trade area market level full breakdown of the demographics and trading areas as DAs are attributed to their closest branch and allows users to trigger on for where the bank has market coverage and where the gaps reside. However, outside of the allocation of the branches, the geography has a robust set of demographics such as growth (population, income), Dwelling composition and structure, average expenditure and the product recommendations the bank can target driven through the average expenditure datasets. Although the file has a significant amount of data and can be seen as overwhelming, selected data is fully visualized. This also has the full breakdown of how many frequent clients reside in the respective markets and what kind of products are being recomened on the basis of the market demographics analyzed through dwelling composition, growth metrics and expenditure.

Figure five is the visualization of the market level data and its respective dashboard component

The final table provides visualization and breakdown of the five primary product lines of business the bank offers which are combined with the market level data and cross validated against the average expenditure dataset. This is done to identify what products can be recommended throughout the market based on current and anticipate expenditure and growth metrics. For example, markets with high population, income and dwelling growth with limited spend would be targeted with mortgage products given the anticipated growth and the limited spend indicating a demographic saving to buy their home in a growth market. These assumptions are made across the market based on the actual indexed values and as such every market (DA) is given a product recommendation.

Figure six is the visualization of the product recommendation and analysis data and its respective dashboard component

Dashboard

Based on the full breakdown of the data extracted, the build out and the tables leveraged as seen above, the dashboard is fully interactive and driven by one prime parameters which controls all elements of the dashboard. Additional visualizations such as the products visualization, the client distribution treemap and the branch trends bar graph are combined here. The products visualization provides a full breakdown of the products that can be recommended based on their value and categorization to the bank. The value is driven based on the revenue the product can bring as investment products drive higher returns than liabilities. This is then broken down into three graphs consisting of the amount of times the product is recommended, the market coverage the recommendation provides between Stocks, Mortgages, Broker Fees, Insurance and Personal Banking products. The client distribution tree map provides an overview by branch as to how many frequent clients reside in the branch’s respective trading area. This provides a holistic approach to anticipating branch traffic trends and capacity constraints as branches with a high degree of frequent clients would require larger square footage and staffing models to adequately service the dependent markets. The final component is the representation of the client trends in a five year run rate to identify the growth the bank experienced in the market and at a branch level through new client acquisition. This provides a full run down of the number of new clients acquired and how the performance varies year over year to identify areas of high and low growth.

This combined with the primary three mapping visualizations, creates a fully robust and interactive dashboard for the user. Parameters are heavily used and are built on a select by branch basis to dynamically change all 6 live elements to represent what the user input requires. This is one of the most significant capabilities of Tableau, the flexibility of using a parameter to analyze the entire market, one branch at a time or to analyze markets without a branch is extremely powerful in deriving insights and analytics. The overall dashboard then zooms in/out as required when a specific branch is selected highlighting its location, its respective frequent clients, the trade area breakdown, what kind of products to recommend, the branch client acquisition trends and the actual number of frequent clients in the market. This can also be expanded to analyze multiple branches or larger markets overall if the functionality is required. Overall, the capacity of the dashboard consists of the following elements:

1. Market DA Level Map
2. Branch Level Map
3. Client Level Map
4. Client Distribution (Tree-Map)
5. Branch Trending Graph
6. Product Recommendation Coverage, Value and Effectiveness

This combined with the capacity to manipulate/store a live feed of data and the current parameters used for this level of analysis bring a new capacity to visualizing large datasets and providing a robust interactive playground to derive insights and analytics.

The link for this full Tableau Workbook is hosted here (please note an online account is required):https://prod-useast-a.online.tableau.com/t/torontofimarketgeovisprojectsa8905fall2019/views/TheTorontoFIMarketDashboard/TorontoFIMarket?:showAppBanner=false&:display_count=n&:showVizHome=n&:origin=viz_share_link

Putting BlogTO on the map (literally) – Tutorial

Kyle Larsen
SA8905 – Cartography and Geovisualization
Fall 2019

Instagram is a wealth of information, for better or worse, if you’ve posted to Instagram before and your profile is public, maybe even if it’s not, then your information is out there just waiting for someone, someone maybe like me, to scrape your information and put it onto a map. You have officially been warned.

But I’m not here to preach privacy or procure your preciously posted personal pics. I’m here to scrape pictures from Instagram, take their coordinates, and put them onto a map into a grid layout over Toronto. My target for this example is a quite public entity that thrives off exposure, the notorious BlogTo. Maybe only notorious if you live in Toronto, BlogTo is a Toronto-based blog about the goings on in the 6ix as well as Toronto life and culture, they also have an Instagram that is almost too perfect for this project – but more on that later. Before anything is underway a huge thank-you-very-much to John Naujoks and his Instagram scraping project that created some of the framework for this project (go read his project here, and you can find all of my code here)

When scraping social media sometimes you can use an API to directly access the back end of a website, Twitter has an easily accessible API that is easy to use. Instagram’s API sits securely behind the brick wall that is Facebook, aka it’s hard to get access to. While it would be easier to scrape Twitter, we aren’t here because this is easy, maybe it seems a little rebellious, but Instagram doesn’t want us scraping their data… so we’re going to scrape their data.

This will have to be done entirely through the front end, aka the same way that a normal person would access Instagram, but we’re going to do it with python and some fancy HTML stuff. To start you should have python downloaded (3.8 was used for this but any iteration of python 3 should give you access to the appropriate libraries) as well as some form of GIS software for some of the mapping and geo-processing. Alteryx would be a bonus but is not necessary.

We’re going to use a few python libraries for this:

urllib – for accessing and working with URLs and HTML
selenium – for scraping the web (make sure you have a browser driver installed, such as chromedriver)
pandas – for writing to some files

If you’ve never done scraping before, it is essentially writing code that opens a browser, does some stuff, takes some notes, and returns whatever notes you’ve asked it to take. But unlike a person, you can’t tell python to go recognize specific text or features, which is where the python libraries and HTML stuff comes in. The below code (thanks John) takes a specific Instagram user and return as many post URLs as you want and adds them to a list, for your scraping pleasure. If you enable the browser head you can actually watch as python scrolls through the Instagram page, silently kicking ass and taking URLs. It’s important to use the time.sleep(x) function because otherwise Instagram might know what’s up and they can block your IP.

But what do I do with a list of URLs? Well this is where you get into the scrappy parts of this project, the closest to criminal you can get without actually downloading a car. The essentials for this project are the image and the location, but this where we need to get really crafty. Instagram is actually trying to hide the location information from you, at least if you’re scraping it. Nowhere in a post are coordinates saved. Look at the below image, you may know where the Distillery District is, but python can’t just give you X and Y because it’s “south of Front and at that street where I once lost my wallet.”

If you click on the location name you might get a little more information but… alas, Instagram keeps the coordinates locked in as a .png, yielding us no information.

BUT! If you can scrape one website, why not another? If you can use Google Maps to get directions to “that sushi restaurant that isn’t the sketchy one near Bill’s place” then you might as well use it to get coordinates, and Google actually makes it pretty easy – those suckers.
(https://www.google.com/maps/place/Distillery+District,+Toronto,+ON/@43.6503055,-79.35958,16.75z/data=!4m5!3m4!1s0x89d4cb3dc701c609:0xc3e729dcdb566a16!8m2!3d43.6503055!4d-79.35958 )
I spy with my little eye, some X and Y coordinates, the first set after the ‘@’ would usually be the lat/long of your IP address, which I’ve obviously hidden because privacy is important, that’s the takeaway from this project right? The second lat/long that you can gleam at the end of the URL is the location of the place that you just googled. Now all that’s left is to put all of this information together and create the script below. Earlier I said that it’s difficult to tell python what to look for, and what you need is the xpath, which you can copy from the html (right-click an element and then right-click that html and then you can get the xpath for that specific element. For this project we’re going to need the xpath for both the image and the location. The steps are essentially as follows:

go to Instagram post
download the image
copy the location name
google the location
scrape the URL for the coordinates

There are some setbacks to this, not all posts are going to have a location, and not all pictures are pictures – some are videos. In order for a picture to qualify for full scraping it has to have a location and not be a video, and the bonus criteria – it must be in Toronto. Way back I said that BlogTO is great for this project, that’s because they love to geotag their posts (even if it is mostly “Toronto, Ontario”) and they love to post about Toronto, go figure. With these scripts you’ve built up a library of commands for scraping whatever Instagram account your heart desires (as long as it isn’t private – but if you want to write script to log in to your own account then I guess you could scrape a private account that has accepted your follow request, you monster, how dare you)

With the pics downloaded and the latitudes longed it is now time to construct the map. Unfortunately this is the most manual process, but there’s always the arcpy library if you want to try and automate this process. I’ll outline my steps for creating the map, but feel free to go about it your own way.

Create a grid of 2km squares over Toronto (I used the grid tool in Alteryx)
Intersect all your pic-points with the grid and take the most recently posted pic as the most dominant for that grid square
Mark each square with the image that is dominant in that square (I named my downloaded images as their URLs)
Clip all dominant images to 1×1 size (I used google photos)
Take a deep breath, maybe a sip of water
Manually drag each dominant image into its square and pray that your processor can handle it, save your work frequently.

This last part was definitely the most in need of a more automated process, but after your hard work you may end up with a result that looks like the map below, enjoy!

Telling a Story through a Time-series animation using Open Data

By: Brian Truong.

GeoVis Project @RyersonGEO SA8905, Fall 2018

Context

As a student and photographer, I have frequently walked around the streets of Toronto. I would often see homeless individuals in certain neighborhoods in Toronto. While at the time I was aware of some shelters across of Toronto, I never fully understood the Toronto shelter system as I thought organizations in Toronto were one and the same in terms of providing shelters to those who are at-risk. I also noticed that the City of Toronto updates their shelter occupancy data on a more less daily basis, which led me to choose this topic for my GeoViz project. My lack of knowledge of the shelter system and the readily available data, motivated me to choose to make a Time Series map along with incorporating ESRI’s Story Maps into the project. This was to ensure that whoever wanted to see my project could be told the story of Toronto Shelters as well.

Process

Toronto open data provides shelter occupancy data in multiple formations, however, a JSON data format was chosen due to previous experience with working with JSON data in Alteryx. JSON data was provided through a link from Toronto Open Data. Using Alteryx a scrip was created to download the live(ish) data, parse it, put it in a proper format, then filter the data, and along with creating appropriate columns to work with the data.

Above is an example of the JSON data that was used, the data itself is semi-structured as the data is organized in a specific format. The data consisted of multiple of entries for Shelter location, those were filtered out so that only organizational program was present for each shelter location. this usually went down too the program that housed the largest number of people. In order for a proper time series to be created, a date/time column must be present, columns were created through the use of the formula tool where columns such as date/time and occupancy rates were created.

Above is the final Alteryx strip that was used to get the data from a JSON format to a .xlsx format. However, there was one problem with the data. The data itself wasn’t geocoded, so I had to manually geocode each shelter location by running the address of each shelter through Google Maps and copy and pasting the (x and y) values of the shelter locations. These coordinates were then put into the same file as the output of the Alteryx script, except it was in a different sheet. Using VLOOKUP, shelters were assigned their coordinates through matching shelter names.

Time Series Map

The time series portion of this geoviz project was created using Arcmap Pro, the excel file was brought into ArcGIS Pro and points were created using x and y coordinates. A shapefile was created, in order to create a time series map, the time field had to be enabled. Below shows the steps needed to be taken in order to enable time as a field on a shapefile.

In order to actually enable time, a time column must already exist in the format of dd/mm/yyy XX:XX. From that point, the change in shelter occupancy could be viewed through a time-slider going at any interval that the user required. For this project, it went by a daily basis on a 3-minute loop. In order to capture it as a video and export it, the animation function was required. Within the animation tab, the tool append was used.

What the append feature does is that it follows the time series map from the first frame, which is on the first day of the time series map (Jan 1, 2018 00:00) to the last day of the time series map (Nov 11, 2018 00:00). The animation would then be created as per specifications of the settings. The video itself is exported through a 480p video at 15 frames a second. It was then uploaded on YouTube and embeded on the storymaps.

ESRI Story Maps

The decision to use ESRI’s story maps was in part due to what motivated me to work on this. I wanted to tell the story of shelters and who they serve and is affected by them. Especially after two major events in the past year that has led to shelters in Toronto showing up on the news. Both the cold snap in early 2018 and the large influx of migrants has had a huge effect on Toronto’s Shelters.