Tracking the COVID-19 Pandemic in Toronto with R and Leaflet

By: Tavis Buckland

Geovisualization Project Assignment, SA8905, Fall 2020

Github Repository: https://github.com/Bucklandta/TorontoCovid19Cases.git

INTRO

Over the course of the pandemic, the City of Toronto has implemented a COVID-19 webpage focused on providing summary statistics on the current extent of COVID-19 cases in the city. Since the beginning of the pandemic, this webpage has greatly improved, yet it still lacks the functionality to analyze spatio-temporal trends in case counts. Despite not providing this functionality directly, the City has released the raw data for each reported case of COVID-19 since the beginning of the pandemic . Using RStudio with the leaflet and shiny libraries, a tool was designed to allow for the automated collection, cleaning and mapping of this raw case data.

Sample of COVID-19 case data obtained from the Toronto Data Portal

DATA

The raw case data was downloaded from the Toronto Open Data Portal in R, and added to a data frame using read.csv. As shown in the image below, this data contained the neighbourhood name and episode date for each individual reported case. As of Nov. 30th, 2020, this contained over 38,000 reported cases. Geometries and 2016 population counts for the City of Toronto neighbourhoods were also gathered from the Toronto Open Data Portal.

PREPARING THE DATA

After gathering the necessary inputs, an extensive amount of cleaning was required to allow the case data to be aggregated to Toronto’s 140 neighbourhoods and this process had to be repeatable for each new instance of the COVID-19 case data that was downloaded. Hyphens, spaces and other minor inconsistencies between the case and neighbourhood data were solved. Approximately 2.5% of all covid cases in this dataset were also missing a neighbourhood name to join on. Instead of discarding these cases, a ‘Missing cases’ neighbourhood was developed to hold them. The number of cases for each neighbourhood by day was then counted and transposed into a new data table. From there, using ‘rowSum’, the cumulative number of cases in each neighbourhood was obtained.

Example of some of the code used to clean the dataset and calculate cumulative cases

Unfortunately, in its current state, the R code will only gather the most recent case data and calculate cumulative cases by neighbourhood. Based on how the data was restructured, calculating cumulative cases for each day since the beginning of the pandemic was not achieved.

CREATING A SHINY APP USING LEAFLET

Using leaflet all this data was brought together into an interactive map. Raw case counts were rated per 100,000 and classified into quintiles. The two screenshots below show the output and popup functionality added to the leaflet map.

In its current state, the map is only produced on a local instance and requires RStudio to run. A number of challenges were faced when attempting to deploy this map application, and unfortunately, the map was not able to be hosted through the shiny apps cloud-server. As an alternative, the map code has been made available through a GitHub repository at the top of this blog post. This repository also includes a stand-alone HTML file with an interactive map.

Screenshot of HTML map produced by R Shiny App and Leaflet. Popups display neighbourhood names, population, raw count, and rate per 100,000 for the most recent case data.

LIMITATIONS

There are a couple notable limitations to mention considering the data and methods used in this project. For one, the case data only supports aggregation to Toronto neighbourhoods or forward sortation areas (FSA). At this spatial scale, trends in case counts are summarized over very large areas and are not likely to accurately represent This includes the modifiable areal unit problem (MAUP), which describes the statistical biases that can emerge from aggregating real-world phenomena into arbitrary boundaries. The reported cases derived from Toronto Public Health (TPH) are likely subject to sampling bias and do not provide a complete record of the pandemic’s spread through Toronto. Among these limitations, I must also mention my limited experience building maps in R and deploying them onto the Shinyapps.io format.

FUTURE GOALS

With the power of R and its many libraries, there are a great many improvements to be made to this tool but I will note a few of the significant updates I would like to implement over the coming months. Foremost, is to use the ‘leaftime’ R package to add a timeline function, allowing map-users to analyze changes over time in reported neighbourhood cases. Adding the function to quickly extract the map’s data into a CSV file, directly from the map’s interface, is another immediate goal for this tool. This CSV could contain a snapshot of the data based on a particular time frame identified by a user. The last functionality planned for this map is the ability to modify the classification method used. Currently, the neighbourhoods are classified into quintiles based on cumulative case counts per 100,000. Using an extended library of leaflet, called ‘leafletproxy’, would allow map users greater control over map elements. It should be possible to allow users to define the number of classes and which method (i.e. natural breaks, standard deviation, etc.) directly from the map application.