Tracking the COVID-19 Pandemic in Toronto with R and Leaflet

By: Tavis Buckland

Geovisualization Project Assignment, SA8905, Fall 2020

Github Repository: https://github.com/Bucklandta/TorontoCovid19Cases.git

INTRO

Over the course of the pandemic, the City of Toronto has implemented a COVID-19 webpage focused on providing summary statistics on the current extent of COVID-19 cases in the city. Since the beginning of the pandemic, this webpage has greatly improved, yet it still lacks the functionality to analyze spatio-temporal trends in case counts. Despite not providing this functionality directly, the City has released the raw data for each reported case of COVID-19 since the beginning of the pandemic . Using RStudio with the leaflet and shiny libraries, a tool was designed to allow for the automated collection, cleaning and mapping of this raw case data.

Sample of COVID-19 case data obtained from the Toronto Data Portal

DATA

The raw case data was downloaded from the Toronto Open Data Portal in R, and added to a data frame using read.csv. As shown in the image below, this data contained the neighbourhood name and episode date for each individual reported case. As of Nov. 30th, 2020, this contained over 38,000 reported cases. Geometries and 2016 population counts for the City of Toronto neighbourhoods were also gathered from the Toronto Open Data Portal.

PREPARING THE DATA

After gathering the necessary inputs, an extensive amount of cleaning was required to allow the case data to be aggregated to Toronto’s 140 neighbourhoods and this process had to be repeatable for each new instance of the COVID-19 case data that was downloaded. Hyphens, spaces and other minor inconsistencies between the case and neighbourhood data were solved. Approximately 2.5% of all covid cases in this dataset were also missing a neighbourhood name to join on. Instead of discarding these cases, a ‘Missing cases’ neighbourhood was developed to hold them. The number of cases for each neighbourhood by day was then counted and transposed into a new data table. From there, using ‘rowSum’, the cumulative number of cases in each neighbourhood was obtained.

Example of some of the code used to clean the dataset and calculate cumulative cases

Unfortunately, in its current state, the R code will only gather the most recent case data and calculate cumulative cases by neighbourhood. Based on how the data was restructured, calculating cumulative cases for each day since the beginning of the pandemic was not achieved.

CREATING A SHINY APP USING LEAFLET

Using leaflet all this data was brought together into an interactive map. Raw case counts were rated per 100,000 and classified into quintiles. The two screenshots below show the output and popup functionality added to the leaflet map.

In its current state, the map is only produced on a local instance and requires RStudio to run. A number of challenges were faced when attempting to deploy this map application, and unfortunately, the map was not able to be hosted through the shiny apps cloud-server. As an alternative, the map code has been made available through a GitHub repository at the top of this blog post. This repository also includes a stand-alone HTML file with an interactive map.

Screenshot of HTML map produced by R Shiny App and Leaflet. Popups display neighbourhood names, population, raw count, and rate per 100,000 for the most recent case data.

LIMITATIONS

There are a couple notable limitations to mention considering the data and methods used in this project. For one, the case data only supports aggregation to Toronto neighbourhoods or forward sortation areas (FSA). At this spatial scale, trends in case counts are summarized over very large areas and are not likely to accurately represent This includes the modifiable areal unit problem (MAUP), which describes the statistical biases that can emerge from aggregating real-world phenomena into arbitrary boundaries. The reported cases derived from Toronto Public Health (TPH) are likely subject to sampling bias and do not provide a complete record of the pandemic’s spread through Toronto. Among these limitations, I must also mention my limited experience building maps in R and deploying them onto the Shinyapps.io format.

FUTURE GOALS

With the power of R and its many libraries, there are a great many improvements to be made to this tool but I will note a few of the significant updates I would like to implement over the coming months. Foremost, is to use the ‘leaftime’ R package to add a timeline function, allowing map-users to analyze changes over time in reported neighbourhood cases. Adding the function to quickly extract the map’s data into a CSV file, directly from the map’s interface, is another immediate goal for this tool. This CSV could contain a snapshot of the data based on a particular time frame identified by a user. The last functionality planned for this map is the ability to modify the classification method used. Currently, the neighbourhoods are classified into quintiles based on cumulative case counts per 100,000. Using an extended library of leaflet, called ‘leafletproxy’, would allow map users greater control over map elements. It should be possible to allow users to define the number of classes and which method (i.e. natural breaks, standard deviation, etc.) directly from the map application.

COVID-19 in Toronto: A Tale of Two Age Groups

By Meira Greenbaum

Geovis Project Assignment @RyersonGeo, SA8905, Fall 2020

Story Map Link

Introduction

The COVID-19 pandemic has affected every age group in Toronto, but not equally (breakdown here). As of November 2020, the 20-29 age group accounts for nearly 20% of cases, which is the highest proportion compared to the other groups. The 70+ age group accounts for 15.4% of all cases. During the first wave, seniors were affected the most, as there were outbreaks in long-term care homes across the city. By the end of summer and early fall, the probability of a second wave was certain, and it was clear that an increasing number of cases were attributed to younger people, specifically those 20-29 years old. Data from after October 6th was not available at the time this project began, but since then Toronto has seen another outbreak in long-term care homes and an increasing number of cases each week. This story map will investigate the spatial distribution and patterns of COVID-19 cases in the city’s neighbourhoods using ArcGIS Pro and Tableau. Based on the findings, specific neighbourhoods with high rates can be analyzed further.

Why these age groups?

Although other age groups have seen spikes during the pandemic, the trends of those cases have been more even. Both the 20-29 and 70+ groups have seen significant increases and decreases between February and November. Seniors are more likely to develop extreme symptoms from COVID-19, which is why it is important to focus on identifying neighbourhoods with higher rates of seniors. 20-29 is an important age group to track because increases within that group are more unique to the second wave and there is a clear cluster of neighbourhoods with high rates.

Data and Methods

The COVID-19 data for Toronto was provided by the Geo-Health Research Group. Each sheet within the Excel file contained a different age group and the number of cases each neighbourhood had per week from January to early October. The format of the data had to be arranged differently for Tableau and ArcGIS Pro. I was able to table join the original excel sheet with the columns I needed (rates during the week of April 14th and October 6th for the specific age groups) to a Toronto neighbourhood shapefile in Pro and map the rates. The maps were then exported as individual web layers to ArcGIS Online, where the pop-ups were formatted. After this was done, the maps were added to the Story Map. This was a simple process because I was still working within the ArcGIS suite so the maps could be transported from Pro to Online seamlessly.

For animations with a time and date component, Tableau requires the data to be vertical (i.e. had to be transposed). This is an example of what the transformation looks like (not the actual values):

A time placeholder was added beside the date (T00:00:00Z) and the excel file was imported into Tableau. The TotalRated variable was numeric, and put in the “Columns” section. Neighbourhoods was a string column and dragged to the “Colour” and “Label” boxes so the names of each neighbourhood would show while playing the animation. The row column was more complicated because it required the calculated field as follows:

TotalRatedRanking is the new calculation name. This produced a new numeric variable which was placed in the “Rows” box. 

If TotalRatedRanking is right clicked, various options will pop-up. To ensure the animation was formatted correctly, the “Discrete” option had to be chosen as well as “Compute Using —> Neighbourhoods.” The data looked like the screenshot below, with an option to play the animation in the bottom right corner. This process was repeated for the other two animations.

Unfortunately, this workbook could not be imported directly into Tableau Public (where there would be a link to embed in the Story Map) because I was using the full version of Tableau. To work around this issue, I had to re-create the visualization in Tableau Public (does not support animation), and then I could add the animation separately when the workbook was uploaded to my Tableau Public account. These animations had to be embedded into the Story Map, which does have an “Embed” option for external links. To do this, the “Share” button on Tableau Public had to be clicked and a link appeared. But when embedded in the Story Map, the animation is not shown because the link is not formatted correctly. To fix this, the link had to be altered manually (a quick Google search helped me solve it):

Limitations and Future Work

Creating an animation showing the rate of cases over time in each neighbourhood (for whichever age group or other category in the excel spreadsheet) may have been beneficial. An animation in ArcGIS Pro would have been cool (just not enough time to learn about how ArcGIS animation works), and this is an avenue that could be explored further. The compromise was to focus on certain age groups, although patterns between the start (April) and end (October) points are less obvious. It would also be interesting to explore other variables in the spreadsheet, such as community spread and hospitalizations per neighbourhood. I tried using kepler.gl, which is a powerful data visualization tool developed by Uber, to create an animation from January to October for all cases, and this worked for the most part (video at the end of the Story Map). The neighbourhoods were represented as dots (not polygons), which is not very intuitive for the viewer because the shape of the neighbourhood cannot be seen. Polygons can be imported into kepler.gl but only as a geojson and I am unfamiliar with that file format.

A Pandemic in Review: a Trajectory of the Novel Coronavirus

Author: Swetha Salian

Geovisualization Project Assignment @SA8905, Fall 2020

Introduction to Covid-19

Covid-19 is a topic at the top of many of our minds right now, and has been the subject of discussion all around the world. There are various sources of information out there, and as with most current issues, while sources of legitimate information exist, there is also a great deal of misinformation that may be disseminated. This has lead me to investigate the topic further, and to explore the patterns of the disease, in an effort to understand what has transpired in the past year and where we may be headed, as we enter into the second year of this pandemic.

Let’s begin with where it started, what the trajectory has looked like over the past year, and where it is currently as the year is coming to a close. Covid-19 is a disease caused by the new Coronavirus called SARS-CoV-2. The first report was of ‘viral pneumonia’ in Wuhan, China on December 31, 2019 and spread to all the continents except Antarctica, causing widespread infections and deaths. Investigations are ongoing, but as with other coronaviruses, it is believed to be spread by large respiratory droplets containing the virus through person-person contact. In January 2020, the total number of cases across the globe numbered 37,907 and within five months, by June 2020, the number rose to 10,182,385. We currently sit at over 6 million cases across 202 countries and territories, as of November 2020. The numbers still appear to be on a rise even with a number of countries taking various initiatives and measures in an effort to curb to spread of the disease. The data, however, shows that the death rate has been declining in the past few weeks, with a total of 1,439,784 deaths globally as of today. This is a ratio of approximately 2% of cumulative deaths to the total number of cases.

Using Tableau desktop 2019.2, I created a time lapse map of weekly reported COVID-19 cases from January 1 to November 15. Additionally, there is a graph displaying weekly reported deaths for the same date range as mentioned earlier.

Link to my Tableau Public map: https://public.tableau.com/profile/swetha8500#!/vizhome/Salian_Swetha_Geoviz/Dashboard1

Data

I chose to acquire data from WHO (World Health Organization) because of the reputable research and their outreach globally. The global literature cited in the WHO COVID-19 database is updated daily from searches of bibliographic databases, hand searching, and the addition of other expert-referred scientific articles. 

The data for this project is a .csv file that has a list of new & cumulative cases, new and cumulative deaths, sorted by country and reported date from January 1 through November 15. This list consists of data from 236 countries, territories and areas and a total of 72966 data entries for the year. For my analysis, I had a time lapse map of cases for the year, for which I used Cumulative_cases column. For the graphs representing weekly death count as well as top 10 countries by death count, I used New_deaths column.

Creating a Dashboard in Tableau Desktop

Tableau is a data visualization software which is fairly easy to use with minimum coding skills. It is also a great tool for importing large data and has the option for a variety of data to be imported as shown in the image below.

The .csv file imported opens up on the Data Source tab. There are options to open a New Worksheet and this is where we start creating all the visualizations separately and the last step would be to bring them all into a Dashboard tab.

In the side bar displayed on the left, there are Dimensions and Measures. Tableau is intelligent to generate longitude and latitude by country names. Rows and Columns are automatically filled in with coordinates when Country is added. In the Pages section, drag Date reported and this can be filtered by how you want to display the data, I chose weekly reported. In Marks section, drag and drop Category from Dimensions into Color and Cumulative Cases into Size and change the measure to sum.

By adding Date reported to Pages, it generates a Time Slider, which enables you to automatically play, choose a particular date and also set the speed setting to slow, medium or fast. The Category value generated a range for the number of cases reported weekly, which is what is shown as the changing colors on the map. Highlight country gives you an option to search for a particular country you want to view data for.

Create a new Dashboard and import the sheets that you have worked on and create a visual story. you have the option to add text, borders, background color, etc. to enhance the data.

As shown below, this is the static representation of the dashboard, which displays the weekly reported cases on the map and weekly reported deaths on the graph.

To publish to an online public portal follow the steps as shown below.

Limitations

As I was collecting data from the World Health Organization, I realized I couldn’t find comprehensive data on age groups and gender for cases or deaths. However, with the data I had, I was able to find a narrative for my story.

I had a hiccup while I was trying to publish to Tableau public from desktop. After creating an account online, I was getting an error on the desktop as shown below.

The solution to this is to go to the Data menu, scroll down to your data source, .csv files name in my case, and select Use Extract. Extracts are saved subsets of data that you can use to improve performance or to take advantage of Tableau functionality not available or supported in your original data. When you create an extract of your data, you can reduce the total amount of data by using filters and configuring other limits