Modelling Ontario Butterfly Populations using Citizen Science

Author Name: Emily Alvarez

Data Source: Toronto Entomologists Association (TEA), Statistics Canada

Project Link:

https://public.tableau.com/profile/emily6079#!/vizhome/ModellingOntarioButterflyPopulationsusingCitizenScience/Butterfly_Dashboard?publish=yes

Background:

Over the summer, I spotted multiple butterflies and caterpillars in my garden and became curious about what species may be present in my area and how that might change over time. Originally, I wanted to look at pollinators in general and their populations in Canada, but the data was not available for this. I reached out to the Toronto Entomologists Association (TEA) and fortunately, there was an abundant amount of butterfly population data gathered for the Ontario Butterfly Atlas. This atlas data comes from eButterfly records, iNaturalist records and BAMONA records, as well as records submitted by the public directly to TEA, therefore this data is collected by anyone who wants to submit observations. The organization had an interactive web-map (Figure 1), but this data still had more potential to be designed in a way that can engage both butterfly enthusiasts and the general public.

Figure 1: Ontario Butterfly Atlas Interactive Web Map

Technology

I chose Tableau as the platform to model this data on because it works efficiently with complex databases and large datasets. It is easy to sort and filter the data as well as perform operations (SUM, COUNT) as this was needed for some components of the dashboard. I have used Tableau in the past for simple data visualization but never for spatial data so I felt that using Tableau could be a learning experience as well as improving my skills on a software that I have used in the past.  

Data & Methods:

I consulted with a contact at TEA who provided me with context on the data such as how it is gathered, missing gaps, and the annual seasonal summary on the data. Based on this information and after reviewing the dataset, I felt that there were 3 main components I could model about butterfly species in Ontario. Their location, number of yearly observations and their flight periods for adult populations. Because there was so much data, I focused on 2019 for the locational data and flight periods. There were some inconsistencies with how some of the data was recorded, mostly for number of adults observed since this was not always recorded as a numeric value, therefore any rows that did not have a numeric value were omitted from the dataset.

I chose to model the location of the species by census division because these divisions are not too small in area but are also general enough that it is easy to find the user’s location if they reside in Ontario. This resulted in a spatial join between the observation’s coordinates and the provincial census divisions’ geometry which allowed for a calculation of total sum of adults observed per census division which could also be filtered by species (Figure 2).

Figure 2: Census Division Map of Adult Butterfly Species

I modelled flight periods by month of observation of adult species because this seemed like an efficient way for the user to find when species are in their flight periods (Figure 3). Some enthusiasts may prefer this data to be modelled by month-thirds instead, but I felt that because I wanted this dashboard to be for both butterfly enthusiasts and the general public, I thought modelling by month may be easier for the user to interpret. I decided to also show this by census division because the circle size helps indicate where observations are most popular and how that compares to other census divisions. The user also has a choice to sort by census division and only visualize the flight period for that particular census division.

Figure 3: Flight Period

I modelled yearly observations starting from 2010 because submitted observations began to increase during this time due to more accessibility to online services for submissions, although data exists from the 1800s (Figure 4). This data also could only be filtered by species and not census division because this dataset with all of the observations is too big for the spatial join and caused issues with data extraction that Tableau requires for workbooks to post online.  

Figure 4: Yearly Observations for all Census Divisions

Limitations and Future Work:

  • One of the biggest limitations to this dataset is the lack of observations in the northern regions compared to the southern. Because there is a lower population and less accessibility to a lot of areas, there are few submitted observations here, therefore the dataset does not capture the whole picture of Ontario.
  • Another limitation is that because this is citizen science-based data, there is some inconsistency with some data entry, as an example, the Adult populations were not always recorded numerically but sometimes with text or unclear values such as “a few, many, >100” which resulted in these observations not being modelled because they could not be properly quantified.
  • Another limitation is that the yearly observations cannot be sorted by census division. Because this contains such a large dataset, to conduct the spatial join with the census division polygons caused issues with data extraction and publishing the workbook. Therefore, this component can only be sorted by species.
  • The last biggest limitation to the dashboard is the way flight periods are modelled. Butterfly enthusiasts may prefer to look at flight periods within a smaller scale than months and prefer month-thirds. A future addition to this dashboard could include a toggle that allows you to switch between looking at flight period by month or month-thirds instead.