Python – Master of Spatial Analysis

Visualizing Earthquakes with Pydeck: A Geospatial Exploration

Mapping data in an interactive and visually compelling way is a powerful approach to uncovering spatial patterns and trends. Pydeck, a Python library for large-scale geospatial visualization, is an exceptional tool that makes this possible. Leveraging the robust capabilities of Uber’s Deck.gl, Pydeck enables users to create layered, interactive maps with ease. In this tutorial, we delve into Pydeck’s potential by visualizing earthquake data, exploring how it allows us to reveal patterns and relationships in raw datasets.

This project focuses on mapping earthquakes, analyzing their spatial distribution, and gaining insights into seismic activity. By layering visual elements like scatterplots and heatmaps, Pydeck provides an intuitive, user-friendly platform for understanding complex datasets. Throughout this tutorial, we explore how Pydeck brings earthquake data to life, offering a clear picture of patterns that emerge when we consider time, location, magnitude, and depth.

Why Pydeck?

Pydeck stands out as a tool designed to simplify geospatial data visualization. Unlike traditional map-plotting libraries, Pydeck goes beyond static visualizations, enabling interactive maps with 3D features. Users can pan, zoom, and rotate the maps while interacting with individual data points. Whether you’re working in Jupyter Notebooks, Python scripts, or web applications, Pydeck makes integration seamless and accessible.

One of Pydeck’s strengths lies in its support for multiple visualization layers. Each layer represents a distinct aspect of the dataset, which can be customized with parameters like color, size, and height to highlight key attributes. For instance, in our earthquake visualization project, scatterplot layers are used to display individual earthquake locations, while heatmaps emphasize regions of frequent seismic activity. The ability to combine such layers allows for a nuanced exploration of spatial phenomena.

What makes Pydeck ideal for projects like this one is its balance of simplicity and power. With just a few lines of code, users can create maps that would otherwise require advanced software or extensive programming expertise. Its ability to handle large datasets ensures that even global-scale visualizations, like mapping thousands of earthquakes, remain efficient and responsive.

Furthermore, Pydeck’s layered architecture allows users to experiment with different ways of presenting data. By combining scatterplots, heatmaps, and other visual layers, users can craft a visualization that is both aesthetically pleasing and scientifically robust. This flexibility makes Pydeck a go-to tool for not only earthquake mapping but any project requiring geospatial analysis.

Creating Interactive Earthquake Maps: A Pydeck Tutorial

Before diving into the visualization process, the notebook begins by setting up the necessary environment. It imports essential libraries such as pandas for data handling, pydeck for geospatial visualization, and other utilities for data manipulation and visualization control. To ensure the libraries are available for usage they must be installed using pip.

! pip install pydeck pandas ipywidgets h3

import pydeck as pdk
import pandas as pd
import h3
import ipywidgets as widgets
from IPython.display import display, clear_output

Step 1: Data Preparation and Loading

Earthquake datasets typically include information such as the location (latitude and longitude), magnitude, and depth of each event. The notebook begins by loading the earthquake data from a CSV file using the Pandas library.

The data is then cleaned and filtered, ensuring that only relevant columns—such as latitude, longitude, magnitude, and depth—are retained. This preparation step is critical as it allows the user to focus on the most important attributes needed for visualization.

Once the dataset is ready, a preview of the data is displayed to confirm its structure. This typically involves displaying a few rows of the dataset to check the format and ensure that values such as the coordinates, magnitude, and depth are correctly loaded.

# Read in dataset
earthquakes = pd.read_csv("Earthquakes-1990-2023.csv")

# Drop rows with missing data
earthquakes = earthquakes.dropna(subset=["latitude", "longitude", "magnitude", "depth"])

# Convert time column to datetime
earthquakes["time"] = pd.to_datetime(earthquakes["time"], unit="ms")

Step 2: Initializing the Pydeck Visualization

With the dataset cleaned and ready, the next step is to initialize the Pydeck visualization. Pydeck provides a high-level interface to create interactive maps by defining various layers that represent different aspects of the data.

The notebook sets up the base map using Pydeck’s Deck class. This involves defining an initial view state that centers the map on the geographical region of interest. The center of the map is determined by calculating the average latitude and longitude of the earthquakes in the dataset, and the zoom level is adjusted to provide an appropriate level of detail.

# Render map
pdk.Deck(
    layers=[heatmap_layer],
    initial_view_state=view_state,
    tooltip={"text": "Magnitude: {magnitude}\nDepth: {depth} km"},
).show()

Step 3: Creating the Heatmap Layer

The primary visualization in the notebook is a heatmap layer to display the density of earthquake events. This layer aggregates the data into a continuous color gradient, with warmer colors indicating areas with higher concentrations of seismic activity.

The heatmap layer helps to identify regions where earthquakes are clustered, providing a broader view of global or regional seismic activity. For instance, high-density areas—such as the Pacific Ring of Fire—become more prominent, making it easier to identify active seismic zones.

# Define HeatmapLayer
heatmap_layer = pdk.Layer(
    "HeatmapLayer",
    data=filtered_earthquakes,
    get_position=["longitude", "latitude"],
    get_weight="magnitude",  # Higher magnitude contributes more to heatmap
    radius_pixels=50,  # Radius of influence for each point
    opacity=0.7,
)

Step 4: Adding the 3D Layer

To enhance the visualization, the notebook adds a columnar layer, which maps individual earthquake events and there depths as extruded columns on the map. Each earthquake is represented by a column, where:

Height: The height of each column corresponds to the depth of the earthquake. Tall columns represent deeper earthquakes, making it easy to identify significant seismic events at a glance.
Color: The color of the column also emphasizes the depth of the earthquake, with a color gradient yellow-red used to represent varying depths. Typically, deeper earthquakes are shown in redder colors, while shallower earthquakes are displayed in yellow.

This 3D column layer provides an effective way to visualize the distribution of earthquakes across geographic space while also conveying important information about their depth.

# Define a ColumnLayer to visualize earthquake depth
column_layer = pdk.Layer(
    "ColumnLayer",
    data=sampled_earthquakes,
    get_position=["longitude", "latitude"],
    get_elevation="depth",  # Column height represents depth
    elevation_scale=100,
    get_fill_color="[255,  255 - depth * 2, 0]",  # yellow to red
    radius=15000,
    pickable=True,
    auto_highlight=True,
)

Step 5: Refining the Visualization

Once the base map and layers are in place, the notebook provides additional customization options to refine the visualization. Pydeck’s interactive capabilities allow the user to:

Zoom in and out: Users can zoom in to explore smaller regions in greater detail or zoom out to get a global view of seismic activity.
Hover for details: When hovering over an earthquake event on the map, a tooltip appears, providing additional information such as the exact magnitude, depth, and location. This interaction enhances the user experience, making it easier to explore the data in a hands-on way.

The notebook also ensures that the map’s appearance and behavior are tailored to the dataset, adjusting parameters like zoom level and pitch to create a visually compelling and informative display.

Step 6: Analyzing the Results

After rendering the map with all layers and interactive features, the notebook transitions into an analysis phase. With the interactive map in front of them, users can explore the patterns revealed by the visualization:

Clusters of seismic activity: By zooming into regions with high earthquake density, users can visually identify clusters of activity along tectonic plate boundaries, such as the Pacific Ring of Fire. These clusters highlight regions prone to more frequent and intense earthquakes.
Magnitude distribution: The varying sizes of the circles (representing different earthquake magnitudes) reveal patterns of high-magnitude events. Users can quickly spot large earthquakes in specific regions, offering insight into areas that may need heightened attention for preparedness or mitigation efforts.
Depth-related trends: The color gradient used to represent depth provides insights into the relationship between earthquake depth and location. Deeper earthquakes often correspond to subduction zones, where one tectonic plate is forced beneath another. This spatial relationship is critical for understanding the dynamics of earthquake behavior and associated risks.

By interacting with the map, users gain a deeper understanding of the data and can draw meaningful conclusions about seismic trends.

Limitations of Pydeck

While Pydeck is a powerful tool for geospatial visualization, it does have some limitations that users should be aware of. One notable constraint is its dependency on web-based technologies, as it relies heavily on Deck.gl and the underlying JavaScript frameworks for rendering visualizations. This means that while Pydeck excels in creating interactive, browser-based visualizations, it may not be the best choice for large-scale offline applications or those requiring complex, non-map-based visualizations. Additionally, Pydeck’s documentation and community support, although growing, may not be as extensive as some more established libraries like Matplotlib or Folium, which can make troubleshooting more challenging for beginners. Another limitation is the performance handling of extremely large datasets; while Pydeck is designed to handle large-scale data, rendering thousands of points or complex layers may lead to slower performance depending on the user’s hardware or the complexity of the visualization. Finally, while Pydeck offers significant customization options, certain advanced features or highly specialized geospatial visualizations (such as full-featured GIS analysis) may require supplementary tools or libraries beyond what Pydeck offers. Despite these limitations, Pydeck remains a valuable tool for interactive and engaging geospatial visualization, especially for tasks like real-time data visualization and web-based interactive maps.

Conclusion

Pydeck transforms geospatial data into an interactive experience, empowering users to explore and analyze spatial phenomena with ease. Through this earthquake mapping project, we’ve seen how Pydeck highlights patterns in seismic activity, offering valuable insights into the magnitude, depth, and distribution of earthquakes. Its intuitive interface and powerful visualization capabilities make it a vital tool for geospatial analysis in academia, research, and beyond. Whether you’re studying earthquakes, urban development, or environmental changes, Pydeck provides a platform to bring your data to life. By leveraging its features, you can turn complex datasets into accessible stories, enabling better decision-making and deeper understanding of the world around us. While it is a powerful tool for creating visually compelling maps, it is important to consider its limitations, such as performance issues with very large datasets and the need for web-based technology for rendering. For users seeking similar features in a less code-based environment Kepler.gl—an open-source geospatial analysis tool—offer even greater flexibility and performance. To explore the notebook and try out the visualization yourself, you can access it here. Pydeck opens up new possibilities for anyone looking to dive into geospatial analysis and create interactive maps that bring data to life.

Putting BlogTO on the map (literally) – Tutorial

Kyle Larsen
SA8905 – Cartography and Geovisualization
Fall 2019

Instagram is a wealth of information, for better or worse, if you’ve posted to Instagram before and your profile is public, maybe even if it’s not, then your information is out there just waiting for someone, someone maybe like me, to scrape your information and put it onto a map. You have officially been warned.

But I’m not here to preach privacy or procure your preciously posted personal pics. I’m here to scrape pictures from Instagram, take their coordinates, and put them onto a map into a grid layout over Toronto. My target for this example is a quite public entity that thrives off exposure, the notorious BlogTo. Maybe only notorious if you live in Toronto, BlogTo is a Toronto-based blog about the goings on in the 6ix as well as Toronto life and culture, they also have an Instagram that is almost too perfect for this project – but more on that later. Before anything is underway a huge thank-you-very-much to John Naujoks and his Instagram scraping project that created some of the framework for this project (go read his project here, and you can find all of my code here)

When scraping social media sometimes you can use an API to directly access the back end of a website, Twitter has an easily accessible API that is easy to use. Instagram’s API sits securely behind the brick wall that is Facebook, aka it’s hard to get access to. While it would be easier to scrape Twitter, we aren’t here because this is easy, maybe it seems a little rebellious, but Instagram doesn’t want us scraping their data… so we’re going to scrape their data.

This will have to be done entirely through the front end, aka the same way that a normal person would access Instagram, but we’re going to do it with python and some fancy HTML stuff. To start you should have python downloaded (3.8 was used for this but any iteration of python 3 should give you access to the appropriate libraries) as well as some form of GIS software for some of the mapping and geo-processing. Alteryx would be a bonus but is not necessary.

We’re going to use a few python libraries for this:

urllib – for accessing and working with URLs and HTML
selenium – for scraping the web (make sure you have a browser driver installed, such as chromedriver)
pandas – for writing to some files

If you’ve never done scraping before, it is essentially writing code that opens a browser, does some stuff, takes some notes, and returns whatever notes you’ve asked it to take. But unlike a person, you can’t tell python to go recognize specific text or features, which is where the python libraries and HTML stuff comes in. The below code (thanks John) takes a specific Instagram user and return as many post URLs as you want and adds them to a list, for your scraping pleasure. If you enable the browser head you can actually watch as python scrolls through the Instagram page, silently kicking ass and taking URLs. It’s important to use the time.sleep(x) function because otherwise Instagram might know what’s up and they can block your IP.

But what do I do with a list of URLs? Well this is where you get into the scrappy parts of this project, the closest to criminal you can get without actually downloading a car. The essentials for this project are the image and the location, but this where we need to get really crafty. Instagram is actually trying to hide the location information from you, at least if you’re scraping it. Nowhere in a post are coordinates saved. Look at the below image, you may know where the Distillery District is, but python can’t just give you X and Y because it’s “south of Front and at that street where I once lost my wallet.”

If you click on the location name you might get a little more information but… alas, Instagram keeps the coordinates locked in as a .png, yielding us no information.

BUT! If you can scrape one website, why not another? If you can use Google Maps to get directions to “that sushi restaurant that isn’t the sketchy one near Bill’s place” then you might as well use it to get coordinates, and Google actually makes it pretty easy – those suckers.
(https://www.google.com/maps/place/Distillery+District,+Toronto,+ON/@43.6503055,-79.35958,16.75z/data=!4m5!3m4!1s0x89d4cb3dc701c609:0xc3e729dcdb566a16!8m2!3d43.6503055!4d-79.35958 )
I spy with my little eye, some X and Y coordinates, the first set after the ‘@’ would usually be the lat/long of your IP address, which I’ve obviously hidden because privacy is important, that’s the takeaway from this project right? The second lat/long that you can gleam at the end of the URL is the location of the place that you just googled. Now all that’s left is to put all of this information together and create the script below. Earlier I said that it’s difficult to tell python what to look for, and what you need is the xpath, which you can copy from the html (right-click an element and then right-click that html and then you can get the xpath for that specific element. For this project we’re going to need the xpath for both the image and the location. The steps are essentially as follows:

go to Instagram post
download the image
copy the location name
google the location
scrape the URL for the coordinates

There are some setbacks to this, not all posts are going to have a location, and not all pictures are pictures – some are videos. In order for a picture to qualify for full scraping it has to have a location and not be a video, and the bonus criteria – it must be in Toronto. Way back I said that BlogTO is great for this project, that’s because they love to geotag their posts (even if it is mostly “Toronto, Ontario”) and they love to post about Toronto, go figure. With these scripts you’ve built up a library of commands for scraping whatever Instagram account your heart desires (as long as it isn’t private – but if you want to write script to log in to your own account then I guess you could scrape a private account that has accepted your follow request, you monster, how dare you)

With the pics downloaded and the latitudes longed it is now time to construct the map. Unfortunately this is the most manual process, but there’s always the arcpy library if you want to try and automate this process. I’ll outline my steps for creating the map, but feel free to go about it your own way.

Create a grid of 2km squares over Toronto (I used the grid tool in Alteryx)
Intersect all your pic-points with the grid and take the most recently posted pic as the most dominant for that grid square
Mark each square with the image that is dominant in that square (I named my downloaded images as their URLs)
Clip all dominant images to 1×1 size (I used google photos)
Take a deep breath, maybe a sip of water
Manually drag each dominant image into its square and pray that your processor can handle it, save your work frequently.

This last part was definitely the most in need of a more automated process, but after your hard work you may end up with a result that looks like the map below, enjoy!

Lexical Distance and Linguistic Diversity in the Balkans: A Network Map

By Zeljko Bavcevic

Geovis Class Project @RyersonGeo, SA8905, 2018

Introduction

The purpose of this series of posts is to serve as a record of my work on the SA8905 Geovisualization project. The broad aim of this project is to explore the complex relationship between language and geography, and each serves as a mediating factor on the other. I have long been fascinated by how geography has an impact on language and on populations, borders and culture by proxy. Specifically, I was hoping to understand how changes in the traversability of landscapes would impact migrations of people and thus impact language.

Over the course of research phase of this project, I realized that conducting the project on all of the languages of the world would require a large amount of data collection and cleaning, such that was considerably beyond the temporal scope of a single class. As such, I narrowed my goal to examining only the geography of language in the Balkan Peninsula, and how they related to one another in terms of linguistic diversity, lexical distance and speaker population.

Research

The first stage in this process required finding data to operationalize my target variables. This proved more difficult than I had first expected for a number of reasons. Firstly, there is no single, global set of agreed upon variables for understanding language. Instead there are a number of competing variable sets each maintained by different organizations (with very different incentives).

Eventually I narrowed my focus on three primary linguistic variables for visualization, these are:

Speaker Population: The number of individuals in the world estimated to speak a certain language. The value is largely based on a projection.

Lexical Distance: A linguistic variable measuring the conceptual distance between languages by comparing each along a number of criteria such as common words, verb formations and other comparative measures.

Linguistic Diversity: An index score that measures the different types of languages, dialects or variations spoken within the regions of the primary language.

Data:

The data for these variables are generated and maintained by two primary organizations, SIL International and Unesco. SIL Interational compiles data on a number of relevant linguistic variables for sale to organizations. Unesco on the other hand is an international non-profit organization. The two data sets are very different in their methodologies and as such cannot be combined or used in conjunction. For this purpose, I elected to only use one of the data sets. Although the Unesco data was free, the format it was kept in would have required a laborious process of cleaning and transformation before it could easily be used in my model. As such, I reached out to SIL international for a quote and acquired the data I needed. It must be noted, for the purposes of transparency, that SIL is a Christian organization and there have been several concerns about its methodology and incentive structure. To assess the impact of thee on my outcome, I did a brief comparison between a sub-sample of the UNESCO and SIL data sets and was satisfied that it was within acceptable parameters.

During my research, I had found a number of illustrations of lexical distance. Most often these would take the form of node or network charts depicting the different languages (Figure 1).

Figure 1: Lexical Distance Network

However, these were all static, non-spatial and often did not take into account other relevant linguistic variables such as linguistic diversity within a language class. As such, I wanted my own visualization to be dynamic, interactive, spatial and containing other relevant linguistic variables. To this end I needed to find a technology or platform that would allow me sufficient customizability and interactions. Inspired by the network or node maps I had consistently seen throughout my research phase, I knew that I wanted to build on this concept, adding a spatial and interactive component.

Technology

I considered a number of options, the first and most obvious was using Python to code a network map using the Gephi platform. While pure coding would offer the most freedom and customizability, hosting the various tools I needed would prove very tedious and costly. As such, I set out in search of a hosted node or network analysis platform. After considerable research into a number of possible candidates, I opted for kumu.io.

Kumu was selected because it allowed me the freedom of coding most of my map to my specifications (On top of having a very user friendly UI), while also hosting all of my data and tools natively. This reduced the technical “surface area” of the project, which reduces opportunities for code breaking bugs and cross platform communications errors. Paying the modest membership fee, I began adding my data to Kumu.

Execution

The first stage of development was loading my SIL data into Kumu. This was made easy using Kumu’s data cleaning tool. This allows the user to make sure all the input data meets kumu’s formatting requirements and even allows the user to dynamically change spreadsheet documents before upload.

Kumu’s Upload Wizard

After this was complete I created a bi-directional connection between each language (or element in network analysis parlance). This resulted in an ugly and incomprehensible visual bundle of connections. The next stage of the process would be coding the various variable symbolizes, interface options (adding a search, zoom and selection toolbar).

Kumu’s Advanced Code Based Editor

This was done using Kumu’s advance coding editor and I encountered no issues during this stage. However, when I attempted to add the polygon of the various countries of the Balkan Peninsula, the map visualization would simply vanish and I was not able to trouble shoot this with any success. As such, I had to ultimately abandon the spatial component of the project due to the constraints of time. I was still very satisfied with the resulting output.

The Final Output

Challenges

A number of challenges were encountered during the course of this project. The primary issue was that the geographic overlay failed to load. My every attempt to fix this was unsuccessful and ultimately this radically undermined completing the project as I had conceived it at the design stage. Nonetheless, I still believe that the other elements of the project still satisfied the project requirements of producing a novel and interesting geovisualization.

NHL Travel Web App

by Luke Johnson
Geovis Project Assignment @RyersonGeo, SA8905, Fall 2017

Context

I’ve been a Toronto Maple Leaf and enthusiastic hockey fan my whole life, and I’ve never been able to intersect my passion for the sport with my love of geography. As a geographer, I’ve been looking for ways to blend the two together for a few years now, and this geovis project finally provided me the opportunity! I’ve always been interested in the debate about how teams located on the west cost travel more than teams located centrally or on the east coast, and that they had a way tougher schedule because of the increased travel time. For this project, I decided to put that argument to rest, and allow anybody to compare two teams for the 2016/2017 NHL season, and visualize all the flights that they take throughout the year, as well as view the accumulated number of kilometres traveled at any point during the season and display the final tally. I thought this would be a neat way to show hockey fans the grueling schedule the players endure throughout the year, without the fan having to look at a boring table!

It all started with the mockup above. I had brainstormed and created a few different interfaces, but this is what I came up with to best illustrate travel routes and cumulative kilometres traveled throughout the year. The next step was deciding on the what data to use and which technology would work best to put it all together!

Data

First of all, all NHL teams were compiled along with the name of their arena and the arena location (lat/long). Next, a pre-compiled csv of the NHL schedule was downloaded from left wing lock, which saved me a lot of time not having the scrape the NHL website, and compile the schedule myself. Believe it or not, that’s all the data I needed to figure out the travel route and kilometres traveled for each team!

Methods

All of this data mentioned above was put into a SQLite database with 3 tables – a Team table, Arena table, and a Schedule table. The Arena table could be joined with the Team table, to get information on which team played at what arena, and where that arena is located. The Team table can also be joined with the Schedule table, to get information regarding which teams play on what day, and the location of the arena that they are playing.

Because I wanted to allow the user to select any unique combination of teams, it would have been very difficult to pre-process all of the unique combinations (435 unique combinations to be exact). For this reason, I decided to build a very lightweight Application Programming Interface (API) that would act as a mediator between the database and the web application. API’s are a great resource for controlling how the data from the database is delivered, and simplifies the combination process. This API was programmed in Python using the Flask framework. The following screenshot shows a small exert from the Flask python code, where a resource is set up to allow the web application to query all of the arenas, and will get back a geojson which can be displayed on the map.

After the Flask python API was configured, it was time to build the front end code of the application! Mapbox was chosen as the web mapping tool in the front end, mainly because of its ease of use and vast sample code available online. For a smaller number of users, it’s completely free! To create the chart, I decided to use an open source charting library called Chart.js. It is 100% free, and again has lots of examples online. Both the mapbox map and Chart.js chart were created using javascript, and wrapped within HTML and CSS, to create one main webpage.

To create the animation, the web application sends a request to the API to query the database for each team chosen to compare. The web application then loops through the schedule for each team, refreshing the page rapidly to make a seamless animation of the 2 airplane’s moving. At the same time the distance between two NHL arenas is calculated, and a running total is appended to the chart, and refreshed after each game in the schedule. The following snippet of code shows how the team 1 drop down menu is created.

Results

After everything was compiled, it was time to demo the web app! The video below shows a demo of the capability of the web application, comparing the Toronto Maple Leafs to the Edmonton Oilers, and visualizing their flights throughout the year, as well as their total kilometres traveled.

To get a more in depth understanding of how the web application was put together, please visit my Github page where you can download the code and build the application yourself!