R – Master of Spatial Analysis

Gregory Huang
Geovisualization Project, @RyersonGeo, Fall 2019

Introduction

This project is a demonstration of the abilities of the mapdeck package in R, including its shiny interactive app compatibility.

Mapdeck is an R package created by David Cooley. Essentially, it integrates some of mapbox’s functionality into the R environment. Mapbox is a popular web-based mapping service that is community-driven and provides some great geovisualization functionalities. Strava’s global heat map is one example.

I am interested in looking at flight routes across global hubs and see if there are destination overlaps for these routes. Since the arc layer provided by mapdeck has impressive visualization capabilities of the flight routes, I’ve chosen to use mapdeck to visualize some flight route data around the world.

Example of a map generated by mapdeck: arcs, text, lines, and scatterplots are all available. Perspective changes can be done by pressing down Ctrl and clicking. The base maps are customizable with a massive selection of both mapbox and user-generated maps. This map is one of the results from longest_flights.R, which uses the “decimal” basemap.

The Map has some level of built-in interactivity: Here is an example of using a “tooltip” where if a user hovers over an arc, the arc highlights and shows information about that particular route. Note that mapdeck doesn’t want to draw flight routes across the Pacific – so if accuracy is key, do keep this in mind.

Software Requirements

To replicate this project, you’ll need your own mapbox access token. It is free as long as you have a valid email address. Since the code is written in R, you’ll also need R and R Studio downloaded on your machine to run the code.

Tl;dr…

Here’s the Shiny App

The code I created and the data I used can also be found on my GitHub repository, Geovis. To run them on your personal machine, simply download the folder and follow the instructions on the README document at the bottom of the repository page.

Screenshot of the shiny app: The slide bar will tell the map which flights to show, based on the longitudes of the destinations. All flights depart out of YYZ/KEF/AMS/FRA/DXB.

Details: Code to Generate a Map

The code I’ve written contained 2 major parts, both utilizing flight route data. The first part is done with longest_flights.R, demonstrating the capabilities of the mapdeck package using data I curated for the longest flights in the world. The second part is done with yyz_fra.R and shinyApp.R to demonstrate the shiny app compatibility and show how the package handles larger datasets (hint – very well). The shinyApp uses flight route data from 5 airports: Toronto, Iceland-Keflavik, Amsterdam, Frankfurt, and Dubai, pulled from openflights.org.

For the flight route data for the 5 airports, in particular, the data needed cleaning to make the data frame useable to mapdeck. This involved removing empty rows, selecting only the relevant data, and merging the tables.

Code snippet for cleaning the data. After the for loop completes, the flight route data downloaded from openflights.org becomes available to be used for mapdeck.

Once the data were cleaned, I began using the mapdeck functions to map out the routes. The basic parts of the mapdeck() function are to first declare your key, give it a style, and assign it a pitch if needed. There are many more parameters you can customize, but I just changed the style and pitch. Once the mapdeck map is created, use the “pipe” notion (%>%) to add any sort of layers to your map. For example, add_arc() to add the arcs seen in this post. Of course, there are many parameters that you can set, but the most important are the first three: Where your data come from, and where the origin/destination x-y coordinates are.

An example creating an arc on a map. In addition to the previously mentioned parameters, tooltip generates the little chat boxes when you hover over a layer entry, and layer_id is important when there are multiple layers on the same map.

Additional details on creating all different types of layers, including heatmaps, can be found on the documentation page HERE.

Details: Code to make a “Shiny” app

On top of the regular interactive functionalities of mapdeck, incorporating a mapdeck map into shiny can add more layers of interactivity to the map. In this particular instance, I added a slider bar in Shiny where the user can indicate the longitudes of the destinations they want to see. For example, I can filter to see just the flights going to East Asia by using that slider bar. Additional functions of shiny include using drop-menus to select specific map layers, and checkboxes as well.

The shiny code can roughly be broken down into three parts: ui, server, and shinyApp(ui, server). The ui handles the user interface and receives data from the server, while the server decides what map to produce by the input given by the user in ui. shinyApp(ui,server) combines the two to generate a shiny app.

Mapdeck integrates into the shiny app environment by mapdeckOutput() in ui to specify the map to display, and by renderMapdeck() and mapdeck_update() in server to generate the map (rendeerMapdeck) and appropriate layers to display (mapdeck_update).

Below is the code used to run the shiny app demonstrated in this blog post. Note the ui and server portions of the code bode. To run the shiny app after that, simply run shinyApp(ui,server) to generate the app.

Snippet of the Server creation section. Note that the code listens to what the UI says with reactive() and observeEvent().

This concludes my geovis blog post. If you have any questions, please feel free to email me at gregory.huang@ryerson.ca.

Here is the link to my GitHub repository again: https://github.com/greghuang8/Geovis

Transportation Flows Mapping Using R

The geographic visualization of data using programming languages, and specifically R, has seen a substantial upsurge in adoption and popularity among members of the GIS and data analytics community in recent years. While the learning curve in acquainting oneself with scripting techniques might be steeper than using more traditional and out of box GIS applications, it undoubtedly provides some other benefits such as building customizable processes and handling complex spatial analysis operations. The latter point being imperative for projects containing extensive amounts of data as is often the case with transportation and commuting flows which ordinarily contain considerable amount of records comprising of trips’ origins and destinations, mode of transport and travel times information. An added interesting perk is that R offers very creative and visually appealing finalized graphical solutions which were one of the motivators behind the choice of technique for this project. The primary motivator was, however, the program’s capacity in transportation data modelling and mapping as the aim of the project was mapping commuting flows.

Story of R

R is an open source software environment and language for statistical computing and graphics. It is highly extensible which makes it particularly useful to researchers from varied academic and professional fields (they increasingly range from social science, biology and engineering to finance and energy sectors and multifold other fields in between). It is also one of the most rapidly growing software programs in the world, most likely due to the expansion of data science. In the context of Geographic Information Systems (GIS), it can be described as a powerful command-line system comprised of a range of tailored packages, each of them offering different and additional components for handling and analyzing spatial data. The ones utilized in the project were ggplot2, and maptools, and to lesser extent plyr. The former two are some of the most common ones in the R geospatial community while the others encountered in research and worth exploring further were: leaflet and mapview for interactive maps; shiny for web applications; and ggmap, sp and sf for general GIS capabilities. Being an open source software, R community is very helpful in organizing and locating necessary information. One neat option is the readily available cheat sheets for many of the packages (i.e. ggplot cheat sheet) which make finding information genuinely fast.

There are some stunning examples of data visualization in R. One that made a significant media splash a few years ago was done by Paul Butler, a mathematics student at University of Toronto at the time, who plotted social media friendship connections (it created admiration as well as disbelief from many, according to an author, that this was done with less than 150 lines of code in an “old dusty” statistical software such as R). It also inspired further data visualization explorations using R. One of my favorite recent such works came in the form of a compelling book London – The Information Capital by geographer James Cheshire and its co-author designer Oliver Uberti. The majority of the examples in the book were predominantly written not only in R but specifically in its ggplot package, in combination with graphic design applications, and should serve as innovative illustrations on data visualization approaches as well as capabilities on what software could potentially provide. Both of the aforementioned projects inspired mine.

Transportation Mapping and Modelling

I would like to give some background on the type of analysis that was conducted. One of the common types of analysis in transportation geography, transportation planning and transportation engineering is geographic analysis of transport systems for origin-destination data that shows how many people travel (or could potentially travel) between places. This also represents the basic unit of analysis in most transport models which is the trip (single purpose journeys from an origin “A” to origin “B”, and not to be mistaken with Timothy Leary definition). Trips are often grouped by transport mode or number of people travelling, and are represented as desire lines connecting zone centroids (desire lines are straight and closest possible lines between origin – destination points, and can be converted to routes). They do not necessarily need to represent just movement of the people and can show commodity flows and retail trade as well. TransCAD software is often used as the industry standard for this type of modelling. It is, however, quite costly and implemented solely by transportation planning firms and agencies. On the other hand, R is starting to see dedicated transportation planning packages and continuously utilizing relevant GIS ones in transportation field. And most importantly: it’s free.

Data

The dataset implemented for the project was American Community Survey 2009-2013 – 5 Year American Community Survey Commuting Flows located via Inter-University Consortium for Political and Social Research. It is a survey for the entire United States focusing on people’s (over the working age of 16) journeys to work. Data in the original survey was tabulated based on a few categories: means of transportation to work, private vehicle occupancy, time leaving home to go to work, travel and aggregated travel time to work, etc. For the purposes of the project all workers in commuting flows were selected (grouped together for all transportation modes). The trips were based on inner and inter-county commutes.

There are two main components needed when mapping transportation flows in general: coordinates of place of origin, and coordinates of place of destination. Common practice in transportation planning field is to have population weighted centroids for origins and destinations, regardless of the geographic unit of analysis, which in this case was U.S. counties. Therefore population weighted centroid shapefile for U.S. counties was needed so that it can be merged with the original survey data. It was located at the U.S. Census Bureau website and based on 2010 U.S. Census population numbers and distributions per county areas. The study area for the project was the United States and it excluded Canada and Mexico (even though both countries were included for workplace-based geographies), because specific regions of both countries were not mentioned which would make calculations of population weighted centroids not very realistic. Additionally, these records were not numerous to significantly change the model.

Process

In the first step, data was loaded and reformatted in R (R can be downloaded from https://www.r-project.org/ and although analysis can be conducted in R directly it is much preferred and easier to use Rstudio which provides a user-friendly-graphical interface). Rstudio interface and snippet of code is displayed in Figure 1 below (Rstudio can be downloaded from https://www.rstudio.com/ ).

Figure 1: Rstudio interface and snippet of code in the project

Following the two datasets, original commuting survey and population weighted centroids, were joined based on county name and code, and then the unified file was subset to exclude Canada and Mexico, followed by renaming some columns fields for easier readings of origin and destination coordinates. In the next step, ggplot2 was used to position scales for continuous data for x and y axes, succeeded by plotting line segments with alpha command. Number of trips to be plotted were experimented with to show either all trips, or to filter them based on more than 5, 10, 15, 20, 25 and 50 trips. Showing all trips resulted in too dense of a plot as all of the United States was used as a study area. If the study area was of a large scale in nature, showing all trips would be acceptable. The optimal results seemed to be when trips were filtered to show over 10 inner and inter county journeys-to-work trips which resulted in the plot displayed in Figure 2.

Figure 2 – U.S. origin-destination plot in Rstudio_US_11x17_10_Nebojsa_Stulic

The final map was then graphically improved in Adobe Creative Suite resulting in image in Figure 3.

Figure 3: Final mapping project after graphical improvements

Map

The final design showing thousands of commuting trips resembled a NASA image of United States from space at night. It indicated some predictable commuting patterns such as increased journey-to-work lines concentration in large urban centres and in areas with large population densities, such as the North East part of the country. However, some patterns were not so obvious and required some further digging into data accuracy (which passed the test) and then the way in which the original survey was designed. For instance, there are lines from Honolulu, Anchorage and Puerto Rico to the mainland even though the survey was designed to represent daily commuting flows by car, truck, or van; public transport, and other means of commuting. The survey was designed to ask questions for all workers based on primary and secondary jobs by way of commuting for respective reference week when it was conducted and answered. These uncommon results were attributable to people who worked during the reference week at a location that was different from their home (or usual place of work), such as people away from home on business. Therefore place-of-work data showed some interesting geographic patterns of workers who made daily work trips to different parts of the country (e.g., workers who lived in New York and worked in California).

The final mapping product was printed and framed on 24” x 36” canvas as shown in Figure 4. Size was chosen based on aspect ratio of 2 to 3 which seemed best suited to represent the geography of the United States horizontal width and vertical length. Some other options would be to print on acrylic or aluminum which is less cost effective and more time consuming (most of the shops require around 10 days to complete it). However, the printed map on canvas was my preferred choice for this project based on the aesthetic I was aiming for which was to have the appearance of accentuated high commuting areas and dimmed low commuting areas. Another pleasant surprise was that when printing was finalized it manifested more as a painting than data visualization transportation project.

Figure 4 – Printed map on canvas_Nebojsa_Stulic

Figure 4: Printed map on canvas

Tag: R

GeoVis: Mapdeck Package in R

Introduction

Software Requirements

Tl;dr…

Details: Code to Generate a Map

Details: Code to make a “Shiny” app

Transportation Flow Mapping Using R