United States – Master of Spatial Analysis

Interactive Map and Border Travels

Given the chance to look at making geovisualisation, a pursuit began to bring in data on a scope which would need adjustments and interaction for understanding geography further and further, while still being able to begin the journey with an overview and general understanding of the topic at hand.

This blog post doesn’t unveil a hidden gem theme of border crossing, but demonstrates how an interactive map can share the insights which the user might seek, not being limited to the publisher’s extents or by printed information. Border crossing is selected as topic of interest to observe the navigation that may get chosen with borders, applying this user to a point of view that is similar to those crossing at these points themselves, by allowing them to look at the crossing options, and consider preferences.

To give the user this perspective, this meant beginning to locate and provide the crossing points. The border crossing selected was the US border between Canada and between Mexico, being a scope which could be engaged with the viewer and provide detail, instead of having to limit this data of surface transportation to a single specified scale and extent determined by the creator rather than the user.

Border crossings are a matter largely determined by geography, and are best understood in map rather than any other data representation, unlike attributes like sales data which may still be suitable in an aspatial sense, such as projected sales levels by line graph.

To get specific, the data came from the U.S. Bureau of Transportation Statistics, and was cleaned to be results from the beginning of January 2010 til the end of September 2020. The data was geocoded with multiple providers and selected upon consistency, however some locations were provided but their location could not be identified.

Seal of the U.S. Bureau of Transportation Statistics

To start allowing any insights for you, the viewer, the first data set to be appended to the map is of the border locations. These are points, and started to identify the distribution of crossing opportunities between the north American countries. If a point could not be appended to the location of the particular office that processed the border entries, then the record was assigned to the city which the office was located in. An appropriate base layer was imported from Mapbox to best display the background map information.

The changes in the range of border crossings were represented by shifts in colour gradient and symbol size. With all the points and their proportions plotted, patterns could begin to be provided as per the attached border attributes. These can illustrate the increases and decreases in entries, such as the crossings in California points being larger compared to entries in Montana.

But is there a measure as to how visited the state itself is, rather than at each entry point? Yes! Indeed there is. In addition to the crossing points themselves, the states which they belong to have also been given measurement. Each state with a crossing is represented on the map displaying a gradient for the value of average crossing which the state had experienced. We knew that California had entry points with more crossings than the points shown in Montana, but now we compare these states themselves, and see that California altogether still experienced more crossings at the border than Montana had, despite having fewer border entry points.

Could there be a way to milk just a bit more of this basic information? Yes. This is where the map begins to benefit from being interactive.

Each point and each state can be hovered over to show the calculated values they had, clarifying how much more or less one case had when compared to another. A state may have a similar gradient, an entry point may appear the same size, but to hover over them you can see which place the locations belong to, as well as the specific crossing value it has. Montana is a state with one of the most numerous crossing points, and experiencing similar crossing frequencies across these entries. To hover over the points we can discover that Sweetgrass, Montana is the most popular point along the Montana border.

In fact, this is how we discover another dimension which belongs to the data. Hovering over these cases we can see a list of transport modes that make up the total crossings, and that the sum was made up of transport by trucks, trains, automotives, busses, and pedestrians.

To discover more data available should simply mean more available to learn, and to only state the transport numbers without their visuals would not be the way to share an engaging spatial understanding. With these 5 extra aspects of the border crossings available, the map can be made to display the distributions of each particular mode.

Despite the points in Alaska typically being one of the least entered among the total border crossings, selecting the entries by train draws attention to Skagway, Alaska, being one of the most used border points for crossing into the US, even though it is not connected to the mainland. Of course, this mapped display paints a strong understanding from the visuals, as though this large entry experienced at Skagway, Alaska is related to the border crossings at Blaine, Washington, likely being the train connection between Alaska and Continental USA.

Mapping truck crossing levels (above), crossings are made going east and past the small city of Calexico. The Calexico East is seen having a road connection between the two boundaries facing a single direction, suggesting little interaction intended along the way

When mapping pedestrian crossings (above), these are much more popular in Calexico, the area which is likely big dense to support the operation of the airport shown in its region, and is displaying an interweaving connection of roads associated with an everyday usage

Overall, this is where the interactive mapping applies. The borders and their entry points have relations largely influenced by geography. The total pedestrian or personal vehicle crossings do well to describe how attractive the region may be on one side rather than another. Searching to discover where these locations become attractive, and even the underlying causes for the crossing to be selected, can be discovered in the map that is interactive for the user, looking at the grounds which the user chooses.

While this theme data layered on top highlights the topic, the base map can help explain the reasons behind it, and both are better understood when interactive. It isn’t necessary to answer one particular thought here as a static map may do, but instead to help address a number of speculative thoughts, enabling your exploration.

United States Presidential Election Results: 1976-2016 in Tableau

By: Vincent Cuevas

Geovisualization Project Assignment, SA8905, Fall 2020

Project link can be found here.

Introduction

The United States presidential elections occur every four years and much attention is placed on the polarization of US politics based on voting for either of the major political parties, the Democratic Party and the Republican Party. This project aims to use visualization to show the results across many different elections over time to view how the American public is voting for these two parties.

Methodology and Data

Tableau was used for the data visualization due to its ability to integrate multiple data sheets and recognize spatial data to instantly create maps. It is also able to quickly generate different types of visualizations in cartographic maps, bar charts, line graphs, etc.

Data was collected from the University of California – Santa Barbara website the Presidency Project. The repository contains data from elections all the way back up to 1789. This visualization will go back to 1976 and view results up until 2016. Other data sources were considered for this visualization, namely MIT’s Election Lab dataset from 1976-2016. However, this dataset contained results for up to 66 different parties that votes were casted for from 1976 to 2016. Incorporating this level of detail would have shown inconsistent data fields across the different election years. Other political parties are omitted from this project due to the inconsistency of party entrants by year and the fact that Democrats and Republicans take up the vast majority of the national vote. The Presidency Project data was used as it provided simpler views of Democrat-Republican results.

Data Retrieval

The downside to using UCSB’s Presidency Project data is that it is not available as a clean data file!

*Screenshot taken from UCSB’s Presidency Project webpage for 1976 US Election Results.*

The data was collected from each individual data page into an Excel sheet. One small piece of data that was collected elsewhere was the national voter turnout data, which was taken from the United States Election Project website.

Voting Margin Choropleth Map

Once the data was formatted, only two sheets needed to be imported into Tableau. The first was the state level results, and the second being the national level results. The relationship between the two is held to together by a join on the state fields.

Tableau has a nice feature in that it instantly converts recognizable data fields into spatial data. In this case, the state field generates latitude and longitude points for each state. Drag the auto-generated Latitude and Longitude fields into Columns and Rows, and then drag state under Marks to get this.

*Screenshot of adding Tableau generated longitude and latitude data to a Tableau worksheet.*

For one of the main sheets, one of the maps will show a choropleth themed map that will show voting margin differences between the Democratic Party and the Republican Party. Polygon shapes are needed, which can be done by going to the drop-down menu in Marks and selecting Map. Next, the sheet will need to identify the difference between states that were Democrat vs. Republican. A variable ‘PartyWin’ was created for this and dragged under marks, and colours were changed to represent each party.

The final step requires creating ranges based on the data. Ranges cannot be created manually and require either some programming logic and/or the use of bins. Bins were created by right-clicking a variable ‘VictoryMargin (%)’. The size of each bin is essentially a pre-determined interval (20 was chosen). VictoryMargin(%) was dragged under Marks in order to get a red/blue separation from the colours from Party Win. The Colors were edited under VictoryMargin to get appropriate light/darker hues for each colour. The specific bins were also appropriately labelled based on 20 point intervals.

*Screenshot of Party Win, Victory Margin choropleth map in Tableau Worksheet view*

The screenshot shows that you can hover over the states and retrieve information on Party Win, the percentage of Democrat and Republican votes that year, as well as the Victory Margin. The top-left corner also has Year in the Pages area, which also for a time-series view for each page that contains Year.

Vote Size Dot Symbol Map

While margin of victory in each state illustrates the degree on if the state voted Democrat or Republican, we know that the total number of Democrat and Republican not equal when comparing voting populations across different states. Florida, for example has 9,420,039 total votes casted and had a 1.2% victory margin for the Republicans in 2016. Contrast that with District of Columbia in the same year, which had 311,268 total votes, but with a 86.8% victory margin for Democrats. For the next map, dot symbols are used to determine the vote size (based on the variable Total State Votes) for each state.

The same longitude and latitude generated map will be used from the choropleth map, only this time the dots and the surrounding Open Street basemap are kept intact. A similar approach is taken from the choropleth map using Party Win to differentiate between Republican and Democrat states. The Total State Votes variable is dragged into the size area under Marks to create different dots sizes based on the numbers here. Bins were created once again – this time with an interval break of 2.5 million votes per state. Ideally, there would be customized breaks as many states fall into the lower end of total votes such as District of Columbia. Once the labelled bins are edited, additional information for State, Total Democrat Votes and Total Democrat Votes were entered to view in the Tooltip.

*Screenshot of Dot Symbol map based on Number of State Votes in Tableau Worksheet view*

Electoral College Seats Bar

American politics has the phrase of “270 To Win“, based on needing 270 electoral seats as of 2020 to win enough seats for the presidency. As recently as 2016, the Democratic candidate Hillary Clinton won the popular vote over the Republican candidate Donald Trump. However, Trump won the majority of electoral seats and presidency based on winning votes in states with a greater total number of seats.

A bar showing the number of electoral seats won can highlight the difference between popular vote, and that greater margin of victory in a state matters less than having a greater number of state seats won. To create this bar the same setup is used having Party Win and State underneath the marks. This time, a SUM value of the number of seats is dragged to the Columns. The drop down list is then changed into a bar.

*Screenshot of Electoral College Seats bar graph in Tableau worksheet.*

Dashboard and Nationwide Data Points

Since this data will go into a dashboard, there is a need to think how these visualizations compliment each other. The maps themselves provide data while looking at a view of individual states. The dynamic bar shows the results of each state, though is better at informing the viewer the number of seats of won by each party, and the degree to how many more seats were won. The dynamic bar needs some context though, specifically the number of total seats won nationwide. This logically took the visualization for placing the maps at the middle/bottom, while moving the electoral college bar to the top, while also providing some key indicators for the overall election results.

The key data points included were the party names, party candidates, percentage of popular vote, total number of party votes, total number of electoral seats, as well as an indicator of if either the Democratic or Republican Party won. Secondary stats for the Other Party Vote (%), Total Number of Votes Casted, as well as Voter Turnout(%). Individual worksheets were created of each singular stat and were imported into the dashboard. Space was also used to include Alaska and Hawaii. While the main maps are dynamic in Tableau and allow for panning, having the initial view of these states limits the need to for the user to find those states. All of the imported data had ‘Year’ dragged into the pages area of the worksheet, allowing for a time-series view of all of the data points.

You can see what the time series from 1976 to 2016 looks like in a gif animation via this Google Drive link.

Insights

When looking at the results starting from 1976, an interesting point is that many Southern states were Democratic (with a big part due to the Democratic candidate Jimmy Carter being governor of Georgia) that are now Republican in 2016. 1980 to 1984 was the Ronald Reagan era, where the Californian governor was immensely popular throughout the country. Bill Clinton’s reign from in 1992 and 1996 followed in Carter’s footsteps with the Arkansas governor able to win seats in typically Republican states. Starting with the George W. Bush presidency win in 2000, current voting trends manage to stay very similar with Republican states being in the Midwest and Southern regions, while Democrats take up the votes in the Northeast and Pacific Coast. Many states around the Great Lakes such as Wisconsin, Michigan and Pennsylvania have traditionally been known as “swing states” in many elections with Donald Trump winning many of those states in 2016. When it comes to number of votes by state, two states with larger populations (California, New York) have typically been Democratic in recent years leading to a large amount of total votes for Democrats. However, the importance of total votes is minimized compared to the number of electoral seats gained.

Future Considerations and Limitations

With the Democrats taking back many of those swing states in the most recent election, inputting the 2020 election data would highlight where Democrats were successful in 2020 vs. in 2016. Another consideration would be to add the results since 1854, when the Republican Party was first formed as the major opposition to the Democratic Party.

Two data limitations within Tableau are the use of percentages, and the lack of projections. Tableau can show data in percentages, but only as a default if it is part of a Row % or Column % total. The data file was structured in a way where this was not possible, meaning that whole numbers were used with (%) labelled wherever necessary. Tableau also is not able to project in a geographic coordinate system without necessary conversions. For the purposes of this map, the default Web Mercator layout was used. One previous iteration of this map was also done as a cartogram hex map. However, a hex map may be better in a static map as the sizing and zooming is much more forgiving when using the default basemap.

A Shot in the Dark: Analyzing Mass Shootings in the United States, 2014-2019

By: Miranda Ramnarayan

Geovis Project Assignment @RyersonGeo, SA8905, Fall 2019

The data gathered for this project was downloaded from the Gun Violence Archive (https://www.gunviolencearchive.org/), which is a non-for Profit Corporation. The other dataset is the political affiliation per state, gathered by scrapping this information from (https://www.usa.gov/election-results). Since both of these datasets contain a “State Name” column, an inner join will be conducted to allow the two datasets to “talk” to each other.

The first step is importing your excel files, and setting up that inner join.

There are four main components this dashboard is made of: States with Mass Shootings, States with Highest Death Count, Total Individuals Injured from Mass Shootings and a scattergram displaying the amount of individuals injured and killed. All of these components were created in Tableau Worksheets and then combined on a Dashboard upon completion. The following are steps on how to re-create each Worksheet.

1. States with Mass Shootings

In order to create a map in Tableau, very basic geographic information is needed. In this case, drag and drop the “State” attribute under the “Dimensions” column into the empty frame. This will be the result:

In order to change the symbology from dots to polygons, select “Map” under the Marks section.

To assign the states with their correct political affiliation, simply drag and drop the associated year you want into the “Colour” box under Marks.

This map is displaying the states that have had mass shootings within them, from 2014 to 2019. In order to automatic this, simply drag and drop the “Incident Date” attribute under Pages. The custom date page has been selected as “Month / Year” since the data set is so large.

This map is now complete and when you press the play button displayed in the right side of this window, the map will change as it only displays states that have mass shootings within them for that month and year.

2. States with Highest Death Count

This is an automated chart that shows the Democratic and Republican state that has the highest amount of individuals killed from mass shootings, as the map with mass shootings above it runs through its time series. Dragging and dropping “State” into the Text box under Marks will display all the states within the data set. Dragging and dropping the desired year into Colour under Marks will assign each state with its political party.

In order for this worksheet to display the state with the highest kill count, the following calculations have to be made once you drag and drop the “# Killed” from Measures into Marks.

To link this count to each state, filter “State” to only display the one that has the maximum count for those killed.

This will automatically place “State” under Filters.

Drag and drop “Incident Date” into Pages and set the filter to Month / Year, matching the format from section 1.

Format your title and font size. The result will look like:

3. Total Individuals Injured from Mass Shootings

In terms of behind the scenes editing, this graph is the easiest to replicate.

Making sure that “State Name” is above “2016” in this frame is very important, since this is telling Tableau to display each state individually in the bar graph, per year.

4. Scattergram

This graph displays the amount of individuals killed and injured per month / year. This graph is linked to section 1 and section 2, since the “Incident Date” under Pages is set to the same format. Dragging and dropping “SUM (#Killed)” into Rows and SUM (#Injured) into Columns will set the structure for the graph.

In order for the dot to display the sum of individuals killed and injured, drag and drop “# Killed” into Filter and the following prompt will appear. Select “Sum” and repeat this process for “# Injured”.

Drag and drop “Incident Date” and format the date to match Section 1 and 2. This will be your output.

Dashboard Assembly

This is where Tableau allows you to be as customizable as you want. Launching a new Dashboard frame will allow you to drag and drop your worksheets into the frame. Borders, images and text boxes can be added at this point. From here, you can re-arrange/resize and adjust your inserted workbooks to make sure formatting is to your desire.

Right clicking on the map on the dashboard and selecting “Highlight” will enable an interactive feature on the dashboard. In this case, users will be able to select a state of interest, and it will highlight that state across all workbooks on your dashboard. This will also highlight the selected state on the map, “muting” other states and only displaying that state when it fits the requirements based on the calculations set up prior.

Since all the Pages were all set to “Month/Year”, once you press “play” on the States with Mass Shootings map, the rest of the dashboard will adjust to display the filtered information.

It should be noted that Tableau does not allow the user to change the projection of any maps produced, resulting in a lack of projection customization. The final dashboard looks like this:

Transportation Flow Mapping Using R

Transportation Flows Mapping Using R

The geographic visualization of data using programming languages, and specifically R, has seen a substantial upsurge in adoption and popularity among members of the GIS and data analytics community in recent years. While the learning curve in acquainting oneself with scripting techniques might be steeper than using more traditional and out of box GIS applications, it undoubtedly provides some other benefits such as building customizable processes and handling complex spatial analysis operations. The latter point being imperative for projects containing extensive amounts of data as is often the case with transportation and commuting flows which ordinarily contain considerable amount of records comprising of trips’ origins and destinations, mode of transport and travel times information. An added interesting perk is that R offers very creative and visually appealing finalized graphical solutions which were one of the motivators behind the choice of technique for this project. The primary motivator was, however, the program’s capacity in transportation data modelling and mapping as the aim of the project was mapping commuting flows.

Story of R

R is an open source software environment and language for statistical computing and graphics. It is highly extensible which makes it particularly useful to researchers from varied academic and professional fields (they increasingly range from social science, biology and engineering to finance and energy sectors and multifold other fields in between). It is also one of the most rapidly growing software programs in the world, most likely due to the expansion of data science. In the context of Geographic Information Systems (GIS), it can be described as a powerful command-line system comprised of a range of tailored packages, each of them offering different and additional components for handling and analyzing spatial data. The ones utilized in the project were ggplot2, and maptools, and to lesser extent plyr. The former two are some of the most common ones in the R geospatial community while the others encountered in research and worth exploring further were: leaflet and mapview for interactive maps; shiny for web applications; and ggmap, sp and sf for general GIS capabilities. Being an open source software, R community is very helpful in organizing and locating necessary information. One neat option is the readily available cheat sheets for many of the packages (i.e. ggplot cheat sheet) which make finding information genuinely fast.

There are some stunning examples of data visualization in R. One that made a significant media splash a few years ago was done by Paul Butler, a mathematics student at University of Toronto at the time, who plotted social media friendship connections (it created admiration as well as disbelief from many, according to an author, that this was done with less than 150 lines of code in an “old dusty” statistical software such as R). It also inspired further data visualization explorations using R. One of my favorite recent such works came in the form of a compelling book London – The Information Capital by geographer James Cheshire and its co-author designer Oliver Uberti. The majority of the examples in the book were predominantly written not only in R but specifically in its ggplot package, in combination with graphic design applications, and should serve as innovative illustrations on data visualization approaches as well as capabilities on what software could potentially provide. Both of the aforementioned projects inspired mine.

Transportation Mapping and Modelling

I would like to give some background on the type of analysis that was conducted. One of the common types of analysis in transportation geography, transportation planning and transportation engineering is geographic analysis of transport systems for origin-destination data that shows how many people travel (or could potentially travel) between places. This also represents the basic unit of analysis in most transport models which is the trip (single purpose journeys from an origin “A” to origin “B”, and not to be mistaken with Timothy Leary definition). Trips are often grouped by transport mode or number of people travelling, and are represented as desire lines connecting zone centroids (desire lines are straight and closest possible lines between origin – destination points, and can be converted to routes). They do not necessarily need to represent just movement of the people and can show commodity flows and retail trade as well. TransCAD software is often used as the industry standard for this type of modelling. It is, however, quite costly and implemented solely by transportation planning firms and agencies. On the other hand, R is starting to see dedicated transportation planning packages and continuously utilizing relevant GIS ones in transportation field. And most importantly: it’s free.

Data

The dataset implemented for the project was American Community Survey 2009-2013 – 5 Year American Community Survey Commuting Flows located via Inter-University Consortium for Political and Social Research. It is a survey for the entire United States focusing on people’s (over the working age of 16) journeys to work. Data in the original survey was tabulated based on a few categories: means of transportation to work, private vehicle occupancy, time leaving home to go to work, travel and aggregated travel time to work, etc. For the purposes of the project all workers in commuting flows were selected (grouped together for all transportation modes). The trips were based on inner and inter-county commutes.

There are two main components needed when mapping transportation flows in general: coordinates of place of origin, and coordinates of place of destination. Common practice in transportation planning field is to have population weighted centroids for origins and destinations, regardless of the geographic unit of analysis, which in this case was U.S. counties. Therefore population weighted centroid shapefile for U.S. counties was needed so that it can be merged with the original survey data. It was located at the U.S. Census Bureau website and based on 2010 U.S. Census population numbers and distributions per county areas. The study area for the project was the United States and it excluded Canada and Mexico (even though both countries were included for workplace-based geographies), because specific regions of both countries were not mentioned which would make calculations of population weighted centroids not very realistic. Additionally, these records were not numerous to significantly change the model.

Process

In the first step, data was loaded and reformatted in R (R can be downloaded from https://www.r-project.org/ and although analysis can be conducted in R directly it is much preferred and easier to use Rstudio which provides a user-friendly-graphical interface). Rstudio interface and snippet of code is displayed in Figure 1 below (Rstudio can be downloaded from https://www.rstudio.com/ ).

Figure 1: Rstudio interface and snippet of code in the project

Following the two datasets, original commuting survey and population weighted centroids, were joined based on county name and code, and then the unified file was subset to exclude Canada and Mexico, followed by renaming some columns fields for easier readings of origin and destination coordinates. In the next step, ggplot2 was used to position scales for continuous data for x and y axes, succeeded by plotting line segments with alpha command. Number of trips to be plotted were experimented with to show either all trips, or to filter them based on more than 5, 10, 15, 20, 25 and 50 trips. Showing all trips resulted in too dense of a plot as all of the United States was used as a study area. If the study area was of a large scale in nature, showing all trips would be acceptable. The optimal results seemed to be when trips were filtered to show over 10 inner and inter county journeys-to-work trips which resulted in the plot displayed in Figure 2.

Figure 2 – U.S. origin-destination plot in Rstudio_US_11x17_10_Nebojsa_Stulic

The final map was then graphically improved in Adobe Creative Suite resulting in image in Figure 3.

Figure 3: Final mapping project after graphical improvements

Map

The final design showing thousands of commuting trips resembled a NASA image of United States from space at night. It indicated some predictable commuting patterns such as increased journey-to-work lines concentration in large urban centres and in areas with large population densities, such as the North East part of the country. However, some patterns were not so obvious and required some further digging into data accuracy (which passed the test) and then the way in which the original survey was designed. For instance, there are lines from Honolulu, Anchorage and Puerto Rico to the mainland even though the survey was designed to represent daily commuting flows by car, truck, or van; public transport, and other means of commuting. The survey was designed to ask questions for all workers based on primary and secondary jobs by way of commuting for respective reference week when it was conducted and answered. These uncommon results were attributable to people who worked during the reference week at a location that was different from their home (or usual place of work), such as people away from home on business. Therefore place-of-work data showed some interesting geographic patterns of workers who made daily work trips to different parts of the country (e.g., workers who lived in New York and worked in California).

The final mapping product was printed and framed on 24” x 36” canvas as shown in Figure 4. Size was chosen based on aspect ratio of 2 to 3 which seemed best suited to represent the geography of the United States horizontal width and vertical length. Some other options would be to print on acrylic or aluminum which is less cost effective and more time consuming (most of the shops require around 10 days to complete it). However, the printed map on canvas was my preferred choice for this project based on the aesthetic I was aiming for which was to have the appearance of accentuated high commuting areas and dimmed low commuting areas. Another pleasant surprise was that when printing was finalized it manifested more as a painting than data visualization transportation project.

Figure 4 – Printed map on canvas_Nebojsa_Stulic

Figure 4: Printed map on canvas