Visualizing New York City yellow cabs and their origin-destination over time

Fana Gidey
SA8905 – Cartography and Geovisualization
Fall 2019
@Ryersongeo

Background

Taxi networks can uncover how people move within neighbourhoods and detect distinct communities, cost of housing and other socio-economic features.  New York City is famous for its yellow cabs and diverse neighbourhoods providing a good study area. This project will look at trip records for Yellow Cab taxi’s in order to visualize New York residents travel patterns over-time.

On New Year’s Eve, New York taxi riders are expected to make their way to see the ball drop, watch the fireworks from the east of Hudson River, Battery Park, and Coney Island. Lastly movement from the outer boroughs into Manhattan and Brooklyn for entertainment is expected. 

Marketers, policy makers, urban planners and real estate industry can leverage this spatial data to predict activity and features of human society.

Technology

The technology used for visualization is Kepler.gl. Kepler.gl is an open-source geospatial data analysis tool. I picked the tool because it can visualize paths over time with time-series and animations that can communicate a very powerful data narrative. Previous examples were flight and refugee movement data. Kepler has drag and drop options to highly skilled scripting.

Step 1: Gathering the Data

The data was obtained NYC Open Data Portal – Transportation – City of New York. Here you can obtain Yellow, Green Cab and For-Hire Vehicle trip records. Initially I wanted to compare trip records between two years (i.e. 2009 and 2016) however this data set is so robust (131,165,043 records). I decided to narrow down and focused on yellow cabs and only a single date that may have lots of taxi activity (January 1st 2016). The columns in the data set include: VendorID, Pickup and Drop-off Date Timestamp, pickup and drop off latitude and longitudes, trip distance, payment type, payment amount, tax, toll amount, and total amount.

Step 2: Cleaning the Data

It is imperative to know how data needs to be structured when drawing paths over time using origin-destination data. In order to create a path over time map, the data source should include the following types of information:

  • The Latitude and Longitude coordinates for each trip data point in a path
  • A column that defines the order to connect the points (in my case I used the date timestamp information, or you can manually applied surrogate key is also acceptable (i.e. 1, 2, 3, 4, 5)
  • The source data has a sufficient amount of data points to create lines from points

Before

The data was then cleaned and prepped for use in excel. The fields were formatted to currency (2 decimal spaces $) date (m/d/yyyy h:mm:ss) and null values were removed. A trip duration field was calculated and obsolete data is removed. The csv now has 345,038 records.

After

Step 3: Create Visualization

Now that that the data is cleaned and prepped for use it can be implemented to an interactive visualization software. As soon as you navigate to kepler.gl and select ‘Get Started’. You will be prompted to add your data (i.e. csv, json, and geojson).

Once your data is loaded, you can start with the “Layer” options. The software was able to pick up the pick-up and drop-off latitude and longitude. The pick-up and drop-off are represented as point features, you can use the drop-down menu to select lines, arcs, etc.

The origin-destination points are now represented by arcs. In order to animate the feature, a field must be selected to sort by.

In the filters tab, you can choose a field to sort by (i.e. “pick_up datetime”).

You can edit the map style by select the “Base Map” tab.

Other customization features are highlighted below.

Final Results

https://fanagidey.github.io/

Create a Quick Web Map with Kepler.gl and Jupyter Notebook

Author: Jeremy Singh

SA8903

GeoVisualization Project Fall 2019

Background: This tutorial uses any csv file with latitude and longitude columns in order to plot points on the web map. Make sure your csv file is saved in the same folder this notebook is saved (makes things easier).

I recommend downloading the Anaconda Distribution which comes with jupyter notebook.

There are 3 main important python libraries that are used in this tutorial

  1. Pandas: Pandas is a python library that is used for data analysis and manipulation.
  2. kepler.gl: This a FREE open-source web-based application that is capable of handling large scale geospatial data to create beautiful visualizations.
  3. GeoPandas: Essentially, geopandas is an extension of Pandas; fully capable of handling and processing of geospatial data.

The first step is to navigate to the folder where you want this notebook to be saved from the main directory when juypter notebook is launched. Then click ‘new’ -> Python 3, a tab will open up with your notebook (See image below).

Next, using the terminal it is important to have these libraries installed to ensure that this tutorial works and everything runs smoothly.

For more information on jupyter notebook see: https://jupyter.org/

Navigate back to the directory and open a terminal prompt via the ‘new’ Tab’.

A new tab will open up, this will function very similarly to the command prompt on windows. Next type “pip install pandas keplergl geopandas” (do not include quotes). This process will help install these libraries.

Below you will find what my data looks like the map before styling

With some options

KeplerGL also allows for 3D visualizations. Here is my final map:

Lastly, if you wish to save off your web map as an HTML file to host somewhere like GitHub or AWS this command will do that for you:

Link to my live web map here:

https://jeremysingh21.github.io/

The code and data I used for this tutorial is located on my GitHub page located here:

https://github.com/jeremysingh21/GeoVizJeremySingh

Time-series Animation of Power Centre Growth in the Greater Toronto Area for the Last 25 Years

By: Jennifer Nhieu
Geovisualization Class Project @RyersonGeo, SA8905, Fall 2018

Introduction:

In 1996, there were 29 power centres with 239 retail tenants accounting for just under five million square feet of retail space (Webber and Hernandez, 2018). 22 years later, in 2018, there are 125 power centres with 2,847 retail tenants accounting for 30 million more square feet of retail space (Webber and Hernandez, 2018). In addition, power centres expand in an incremental manner, either through the purchase and integration of adjoining parcels or the conversion of existing parking space into new stores (Webber and Hernandez, 2018). This development process often leads to retail centres becoming “major clusters of commercial activity that significantly exceed the original approved square footage total” (Webber and Hernandez, 2018, pg. 3).

Data and Technology:

To visualize this widespread growth of power centres from 1996 to 2017, a time-series animation map was created on Kepler.gl. (beta version) using power centre growth data provided by the Centre for the Study of Commercial Activity (CSCA) at Ryerson University, who undertakes an annual field survey-based inventory of retail activity in the Greater Toronto Area. Kepler.gl was created by Uber’s visualization team and released to the public on the summer of 2018. It is an “open source, high-performance web-based application for visual exploration of large-scale geolocation data sets. Kepler.gl can render millions of points representing thousands of trips and perform spatial aggregations on the fly” and is partnered with Mapbox, a “location data platform for mobile and web applications that provides open-source location features”. (Uber Technologies Inc., 2018 and Mapbox, 2018).

Methodology:

The data provided by the CSCA includes information on the shopping centre’s name, a unique identification code for each power center, the longitude and latitude coordinates for each power centre, its square footage information from the year it was built to 2017 and etc. The data had to be restructured on Microsoft Excel to include a pseudo date and time column, which would include the years 1992 to 2017, as well as 1-hour time intervals to allow Kelper.gl to create an animation based on date and time. The table below is an example of the data structure required to create a time animation on Kepler.gl.

Table 1: Data structure example*

Table 1: Data structure example*
*The data in this table has been modified due to confidentiality reasons.

Below is the time series animation set up for visualizing power centre growth on Kepler.gl. This process can also be replicated to produce another time-series animations:

  1. Visit https://kepler.gl/#/
  2. Click Get Started.
  3. Drag and drop .csv file onto application.
  4. Hold and drag to navigate to the Toronto GTA.
  5. On the contents bar, click + Add Layer on the Layers
  6. Under Basic Layer Type, click Select A Type, then select Point.
  7. Under Columns, Lat* = COORDY and Lng* = COORDX.
  8. Under Color, click Color Based On, then select SQ FT.
  9. Under Color, click the color scheme bar, select a preferred light to dark colour scheme.
  10. Under Color, Color Scale, select quantize.
  11. Under Color, Opacity, set to 4.
  12. Under Radius, Radius Based on, Select a field, select
  13. Under Radius, Radius Range, set the range from 1 to 60.
  14. On the contents bar, click + Add Layer on the Filters
  15. Click Select a field, then select
  16. In the slider, drag the rightmost square notch to highlight only 2 bars with the left square notch.
  17. Press the play button the start the animation.

Notes:

  • The speed of the animation can be adjusted.
  • The legend can be shown by clicking the bottom circular button located to the top right corner of the screen.
  • Hover your mouse over a point to see the metadata of the selected power centre.

Figure 1: Power Centre Growth in the Toronto GTA (1992 – 2017)

Limitations:

During the implementation process, it became apparent that Kepler.gl is more focused on graphics and visuals than it does on cartographic standards. The program does not allow the user to manually adjust class ranges on the legend, nor does it accurately display continuous data. The proportional symbols used to represent power centre growth displays flashing or blinking symbols rather than a gradual growth in the symbols. There was an attempt to correct this problem by duplicating the values in the date and time column, and then adding additional pseudo date and time values between each year. However, when tested, the animation exhibited the same flashing and blinking behaviour, therefore it became apparent that is problem exists in the programming of Kepler.gl and not in the data itself. Furthermore, by duplicating these values, the file would exceed the maximum file size on chrome (250mb), and limit performance on Safari, the two web browsers it runs on.

Conclusion:

Regardless of the limitations, as the current Kepler.gl is still in its early beta version, it still has a lot of potential to incorporate user feedback from industry professionals and run additional testing before the final release.

References:

Webber, S. and Hernandez, T. (2018). Retail Development and Planning Policy.   Centre for the Study of Commercial Activity, Ryerson University. Toronto, CA.Uber Technologies Inc. (2018). Kepler.gl. Retrieved from http://kepler.gl/#/
Mapbox. (2018). About Mapbox. Retrieved from https://www.mapbox.com/about/