Visualizing New York City yellow cabs and their origin-destination over time

Fana Gidey
SA8905 – Cartography and Geovisualization
Fall 2019
@Ryersongeo

Background

Taxi networks can uncover how people move within neighbourhoods and detect distinct communities, cost of housing and other socio-economic features.  New York City is famous for its yellow cabs and diverse neighbourhoods providing a good study area. This project will look at trip records for Yellow Cab taxi’s in order to visualize New York residents travel patterns over-time.

On New Year’s Eve, New York taxi riders are expected to make their way to see the ball drop, watch the fireworks from the east of Hudson River, Battery Park, and Coney Island. Lastly movement from the outer boroughs into Manhattan and Brooklyn for entertainment is expected. 

Marketers, policy makers, urban planners and real estate industry can leverage this spatial data to predict activity and features of human society.

Technology

The technology used for visualization is Kepler.gl. Kepler.gl is an open-source geospatial data analysis tool. I picked the tool because it can visualize paths over time with time-series and animations that can communicate a very powerful data narrative. Previous examples were flight and refugee movement data. Kepler has drag and drop options to highly skilled scripting.

Step 1: Gathering the Data

The data was obtained NYC Open Data Portal – Transportation – City of New York. Here you can obtain Yellow, Green Cab and For-Hire Vehicle trip records. Initially I wanted to compare trip records between two years (i.e. 2009 and 2016) however this data set is so robust (131,165,043 records). I decided to narrow down and focused on yellow cabs and only a single date that may have lots of taxi activity (January 1st 2016). The columns in the data set include: VendorID, Pickup and Drop-off Date Timestamp, pickup and drop off latitude and longitudes, trip distance, payment type, payment amount, tax, toll amount, and total amount.

Step 2: Cleaning the Data

It is imperative to know how data needs to be structured when drawing paths over time using origin-destination data. In order to create a path over time map, the data source should include the following types of information:

  • The Latitude and Longitude coordinates for each trip data point in a path
  • A column that defines the order to connect the points (in my case I used the date timestamp information, or you can manually applied surrogate key is also acceptable (i.e. 1, 2, 3, 4, 5)
  • The source data has a sufficient amount of data points to create lines from points

Before

The data was then cleaned and prepped for use in excel. The fields were formatted to currency (2 decimal spaces $) date (m/d/yyyy h:mm:ss) and null values were removed. A trip duration field was calculated and obsolete data is removed. The csv now has 345,038 records.

After

Step 3: Create Visualization

Now that that the data is cleaned and prepped for use it can be implemented to an interactive visualization software. As soon as you navigate to kepler.gl and select ‘Get Started’. You will be prompted to add your data (i.e. csv, json, and geojson).

Once your data is loaded, you can start with the “Layer” options. The software was able to pick up the pick-up and drop-off latitude and longitude. The pick-up and drop-off are represented as point features, you can use the drop-down menu to select lines, arcs, etc.

The origin-destination points are now represented by arcs. In order to animate the feature, a field must be selected to sort by.

In the filters tab, you can choose a field to sort by (i.e. “pick_up datetime”).

You can edit the map style by select the “Base Map” tab.

Other customization features are highlighted below.

Final Results

https://fanagidey.github.io/