Putting BlogTO on the map (literally) – Tutorial

Kyle Larsen
SA8905 – Cartography and Geovisualization
Fall 2019

Instagram is a wealth of information, for better or worse, if you’ve posted to Instagram before and your profile is public, maybe even if it’s not, then your information is out there just waiting for someone, someone maybe like me, to scrape your information and put it onto a map. You have officially been warned.

But I’m not here to preach privacy or procure your preciously posted personal pics. I’m here to scrape pictures from Instagram, take their coordinates, and put them onto a map into a grid layout over Toronto. My target for this example is a quite public entity that thrives off exposure, the notorious BlogTo. Maybe only notorious if you live in Toronto, BlogTo is a Toronto-based blog about the goings on in the 6ix as well as Toronto life and culture, they also have an Instagram that is almost too perfect for this project – but more on that later. Before anything is underway a huge thank-you-very-much to John Naujoks and his Instagram scraping project that created some of the framework for this project (go read his project here, and you can find all of my code here)

When scraping social media sometimes you can use an API to directly access the back end of a website, Twitter has an easily accessible API that is easy to use. Instagram’s API sits securely behind the brick wall that is Facebook, aka it’s hard to get access to. While it would be easier to scrape Twitter, we aren’t here because this is easy, maybe it seems a little rebellious, but Instagram doesn’t want us scraping their data… so we’re going to scrape their data.

This will have to be done entirely through the front end, aka the same way that a normal person would access Instagram, but we’re going to do it with python and some fancy HTML stuff. To start you should have python downloaded (3.8 was used for this but any iteration of python 3 should give you access to the appropriate libraries) as well as some form of GIS software for some of the mapping and geo-processing. Alteryx would be a bonus but is not necessary.

We’re going to use a few python libraries for this:

  • urllib – for accessing and working with URLs and HTML
  • selenium – for scraping the web (make sure you have a browser driver installed, such as chromedriver)
  • pandas – for writing to some files

If you’ve never done scraping before, it is essentially writing code that opens a browser, does some stuff, takes some notes, and returns whatever notes you’ve asked it to take. But unlike a person, you can’t tell python to go recognize specific text or features, which is where the python libraries and HTML stuff comes in. The below code (thanks John) takes a specific Instagram user and return as many post URLs as you want and adds them to a list, for your scraping pleasure. If you enable the browser head you can actually watch as python scrolls through the Instagram page, silently kicking ass and taking URLs. It’s important to use the time.sleep(x) function because otherwise Instagram might know what’s up and they can block your IP.

But what do I do with a list of URLs? Well this is where you get into the scrappy parts of this project, the closest to criminal you can get without actually downloading a car. The essentials for this project are the image and the location, but this where we need to get really crafty. Instagram is actually trying to hide the location information from you, at least if you’re scraping it. Nowhere in a post are coordinates saved. Look at the below image, you may know where the Distillery District is, but python can’t just give you X and Y because it’s “south of Front and at that street where I once lost my wallet.”

If you click on the location name you might get a little more information but… alas, Instagram keeps the coordinates locked in as a .png, yielding us no information.

BUT! If you can scrape one website, why not another? If you can use Google Maps to get directions to “that sushi restaurant that isn’t the sketchy one near Bill’s place” then you might as well use it to get coordinates, and Google actually makes it pretty easy – those suckers.
(https://www.google.com/maps/place/Distillery+District,+Toronto,+ON/@43.6503055,-79.35958,16.75z/data=!4m5!3m4!1s0x89d4cb3dc701c609:0xc3e729dcdb566a16!8m2!3d43.6503055!4d-79.35958 )
I spy with my little eye, some X and Y coordinates, the first set after the ‘@’ would usually be the lat/long of your IP address, which I’ve obviously hidden because privacy is important, that’s the takeaway from this project right? The second lat/long that you can gleam at the end of the URL is the location of the place that you just googled. Now all that’s left is to put all of this information together and create the script below. Earlier I said that it’s difficult to tell python what to look for, and what you need is the xpath, which you can copy from the html (right-click an element and then right-click that html and then you can get the xpath for that specific element. For this project we’re going to need the xpath for both the image and the location. The steps are essentially as follows:

  • go to Instagram post
  • download the image
  • copy the location name
  • google the location
  • scrape the URL for the coordinates

There are some setbacks to this, not all posts are going to have a location, and not all pictures are pictures – some are videos. In order for a picture to qualify for full scraping it has to have a location and not be a video, and the bonus criteria – it must be in Toronto. Way back I said that BlogTO is great for this project, that’s because they love to geotag their posts (even if it is mostly “Toronto, Ontario”) and they love to post about Toronto, go figure. With these scripts you’ve built up a library of commands for scraping whatever Instagram account your heart desires (as long as it isn’t private – but if you want to write script to log in to your own account then I guess you could scrape a private account that has accepted your follow request, you monster, how dare you)

With the pics downloaded and the latitudes longed it is now time to construct the map. Unfortunately this is the most manual process, but there’s always the arcpy library if you want to try and automate this process. I’ll outline my steps for creating the map, but feel free to go about it your own way.

  1. Create a grid of 2km squares over Toronto (I used the grid tool in Alteryx)
  2. Intersect all your pic-points with the grid and take the most recently posted pic as the most dominant for that grid square
  3. Mark each square with the image that is dominant in that square (I named my downloaded images as their URLs)
  4. Clip all dominant images to 1×1 size (I used google photos)
  5. Take a deep breath, maybe a sip of water
  6. Manually drag each dominant image into its square and pray that your processor can handle it, save your work frequently.

This last part was definitely the most in need of a more automated process, but after your hard work you may end up with a result that looks like the map below, enjoy!

Lexical Distance and Linguistic Diversity in the Balkans: A Network Map

By Zeljko Bavcevic

Geovis Class Project @RyersonGeo, SA8905, 2018

  1. Introduction

The purpose of this series of posts is to serve as a record of my work on the SA8905 Geovisualization project. The broad aim of this project is to explore the complex relationship between language and geography, and each serves as a mediating factor on the other. I have long been fascinated by how geography has an impact on language and on populations, borders and culture by proxy. Specifically, I was hoping to understand how changes in the traversability of landscapes would impact migrations of people and thus impact language.

Over the course of research phase of this project, I realized that conducting the project on all of the languages of the world would require a large amount of data collection and cleaning, such that was considerably beyond the temporal scope of a single class. As such, I narrowed my goal to examining only the geography of language in the Balkan Peninsula, and how they related to one another in terms of linguistic diversity, lexical distance and speaker population.

  1. Research

The first stage in this process required finding data to operationalize my target variables. This proved more difficult than I had first expected for a number of reasons. Firstly, there is no single, global set of agreed upon variables for understanding language. Instead there are a number of competing variable sets each maintained by different organizations (with very different incentives).

Eventually I narrowed my focus on three primary linguistic variables for visualization, these are:

  1. Speaker Population: The number of individuals in the world estimated to speak a certain language. The value is largely based on a projection.

 

  1. Lexical Distance: A linguistic variable measuring the conceptual distance between languages by comparing each along a number of criteria such as common words, verb formations and other comparative measures.

 

  1. Linguistic Diversity: An index score that measures the different types of languages, dialects or variations spoken within the regions of the primary language.

 

  1. Data:

The data for these variables are generated and maintained by two primary organizations, SIL International and Unesco. SIL Interational compiles data on a number of relevant linguistic variables for sale to organizations. Unesco on the other hand is an international non-profit organization. The two data sets are very different in their methodologies and as such cannot be combined or used in conjunction. For this purpose, I elected to only use one of the data sets. Although the Unesco data was free, the format it was kept in would have required a laborious process of cleaning and transformation before it could easily be used in my model. As such, I reached out to SIL international for a quote and acquired the data I needed. It must be noted, for the purposes of transparency, that SIL is a Christian organization and there have been several concerns about its methodology and incentive structure. To assess the impact of thee on my outcome, I did a brief comparison between a sub-sample of the UNESCO and SIL data sets and was satisfied that it was within acceptable parameters.

During my research, I had found a number of illustrations of lexical distance. Most often these would take the form of node or network charts depicting the different languages (Figure 1).

Lexical Distance
Figure 1: Lexical Distance Network

However, these were all static, non-spatial and often did not take into account other relevant linguistic variables such as linguistic diversity within a language class. As such, I wanted my own visualization to be dynamic, interactive, spatial and containing other relevant linguistic variables. To this end I needed to find a technology or platform that would allow me sufficient customizability and interactions. Inspired by the network or node maps I had consistently seen throughout my research phase, I knew that I wanted to build on this concept, adding a spatial and interactive component.

  1. Technology

I considered a number of options, the first and most obvious was using Python to code a network map using the Gephi platform. While pure coding would offer the most freedom and customizability, hosting the various tools I needed would prove very tedious and costly. As such, I set out in search of a hosted node or network analysis platform. After considerable research into a number of possible candidates, I opted for kumu.io.

Kumu was selected because it allowed me the freedom of coding most of my map to my specifications (On top of having a very user friendly UI), while also hosting all of my data and tools natively. This reduced the technical “surface area” of the project, which reduces opportunities for code breaking bugs and cross platform communications errors. Paying the modest membership fee, I began adding my data to Kumu.

  1. Execution

The first stage of development was loading my SIL data into Kumu. This was made easy using Kumu’s data cleaning tool. This allows the user to make sure all the input data meets kumu’s formatting requirements and even allows the user to dynamically change spreadsheet documents before upload.


Kumu’s Upload Wizard

After this was complete I created a bi-directional connection between each language (or element in network analysis parlance). This resulted in an ugly and incomprehensible visual bundle of connections. The next stage of the process would be coding the various variable symbolizes, interface options (adding a search, zoom and selection toolbar).


Kumu’s Advanced Code Based Editor

This was done using Kumu’s advance coding editor and I encountered no issues during this stage. However, when I attempted to add the polygon of the various countries of the Balkan Peninsula, the map visualization would simply vanish and I was not able to trouble shoot this with any success. As such, I had to ultimately abandon the spatial component of the project due to the constraints of time. I was still very satisfied with the resulting output.


    The Final Output

  1. Challenges

A number of challenges were encountered during the course of this project. The primary issue was that the geographic overlay failed to load. My every attempt to fix this was unsuccessful and ultimately this radically undermined completing the project as I had conceived it at the design stage. Nonetheless, I still believe that the other elements of the project still satisfied the project requirements of producing a novel and interesting geovisualization.

NHL Travel Web App

by Luke Johnson
Geovis Project Assignment @RyersonGeo, SA8905, Fall 2017

Context

I’ve been a Toronto Maple Leaf and enthusiastic hockey fan my whole life, and I’ve never been able to intersect my passion for the sport with my love of geography. As a geographer, I’ve been looking for ways to blend the two together for a few years now, and this geovis project finally provided me the opportunity! I’ve always been interested in the debate about how teams located on the west cost travel more than teams located centrally or on the east coast, and that they had a way tougher schedule because of the increased travel time. For this project, I decided to put that argument to rest, and allow anybody to compare two teams for the 2016/2017 NHL season, and visualize all the flights that they take throughout the year, as well as view the accumulated number of kilometres traveled at any point during the season and display the final tally. I thought this would be a neat way to show hockey fans the grueling schedule the players endure throughout the year, without the fan having to look at a boring table!

It all started with the mockup above. I had brainstormed and created a few different interfaces, but this is what I came up with to best illustrate travel routes and cumulative kilometres traveled throughout the year. The next step was deciding on the what data to use and which technology  would work best to put it all together!

Data

First of all, all NHL teams were compiled along with the name of their arena and the arena location (lat/long). Next, a pre-compiled csv of the NHL schedule was downloaded from left wing lock, which saved me a lot of time not having the scrape the NHL website, and compile the schedule myself. Believe it or not, that’s all the data I needed to figure out the travel route and kilometres traveled for each team! 

Methods

All of this data mentioned above was put into a SQLite database with 3 tables – a Team table, Arena table, and a Schedule table. The Arena table could be joined with the Team table, to get information on which team played at what arena, and where that arena is located. The Team table can also be joined with the Schedule table, to get information regarding which teams play on what day, and the location of the arena that they are playing. 

Because I wanted to allow the user to select any unique combination of teams, it would have been very difficult to pre-process all of the unique combinations (435 unique combinations to be exact). For this reason, I decided to build a very lightweight Application Programming Interface (API) that would act as a mediator between the database and the web application. API’s are a great resource for controlling how the data from the database is delivered, and simplifies the combination process. This API was programmed in Python using the Flask framework.  The following screenshot shows a small exert from the Flask python code, where a resource is set up to allow the web application to query all of the arenas, and will get back a geojson which can be displayed on the map.

After the Flask python API was configured, it was time to build the front end code of the application! Mapbox was chosen as the web mapping tool in the front end, mainly because of its ease of use and vast sample code available online. For a smaller number of users, it’s completely free! To create the chart, I decided to use an open source charting library called Chart.js. It is 100% free, and again has lots of examples online. Both the mapbox map and Chart.js chart were created using javascript, and wrapped within HTML and CSS,  to create one main webpage.

To create the animation, the web application sends a request to the API to query the database for each team chosen to compare. The web application then loops through the schedule for each team, refreshing the page rapidly to make a seamless animation of the 2 airplane’s moving. At the same time the distance between two NHL arenas is calculated, and a running total is appended to the chart, and refreshed after each game in the schedule. The following snippet of code shows how the team 1 drop down menu is created.

Results

After everything was compiled, it was time to demo the web app! The video below shows a demo of the capability of the web application, comparing the Toronto Maple Leafs to the Edmonton Oilers, and visualizing their flights throughout the year, as well as their total kilometres traveled.

To get a more in depth understanding of how the web application was put together, please visit my Github page where you can download the code and build the application yourself!