Data Visualization

Turning citi bike trips into a network graph, with a zoom function.

Final Version

I am fascinated by the MTA, so for our data visualization assignment, I started with the MTA’s developer resources. In addition to real time train data, they publish a weekly accounting of turnstile traffic at all stations, given in 4 hour samples, with each file comprising a week. I downloaded the most recent week to check it out, and go to work writing a program to clean up the data. It was a mess.

My first step was writing a callback function to clean the data. I figured it’d be better to do this in python, but this is a javascript class so I figured I’d go for it… The original data set had sampling down to the individual turnstile, and I wanted to simplify it so that it just had data on each station overall. Here was my original code:

let lastWeek;
let table;

function preload(){
lastweek = loadTable("turnstile small test.csv", "csv", "header", makenewcsv, errorrr);

function makenewcsv(){
table = new p5.Table();

table.addColumn('entries'); // keep as cumulative?

for (var r = 1; r < lastWeek.getRowCount(); r++){ let thisRow = lastWeek.getRow(r); let thisStation = thisRow.get("STATION"); let thisDate = thisRow.get("DATE"); let thisTime = thisRow.get("TIME"); let thisEntries = thisRow.get("ENTRIES"); let thisExits = thisRow.get("EXITS"); // if the combo of station, date, and time exists in the new spreadsheet // add this rows info to the corresponding row // if that combo of all three does not exist, // add a new row if(table.findRow(thisStation)){ //at the very least this station should exist for (let n = 1; n < table.getRowCount(); n++){ let myRow = table.getRow(r); // THIS WAS A BUG I LATER FIXED, all these variables should be 'myRow'. ... let myStation = thisRow.get("STATION"); let myDate = thisRow.get("DATE"); let myTime = thisRow.get("TIME"); if( myStation == thisStation && thisDate == myDate && thisTime == myTime){ let newEntries = myRow.getNum(n,"entries") + thisEntries; let newExits = myRow.getNum(n, "exits") + thisExits; myRow.setNum(n, 'entries',newEntries ); myRow.setNum(n, 'exits', newExits); break; } } } else table.addRow(thisRow); //add a new row } saveTable(table, "cleanedUpTurnstile.csv") }
This only partially accounted for my data (and now I know had a lot of bugs).. but whatever, I couldn't even get the initial table to load. The original file had about ~65k rows, and I immediately ran into issues with p5 parsing the file into a table. I tried deleting rows, and still no luck. I did all kinds of silly things to try and get the initial table to just load. External files, local files, test files, preload, setup, callbacks it didn't seem to matter.

I managed to get it working, but it was finicky. So I decided to avoid p5 in favor of a library called papa parse. The demo worked well, but it took some time get my code sorted. I had to use jquery to select my file (as shown in this example), and it worked! I had some trouble in my code getting the data cleaned up... and I was still hung up on p5 table not working.I tried this example locally and it worked... I had previously tried dropping a callback and doing everything in setup, but it didn't work. But then I tried swapping my file into Allison's example and it worked!!! dakslfjadkslfjlads :table-flip:

So now it was time to figure out my code for cleaning the data. I once again ran into hurdles with p5 tables, so I switched back to papa parse, and YAY NOW I GOT MOST SH*T WORKING! Basically the library turns your csv into an array of json objects that represent rows, with keys being the column headers.

let table = [];

let csv;
function cleanData(results, file){
let stuff =;

for (var r = 0; r < stuff.length; r++) //iterate through the rows of our incoming data { let thisRow = stuff[r]; let thisStation = stuff[r].STATION; let thisDate = stuff[r].DATE; let thisTime = stuff[r].TIME; let thisEntries = parseInt(stuff[r].ENTRIES); //let thisExits = parseInt(stuff[r].EXITS); // if the combo of station, date, and time exists in the new spreadsheet // add this rows info to the corresponding row // if that combo of all three does not exist, // add a new row let addARow = true; //flag for whether or not we push the row. for (let n = 0; n < table.length-1; n++) //iterate through our new spreadsheet... { let myStation = table[n].STATION; let myDate = table[n].DATE; let myTime = table[n].TIME; let myEntries = parseInt(table[n].ENTRIES); //let myExits = parseInt(table[n].EXITS); if( myStation === thisStation && thisDate === myDate && thisTime === myTime) //if this row in the new spreadsheet is the same as the old one { //todo sum the entries and exits table[n].ENTRIES = myEntries + thisEntries; //table[n].EXITS = myExits + thisExits; addARow = false; break; } } if(addARow) //if we didn't update any existing rows { console.log("adding row"); table.push(thisRow); //add a new row } } console.log(table); csv = Papa.unparse(table); console.log(csv); }
I had problems with the exits column so I omitted actually updating the count of exits in this code... From here, I just copied the output from the console and I had a somewhat cleaned up CSV. I loaded it into a spreadsheet and started to comb through the data to look for irregularities and do additional transformations. I wanted to change my entries info from a cumulative count to tracking how many people were entering in a given 4 hour window. The MTA documentation wasn't the best, and I still had some peculiarities in my data... I decided to move on.

I grabbed the citi bike trip data and started messing around with it. I wrote a function to remove rows with lat longs of 0, and then began playing around with the visualization. My first test, I plotted the lat/lngs of start and end of the trip to lines. Then I messed around with using the length of the trip as a sort of "decay" to remove lines from the screen. Then I ended up using the decay for stroke as well. Here's where my first test ended up.

Then I decided to create a fresh sketch and try to "zoom" in on the networks. I wrestled with this for a bit, but landed on a method of defining min/max lat and longs, and then regenerating the drawing. This worked fine, though slow. Then I went to make a reset button and ran into so many problems. For some reason, I couldn't get my mouseReleased functions to stop triggering when I hit reset, even when I tried to create conditions to avoid it. Anyways here's where I landed

Posted in ICM

Leave a Reply