Sunday, June 22, 2014

Interesting taxi rides dataset

I got the following from my collaborator Zach Nation. NY taxi ride dataset that was not properly anonymized and was reverse engineered to find interesting insights in the data.

For the sport, I have used GraphLab Create to load and analyze this dataset. I started with an image of some NY taxis:

 Using GraphLab Create I was able to reverse engineer the anonymizaiton and query the data based on the medallion number (for example 8J77 for the lower left taxi in the image).

I was further able to dig into personal details based on the medallion number:
And finally ask questions like how much money the taxis in the image made in a certain week?
Anyone who wants to try it out is welcome to email me, I can send you the ipython notebook to play with.

No comments:

Post a Comment