Ever since Bloomberg created the open data policy requiring all city agencies to release their data sets by 12/31/2018, NYC has become one of the world’s most transparent cities in regards to data. There’s still two years before the deadline, but there’s over 1500 data sets available on the Open Data site now. Not all data sets are created equal, though. Some are incomplete, some are unstructured, some are statistically insignificant, some are free, some expensive, inaccurate, awesome, mind blowing, unnecessarily cool…. you get the point. I’ve always wanted to write my own click-baity countdown article, so here’s my 10 favorite public data sets and visualizations:
#10. WNYC’s Median Income Map
For real estate developers and brokers, median income can be a neighborhood’s most telling indicators of value. This mapping by WNYC visualizes 2012 median income data from the US Census Bureau, and you can hover over each block to get the figures. This map just reinforces John Oliver’s assertion that Port Authority is the worst place on earth. Nice job, Port Authority.
#9. Robert Manduca’s Where are the Jobs?
Now that we know how much everyone is making, we can actually see where they’re working by sector in Robert Manduca’s dot mapping of Employment in America in 2014. It also includes a second layer with a dot mapping of income. One dot = one job. Like the Median Income map, this also uses US Census Bureau data.
#8. Joey Cherdarchuk’s Breathing City
What’s even more awesome than seeing where each job is? Seeing people going to those jobs over time! Joey Cherdarchuk of Darkhorse Analytics animates Manhattanites commuting from work (red) to home (blue) over a typical 24 hour period in a seamless loop. Amazingly, he did most of it in Excel. Watch the animation and read how he did it here.
#7. Dustin Cable’s Racial Dot Map
Inspired by Eric Fischer, who has done numerous artistic map visualizations of data, Dustin Cable put together this incredible racial dot map at UVA’s Weldon Cooper Center. Using 2010 US Census data, the map has 308,745,538 colored dots- one for each person in the US- indicating racial distribution. And you can zoom in REALLY close.
#6. Todd Schneider’s Citibike Usage Animation
Citibike publishes all its ride data, including trip duration, origin, destination, gender, and age and makes it available for download here. It comes in CSV format, and location data is in lat-lon format, so you can actually plot where people are going over time. Luckily for us, Todd W. Schneider already created an animated map of typical ridership throughout a 24 hour period. His code is up on GitHub here. It’s cool enough that the data is available. It’s even cooler that someone visualized it.
Bonus: If you liked this one, you’ll also enjoy this time lapse video of traffic using Waze data.
#5. Trulia Maps
While #’s 6-10 are awesome at visualizing single data sets, it’s hard to ignore the sheer amount of data that Trulia provides, which is why it comes in at #5 on the list. In addition to sales and rental listings across the country, it has a comprehensive maps section. Trulia maps provides visualizations across eight categories and and forty metrics, and you can get detail down to the block in NYC.
#4. Zillow API
Coming in at #4 is Zillow (which happens to own Trulia). Zillow also serves as a hub for sales and rental listings, mortgage lenders, and real estate agents. But Zillow beats out Trulia because it publishes a lot of research data on sales and rental prices. Furthermore, it has several API’s available that spit out a variety of property specs in XML format. If you’re prospecting in a neighborhood you’re not familiar with, you can get a great start with Zillow’s neighborhood research and API’s.
#3. Property Shark
Property Shark has a special place in my heart. A couple years ago, I cobbled together a method to find potential development sites by using Photoshop to overlay screen shots of layers from Property Shark’s Map feature. Some of the more useful layers include Building Class, Year Built, Building Stories, Available Air Rights by Parcel, Lis Pendens, Inclusionary Housing Areas, and Recent Sales. The paid version gives you access to property-level reports with way more detail.
If you work in NYC real estate, I don’t need to tell you what’s on StreetEasy (also owned by Zillow). StreetEasy has a ridiculous amount of historical listings, sales, and rental data, and they publish quarterly reports on trends as well. If you’re looking to build a list of residential comps, this is the place to go. They even have a tool to generate comp reports. They experimented with an API for a while, but unfortunately took it down.
#1. NYC’s PLUTO
PLUTO (Primary Land Use Tax Output), is part of NYC’s Bytes of the Big Apple software suite, is the mother lode of NYC parcel data. No maps, no visualizations; just raw data. It’s available in multiple formats (my favorite of which is CSV), and for each parcel there’s over 80 different data points. Slice it, dice it, plot it on ArcGIS or CartoDB… there’s limitless possibilities with this data set, and it’s updated every few months. Want to find out how much garage space exists in C6-2 zoned buildings in Manhattan built before 1940? (170,922 sf) Ever wonder how much buildable sf is left in Manhattan? (750 million sf). My only qualms are that it’s not time phased data (which would make it exponentially larger).
That’s all, folks! Practice safe data science, and remember, correlation does not imply causation! Have a favorite data source or visualization? Share in the comments!