Wrangling & mapping Philly open data in R

Replication of data science workflows is as critical in government as it is in any domain.

Much of Philadelphia’s vast open data resources are updated nightly or weekly, providing intelligence at high space/time resolutions.

Typically, my students develop algorithms that are less replicable, beginning their workflow by downloading a snapshot of data from an open data site in csv or shapefile form.

This semester, I have tried to give them tools to plug their code directly in to open data APIs – a timely feed of city administrative data. To do so, I’ve introduced R packages including ckanr, esri2sf, jsonLite and others.

I wanted to demonstrate a particular introductory workflow that I recently adopted for a series of data-driven neighborhood reports.

The goal was to create a very simple workflow in R, allowing a user to specify a PostGRES query; feed that query in to one of several Philadelphia open data APIs which are supported by Carto; wrangle that data to Simple Features and then create a series of maps and plots.

I demonstrate the workflow briefly below – one that can be reused for many of the Philadelphia Open Data APIs. The talented folks at the Office of Open Data and Digital Transformation have created helpful vignettes for each API. For example, here is the page for the crime data.

Begin by loading the requisite libraries, and specifying a ‘map theme’ that we can use to customize our data visualizations.

The below query is written in PostGRES, not R. If copied and pasted in to a browser, it returns raw crime data in json format – like so.

Note that we use the Police District (‘dc_dist’) here to query the data spatially. Grab the resulting url and paste it into the ‘fromJSON’ command as below. Note the browser has recoded the POSTGRES into a url that can be read in to R.

Next, the json can be converted into a Simple Features data frame.  We create new X and Y coordinates fields; a ‘yearQuarter’ time period field; and recode the crime type field into a ‘Crime_Type’ which will read easier on a legend.

Recall, we downloaded the data for the entire 18th Police District, but we actually need data at a smaller neighborhood geography. As such, the code below uses the esri2sf package to download the neighborhood boundary shapefile from OpenDataPhilly, selects the Spruce Hill neighborhood and converts to Simple Features. It does the same for the street centerlines file and uses the former shapefile to clip the latter.

Using the resulting data frames, we create the below basemap.


Now all that’s left is to layer the crime data.  First we perform a spatial selection by selecting all crimes that are inside the ‘thisRCO’ boundary. We then map `thisRCO.crimesRecent’. Note that the labels pull directly from the data frame ensuring that this routine is replicable no matter which neighborhood is selected.


If you make one simple change to your PostGRES query (‘current_date >= n’), you can download historical data as well. Here is the crime trend for the last 8 years.


That’s it. As I mentioned, this simple workflow can be adopted for a host of datasets including zoning, Licenses & Inspections, real estate transactions, 311, parking data and more. To check out these and more, head over to OpenDataPhilly and look for the “API Documentation” associated with a given dataset.

Ken Steif, PhD is the founder of Urban Spatial. He is also the director of the Master of Urban Spatial Analytics program at the University of Pennsylvania. You can follow him on Twitter @KenSteif.


March 3, 2018