During last weekend's 37Billion Datathon, I learned how to use some basic GIS tools.
We were given a shapefile with a series of 250m x 250m grids that covered Massachusetts.
One of the things I immediately wanted to do with the grid shapefile data was compute the distance from each grid cell to the nearest public transit station.
The dataset we were given included a walksheds file that represented the areas that were walkable around public transit hubs, but I wanted to be able to do a more fine-grained analysis.
I knew I could search for a local public transit station given a latitude & longitude, but first, I needed to get the latitude and longitude from the grid cells.
After some trial and error, I figured out how to convert latitude & longitude to a projection's coordinates. Here are the steps:
It took me several hours to figure out how to successfully install and link GDAL. This is what worked for me:
Download & install the GDAL Framework from here
It'll install to
/Library/frameworks/GDAL.framework/Programs to your PATH environment variable.
This is important, because other packages will use the gdal-config program to determine where to find the GDAL libraries.
Import fiona, pyproj, and shapely.
load the grid shapefile
import os, fiona, pyproj, shapely import fiona.crs # put the relative path to the directory that contains the data here data_dir = "../data/massvehicledata/" shapefile = os.path.join(data_dir, 'grid_250m_shell.shp')
As I understand it, a PROJ4 string is a standard way of clearly defining a projection from a coordinate system. You can read more about Proj4 here.
Here, fiona does all the work and pulls it out of the shapefile:
grid_proj_string = None with fiona.open(shapefile) as source: crs = source.crs grid_proj_string = fiona.crs.to_string(crs)
'+datum=NAD83 +lat_0=41 +lat_1=41.7166666667 +lat_2=42.6833333333 +lon_0=-71.5 +no_defs +proj=lcc +units=m +x_0=200000 +y_0=750000'
grid_proj_string, and project a latitude and longitude.
Pyproj allows us to do coordinate projections in the exact way that I explained in the beginning of the article - all we have to do is give it the PROJ4 string.
(42.3428482,-71.0938683) (Northeastern University) in the coordinates used by the grid shapefile is:
grid_proj = pyproj.Proj(grid_proj_string)
To make sure this works, let's get the bounding box for the very first grid cell in the shapefile.
The bounding box of this grid in lat/lon is:
import shapely.geometry with fiona.open(shapefile) as source: # get the first shape shapefile_record = source.next() shape = shapely.geometry.asShape(shapefile_record['geometry']) minx, miny, maxx, maxy = shape.bounds maxx_ll, maxy_ll = grid_proj(maxx, maxy, inverse=True) minx_ll, miny_ll = grid_proj(minx, miny, inverse=True) first_bounding_box_ll = shapely.geometry.box(minx_ll, miny_ll, maxx_ll, maxy_ll)
Bounding box of the first grid cell, in lat/lon:
(-70.79455544033586, 42.75570216940319, -70.79147634111301, 42.75793388033074)
Here's where the minimum lat/lon point of this grid is according to google maps:
Seems sort of reasonable - at the very least, the point is on an edge boundary of Massachusetts. I'm not sure if there's a good, reliable way of testing if that's correct, but I'll take it for face value for now.
I'm interested in building technological platforms that leverage what we know about social dynamics to help people live their lives better.
I'm currently working at the Human Dynamics Group at the MIT Media Lab, creating systems that attempt to measure and impact human social and health behaviors.
I've also worked with the Lazer Lab, inferring partisan dynamics from congressional public statements.
You can e-mail me at email@example.com
Send me encrypted messages using my PGP key. (via keybase)