In February 2014, four friends and I went to HackBeanpot and made a product called Adventur, which won the prestigious 'Most Innovative' prize:
Given a set of dense geographical history data, Adventur does two things:
The idea came out of a brainstorming session Eric and I had at Tatte Bakery in Kendall Square.
Tatte Bakery is a coffee shop that's very close to my apartment. I live in Beacon Hill, and Tatte is a 10 minute walk across the Charles River.
I'd been living in Beacon Hill for about five months, but only discovered Tatte a few weeks prior. I wondered what other great places I had been missing that were near routes I took daily.
Eric was wondering the same, and we decided to find out during HackBeanpot.
At its core, making Adventur meant solving the problem of finding anomalies in a set of geo data.
We reduced this to the problem of determining the probability that the user visits place n after visiting the previous n-1 places.
I also thought it was reasonable to assume that the nth place that the user visited was only ever dependent on the previous n-1 places.
These assumptions scream "markov model".
So now, all we needed to do was represent the geo-data in a way that an n-gram model could work with.
But that didn't mean we couldn't implement it ourselves.
So we did.
We successfully implemented the algorithm (with some tweaks).
If you gave us a week of your geographic history, we could:
We thought that was very, very cool.
It took Eric and I the first 24 hours of the hackathon to do.
There were several problems with this algorithm, though:
The algorithm needs dense geo-data to work correctly, but the denser the data, the slower the algorithm ran.
We ended up taking a uniform sampling of the data, and I tried to strike a balance between density and performance.
Even when sampling 1000 data points (not that many, when two weeks of data nets you 18,000 points), the algorithm took over five minutes to complete on my machine.
The sampling also reduced the accuracy of the model significantly.
Our goal was to be able to have users upload their geo-data, and then quickly see their results. This wasn't working.
The first part of our algorithm requires you to do the following:
Taught to think functionally during my first year as an undergraduate, I knew that the best approach to this problem was a recursive one, so I wrote one.
It was pretty, had a tail-call, and worked.
But it was written in Python, and Python has a recursion limit, and no tail-call optimization.
So, when we started to try to use it on samples of lengths over 1000 data points, python would just break the computation.
We found this out at midnight on Saturday, about twelve hours before demos.
So, at midnight, after sleeping a whopping total of two hours the night before, we decided to pivot, and create our own algorithm.
Stay tuned for part two - I'll describe the final algorithm we came up with and what tools we used to get there.
I'm interested in building technological platforms that leverage what we know about social dynamics to help people live their lives better.
I'm currently working at the Human Dynamics Group at the MIT Media Lab, creating systems that attempt to measure and impact human social and health behaviors.
I've also worked with the Lazer Lab, inferring partisan dynamics from congressional public statements.
You can e-mail me at firstname.lastname@example.org
Send me encrypted messages using my PGP key. (via keybase)