Fuzzy matching at Scale

October 18, 2014

In the last few months I’ve given two different talks about scalable fuzzy matching. The first was a Strata in San Jose, titled Similarity at Scale. In that talk I focused mostly on techniques for doing fuzzy matching (or joins) between large data sets, primarily via Cascading workflows. More recently I presented at Cassandra Summit 2014, on Fuzzy Entity Matching. This was a different take on the same issue, where more…

The Durkheim Project goes live!

July 3, 2013

As of today, the Durkheim Project is now live. This is a research project involving Patterns and Predictions, the Geisel School of Medicine at Dartmouth, the U.S. Department of Veterans Affairs (VA) and Facebook. See the Durkheim Project launch announcement for full details. The worthy goal of the Durkheim Project is to improve the medical community’s ability to predict suicides. The driving force was original the military’s concern about increasing more…