Hadoop, MapReduce and processing large Twitter datasets for fun and profit

This fall I just enrolled back to complete my PhD at the School of Journalism and Mass Communications (SJMC) at the University of Wisconsin-Madison. As part of my activities, I’ve been attending the sessions of the Social Media and Democracy research group at SJMC, a great collaborative effort to further research in Social Media and how it’s used in political communications.

As part of a series of upcoming research projects on a HUGE Twitter dataset collected SMAD  during the US 2012 presidential election, we’ve been brushing up on Python, Hadoop and MapReduce. I’m very excited about this opportunity, as big data analysis seems to be coming of age and gaining traction on in several areas of communication research.

Getting started with Hadoop and MapReduce

As part of our training, Alex Hanna, a sociology PhD student at UW-Madison, put together an excellent series of workshops on Twitter (or, as he’s aptly named them,  “Tworkshops“) to get the whole SMAD team started in the art of big data analysis. It’s an excellent reference for beginners, so if you are interested in analyzing Twitter data, this is definitely for you:

Thanks to Alex for all his time and effort. This is all incredibly cool, and I’m looking forward to continue exploring our dataset and learning more about Hadoop and MapReduce!