Categories
Uncategorized

Hadoop, MapReduce and processing large Twitter datasets for fun and profit

I’m back in school for my PhD, and getting ready to conduct research on a HUGE Twitter dataset on the US 2012 presidential election collected by the Social Media and Democracy research team at UW-Madison. We’ve been brushing up on Python, Hadoop and MapReduce. As part of our training, Alex Hanna, a sociology PhD student at UW-Madison, put together an excellent series of workshops on Twitter (or, as he’s aptly named them, “Tworkshops”) to get us started. Check them out!

This fall I just enrolled back to complete my PhD at the School of Journalism and Mass Communications (SJMC) at the University of Wisconsin-Madison. As part of my activities, I’ve been attending the sessions of the Social Media and Democracy research group at SJMC, a great collaborative effort to further research in Social Media and how it’s used in political communications.

As part of a series of upcoming research projects on a HUGE Twitter dataset collected SMAD¬† during the US 2012 presidential election, we’ve been brushing up on Python, Hadoop and MapReduce. I’m very excited about this opportunity, as big data analysis seems to be coming of age and gaining traction on in several areas of communication research.

Getting started with Hadoop and MapReduce

As part of our training, Alex Hanna, a sociology PhD student at UW-Madison, put together an excellent series of workshops on Twitter (or, as he’s aptly named them,¬† “Tworkshops“) to get the whole SMAD team started in the art of big data analysis. It’s an excellent reference for beginners, so if you are interested in analyzing Twitter data, this is definitely for you:

Thanks to Alex for all his time and effort. This is all incredibly cool, and I’m looking forward to continue exploring our dataset and learning more about Hadoop and MapReduce!

Leave a Reply

Your email address will not be published. Required fields are marked *