Big Data and Social Science: Introduction

Big datasets are increasingly common in social science today, and understandably generating a lot of excitement.

However, for anyone just beginning to work with such data, the task of merely managing large datasets – let alone analyzing them – can be daunting.

Since I’ve been working a lot lately with relatively big datasets, I thought I’d offer up a series of posts here on how to go about dealing with them. Although I’m not a computer scientist, hopefully anyone just starting out will find the overview useful.

The series runs as follows:

Part I: The I/O Problem, or Why Big Data Takes Forever to Process
Part II: Big Data on the Desktop
Part III: Big Data in the Cloud
Tutorial: Parsing GDELT with Spark/Shark on EC2

If you have any corrections or suggestions for the series, please be in touch.