Tuesday, September 6, 2016

Big Data, Big Problems

I've said before that we desperately need more people to work on exploring extremely large datasets, but it can be hard to recruit people to work on these problems. It's a good sign, I think, that students (from grade-school kids to PhD students) want to engage with what can touch, examine, and observe. It's challenging to convince them that we really need them to work with giant databases about the things they want to study more than the things themselves. I certainly didn't get into landscape ecology and fire research by looking at spreadsheets or computer programmes. Not at first, anyway. Maps and satellite images, were my gateway into big data.

These days I don't spend very much time looking at maps or imagery (but they're still important!), in part because it's difficult to represent more than 3 dimensions at a time and there's just too much data to "see". I spend a lot of time working with stacks of satellite imagery and other geospatial data, mostly trying to figure out what burned (and what didn't) and when, what happened before it burned, while it was burning, and after it burned. So my research is focused on specific questions or hypotheses that I answer or test using large datasets. The advantage of using a really big dataset is that there's a better chance that my findings will be generalizable (i.e., they won't just apply to a small area or a single instance). It's also possible to use large datasets to train machine learning algorithms, which can be quite powerful in determining an unknown quantity or quality based on patterns observed for known instances (they can even find cats!).

There's also the possibility of exploratory research just to see what's there in large datasets. Mining the data for useful information is important because if we only analyse data based on what we want to know, we could miss something. (This is sometimes referred to as the "streetlight effect" because of a joke about a drunk looking for his keys under a streetlight since the lighting is best there). Whatever approach one takes to tangling with big data, it's important to realize that the numbers are linked to the "real", tangible world, and the results of analysing these data can have a major impact. I wish more students were interested in the possibilities of engaging with large datasets, and learning to use the tools necessary to work with big data. 

No comments:

Post a Comment