These days I don't spend very much time looking at maps or imagery (but they're still important!), in part because it's difficult to represent more than 3 dimensions at a time and there's just too much data to "see". I spend a lot of time working with stacks of satellite imagery and other geospatial data, mostly trying to figure out what burned (and what didn't) and when, what happened before it burned, while it was burning, and after it burned. So my research is focused on specific questions or hypotheses that I answer or test using large datasets. The advantage of using a really big dataset is that there's a better chance that my findings will be generalizable (i.e., they won't just apply to a small area or a single instance). It's also possible to use large datasets to train machine learning algorithms, which can be quite powerful in determining an unknown quantity or quality based on patterns observed for known instances (they can even find cats!).
There's also the possibility of exploratory research just to see what's there in large datasets. Mining the data for useful information is important because if we only analyse data based on what we want to know, we could miss something. (This is sometimes referred to as the "streetlight effect" because of a joke about a drunk looking for his keys under a streetlight since the lighting is best there). Whatever approach one takes to tangling with big data, it's important to realize that the numbers are linked to the "real", tangible world, and the results of analysing these data can have a major impact. I wish more students were interested in the possibilities of engaging with large datasets, and learning to use the tools necessary to work with big data.
No comments:
Post a Comment