Scientists created a cheap, accurate way to identify insects (and wrote a great big data explainer in the process)

Researchers from the University of California, Riverside, have developed a method for classifying insects that they say blows away all previous methods in terms of accuracy, speed and practicality. The keys to their success: off-the-shelf laser pointers and a thorough understanding of how to use big data.
The paper, which is available for download here, reads like how-to guide for applied big data — even for people unfamiliar with the mathematical and statistical concepts involved. The authors explain how they gathered the data they gathered, why more data matters and how it helped improve the accuracy of their model. They describe clearly the type of model they used — a Bayesian classifier — as well the effects of adding or removing features to aid in classification, and how it compares both in terms of performance and flexibility with other approaches.
Of course, the research — which focused largely on mosquitoes and flies — is potentially very useful, too. Here’s the short version.
Decades of previous research into insect classification, the authors explained, have relied on microphones to capture the sounds insects make when they fly by. Unfortunately, microphones capture so much ambient noise that unless an insect flies within the ideal distance of the microphone under ideal conditions, it can be difficult to capture useful data. Small datasets, combined with sometimes very unnatural conditions in order to maximize data collection, can result in predictive models that prove less accurate once they’re applied to new data that wasn’t part of the study (a result generally referred to as overfitting).
The advent of big data has helped mitigate overfitting because the more data, and more types of data, there are on which to train and test model, the easier it should be to detect the features that actually matter. Think about trying to predict somebody’s relationship status, but studying only the Facebook profiles of students at a particular university in order to build your understanding of what married, single or dating people “look” like. Particularities in geography, education level, age and other factors might result in a model that’s great at predicting whether college students are married or single, but not so great at predicting the same across the general population.

The setup used to measure the insects' data.

The setup used to measure the insects’ data.

Which brings us to the laser pointers. Paired with a phototransistor, and a digital recorder, the laser pointers provide a novel way for capturing the sound an insect makes while flying by without succumbing to the weaknesses of the microphone approach. The interruptions in the light of the laser beam caused by the insects’ wings is captured and turned into an audio file. Using this method, the researchers claimed they captured tens of millions of data points of insect sounds, each accurately labeled as one of the six species of insects they studied during the experiment.
A website dedicated to the research includes examples of the audio files generated by the insects.
However, the team went a step further and analyzed the circadian rhythms (i.e., the times of day they’re active) of the insects they studied in order to make the model more accurate by accounting for time rather than just wingbeat patterns, which can be very similar among different species. Their model also accounts for the common geographic ranges of insects, so it could safely assume, for example, that a mosquito in sub-Saharan Africa probably isn’t isn’t a species primarily found in the United States, even if they share similar circadian rhythms and sounds.
Wingbeats can be unique, but often are very similar.

Wingbeat patterns can be unique, but often are very similar.

In the end, their model could accurately classify insects 79.44 percent of the time when dealing with 10 different classes of insects (this included males and females from 4 of the 6 species studied). Dealing with just two classes, it was accurate 98.99 percent of the time. The model accurately distinguished between males and females of the same species more than 99 percent of the time.
If the predictive model and the sensor setup work outside the lab, as the researchers predict they should (in fact, they suggest the setup could be enhanced to include additional features such as an insect’s size or odor), the applications of this approach could be meaningful. It could help trap and kill insects that are harmful to people or crops, for example, while releasing those that are not harmful and might even be beneficial. In areas where precaution is the word, the model could be tuned to include more false positives, inflicting more collateral damage on non-harmful bugs but helping ensure no harmful ones escape.
Combined with this type of accuracy in determining what’s harmful and what’s not, that mosquito-zapping laser system doesn’t seem so ridiculous after all.