You heard it, I’m against it. In most cases, that is, a data scientist’s purview is more limited than we often like to admit.
Whether “predictive modeling specialists”, “analytics experts”, “expert statisticians”, or “data mining monsters”.. They are around. They go in and out of your business, your newspaper, your favorite blog, all the time.
The legend is, they work in mysterious ways to collect and crunch data. There use science, algorithms, things like neural networks, support vector machines and , behold, deep learning.
They’re usually overrated. Here are a couple of reasons why:
1. No algorithm beats the human brain, still. As a firm believer in Malcolm Gladwell’s “Blink”, I do believe that the human brain works in mysteriously powerful ways. This means, a “domain expert” with vast hands-on experience can drive better decisions with intuition without investing loads of cash in collecting and interpreting data.
2. Domain knowledge is underrated. You can’t crunch bank data without knowing what a bank is, and you can’t crunch it well without being a banker. The turf of the data scientist in many business cases starts from eliciting the problem, formulating it and then only, analyzing it. We usually can’t wrap our minds around high-level concepts before we see them at work. This is why a “data science generalist” usually starts a step behind in catching up to the idea and generating value for the client organization.
3. Data is biased. Most of the cases. Conducting a survey, you get the data of people willing to submit their data (a biased population). Collecting your web site’s clickstream, you only get the people who already visit your website, skewed towards those who visit it more often. By the time you interpret and take action on the data, the population you capture may have (and probably will have) changed.
4. Data is prone to various different fallacies. If you really want to draw a conclusion from data, chances are you will find a subset or subview of data to support it (i.e. Texas sharpshooter problem). If you find a correlation, you will probably jump to a causality (i.e. false cause). Combine it with business arguments and it’s not hard to stride into slippery slopes, moving goalposts or strawmans. Branding an argument “data-driven” does not mean it’s not fallacious.
5. Data science talent is not abundant. And sub-standard expertise in the field can land you in a worse position than the one you started in. I run into stupid mistakes, mismanagement of bias and variance, over-sensitive assumptions and predictive modelling overkills every single day. Even if there is the miraculous and arcane art of making data work, we’re just not that good at it yet.
Making a decision or managing your business; if there is a trusted expert around, just ask his opinion. If you need to forecast the return on a campaign, just run a pilot instead of torturing your data. It’s not magic, it’s the same machine you use to read e-mails every day, and the same guy that would have otherwise worked for the census bureau. They can’t work wonders.
Image Source: TechCrunch.com
Image Source: Flickr