A survey of data scientists from database vendor Paradigm4 suggests that Hadoop and Spark are not as useful for sophisticated Big Data analytics as a non-expert might think.
The problem with Big Data today is not its scope, but rather the diverse forms it takes. That's according to a survey out this week from database vendor Paradigm4, which also said Hadoop may not be as useful as all the hype suggests.
The survey, "Leaving Data on the Table: Data Scientists Reveal Obstacles to Big Data Analytics," asked data analytics experts to comment on the challenges they face meeting Big Data demand today. The 111 respondents indicated some interesting trends:
- 71 percent of the data scientists surveyed agreed that "my analytics are becoming more difficult because of the variety and types of data sources (not just the volume)."
- Only 48 percent of respondents reported using Hadoop or Spark, suggesting that a majority of data experts don't see those platforms as necessary tools for handling data today.
- 49 percent of respondents reported difficulty storing data in relational databases, the type of database that has formed the basis for most large-scale storage for decades.
Based on the survey results, Marilyn Matz, CEO of Paradigm4, said, "The increasing variety of data sources is forcing data scientists into shortcuts that leave data and money on the table. The focus on the volume of data hides the real challenge of data analytics today. Only by addressing the challenge of utilizing diverse types of data will we be able to unlock the enormous potential of analytics."
It's important to keep in mind that Paradigm4 surveyed data scientists, who are more likely to find value in the type of highly sophisticated analytics tools, including SciDB and R, that Paradigm4 integrates into its database platform. Hadoop surely still fills a key niche as a more basic Big Data environment for organizations that can live without precision analytics. But that doesn't mean that a more robust Big Data strategy, focused on addressing the diversity of data as well as volume, wouldn't pay off for enterprises that make the investment.
The survey, which Paradigm4 released July 1, is available in full here.