An AtScale benchmark study shows the strengths and weaknesses of Impala, Spark and Hive, the leading SQL-on-Hadoop big data analytics platforms.
What's the best SQL-on-Hadoop solution? For companies choosing between Impala, Spark and Hive, which platform delivers the best speed and performance? Those are questions that a recent benchmark report from AtScale helps answer.
The report, which was published this week, analyzed the performance of the top three SQL-on-Hadoop platforms for business intelligence operations. Those platforms -- Impala, Spark and Hive -- are increasingly important in the data analytics market for companies that want to work with big data while leveraging the scalability of Hadoop, but also maintaining SQL compatibility.
AtScale's main findings included:
- Hive, despite its widespread use in Hadoop environments, did not come out on top in any of the benchmark tests.
- Impala testing results varied significantly depending on query type, data size and other factors. This suggests that Impala can be a winning data analytics solution in some situations, but not all.
- Spark performance for small data sets was markedly better when using Spark 1.6 rather than Spark 1.5. Enterprises that want the most from their Spark systems should upgrade.
The complete results show that there is no one-size-fits-all solution for Hadoop-based business intelligence. Getting the most from data analytics solutions requires evaluating the needs of a particular workload.
That no SQL-on-Hadoop platform outperforms all others is not surprising, of course. It's rare in any context for one vendor's software solution to beat out all others uniformly.
The bigger point to note from the AtScale study is that the leading data analytics platforms seem to be developing different types of strengths. Going forward, those distinctions could prove to be important in determining how these various platforms solidify their positions in the market.