I recently read a survey report from Typesafe of 2,136 respondents who were asked about Spark vs Hadoop. You can see the report here for yourself (registration required) https://info.typesafe.com/COLL-20XX-Spark-Survey-Report_LP.html?lst=RW&lsd=COLL-20XX-Spark-Survey-Trends-Adoption-Report
The most interesting part of the report for me was that 78% of respondents were using Spark for fast processing of BATCH data sets! Think about that. Spark can work with HDFS as the persistent data store but Spark is really good at processing streaming, transactional data – but – most respondents are just using it to make batch go faster.
This is our experience too – when we talk to customers they want to consider Spark, they know they have to think about future use cases which will almost certainly involve streaming data, transactional data sets and – most importantly – real time analytics and machine learning. But – for now, even ten years after Doug Cutting and Mike Cafarella invented Hadoop – we are still seeing the vast majority of use case focused on batch processing. It really is – back to the 80’s!
So – in my view – Spark is not replacing Hadoop but is simply complementing what is already out there. What do you think?
This Aptuz blog also summarizes neatly the Spark vs Hadoop discussion http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/