Spark and Hadoop – replace or complement?
I recently read a survey report from Typesafe of 2,136 respondents who were asked about Spark vs Hadoop. You can see the report here for yourself (registration required) https://info.typesafe.com/COLL-20XX-Spark-Survey-Report_LP.html?lst=RW&lsd=COLL-20XX-Spark-Survey-Trends-Adoption-Report
The most interesting part of the report for me was that 78% of respondents were using Spark for fast processing of BATCH data sets! Think about that. Spark can work with HDFS as the persistent data store but Spark is really good at processing streaming, transactional data – but – most respondents are just using it to make batch go faster.
This is our experience too – when we talk to customers they want to consider Spark, they know they have to think about future use cases which will almost certainly involve streaming data, transactional data sets and – most importantly – real time analytics and machine learning. But – for now, even ten years after Doug Cutting and Mike Cafarella invented Hadoop – we are still seeing the vast majority of use case focused on batch processing. It really is – back to the 80’s!
So – in my view – Spark is not replacing Hadoop but is simply complementing what is already out there. What do you think?
This Aptuz blog also summarizes neatly the Spark vs Hadoop discussion http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
If you have additional questions, get in touch with us!
Corporate Head Quarters
2205 152nd Avenue NE
Redmond, WA 98052
+1 (425) 605 1289
(Mexico, Colombia & Chile)
Córdoba 42 Int. 807, Roma Norte, Cuauhtémoc, 06700, Mexico City
+52 (55) 5255 1329
85 Great Portland Street, First Floor, London, W1W 7LT
+44 2030 971584
77 Camden Street Lower, Dublin, D02 XE80, Ireland
+353 71 915 9710
Search Guard is a trademark of floragunn GmbH, registered in the U.S. and in other countries. Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries. OpenSearch is licensed under Apache 2.0. All other trademark holders rights are reserved.