Spark and Hadoop - replace or complement?

I recently read a survey report from Typesafe of 2,136 respondents who were asked about Spark vs Hadoop. You can see the report here for yourself (registration required)

The most interesting part of the report for me was that 78% of respondents were using Spark for fast processing of BATCH data sets! Think about that. Spark can work with HDFS as the persistent data store but Spark is really good at processing streaming, transactional data – but – most respondents are just using it to make batch go faster.

This is our experience too – when we talk to customers they want to consider Spark, they know they have to think about future use cases which will almost certainly involve streaming data, transactional data sets and – most importantly – real time analytics and machine learning. But – for now, even ten years after Doug Cutting and Mike Cafarella invented Hadoop – we are still seeing the vast majority of use case focused on batch processing. It really is – back to the 80’s!

So – in my view – Spark is not replacing Hadoop but is simply complementing what is already out there. What do you think?

This Aptuz blog also summarizes neatly the Spark vs Hadoop discussion

Contact Us

Contact us for more information about our products, professional services and support.



By continuing to use the site, you agree to the use of cookies. More information ?

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.