Last week Cloudera announced the One Platform initiative http://vision.cloudera.com/one-platform/ . This is an interesting development – we know that Spark replaced Hadoop as the most popular Apache project and that Cloudera first backed SPark publicly over a year ago. (Hadoop still has more committers though (https://projects.apache.org/projects.html?number) .
In making the announcement Cloudera also confirmed the gaps that exist in Spark to-day, somewhat similar to lists I have shared here :-
We know that Cloudera intends to replace Map-Reduce with Spark. The only way this can happen is for Spark to scale. Our experience with Spark is that it does work and has a lot of advantages in use cases like Machine Learning, way faster than Hive, alerting and aggregating in streaming analysis, joining multiple data sources in memory. But the one thing Spark does not do well is scale. Compared to the scale advantages that Mapreduce provides Spark is a long way behind.
But most interesting is the fact that Enterprises want to have a single data lake/warehouse/repository/whatever you want to call it. Until Spark can scale then all those pesky structured data systems that have been built by customers using Oracle and all their competitors will continue to be used and deployed. Therefore – if Cloudera succeed – and I think they will – the hollowing out of the traditional database vendors is going to really take off.
What do you think? Is the One Platform initiative just marketing hype? Will Cloudera succeed in replacing Oracle and their like? Let us know and check out our new website at www.exceleratesystems.com.