Machine Learning and Spark – get ready for the next big disruptor

There are lots of articles, blogs, reports and noise at the moment about Spark and machine learning – driven primarily by the rapid adoption of MLlib (Spark’s general machine learning library) that is leading developers to use R and Python in particular for Advanced Analytics. For a great overview go to Infoworld – Why you should use Spark for Machine learning.

It’s generally recognized that Spark has a long way to go before it is fully Enterprise ready. Almost every client I talk to follows a very familiar pattern – they want to try it for speed and scale, they try it and get disappointed in particular by it’s scaleability and then decide to wait.

However, when Machine Learning comes into the discussion, Spark adoption is rapid, visible and highly successful. Customers are now recognizing the growing power of Spark/MLLib, particularly with the growing number of algorithms Spark MLLib supports.  ML has been around since 1979 and more recently the ‘not very good’ Mahout implementation has led to a lot of disappointed projects.

We don’t have space here to go into the details of ML but I notice four key trends that will help customers see strong and rapid time to value in their machine learning projects :-

  1. Customer 360 views are one of the most common Big Data use cases. Using ML and Spark MLLib in particular, customers can leverage massive data volumes to make product recommendations to customers in real time using ads or other recommendation platforms. ML can take Recommendation and Monetization engines to whole new level of predictability and relevance in real-time  
  2. Similarly in Mobile Networks, ML can be used to predict and manage Network Optimization – a critical cost element in Mobile Network profitability. Think about it like a river. Use ML to maximize the flow of water through the narrowest channels while maintaining speed and volume. Maximum benefit flows from predicting in near real time how the flows (Wireless traffic) should be managed.  
  3. With Geolocation services, massive data volumes and ML, Retailers can tailor specific offers to individuals. Imagine a scenario where you go into a Nordstrom’s type store, the Store ML system picks up (from the Store’s already installed  Mobile App) that you have entered the store. As you wander round the various departments the ML system is rapidly choosing products you will be interested in (and presenting them on your mobile device) and, when you press the ‘Get Help’ button on your phone, the Sales assistant glides over, already armed with all your previous purchase history and set of suggestions on what to buy. They open the conversation with ‘Good Morning Mr. Bennett, let’s take a look at that Emile Staub Cocotte that you looked at last time you were here’…..  
  4. Data Wrangling is still a big issue, Machine learning based companies like Trifacta are starting to get a lot of traction inside the Enterprise. Once large companies understand how ML apps can change their entire Big Data ecosystem, ML will become a mainstream technology during 2016.  

Want to know more about Machine learning – take a look at this Infoworld slideshare
What do you think? Is Machine Learning the next big disruptor?

If you have additional questions, get in touch with us!

10 + 8 =


Corporate Head Quarters

2205 152nd Avenue NE
Redmond, WA 98052

+1 (425) 605 1289

Latin America

(Mexico, Colombia & Chile)

Mexico City

Córdoba 42 Int. 807, Roma Norte, Cuauhtémoc, 06700, Mexico City

+52 (55) 5255 1329

United Kingdom


85 Great Portland Street, First Floor, London, W1W 7LT

+44 2030 971584



77 Camden Street Lower, Dublin, D02 XE80, Ireland

+353 71 915 9710

Search Guard is a trademark of floragunn GmbH, registered in the U.S. and in other countries. Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries. OpenSearch is licensed under Apache 2.0. All other trademark holders rights are reserved.