Big Data and Security – still a problem?

When we talk to clients about Big Data it’s often assumed that there is a strong security infrastructure in place to secure all that data either at rest or in motion. While that assumption is starting to be true it’s important to understand where the Hadoop Eco-system is strong in security and where there is additional work to do.

  1. Go back 2 years and the the answer to the ‘Security?’ question was very simple – Kerberos. While that sounds a little silly now, it really was that basic. Any requirements for Role Based Access control, encryption, audit trails / governance / compliance, intrusion detection and all the other requirements of Enterprise security were handled in a single somewhat evasive response ‘Those aspects are covered by the operating systems and / or the networking systems.’ What this actually meant was that Big Data implementations had gaping holes in security.
  2. We worked with one client whose Internal Audit team worked out this lack of security and kept a large Big Data project on hold for a full 12 months before they would allow the Hadoop cluster to be joined in any way to the Enterprise Infrastructure.
  3. If we look at where Security in Hadoop stands today then customers can implement
    1. Authentication using LDAP or AD. (this is the Kerberos stuff)
    2. Encryption at rest or in motion inside the cluster.
    3. Role Based Access control (for example using Apache Sentry)

    5. Data redaction (thus avoiding the problem of admins having access to all data). This is critical when using Hadoop clusters to PCI use cases so that PII (Personally Identifiable information) can be redacted.
    6. Data Governance that provides Auditing, data lineage, data life-cycle management.
    7. Key management to manage encryption keys, certificates etc.

These are all significant advances that have been implemented in various Apache projects such as Sentry. However, it’s also clear in our discussions with clients that most CIO’s and CISO’s still don’t feel 100% comfortable with Big Data Security. This is particularly true in Europe as a recent survey by Forrester showed.
So – when we talk to clients about their Big Data strategy and how they should design and architect what, for many of them, is totally new, we now ask a set of simple questions :-

  1. Is the CISO part of the Big Data strategy team, if not why not?
  2. As part of the Discovery process, we need to make sure the client has the rights to use all the data they plan to use.
  3. Will the client implement data redaction?
  4. Is the client willing to encrypt everything?
  5. Who owns the cluster security profile?.

This is not meant to be a complete list but it simply makes the client consider Security as an issue at the design phase not as an afterthought.
What do you think? Is Security in Big Data a big problem? Do you think the progress in the last 2 years has allowed Hadoop implementations to catch up with more traditional designs?

If you have additional questions, get in touch with us!

7 + 10 =


Corporate Head Quarters

2205 152nd Avenue NE
Redmond, WA 98052

+1 (425) 605 1289

Latin America

(Mexico, Colombia & Chile)

Mexico City

Córdoba 42 Int. 807, Roma Norte, Cuauhtémoc, 06700, Mexico City

+52 (55) 5255 1329

United Kingdom


85 Great Portland Street, First Floor, London, W1W 7LT

+44 2030 971584



77 Camden Street Lower, Dublin, D02 XE80, Ireland

+353 71 915 9710

Search Guard is a trademark of floragunn GmbH, registered in the U.S. and in other countries. Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries. OpenSearch is licensed under Apache 2.0. All other trademark holders rights are reserved.