Achieving End-to-end Security for Apache Spark with Databricks
June 8, 2016 in Company Blog
Today we are excited to announce the completion of the first phase of the Databricks Enterprise Security (DBES) framework. We are proud to say that this makes Databricks the first and only company to provide comprehensive enterprise security on top of Apache Spark.
Hundreds of organizations have deployed Databricks to improve the productivity of their data teams, power their production Spark applications, and democratize data access. As Databricks continues to gain adoption across security-minded industries such as financial services and healthcare, we are also focused on enabling them to maximize the value from their data while satisfying strict security and compliance requirements in their respective industries (such as Sarbanes-Oxley or HIPAA).
Holistic Security for the Big Data Lifecycle
Traditionally, enterprise organizations only had security solutions that addressed parts of their big data infrastructure. Today, enterprises demand holistic security that covers the full spectrum of their big data lifecycle: from file processing, big data clusters, code management, job workflows, application deployments, dashboards, to reports.
The Databricks just-in-time data platform takes a holistic approach to solving the enterprise security challenge by building all the facets of security — encryption, identity management, role-based access control, data governance, and compliance standards — natively into the data platform with DBES.
- Encryption: Provides strong encryption at rest and inflight with best-in-class standards such as SSL and keys stored in AWS Key Management System (KMS).
- Integrated Identity Management: Facilitates seamless integration with enterprise identity providers via SAML 2.0 and Active Directory.
- Role-Based Access Control: Enables fine-grain management access to every component of the enterprise data infrastructure, including files, clusters, code, application deployments, dashboards, and reports.
- Data Governance: Guarantees the ability to monitor and audit all actions taken in every aspect of the enterprise data infrastructure.
- Compliance Standards: Achieves security compliance standards that exceed the high standards of FedRAMP as part of Databricks’ ongoing DBES strategy.
In short, DBES will provide holistic security in every aspect of the entire big data lifecycle.
Major Achievements in DBES Phase One
DBES builds upon the extensive Databricks access management and encryption functionalities that already exist. With the completion of DBES Phase One today, enterprises gain the ability to control access to Apache Spark clusters on an individual basis, manage user identity with a SAML 2.0 compatible identify management provider service, and end-to-end auditability.
Cluster Access Control Lists
The Cluster Access Control Lists, or cluster ACLs, gives Databricks administrators the ability to fine-tune the autonomy of Databricks users based on the enterprise security policy. For example, one can strictly limit the ability to launch new clusters to control costs while giving teams the complete freedom to run code on existing clusters in a self-service manner.
Specifically, an administrator will be able to define whether users are allowed perform the following actions on an individual basis:
- Launch a new cluster
- Terminate an existing cluster
- Run code on (attach to) an existing cluster
- Change the configuration of an existing cluster
- Restart an existing cluster
SAML 2.0 Support
Enterprises will now be able to use a SAML 2.0 compatible identity provider to authenticate and authorize access to the Databricks platform. Since many enterprises already utilize an identity provider service, and virtually all major identity providers (e.g., Okta, PingIdentity) support SAML 2.0, this will vastly simplify the setup and management of accounts on the Databricks platform. Databricks users will also enjoy a more streamlined login process, as now they can log into the platform with a single click instead of having to remember (and possibly recover) passwords.
End-to-End Audit Logs
The audit logs will provide enterprises in security-conscious industries such as healthcare or financial services the tools to satisfy strict compliance requirements, such as HIPAA or Sarbanes-Oxley. The Databricks audit logs are a comprehensive record of activity on the platform, allowing enterprises to monitor the detailed usage patterns of Databricks as the business requires. This allows a central authority to easily reconstruct critical events with:
- The time and details of an action.
- The user who triggered the action (including administrators).
- And other crucial information.
These logs are stored in a human readable format so one can explore the logs easily, the administrator can also analyze the information in the audit logs using the Databricks platform itself.
Making Big Data Simple (and Secure)
Databricks’ vision is to empower anyone to easily build and deploy advanced analytics solutions. With the Databricks Enterprise Security Framework, Databricks can satisfy the diverse (and sometimes competing) needs to secure big data in the modern enterprise, end-to-end. Phase One is only the beginning, stay tuned for more advances in the near future.
Interested in securing your Apache Spark workloads with Databricks? Test drive the platform with a free trial or contact us for a personalized demo.