Security and Trust Center

Your data security is our priority

background-image

We know that data is one of your most valuable assets and always has to be protected — that’s why security is built into every layer of the Databricks Lakehouse Platform. Our transparency enables you to meet your regulatory needs while taking advantage of our platform.

Perform your own self-service security review of Databricks using our due diligence package, which includes documentation and compliance materials.

Accenture
Wehkamp Logo
Wehkamp Logo
“With simplified administration and governance, the Databricks platform has allowed us to bring data-based decision-making to teams across our organization. The ease of adding users, native security integrations with cloud providers and APIs-for-everything has enabled us to bring the data and tools we need to every employee in Wehkamp.”

— Tom Mulder, Lead Data Scientist at Wehkamp

Adren Street Labs
Wehkamp Logo
Wehkamp Logo
“The nearly dozen solutions we have developed are all built on Azure Databricks as a core foundation. This has allowed us to leverage a rapid Lab to Operations deployment pattern, whilst maintaining data security and computational scalability.”

— Jeff Feldman, CTO of Arden Street Labs

Credit Suisse
Wehkamp Logo
Wehkamp Logo
“Despite the increasing embrace of big data and AI, most financial services companies still experience significant challenges around data types, privacy and scale. Credit Suisse is overcoming these obstacles by standardizing on open, cloud-based platforms, including Azure Databricks, to increase the speed and scale of operations and ML across the organization.”

— Credit Suise case study

background-image

Trust

Our trusted platform is built by embedding security throughout the software development and delivery lifecycle. We follow rigorous operational security practices such as penetration testing, vulnerability assessments and strong internal access controls. We believe transparency is the key to winning trust — we publicly share how we operate and work closely with our customers and partners to address their security needs.

Contractual commitment

Beyond the documentation and best practices you will find on our Security and Trust Center, we also provide a contractual commitment to security to all our customers. This commitment is captured in the Security Addendum, which is part of our customer agreement. The Security Addendum describes in clear language a list of security measures and practices we follow to keep your data safe.

Vulnerability management

Detecting and quickly fixing vulnerable software is among the most important responsibilities for any software or service provider, whether the vulnerability exists in your code or the software that you rely on. We take this responsibility very seriously, and provide information about our remediation timelines in our Security Addendum.

Internally we use several well-known security scanning tools to identify vulnerabilities within the platform. Databricks also employs third-party services to analyze our public-facing internet sites and identify potential risks. Severity-0 vulnerabilities, such as zero days that are known to be actively exploited, are treated with the highest urgency, and their fix is prioritized above all other rollouts.

Penetration testing and bug bounty

We perform penetration testing through a combination of an in-house offensive security team, qualified third-party penetration testers and a year-round public bug bounty program. We typically perform 8-10 external third-party penetration tests and 15-20 internal penetration tests per year. We publicly share a platform-wide third-party test report as part of our due diligence package.

We are committed to helping customers gain confidence in the workloads they run on Databricks. If your team would like to run a pen test against Databricks, we encourage you to:

  • Run vulnerability scans within the data plane systems located in your cloud service provider account.
  • Run tests against your own code, provided that those tests are entirely contained within the data plane (or other systems) located in your cloud service provider account and are evaluating your own controls.
  • Participate in the bug bounty program.

Join the Databricks Bug Bounty program facilitated via HackerOne and get access to a deployment of Databricks that isn’t used by live customers.

Internal access

We apply strict policies and controls to internal employee access to our production systems, customer environments and customer data.

We require multifactor authentication to access core infrastructure consoles such as the cloud service provider consoles (AWS, GCP and Azure). Databricks has policies and procedures to avoid the use of explicit credentials, such as passwords or API Keys, wherever possible. For example, only appointed security members can process exception requests for new AWS IAM principals or policies.

Databricks employees can access a production system under very specific circumstances. Any access requires authentication via a Databricks-built system that validates access and performs policy checks. Access requires that employees be on our VPN, and our single sign-on solution requires multifactor authentication.
Learn more →

Our internal security standards implement separation of duties wherever possible. For example, we centralize our cloud identity provider’s authentication and authorization process to separate authorizing access (Mary should access a system) from granting access (Mary now can access a system).

We prioritize least privileged access, both in internal systems and for our access to production systems. Least privilege is explicitly built into our internal policies and reflected in our procedures. For example, most customers can control Databricks employee access to their workspace, and we automatically apply numerous checks before access can be granted and automatically revoke access after a limited time.
Learn more →

Secure software development lifecycle

Databricks has a software development lifecycle (SDLC) that builds security into all steps, from feature requests to production monitoring, supported by tooling designed to trace a feature through the lifecycle. We have automatic security scanning of systems, libraries and code, and automated vulnerability tracking.

Databricks leverages an Ideas Portal that tracks feature requests and allows voting both for customers and employees. Our feature design process includes privacy and security by design. After an initial assessment, high-impact features are subject to Security Design Review from a security expert in engineering, along with threat modeling and other security-specific checks.

We use an agile development methodology and break up new features into multiple sprints. Databricks does not outsource the development of the Databricks platform, and all developers are required to go through secure software development training, including the OWASP Top 10 at hire and annually thereafter. Production data and environments are separated from the development, QA and staging environments. All code is checked into a source control system that requires single sign-on with multifactor authentication, with granular permissions. Code merge requires approval from the functional engineering owners of each area impacted, and all code is peer reviewed.

We run quality checks (such as unit tests and end-to-end tests) at multiple stages of the SDLC process, including at code merge, after code merge, at release and in production. Our testing includes positive tests, regression tests and negative tests. Once deployed, we have extensive monitoring to identify faults, and users can get alerts about system availability via the Status Page. In the event of any P0 or P1 issue, Databricks automation triggers a “5 whys” root cause analysis methodology that selects a member of the postmortem team to oversee the review, and follow-ups are tracked.

We use best-of-breed tools to identify vulnerable packages or code. Automation in a preproduction environment runs authenticated host and container vulnerability scans of the operating system and installed packages, along with dynamic and static code analysis scans. Engineering tickets are created automatically for any vulnerabilities and assigned to relevant teams. The product security team also triages critical vulnerabilities to assess their severity in the Databricks architecture.

Databricks has a formal release management process that includes a formal go/no-go decision before releasing code. Changes go through testing designed to avoid regressions and validate that new functionality has been tested on realistic workloads. Additionally, there is a staged rollout with monitoring to identify issues at early stages. To implement separation of duties, only our deployment management system can release changes to production, and multi-person approval is required for all deployments.

We follow the immutable infrastructure model, where systems are replaced rather than patched, to improve reliability and security by avoiding the risk of configuration drift. When new system images or application code is launched, we transfer workloads to new instances with the new code. This is true both for the control plane and the data plane (see Security Features section for more on the Databricks architecture). Once code is in production, a verification process confirms that artifacts are not added, removed or changed.

The last phase of the SDLC process is creating customer-facing documentation. Databricks docs are managed similarly to code, where the documentation is stored within the same source control system. Significant changes require technical review as well as review from the docs team before they can be merged and published.
Visit documentation →

background-image
Network access Cloud

Option to deploy into a VPC/VNet that you manage and secure. By default there are no inbound network connections to the data plane.

AWS, Azure

Private access (or private link) from user or clients to the Databricks control plane UI and APIs

AWS, Azure

Private access (or private link) from the classic data plane to the Databricks control plane

AWS, Azure

Private access (or private link) from the classic data plane to data on the cloud platform

AWS, Azure

IP access lists to control access to Databricks control plane UI and APIs over the internet

AWS, Azure, GCP

Automatic host-based firewalls that restrict communication

AWS, Azure, GCP

User and group administration Cloud

Use the cloud service provider identity management for seamless integration with cloud resources

AWS, Azure, GCP

Support for Azure Active Directory Conditional Access Policies

Azure (AWS / GCP not applicable)

SCIM provisioning to manage user identities and groups

AWS, Azure, GCP

Single Sign-On with identity provider integration (you can enable MFA via the identity provider)

AWS (Azure / GCP not applicable*)

Service principals or service accounts to manage application identities for automation

AWS, Azure, GCP

User account locking to temporarily disable a user’s access to Databricks

AWS (Azure / GCP not applicable*)

Disable local passwords with password permission

AWS (Azure / GCP not applicable*)

Access management Cloud

Fine-grained permission based access control to all Databricks objects including workspaces, jobs, notebooks, SQL

AWS, Azure, GCP

Secure API access with personal access tokens with permission management

AWS, Azure, GCP

OAuth token support

Azure, GCP

Segment users, workloads and data with different security profiles in multiple workspaces

AWS, Azure, GCP

Data security Cloud

Encryption of control plane data at rest

AWS, Azure, GCP

Customer-managed keys encryption available

AWS, Azure

Encryption in transit of all communications between the control plane and data plane

AWS, Azure, GCP

Intra-cluster Spark encryption in transit or platform-optimized encryption in transit

AWS, Azure

Fine-grained data security and masking with dynamic views

AWS, Azure, GCP

Admin controls to limit risk of data exfiltration

AWS, Azure, GCP

Data governance Cloud

Fine-grained data governance with Unity Catalog

AWS, Azure

Centralized metadata and user management with Unity Catalog

AWS, Azure

Centralized data access controls with Unity Catalog

AWS, Azure

Data lineage with Unity Catalog

Preview on AWS and Azure

Data access auditing with Unity Catalog

AWS, Azure

Secure data sharing with Delta Sharing

AWS, Azure

Workload security Cloud

Manage code versions effectively with repos

AWSAzureGCP

Built-in secret management to avoid hardcoding credentials in code

AWSAzureGCP

Managed data plane machine image regularly updated with patches, security scans and basic hardening

AWS, Azure (GCP not applicable)

Contain costs, enforce security and validation needs with cluster policies

AWSAzureGCP

Immutable short-lived infrastructure to avoid configuration drift

AWSAzureGCP

Auditing and logging Cloud

Comprehensive and configurable audit logging of activities of Databricks users

AWSAzureGCP

Databricks SQL command history logging

AWSAzure

Databricks cluster logging

AWSAzure

Security validations (Compliance) Cloud

ISO 27001, 27017, 27018 compliance

AWS, Azure, GCP

SOC 2 Type 2 report available

AWS, Azure, GCP

GDPR and CCPA compliance

AWS, Azure, GCP

PCI DSS-compliant deployments

AWS (Single Tenant only)

FedRAMP Moderate compliance

AWS coming soonAzure

FedRAMP High compliance

Azure

HIPAA-compliant deployments

AWSAzure

HITRUST

Azure

* Azure Databricks is integrated with Azure Active Directory, and Databricks on GCP is integrated with Google Identity. You can’t configure these in Databricks itself, but you can configure Azure Active Directory or Google Identity as needed.

Security Best Practices

Databricks has worked with thousands of customers to securely deploy the Databricks platform, with the security features that meet their architecture requirements. This document provides a checklist of security practices, considerations and patterns that you can apply to your deployment, learned from our enterprise engagements.

View document for AWS and GCP

Databricks Security and Trust Overview Whitepaper

The Security Overview Whitepaper is designed to provide a summary of all aspects of Databricks for security teams to quickly review.

View document

Databricks Security Documentation

Databricks includes documentation on how to operate our security features and best practices to help our customers deploy quickly and securely. The documentation is targeted primarily at teams that deploy or use Databricks.

Access documentation for AWS, GCP or Azure

Platform Architecture

The Databricks Lakehouse architecture is split into two separate planes to simplify your permissions, avoid data duplication and reduce risk. The control plane is the management plane where Databricks runs the workspace application and manages notebooks, configuration and clusters. Unless you choose to use serverless compute, the data plane runs inside your cloud service provider account, processing your data without taking it out of your account. You can embed Databricks in your data exfiltration protection architecture using features like customer-managed VPCs/VNets and admin console options that disable export.

While certain data, such as your notebooks, configurations, logs and user information, is present within the control plane, that information is encrypted at rest within the control plane, and communication to and from the control plane is encrypted in transit. You also have choices for where certain data lives: You can host your own store of metadata about your data tables (Hive metastore), store query results in your cloud service provider account, and decide whether to use the Databricks Secrets API.

Suppose you have a data engineer that signs in to Databricks and writes a notebook that transforms raw data in Kafka to a normalized data set sent to storage such as Amazon S3 or Azure Data Lake Storage. Six steps make that happen:

  1. The data engineer seamlessly authenticates, via your single sign-on if desired, to the Databricks web UI in the control plane, hosted in the Databricks account.
  2. As the data engineer writes code, their web browser sends it to the control plane. JDBC/ODBC requests also follow the same path, authenticating with a token.
  3. When ready, the control plane uses Cloud Service Provider APIs to create a Databricks cluster, made of new instances in the data plane, in your CSP account. Administrators can apply cluster policies to enforce security profiles.
  4. Once the instances launch, the cluster manager sends the data engineer’s code to the cluster.
  5. The cluster pulls from Kafka in your account, transforms the data in your account and writes it to a storage in your account.
  6. The cluster reports status and any outputs back to the cluster manager.

The data engineer doesn’t need to worry about many of the details — they simply write the code and Databricks runs it.

Compliance

Customers all over the world trust us with their most sensitive data. Databricks has put in place controls to meet the unique compliance needs of highly regulated industries.

Due diligence package

For self-service security reviews, you can download our due diligence package. It includes common compliance documents such as our ISO certifications and our annual pen test confirmation letter. You can also reach out to your Databricks account team for copies of our Enterprise Security Guide and SOC 2 Type II report.

Download

Certifications and standards

background-image

Overview

Databricks takes privacy seriously. We understand that the data you analyze using Databricks is important both to your organization and your customers, and may be subject to a variety of privacy laws and regulations.

To help you understand how Databricks fits into regulatory frameworks that may apply to you, we’ve prepared Privacy FAQs and documents that transparently set forth how Databricks approaches privacy.

background-image

Help investigate a security incident in your Databricks workspace

If you suspect your workspace data may have been compromised or you have noticed inconsistencies or inaccuracies in your data, please report it to Databricks ASAP.

Report SPAM or suspicious communications originating from Databricks

If you have received SPAM or any communications that you believe are fraudulent, or that have inappropriate, improper content or malware, please contact Databricks ASAP.

Understand an internal vulnerability scanner report against a Databricks product

For help analyzing a vulnerability scan report, please raise a support request through your Databricks support channel, submitting the product version, any specific configuration, the specific report output and how the scan was conducted.

Understand how a CVE impacts a Databricks workspace or runtime

If you need information on the impact of a third-party CVE, or a Databricks CVE, please raise a support request through your Databricks support channel, and provide the CVE description, severity and references found on the National Vulnerability Database

Report a bug in Databricks products or services

If you have found a reproducible vulnerability in any of our products, we want to know so that we can resolve it. Please join our public bug bounty program facilitated by HackerOne.

background-image

HIPAA

HIPAA is a US regulation which includes a variety of protections for protected health information. Databricks has HIPAA-compliant deployment options.

Supported Clouds

Regions

Azure Multi-Tenant — All regions

AWS Single Tenant — All regions

AWS Multi-Tenant — us-east-1, us-east-2, ca-central-1, us-west-2