Security and Trust Center
Your data security is our priority
We know that data is one of your most valuable assets and always has to be protected — that’s why security is built into every layer of the Databricks Lakehouse Platform. Our transparency enables you to meet your regulatory needs while taking advantage of our platform.
Trust comes through transparency. See how we secure the platform through industry-leading practices including penetration testing, vulnerability management and secure software development to protect the Databricks Lakehouse Platform.Learn more
We provide comprehensive security capabilities to protect your data and workloads, such as encryption, network controls, auditing, identity integration, access controls and data governance.Learn more
Customers all over the world and across industries rely on the Databricks Lakehouse Platform. We have the certifications and attestations to meet the unique compliance needs of highly regulated industries.Learn more
— Tom Mulder, Lead Data Scientist at Wehkamp
— Credit Suise case study
Our trusted platform is built by embedding security throughout the software development and delivery lifecycle. We follow rigorous operational security practices such as penetration testing, vulnerability assessments and strong internal access controls. We believe transparency is the key to winning trust — we publicly share how we operate, and work closely with our customers and partners to address their security needs. We have offerings for PCI-DSS, HIPAA and FedRAMP compliance, and we are ISO 27001, ISO 27017, ISO 27018 and SOC 2 Type II compliant.
Beyond the documentation and best practices that you will find in our Security and Trust Center, we also provide a contractual commitment to security written in plain language to all our customers. This commitment is captured in the Security Addendum of our customer agreement, which describes the security measures and practices that we follow to keep your data safe.
Detecting and quickly fixing vulnerable software that you rely on is among the most important responsibilities of any software or service provider. We take this responsibility seriously and share our remediation timeline commitments in our Security Addendum.
Internally, we have automated vulnerability management to effectively track, prioritize, coordinate and remediate vulnerabilities in our environment. We perform daily authenticated vulnerability scans of Databricks and third-party/open-source packages used by Databricks, along with static and dynamic code analysis (SAST and DAST) using trusted security scanning tools, before we promote new code or images to production. Databricks also employs third-party experts to analyze our public-facing sites and report potential risks.
Databricks has funded a Vulnerability Response Program for monitoring emerging vulnerabilities before they’re reported to us by our scanning vendors. We accomplish this using internal tools, social media, mailing lists and threat intelligence sources (e.g., US-CERT and other government, industry and open-source feeds). Databricks monitors open vulnerability platforms, such as CVE Trends and Open CVDB. We have an established process for responding to these so we can quickly identify the impact on our company, product or customers. This program allows us to quickly reproduce reported vulnerabilities and resolve zero-day vulnerabilities.
Our Vulnerability Management Program is committed to treating Severity-0 vulnerabilities, such as zero days, with the highest urgency, prioritizing their fix above other rollouts.
Penetration testing and bug bounty
We perform penetration testing through a combination of our in-house offensive security team, qualified third-party penetration testers and a year-round public bug bounty program. We use a mixture of fuzzing, secure code review and dynamic application testing to evaluate the integrity of our platform and the security of our application. We conduct penetration tests on major releases, new services and security-sensitive features. The offensive security team works with our incident response team and security champions within engineering to resolve findings and infuse learnings throughout the company.
We typically perform 8-10 external third-party penetration tests and 15-20 internal penetration tests per year, and all material findings must be addressed before a test can be marked as passed. As part of our commitment to transparency, we publicly share our platform-wide third-party test report in our due diligence package.
Our public bug bounty program, facilitated by HackerOne, allows a global collective of cybersecurity researchers and penetration testers to test Databricks for security vulnerabilities. Some of the key decisions we’ve made to make the program successful include:
- Encouraging an engaged community of hackers to be active on our program by providing transparency to our HackerOne program statistics such as response rate and payouts
- Promptly responding to bug bounty submissions, with an average time-to-bounty under a week
- Performing variant analysis on every valid submission to identify alternative ways that an exploit may be used, and verifying 100% of fixes
- Adding bonuses that drive attention to the most important areas of the product
We work hard to make our program successful and to learn from each submission. Our open and collaborative approach to our bug bounty program has resulted in over 100 security researchers being thanked for over 200 reports. Thank you all for helping us keep Databricks secure!
We want our customers to have confidence in the workloads they run on Databricks. If your team would like to run a vulnerability scan or penetration test against Databricks, we encourage you to:
- Run vulnerability scans on data plane systems located inside of your cloud service provider account.
- Run tests against your code, provided that those tests are entirely contained within the data plane (or other systems) located in your cloud service provider account and are evaluating your controls.
- Join the Databricks Bug Bounty program to access a dedicated deployment of Databricks to perform penetration tests. Any penetration test against our multi-tenant control plane requires participation in the program.
Security investigations and incident response
We use Databricks as our SIEM and XDR platform to process over 9 terabytes of data per day for detection and security investigations. We ingest and process logs and security signals from cloud infrastructure, devices, identity management systems, and SaaS applications. We use structured streaming pipelines and Delta Live Tables to identify the most relevant security events using a data-driven approach and statistical ML models to generate novel alerts, or to correlate, de-duplicate and prioritize existing alerts from known security products. We model our runbooks on adversary tactics, techniques and procedures (TTP) tracked using the MITRE ATT&CK framework. Our security investigations team uses collaborative Databricks notebooks to create repeatable investigation processes, continually evolve incident investigation playbooks, and perform threat hunting against more than 2 petabytes of historic event logs handling complex searches over unstructured and semi-structured data.
Our incident response team stays up to date and helps Databricks prepare for incident management scenarios by:
- Participating in industry-reputed courses from vendors like SANS and attending security conferences like fwd:cloudsec, Black Hat, BSides, RSA
- Performing regular tabletop exercises with executive leadership and internal teams to practice security response scenarios relevant to Databricks products and corporate infrastructure
- Collaborating with engineering teams to prioritize platform observability to allow effective security detection and response
- Regularly updating hiring and training strategies based on an evolving incident response skills and capabilities matrix
We apply strict policies and controls to internal employee access to our production systems, customer environments and customer data.
We require multifactor authentication to access core infrastructure consoles such as the cloud service provider consoles (AWS, GCP and Azure). Databricks has policies and procedures to avoid the use of explicit credentials, such as passwords or API keys, wherever possible. For example, only appointed security team members can process exception requests for new AWS IAM principals or policies.
Databricks employees can access the production system under very specific circumstances (such as emergency break-fix). Access is governed by a Databricks-built system that validates access and performs policy checks. Access requires that employees are connected to our VPN, and authenticate using our single sign-on solution with multifactor authentication.
Learn more →
Our internal security standards call for the separation of duties wherever possible. For example, we centralize our cloud identity provider’s authentication and authorization process to separate authorizing access (Mary should access a system) from granting access (Mary can now access a system).
We prioritize least privilege access, both in internal systems and for our access to production systems. Least privilege is explicitly built into our internal policies and reflected in our procedures. For example, most customers can control whether Databricks employees have access to their workspace, and we programmatically apply numerous checks before access can be granted and automatically revoke access after a limited time.
Learn more →
Secure software development lifecycle
Databricks has a software development lifecycle (SDLC) that builds security into all design, development and production steps — from feature requests to production monitoring — supported by tooling designed to trace a feature through the lifecycle. We have automatic security scanning and automated vulnerability tracking of systems, libraries and code.
Databricks leverages an Ideas Portal that tracks feature requests and allows voting both for customers and employees. Our feature design process includes privacy and security by design. After an initial assessment, high-impact features are subject to a security design review from the product security team in association with the security champions from engineering, along with threat modeling and other security-specific checks.
We use an agile development methodology that breaks up new features into multiple sprints. Databricks does not outsource the development of the Databricks platform, and all developers are required to go through secure software development training — including the OWASP Top 10 — when hired and annually thereafter. Production data and environments are separated from development, QA and staging environments. All code is checked into a source control system that requires single sign-on with multifactor authentication and granular permissions. Code merges require approval from the functional engineering owners of each area impacted, and all code is peer reviewed. The product security team manually reviews security-sensitive code to eliminate business logic errors.
We use best-of-breed tools to identify vulnerable packages or code. Automation in a preproduction environment runs authenticated host and container vulnerability scans of the operating system and installed packages, along with dynamic and static code analysis scans. Engineering tickets are created automatically for any vulnerabilities and assigned to relevant teams. The product security team also triages critical vulnerabilities to assess their severity in the Databricks architecture.
We run quality checks (such as unit tests and end-to-end tests) at multiple stages of the SDLC process, including at code merge, after code merge, at release and in production. Our testing includes positive tests, regression tests and negative tests. Once deployed, we have extensive monitoring to identify faults, and users can get alerts about system availability via the Status Page. In the event of any P0 or P1 issue, Databricks automation triggers a “5 whys” root cause analysis methodology that selects a member of the postmortem team to oversee the review. Findings are communicated to executive leadership, and follow-up items are tracked.
Databricks has a formal release management process that includes a formal go/no-go decision before releasing code. Changes go through testing designed to avoid regressions and validate that new functionality has been tested on realistic workloads. Additionally, there is a staged rollout with monitoring to identify issues early. To implement separation of duties, only our deployment management system can release changes to production, and multiperson approval is required for all deployments.
We follow an immutable infrastructure model, where systems are replaced rather than patched to improve reliability and security and to avoid the risk of configuration drift. When new system images or application code is launched, we transfer workloads to new instances that launch with the new code. This is true both for the control plane and the data plane (see the Security Features section for more on the Databricks architecture). Once code is in production, a verification process confirms that artifacts are not added, removed or changed without authorization.
The final phase of the SDLC process is creating customer-facing documentation. Databricks docs are managed much like our source code, and documentation is stored within the same source control system. Significant changes require both technical and docs team review before they can be merged and published.
Visit documentation →
Security Policy and Communication Details
Databricks follows RFC 9116, ISO/IEC 30111:2019(E), and ISO/IEC 29147:2018(E) standards for security vulnerability handling and communications. For details on our secure communications and PGP signature, please refer to our security.txt file.
|User and group administration||Cloud|
|Auditing and logging||Cloud|
|Security validations (Compliance)||Cloud|
* Azure Databricks is integrated with Azure Active Directory, and Databricks on GCP is integrated with Google Identity. You can’t configure these in Databricks itself, but you can configure Azure Active Directory or Google Identity as needed.
Databricks has worked with thousands of customers to securely deploy the Databricks platform, with the security features that meet their architecture requirements. This document provides a checklist of security practices, considerations and patterns that you can apply to your deployment, learned from our enterprise engagements.
Security Workspace Analysis Tool (SAT) monitors your workspace hardening by reviewing the deployments against our security best practices. It programmatically verifies workspaces using standard API calls and reports deviations by severity, with links that explain how to improve your security.
The Security Overview Whitepaper is designed to provide a summary of all aspects of Databricks for security teams to quickly review.
The Databricks shared responsibility model outlines the security and compliance obligations of both Databricks and the customer with respect to the data and services on the Databricks platform.
The Databricks Lakehouse architecture is split into two separate planes to simplify your permissions, avoid data duplication and reduce risk. The control plane is the management plane where Databricks runs the workspace application and manages notebooks, configuration and clusters. Unless you choose to use serverless compute, the data plane runs inside your cloud service provider account, processing your data without taking it out of your account. You can embed Databricks in your data exfiltration protection architecture using features like customer-managed VPCs/VNets and admin console options that disable export.
While certain data, such as your notebooks, configurations, logs and user information, is present within the control plane, that information is encrypted at rest within the control plane, and communication to and from the control plane is encrypted in transit. You also have choices for where certain data lives: You can host your own store of metadata about your data tables (Hive metastore), store query results in your cloud service provider account, and decide whether to use the Databricks Secrets API.
Suppose you have a data engineer that signs in to Databricks and writes a notebook that transforms raw data in Kafka to a normalized data set sent to storage such as Amazon S3 or Azure Data Lake Storage. Six steps make that happen:
- The data engineer seamlessly authenticates, via your single sign-on if desired, to the Databricks web UI in the control plane, hosted in the Databricks account.
- As the data engineer writes code, their web browser sends it to the control plane. JDBC/ODBC requests also follow the same path, authenticating with a token.
- When ready, the control plane uses Cloud Service Provider APIs to create a Databricks cluster, made of new instances in the data plane, in your CSP account. Administrators can apply cluster policies to enforce security profiles.
- Once the instances launch, the cluster manager sends the data engineer’s code to the cluster.
- The cluster pulls from Kafka in your account, transforms the data in your account and writes it to a storage in your account.
- The cluster reports status and any outputs back to the cluster manager.
The data engineer doesn’t need to worry about many of the details — they simply write the code and Databricks runs it.
Customers all over the world trust us with their most sensitive data. Databricks has put in place controls to meet the unique compliance needs of highly regulated industries.
Due diligence package
For self-service security reviews, you can download our due diligence package. It includes common compliance documents such as our ISO certifications and our annual pen test confirmation letter. You can also reach out to your Databricks account team for copies of our Enterprise Security Guide and SOC 2 Type II report.Download
Certifications and standards
Databricks takes privacy seriously. We understand that the data you analyze using Databricks is important both to your organization and your customers, and may be subject to a variety of privacy laws and regulations.
To help you understand how Databricks fits into regulatory frameworks that may apply to you, we’ve prepared Privacy FAQs and documents that transparently set forth how Databricks approaches privacy.
Help investigate a security incident in your Databricks workspace
If you suspect your workspace data may have been compromised or you have noticed inconsistencies or inaccuracies in your data, please report it to Databricks ASAP.
Report SPAM or suspicious communications originating from Databricks
If you have received SPAM or any communications that you believe are fraudulent, or that have inappropriate, improper content or malware, please contact Databricks ASAP.
Understand an internal vulnerability scanner report against a Databricks product
For help analyzing a vulnerability scan report, please raise a support request through your Databricks support channel, submitting the product version, any specific configuration, the specific report output and how the scan was conducted.
Understand how a CVE impacts a Databricks workspace or runtime
If you need information on the impact of a third-party CVE, or a Databricks CVE, please raise a support request through your Databricks support channel, and provide the CVE description, severity and references found on the National Vulnerability Database
Report a bug in Databricks products or services
If you have found a reproducible vulnerability in any of our products, we want to know so that we can resolve it. Please join our public bug bounty program facilitated by HackerOne.
HIPAA is a US regulation which includes a variety of protections for protected health information. Databricks has HIPAA-compliant deployment options.