CUSTOMER
STORY

Powering faster, scalable threat detection, hunting and incident response

<1 minute

To query 21+ billion security events

12x

Longer instant access to key security logs for response and hunting

80%

Reduction in operational ingest costs managing the data lake

As a global leader in team collaboration software, Atlassian supports over 300,000 customers who rely on its cloud platforms to scale their work every day. Security, data and AI are foundational to how the company operates at global scale. As Atlassian accelerates its leadership in AI and cloud services, we are optimizing our security infrastructure to maintain peak operational efficiency. This proactive modernization enhances our visibility into complex data patterns while ensuring a cost-effective, high-performance foundation for our next generation of products. Project Banyan marked a turning point. By building a modern Security Lakehouse on the Databricks Platform, Atlassian moved from reactive, tool-bound security operations to an open, scalable foundation designed for long-horizon investigations, high-volume analytics and AI-driven detection. What was once operationally impractical—querying billions of events interactively, retaining months of security telemetry and experimenting with machine learning-driven detection—is now part of Atlassian’s evolving security workflow. The result is a more flexible and sustainable foundation that strengthens today’s security posture while preparing the organization for increasingly complex threats.

The limits of a legacy SIEM for modern security needs

Atlassian is a global leader in team collaboration software, powering products like Jira and Confluence that millions of teams rely on to plan, build and scale their work. As Atlassian continues to advance its core strategic priority of cloud and AI innovation, data has become central not only to product development, but also to securing the platform itself. To support this evolution, Atlassian launched Project Banyan, a multi–business unit initiative focused on modernizing how security data is ingested, stored and analyzed. The goal was to build a next-generation Security Lakehouse capable of supporting the future of AI-driven detection, long-term retention and large-scale threat hunting.

At the center of this initiative was a fundamental shift away from traditional SIEM architectures toward an open, scalable data foundation that supports Atlassian’s growing cloud footprint and AI ambitions. As Niels Heijmans, Chief Security Architect at Atlassian, explained, “Project Banyan was a long time coming. We were on a legacy SIEM solution that stored the data in proprietary storage mechanisms and only had 30-day retention. So we could only look back to all the log records of a source for 30 days live and otherwise had to rehydrate them from cold storage which cost time and money.”

While effective for basic log aggregation and alerting, their SIEM increasingly constrained Atlassian’s ability to operate at scale and move toward more proactive, intelligence-driven security. Data retention and scalability were immediate challenges. The legacy platform limited historical visibility, making long-horizon investigations and advanced threat hunting difficult. As data volumes grew, those constraints became more pronounced. “We were also under pressure from the licensing model, which charged per gigabyte ingested. Our data volumes were driving up license consumption,” added Niels.

Beyond cost, the platform struggled to handle the sheer volume of data generated by Atlassian’s cloud environment. That limitation came to a head during a major security incident. According to Niels, “The ultimate catalyst was a security incident where we had to process close to a petabyte of data. As we were rehydrating our logs, the legacy SIEM simply couldn’t do it. It would just fall over.” That moment clarified that the team needed an architecture built for investigations at scale, not just ongoing monitoring. In the middle of that incident, Atlassian was forced to change course. “During that incident, we had to switch gears and go to Databricks. Without Databricks, we wouldn’t have been able to rehydrate and analyze the data.”

In addition to scalability challenges, Atlassian faced structural limitations tied to proprietary tooling. The legacy SIEM relied on a vendor-specific query language, which constrained both flexibility and future readiness. “We wanted to prevent being locked in again,” said Niels. “We want to be on an open data model. The data has to be ours.” That lock-in extended beyond querying. It limited collaboration and made skills less transferable, creating friction for teams trying to build more advanced analytics.

Most critically, the platform could not support Atlassian’s move toward risk-based alerting and clustering. According to Niels, “We wanted to prevent alert fatigue, but that was not possible on our legacy SIEM. We did not have the capability to do any machine learning-based threat or anomaly detection to make more meaningful risk scored threat alerts.”

Taken together, these challenges made clear that incremental optimization of the existing SIEM would not be enough. Atlassian needed a fundamentally different foundation that could support massive data volumes, longer retention, open analytics and future AI use cases without locking the company into rigid tooling or escalating costs. Project Banyan became the vehicle for that shift, setting the stage for a new Security Lakehouse architecture designed to scale with Atlassian’s cloud-first, AI-driven future.

Building a modern Security Lakehouse on the Databricks Platform

To achieve this, Atlassian deepened its years-long strategic partnership with Databricks, which already powered multiple mission-critical data platforms across the organization, including Atlassian’s internal data lakehouse. Just recently, the company launched Atlassian Analytics for customer-facing insights on the Databricks Platform.

Project Banyan established a comprehensive architecture centered around the principles of clean data, lower operational cost and minimal vendor lock-in. Using these principles, they built a modern Security Lakehouse on Atlassian’s next-generation internal data platform, Socrates vNext, supported by Databricks. The objective was to create an open, governed and future-ready architecture. Security data would be owned by Atlassian, queried flexibly and extended over time as new analytics and AI capabilities emerged.

”With Project Banyan, we moved beyond the constraints of legacy SIEM to a modern Security Lakehouse built to scale with our cloud growth,” explained David Cross, CISO at Atlassian.

A unified, governed security data foundation

Atlassian chose to consolidate its security data onto Socrates vNext to avoid building a separate, siloed security platform. Instead, security teams leverage the same Databricks-based analytics foundation used across the company, while enforcing strict governance and role-based access controls appropriate for highly sensitive data. “With Unity Catalog, governance is enforced through centralized controls that restrict access to security data to only authorized teams. The data is sensitive because we look (amongst other things) for insider threats,” explained Niels.

This approach allowed Atlassian to scale security analytics without fragmenting its data strategy or duplicating infrastructure.

Standardizing security data with OCSF

A key architectural decision within the Security Lakehouse was Atlassian’s adoption of the Open Cybersecurity Schema Framework (OCSF). OCSF is an open, vendor-agnostic framework that standardizes how cybersecurity event data is structured across tools and platforms. By providing a common schema, it reduces custom data integration work, improves interoperability and accelerates threat detection and investigation. By standardizing security data upstream before ingestion, Atlassian eliminated inconsistencies that previously slowed analysis and increased cognitive overhead for analysts. “Now it’s cleaned, organized and efficiently stored,” said Niels. “Analysts don’t first have to understand each unique data source and clean the data before they can work with it.”

Standardization also improved storage efficiency and ensured that all downstream analytics, detections and AI use cases operate on a consistent, well-structured foundation. This 20% reduction in file size in Parquet files compared to JSON strings meant paying less for data storage.

Modern detection engineering with PySpark

With data standardized and governed, Atlassian began migrating existing detections from the legacy SIEM into Databricks as PySpark queries. This shift allowed detection logic to be expressed using open, widely adopted technologies rather than proprietary query languages. Niels outlined the benefits this provided the business, saying, “You’re not becoming a tool expert. You’re becoming a technology expert in something that’s widely used across data engineering and analysis.”

Running detections as code also unlocked collaboration patterns, including shared libraries, reusable functions and consistent query styles across teams. According to Niels, “We now have a default Python library created with all kinds of functions that people are able to reuse. That wasn’t possible before and people had to copy-paste query standards from pages.” Databricks Notebooks and collaboration spaces was essential for cross-team collaboration, as it made it easy for non-technical business users to explore data, granted they have permissions with the governance implementation of Unity Catalog.

Long-term retention and scalable analysis

By leveraging Delta Lake as the storage layer for its Security Lakehouse, Atlassian expanded its ability to retain security log data well beyond the limitations of the legacy SIEM; from 30 days to 12 months. This longer retention window gives analysts the historical depth required for complex investigations and long-horizon threat hunting. While retaining more data increases overall data volume, Atlassian views this as a necessary tradeoff to improve analyst productivity and investigative effectiveness. They applied mindful filters and aggregations and prioritize data sources carefully with a log prioritization rubric that scores based on a variety of inputs if the source is eligible for the data lake. This allows the data lake to contain signals, not noise.

Conversational incident response

To further reduce friction for analysts, Atlassian introduced a conversational incident response via AI/BI Genie. Known internally as the OCSF Genie, this chatbot allows analysts to ask natural language questions.

The result is a more inclusive and efficient incident response workflow that accelerates investigations without sacrificing rigor and provides starting point queries for the analysts to use later on in their investigations. According to Niels, “What it allows junior analysts or new folks to do is quickly explore and ask questions like, ‘I’m looking for this IP address within this date range,’ and the Genie converts that into queries.” This approach removes barriers for analysts who may not yet be fluent in SQL or PySpark, while still allowing them to learn by inspecting the generated queries.

A foundation for advanced AI-driven security

Beyond current detections and investigations, Atlassian designed the Security Lakehouse to serve as a foundation for future AI-driven security use cases. By integrating security data with Atlassian’s broader machine learning platform on Databricks, the team can experiment with more advanced approaches to threat detection.

These models focus on identifying unusual patterns, such as impossible travel or compromised credentials, enabling Atlassian to move toward a more proactive security posture. “The goal was to have an unsupervised machine learning model that generates anomaly alerts for staff account access behavior,” said Niels. “We can find anomalies to protect our business and our customers from bad actors. Databricks helped us upskill in that area.”

By processing bigger data sets with a more efficient compute cost model, Niels’ team is able to run more threat signals and unlock ML capabilities for predictive threat detection. These combined capabilities form a cohesive Security Lakehouse architecture that supports Atlassian’s immediate security needs while positioning the organization to adopt more advanced AI and automation over time.

Unlocking scalable security, AI readiness and analyst productivity

The migration to the Databricks Security Lakehouse through Project Banyan represents a strategic evolution in Atlassian’s security operations. By modernizing how security data is ingested and analyzed, Atlassian has significantly enhanced the agility with which its security teams hunt threats and operationalize detections at cloud scale.

A primary driver for Project Banyan was the move toward sub-minute interactivity for large-scale security telemetry. While legacy systems were designed for traditional workloads, the new lakehouse architecture allows security teams to interrogate massive datasets with unprecedented speed. As Niels noted, “The lakehouse architecture in Databricks allows us to query 21 billion rows in less than a minute. We’ve moved from the standard latencies of legacy SIEMs to a highly responsive, interactive environment.”

This performance leap enables a more proactive incident response posture. By retrieving complex answers in seconds, analysts can accelerate decision-making cycles, ensuring that potential risks are identified and addressed with greater precision.

Atlassian also achieved significant gains through architectural optimization. Complex detection queries that previously relied on traditional batch processing now benefit from high-performance data layouts. “Through optimizations like Z-ordering and proper partitioning, we’ve transitioned complex analytical tasks from multi-hour batch cycles to near real-time results,” added Niels.

Further acceleration via Databricks Photon has streamlined high-frequency detection workloads even further. According to Niels, “Photon reduced query times from 17 seconds to just five seconds.” This efficiency directly translates to lower latency for alerting, ensuring security analysts have the high-fidelity data they need exactly when they need it.

Project Banyan also established a more scalable and predictable economic model for security data. By collaborating with Databricks on ingestion architecture, Atlassian optimized the processing of continuous, high-volume logs. “We achieved an 80% reduction in ingestion overhead while maintaining peak performance,” said Niels.

The transition to the lakehouse also improved the predictability of security analysis costs. By moving detections and investigations to open-standard Python and PySpark workloads, Atlassian shifted away from the restrictive licensing models of proprietary tools. This allows the team to focus on high-value security signals and deep investigations without the financial volatility associated with legacy data consumption.

Just as importantly, adopting open, industry-standard technologies has streamlined operations and further upskilled the security team, reinforcing Atlassian’s commitment to an intelligence-driven security posture.

The new architecture ensures that security visibility scales alongside Atlassian’s growth. Rather than managing infrastructure constraints, the team can now ingest more diverse data sources and extend retention periods, supporting advanced analytics and a "Cloud-Trust" strategy that is built for the future.

The future: AI agents

With the ingestion and OCSF standardization phases complete, Atlassian is now executing the high-value final stages of Project Banyan: the deployment of next-generation Detection and Incident Response capabilities.

Atlassian is pioneering the use of the Databricks Platform to develop Autonomous Agents that utilize the Security Lakehouse as a high-fidelity data source. These agents are designed to accelerate detection and response efforts by applying unsupervised learning techniques to proactively identify emerging threats and anomalies. Furthermore, Atlassian has integrated its AI Assistant, Rovo, with Databricks Genie spaces. This integration empowers security staff to query complex security telemetry using natural language, enabling autonomous threat hunting and real-time alerting at a level of sophistication previously reserved for manual expert analysis.

Project Banyan optimizes Atlassian’s security architecture for future-proof scale and economic efficiency, positioning the company to harness the full potential of AI for modern enterprise security. The foundation laid by the Databricks Platform ensures that Atlassian continues to lead in security innovation, unlocking advanced insights across a growing ecosystem of over 9,000 monthly active users.

Looking ahead, Atlassian is actively evaluating Lakewatch as the next evolution of its security data strategy. As a private preview partner, Atlassian is providing design feedback to further streamline high-volume security operations. Lakewatch introduces a direct-to-lakehouse ingestion model, allowing security events to be written straight into tables. This architecture significantly enhances the efficiency of continuous data streams and leverages built-in OCSF transformations to standardize data at the point of ingestion, further accelerating downstream analytics.

Together, these capabilities represent a strategic expansion of Atlassian’s security platform. They build upon the robust foundation established through Project Banyan to support faster investigations, greater automation and future AI-driven security workflows.

Share this post

Details

Industry: Technology and Software
Use Case: Analytics and Business Intelligence, Data Warehousing, Governance and Security, Data Sharing and Collaboration
Cloud: AWS
Product: Agent Bricks / Mosaic AI, Databricks AI/BI, Delta Lake, Unity Catalog

Ready to get started?

Try Databricks for free Talk to an expert