Skip to main content

Read the Goldman Sachs perspective from the Goldman Developer Blog here

Read the FINOS perspective here

Amid increasing market volatility and rising geopolitical tensions, trading volumes have skyrocketed and created new data challenges for even the largest global investment banks. The most common challenges we've seen in financial services include: 1/how we can be faster in order to "stay ahead of the curve and the markets" through our research and 2/how to ensure the robustness and reproducibility of our models and resulting algorithms. Companies that are slow to uncouple themselves from legacy technologies, on-premises infrastructures, and proprietary formats are often held back by the inflexibility and limitations of their tech stacks.

As a result, various data providers and consumers within the financial services industry have combined efforts in order to establish open data standards, with the hope of simplifying data management, reducing operational costs, and automating high data governance standards that guarantees both reliability and timeliness in the transmission, acquisition, and calculation of financial data.

To foster innovation and collaboration between engineers and non-engineers, as well as address data efficiency and governance challenges in the Financial Services industry, Databricks is announcing the open source integration of Lakehouse for Financial Services with the FINOS Legend data modeling platform, originally contributed and maintained by Goldman Sachs. FINOS is the nonprofit organization and financial sector arm of the Linux Foundation, enabling mass innovation through open source technology, with members from the world's leading FSIs including Goldman Sachs, Morgan Stanley, UBS and JP Morgan. Over the past two years, a total of 197 open source contributors have pushed over 6,400 commits to the Legend codebase and submitted 2,400 Pull Requests, adding 292,000 lines of code.

Integration with the Legend data management and data governance platform enhances the impact of Databricks Lakehouse for Financial Services – an open, modern data platform that supports real-time analytics, business intelligence (BI), and powerful AI capabilities across all data types by mitigating regulatory risk using a multi-cloud environment – Databricks offer three solutions to map business processes to data pipelines and analytics:

  1. Code developed by Databricks for Legend software: Using the newly open sourced legend-delta project, Databricks demonstrates how the Legend logical modeling language can be programmatically interpreted as delta tables, helping business analysts and domain experts design, provision and operate a financial services Lakehouse with minimal development and operations overhead. Delta Tables can be created from existing legend data models, financial calculations and aggregations can be pushed down and executed through Databricks at enterprise scale and data quality rules can be enforced in real-time as new financial data become available. Additionally, with the Databricks relational connector, Legend can now integrate with Databricks databases through the comfort of the legend studio interface, reducing the gap between business users and technology practitioners.
  2. Interpret common data models into data pipelines: Common data models built using Legend ensure continuous quality control and relevance of regulatory reporting and compliance. We will demonstrate how the ISDA Common Domain Model (ISDA CDM™) integrates seamlessly with the Databricks Lakehouse environment in an upcoming technical blog post. The ISDA CDM, soon to be hosted as a FINOS open source project, is a machine-readable and machine-executable data model for derivative products, processes and calculations and serves as a blueprint for how derivatives are traded and managed across the trade lifecycle. Having a single, common digital representation of derivatives trade events and actions enhances consistency and facilitates interoperability across firms and platforms, providing a bedrock upon which new technologies can be applied.
  3. Interoperability for an open, collaborative financial services ecosystem: Ultimately, these common data models can be combined with open data protocols, enabling interoperability between and within organizations across the financial ecosystem. Over time, the simple, open and collaborative platform of Lakehouse can be embedded into the data mesh infrastructure, upholding the four key principles of domain-driven ownership of data; data as a product; self-serve data platform; and federated computational governance, with Legend acting as the facilitator of data exchange within an organization, and enabling collaboration between business units.

The benefits for financial institutions, particularly the banking and capital markets sector, include the ability to:

  • Automatically translate enterprise data models and calculations into efficient data pipelines, removing the need for engineers to code calculations and models using the Databricks connector
  • Compile Legend model into an execution plan and provide data access to financial analysts and data scientists through their environments in the format, quality and aggregation designed by domain experts
  • Provide constant data monitoring and continuous improvement of data quality through CI/CD processes

"Immediately after its open source contribution by Goldman Sachs in 2020, Legend became a cornerstone FINOS project and, through its hosted version, has powered an unprecedented amount of open data modeling with industry-wide collaboration. We are extremely excited to see members like Databricks providing open source integrations for the platform, as financial services firms have much to gain from its adoption as the potential for its use to reduce financial burdens and needless complexity is nearly unlimited," said Gabriele Columbro, executive director of FINOS.

"By integrating Legend with Databricks' Lakehouse for Financial Services, we are bringing greater transparency and interoperability to financial institutions across the industry who can now leverage common data models and open source protocols to fuel collaboration and drive business value with data," said Junta Nakai, Global Head of Financial Services and Sustainability at Databricks. "Databricks is proud to contribute to the development of the industry's leading open source data platform and we look forward to continued partnership with the teams at Goldman Sachs and FINOS."

"The code contribution from the Databricks team is a great example of the spirit of FINOS – collaboration and innovation in the financial services industry through open source software. This is in addition to meeting the ever-increasing data modeling requirements from data sourcing needs and a great example of the continued evolution and addition of partners to our open source programming," says Ephrim Stanley, VP, Data Engineering, Goldman Sachs, "Thanks to the contribution from Databricks, Legend can now integrate with Databricks databases."

As the pandemic spurred market volatility, data transparency and oversight have become top-of-mind for many financial institutions looking to make the most of their data while also staying compliant with changing regulations. Investing in technologies built on AI/ML must be an integral part of a financial institution's long-term growth strategy – one that is not only innovative to meet today's standards, but also forward-thinking and adaptable enough to meet future needs.

What's next?

Databricks continues to participate in FINOS go-to-market activities, including fine-tuning regulatory technology for open data standards and open-source technologies, and creating advisory services to support the democratization of data access and ongoing training on data and AI. For more information on Databricks Lakehouse, watch my Legend demo virtual session from our Data+AI Summit.

To learn more about FINOS, visit To read more about the Legend data modeling platform, start with these resources:

Try Databricks for free

Related posts

See all Insights posts