This year’s Data + AI Summit was groundbreaking overall from the quality of keynote speakers to the game-changing product news. One of the most exciting additions were our new hybrid industry tracks with sessions and forums for attendees across six of the largest industries at Databricks, including Public Sector!
In case you missed the live event, I’m excited to share important product announcements and highlights of the industry program. Our sessions, which are now on-demand, feature Databricks employees, customers, and partners sharing their views of the Lakehouse for Public Sector and why it has been a key component for government agencies looking to modernize their data strategy to deliver more insights and support the mission of government.
Public Sector Forum
For our government attendees, the most exciting part of Data + AI Summit 2022 was the Public Sector Forum – a two-hour event that brought together leaders from across all segments of government to hear from peers about their data journey.
In his keynote, Databricks VP of Federal, Howard Levenson, shared an overview of the lakehouse and how it delivers on the promise of both the Federal Data Strategy and the DoD Data Decrees.
In a fireside chat with CDC Chief Data Officer, Alan Sim and CDC Chief Architect, Rishi Tarar, attendees learned about the agency’s COVID-19 vaccine rollout and the challenges they addressed by providing near real-time insight to the public, hospitals and state and local agencies. The CDC was also announced as the winner of the 2022 Data Democratization Award for the work they did to support the vaccine rollout, and their work with state and local agencies and medical partners to monitor the spread and treatment of COVID-19.
The forum included an executive panel featuring Fredy Diaz, Analytics Director at the USPS Office of the Inspector General, and Dr. John Scott, Acting Director of Data Management and Analytics at the Veterans Health Association, who discussed their agency adoption of the lakehouse and the impact it’s had on their mission.
Concluding the session, Cody Ferguson, Data Operations Director at DoD Advana and Brad Corwin, Chief Data Scientist at Booz Allen Hamilton, shared an in-depth overview of the DoD Advanced Analytics Platform, Advana,and the capabilities it has delivered to the Department of Defense.
All sessions are now available on our virtual platform. Here are few you don’t want to miss:
LA County, Department of Human Resources – How the Largest US County is Transforming Hiring with A Modern Data Lakehouse
US Air Force – Safeguarding Personnel Data at Enterprise Scale
Veterans Affairs – Cloud and Data Science Modernization with Azure Databricks
Deloitte – Implementing a Framework for Data Security at a Large Public Sector Agency
State of CA, CalHEERS – Data Lake for State Health Exchange Analytics Using Databricks
Databricks Announcements That Will Transform the Public Sector
While much has been written about the innovations shared by Databricks at this year’s Data + AI Summit, I thought I would provide a quick recap of the news that is particularly exciting for our government customers:
Data Management and Engineering
Delta Lake 2.0 – now fully open source.
This announcement is extremely relevant to our Public Sector customers. Both the DoD Data Decrees and the Federal Data Strategy stress the importance of choosing open source solutions for the Public Sector; by taking this step, Databricks further demonstrates its commitment to developing a lakehouse foundation that is secure, open, and interoperable. Government customers can be sure that:
- Your data is in an open storage format in YOUR object store
- Your code is managed via CI/CD and lives in YOUR GitHub repo
- Your applications leverage open source APIs
- There is no code or data lock-in. We lock you in with value:
- The infrastructure savings of running your application faster and turning off your cloud compute sooner
- The productivity gains of leveraging our platform to do your development and production work
- The mission outcomes that you can unlock, with a very quick time to value
Delta Live Tables introduces enhanced Auto Scaling. This is going to be a game changer for our Public Sector customers, many of whom have asked for the ability to optimize their cluster utilization to reduce infrastructure costs in an automated way without requiring manual intervention. This combines the two major things that will improve the speed at which our public sector customers can build pipelines to ingest and curate their data, but do it in the most cost-effective way without manual tuning.
The information on Project Lightspeed shared at the conference is incredibly relevant to our public sector customers who have seen a significant increase in the need to gain insight into streaming data in real-time. With use cases spanning every segment of our government from visa processing and supply chain management to electronic health records and postal delivery, the combined power of Delta Live Tables (DLT) and Structured Streaming holds great potential for the public sector. In addition, the focus on leveraging streaming data insight at PB scale volumes enables government agencies to mitigate cyber threats and meet the requirements as laid out in OMB M 21-31. All in all, the ease of use and flexibility of this solution are unmatched and we’re excited to offer this to our Public Sector customers.
Governance and Data Sharing
Delta Sharing is now GA. Delta Sharing is a phenomenal technical solution to enable some amazing outcomes for the government. Intergovernmental data sharing has become more critical than ever, as highlighted by the COVID-19 pandemic most recently. In order to address complex challenges that require the collaboration of multiple Federal agencies, state and local governments, and commercial partners, it is critical that government agencies have a way to securely share data to achieve outcomes that will benefit all constituents.
The announcement of Cleanrooms provides an opportunity for the government as agencies begin to share data more openly. The win is the ability to share data across agencies without sacrificing data ownership and data governance, ultimately leading to better mission outcomes.
Also shared were updates around Unity Catalog, which address the number one goal of many Federal CDOs today – the need for a well-cataloged and governed data platform. In addition, many of our catalog partners will be able to take advantage of Unity’s existing API standards to leverage governance on top of the lakehouse. Because Public Sector customers care particularly about data lineage, they will celebrate having a greater understanding of the data sources that make up reports and tables.
Data Science and Machine Learning
Lastly, we announced MLflow 2.0, which includes MLFlow Pipelines,.a significant advantage for public sector data teams when they need to operationalize a model. MLflow Pipelines provides a structured framework that enables teams to automate the handoff from exploration to production so that ML engineers no longer have to juggle manual code rewrites and refactoring. MLflow Pipeline templates scaffold pre-defined graphs with user-customizable steps and natively integrate with the rest of MLflow’s model lifecycle management tools. Pipelines also provide helper functions, or “step cards”, to standardize model evaluation and data profiling across projects. The net of this is that a Public sector organization can put a model into production significantly faster.
Beyond these featured announcements, there was other exciting news about Databricks Marketplace and Serverless Model Endpoints. I encourage you to check out the Day 1 and Day 2 Keynotes to learn more about our product announcements!