Why an academic health system fixed data at the source before betting on AI
by Aly McGue
Healthcare may be one of the greatest beneficiaries of AI. Few industries generate as much data, and few have as much to gain from extracting insight from it. But the gap between generating data and actually using it to improve care, accelerate research, and run operations more efficiently remains enormous in most health systems. The ones closing that gap are starting with data, not models.
NYU Langone Health, a leading academic health system, serves the greater New York area through patient care, medical research, and medical education. NYU Langone utilizes Databricks for its unified data and AI platform, having recently retired its on-premises data lake and is now migrating its enterprise data warehouse. The institution has built a broad community of clinicians, analysts, scientists, and members of the corporate workforce using the platform across care delivery, operations, and research.
Nader Mherabi, the Chief Digital and Information Officer at NYU Langone Health, has led the institution's data strategy well before the current wave of AI, building the foundations for a data-driven health system. In 2017, he recognized that the quality of NYU Langone's data collection and created an opportunity to push further with emerging AI capabilities.
The metaphor Nader returned to: If you want clean water, fix the pipes. Don't try to filter it at the end.
Aly McGue: NYU Langone is a metrics-driven organization with a mature data stack. When you already have a functional warehouse and data lake, what is the 'missing piece’ that makes a move to a modern data platform necessary?
Nader Mherabi: Our path was a little different from some institutions. We've always been a highly data-driven, metrics-driven organization. We already had unified data in a data lake and an enterprise data warehouse, even in the traditional stack. So, the lift to a modern platform was easier for us than it might be for others.
But the imperative was clear. Back in 2017, we recognized that the potential of AI, even at that very early stage, meant we needed to modernize our data stack. It's one thing to build models. It's another thing to run them 24/7 in a safe, reliable way. We needed a platform that could help us realize our ambitions around patient quality, safety, efficiency, and medical research, and that could grow with us as the technology evolves.
One guiding principle we established over a decade ago is that if you really want high-quality data in your intelligence layer, you have to fix it at the transactional systems first. It's like water coming through the pipes. If you have clean water at the source, you don't have to keep filtering it at the end. Filtering dirty water is expensive. So, the goal should always be clean water first. Some things you'll still have to filter along the way, but the principle should be to get it right upstream.
Aly: How has the discipline of fixing data at the transactional level transformed the actual utility of your data layer?
Nader: Years ago, we had many systems with patient data scattered across multiple locations without unified identifiers. That's a huge challenge for data quality, and it limits what you can do with it. Part of our approach was to invest in common transactional platforms: One electronic health record and one ERP system. As we brought in new practices or hospitals, we invested in bringing everyone onto common platforms and then created guiding principles for data.
For example, we would never map data in the data warehouse layer. We always try to fix it at the source. We mastered the systems and the data so we know that this is the authoritative source for patient data, this is the source for financial data, this is the source for operational data. Once you do that, your data platform becomes much more meaningful. People can crosswalk data, which is critical in healthcare. Take a patient at the center: You need to connect their care data to what clinical trials are available, all the way through to the financial side, to specimens collected during surgery and where they physically sit. If you don't have that mapping, you're missing an enormous capability. The guiding principle that makes it possible is always the same: Fix it upstream.
Aly: In healthcare, the stakes for data accuracy are high. How does a unified data foundation prevent the 'conflicting metrics’ debate between different departments, and why is that trust so critical when moving toward agentic AI systems?
Nader: It's huge. Even before AI, the gains from unified data were enormous. When your data is unified, you can create better metrics, and different sides of the business aren't coming in saying, "That number doesn't make sense." If your data isn't unified, your metrics will never line up.
With AI, of course, the stakes go up. If you don't have great data, you're not going to have great AI. Performance depends on data quality. And then there's the real-time dimension. Getting people's insight at the right time and the right place is what matters.
Aly McGue: Once you have unified data, the next challenge is making it discoverable and trustworthy at scale. How does data governance fit into that?
Nader Mherabi: It's fundamental. You need a catalog to operate on data and AI models. We use Unity Catalog, and we're continuing to push it further.
But the investment is not just in the tool, it's the strategy around it. You need to define your master data sources, decide who owns each part of the catalog, and then carefully consider how you expose it to the broader community so people can find what they need without duplicating work. It's one thing to have an enormous data program. It's another for people to actually find the right data within it. If you're adopting a platform like this, I would always suggest getting the catalog right from the start. It underpins everything else.
Aly McGue: A unified platform only delivers value if people across the institution actually use it. How have you approached building that community beyond the data engineering team?
Nader: When you invest in a platform like this, you have to optimize the investment. For us, that means evangelizing what it can do across the institution. The goal is to become a learning health system, one that learns from every patient interaction and feeds that insight back into practice. That only works if the community using the platform extends well beyond IT. We've built a broad user base of clinicians, analysts, and scientists, all working within proper access controls, and we've invested in literacy programs and training to make sure people across care delivery, operations, and research can take advantage of it. Getting IT on the platform is a given. The real measure of success is whether the rest of the institution can use it, too.
Aly: In a high-acuity environment like an Emergency Room, 'insight the day after’ is effectively useless. What are the architectural requirements for a platform to move from retrospective reporting to real-time clinical decision support that can actually prevent a misdiagnosis?
Nader: In care delivery, the impact is direct. We have models running in the emergency room that look for certain critical conditions and provide decision support in front of clinicians. The goal is to make sure that if a patient is being discharged, the system can flag: did you identify this diagnosis? Did you look at this? Because what we don't want is a patient leaving the emergency room with a condition that could have severe consequences if it's missed.
We all hear about cases at other institutions where a misdiagnosis leads to a bad outcome. We want real-time models that continuously run and provide the best advice to clinicians. Not replacing their judgment, but saying, "Hey, you may have overlooked this. Please take another look." For that to work, the models need real-time data. And that requires the data platform to support real-time feeds so the models can operate on current information and provide just-in-time insight.
Aly: How has AI transformed how your organization approaches analytics and BI strategy?
Nader: I believe analytics is three layers. First, you do have to provide some basic visualization. You can't just say, "What do you want to look at?" People need some structured starting points. Second, you add the conversational layer, tools like Genie, where people can get curious and ask deeper questions. And third, you need to be able to deliver the answer in different forms depending on the user: Sometimes it's a direct fact, sometimes it's a visualization, and sometimes it's a few numbers on a screen.
What's powerful about where we are now is that for the first time in human-machine history, we can actually talk to machines in human terms, the way you'd ask a colleague. That clearly has a place. But I'd advise everyone to think about where it makes sense and to what degree. Don't replace your visualization entirely. Add the conversational layer so people can get curious, ask more questions, and help themselves in a simple way.
Aly: The pace of AI development can be paralyzing for many leaders. How do you balance the need for a stable long-term strategy with the reality that the technology might look completely different six months from now?
Nader: First, accept the unpredictability of AI. You're going to wake up tomorrow, and something new will have arrived. The tools and technology will continue to change. Don't get hung up on that. Find good partners who can grow their platform as part of the change, and focus on value creation.
Whether you're delivering safe, high-quality care, improving operational efficiency, or making the patient experience better, that's the value. Go after it with the capabilities that exist today, and then continue to evolve. And the other piece is to educate yourself. Part of what makes people hesitant is that they don't feel like they understand what's happening. You have to stay in the know as best you can, because that helps you make better decisions as the market evolves, especially at the pace it's moving now.
NYU Langone's early and intentional approach is the key takeaway from this discussion. The clean water metaphor captures something important. Organizations that invest in filtering dirty data downstream are always playing catch-up. The ones that fix it at the transactional layer, even though it takes longer and costs more upfront, build a foundation that every subsequent investment, from analytics to AI to real-time clinical decision support, can reliably build on. In a setting where the stakes are patient safety, that discipline isn't optional.
To hear from industry leaders and define your path to operationalizing AI, download the Economist Enterprise report, “Making AI Deliver.”
Subscribe to our blog and get the latest posts delivered to your inbox.