The Data Mesh construct has been evolving alongside other new technology patterns and present opportunities to improve how data is modeled and governed to speed up performance and transform data incrementally. This session will explore the Data Mesh fundamentals and how Avanade is helping clients move away from traditional failure modes, embracing a loosely coupled, distributed and domain-led approach.
David Baxter: Hi everyone, I’m Dave Baxter from Avanade. I’m Avanade’s global data platform modernization lead. And I’m here today with Dael Williamson. We are going to talk about an evolving trend we’ve been seeing over the last few years around two notions. One is a concept called the Data Mesh. And then with Databricks, Lighthouse and how these patterns are converging and providing clients additional value within overall governance and also from how they manage and view their data from a product based concept. Dael, maybe you can tell us a few things about what is the industry shift to the Data Mesh and what are some of the trends that we’re seeing over in Europe?
Dael Williamson: Thanks Dave. So the movement started about two years ago and it’s sort of predicated on a number of different things. One is this need for a little bit more control and self-service with your data platform. So the ability for data owners to be able to control their own data and to be able to organize and model and work with their data. Without there being these traditional failure modes that were moving people into a centralized model.
The second notion that sort of picked up, which is more related to the Data Mesh movement is this notion of data products. So the idea of publishing and owning data and organizing your data into these product sets that aligned to things that are meaningful, in your organization and not necessarily to traditional data sets. There’s different data product types that we’ve seen form. Some are business product types. Some are like process product types. You’ve got insights product types, which are derived through things like machine learning or AI and aggregate product types, KPIs, things like that.
The third element that we’re seeing is this push to have a kind of data marketplace. So the idea is, “I want to be able to consume data that has meaning, that has context. That has these relationships.” What we’re finding is these two patterns of Data Mesh and Lakehouse have helped us to start to realize this aspiration of the data marketplace.
The fourth notion is something that we’re doing a lot of. And we’ve learned this from the microservice movement around domain-driven design. Where we’re actually measuring data usage in a similar way to like how e-commerce platforms measure the performance and behavior of products.
We’re measuring data usage to help us to define the domains. The domains we get from industry models and things like that are really useful as starter points but actually, how data’s used over time starts to give us a really interesting perspective on the real domain that data belongs to. That’s really helpful in reference data, master data because then you actually see the static nature of the data.
And all of this, these platforms trends are sort of moving along with other data trends that are more broad in the market. Things like cross-industry models. Things like the patterns across the industry, this lean into industry-specific data platforms. Trust of data is becoming an incredibly important thing. We’re seeing this across all spheres of life.
And then of course, there’s something really big in Europe specifically, but it’s starting to pick up steam elsewhere in the world and this idea of security, privacy, sovereignty. That whole idea that actually, data needs to be looked after by whomever is responsible for storing that. And we’re seeing that across both platform, cloud platforms and Edge. So, that’s creating a huge amount of complexity in what we’re trying to achieve.
David Baxter: Yeah. And I think the other thing I would add Dael, the notion of self-service data. It’s been around for a while and the concept of a Data Mesh with data products is the next evolution to that, to get to a data marketplace. And ultimately, that’s what our clients are looking and seeking. Is the notion that I can go get a well-known product of data that could be a single domain, or it could be a hybrid of multiple domains with value add within the data itself.
Let’s switch subjects now. Let’s talk a little bit about the movement. The Data Mesh movement and how that’s opening up a lot of opportunities here for our clients. And also, in conjunction with thinking about, “I’m moving to the cloud and my data’s going to be in a different more modern platform.” Maybe you can talk a little bit about what is the Data Mesh?
Dael Williamson: So in the original article that the Martin Fowler group and Zhamak, who’s the original author, wrote up. The idea was looking specifically at failure modes in a number of the different data paradigms of the past. So this idea of centralized, monolithic… This idea of siloed pipelines and also, this idea of specialized resource.
Now, that had a big problem, which we’ve seen in application development for many years. Through monoliths and the movement that surfaced around that about 15 years ago was microservices. In much the same way, there’s a number of triggers in the market that are forcing this pull apart of how data needs to be organized. So that it’s closer to the action. It’s more sort of relevant.
This idea of distribution. I mean, we’re seeing it in the pandemic. Through people being globally distributed, not centrally concentrated into an office, but we’re also seeing it through other walks of life. Things like mobile IoT Edge. Needing to have things on demand closer to where they’re happening.
The self-service mindset is very centered around business being able to do more with their data, themselves. So not being reliant on IT for that. Proprietary data that’s hard to unlock, applications where data is locked in. It’s a huge problem. And we’re seeing open standards sort of push to unlock that and the cloud providers are starting to really lean into that.
Enterprise data governance is a massive problem that I think is one that we really are conscious of at Avanade and it’s one that we’re going after in a material way. We’ll talk about that during this conversation but with AI and scaling and all of these things, and the idea of being able to do things closer to the action, as I said, really important.
The push factor is actually things like privacy and security and trust and truth. You would necessarily think you would want to put that data together but actually, everything we’re doing in the world today is pulling things apart. So you need something that’s more flexible and plays to logically distribute the data in terms of an organizational structure.
David Baxter: So if we think about the Data Mesh, it’s rethinking a little bit about where data comes from and who owns data, who’s managing the data but also, how we distribute the data out. Is that a fair way to characterize it, Deal?
Dael Williamson: It is a fair way to categorize it. And something quite important is actually, it doesn’t always have to be a physical thing. It can also be a logical organizational construct. Where you’re organizing your data into logical products that have meaning to your business and logical domains that also have meaning to your business. Whether those are physically centralized or whether they’re physically distributed onto different infrastructure, that’s immaterial. But if you have that flexibility to have some of the data that’s able to sort of flex as it goes, but it’s also about ownership. And it’s also about having the ability for the domain teams to own and control their own data. That’s really where this came about and triggered a big movement.
David Baxter: Yeah, that makes sense. Let’s move on here and talk a little bit about the overall evolution of data management. Starting back from probably 15, 20 years ago. When data warehouses were making their entry point, to where we’re going now with a lot of cloud modernization and modern tools that clients are using, and how that is part of this overall equation.
Dael Williamson: So, I’ll go into a little bit. We’ve solved some of the key failure modes. So what I’ve layered in here is failure modes and also, the market forces which we just discussed. On the right, in the curve is a bit more of an exponential curve. Now, this follows a very similar narrative to what the Databricks team have been talking about for some time on the Lakehouse movement. What we’ve done is layer in a couple of other constructs that are happening in parallel. And this has been sort of trying to figure out where these different constructs play and how they can be complimentary in nature.
So the warehouse, we know roughly 30 to 40 years old. Highly centralized, predicated on things like high fixed cost, low marginal cost. “I have scarcity of computing storage. So therefore, I need things centralized because I don’t want to buy lots of servers.”
And that kicked off of a behavior, which is very much around this monolithic, couple pipeline, hyper-specialized ownership type of construct because everything was centralized. And when things are centralized, you offset the… the domain teams are able to offset responsibility. You see this a lot through how the flow of data into a warehouses sort of goes into action.
About 10 years ago, data lake came about and this was basically because there was a lot of new data. New data types, a lot of big data coming along, video, images. Nowadays, we’ve got sound, even smell data. So how does that get stored in a relational database, like the warehouse? So it can’t be. So the lake kind of was born and-
David Baxter: And the volume of the data, it just exploded, right?
Dael Williamson: Absolutely. So massive volumes of data. About five years ago, we started to see this breakaway into the hub model. I think a lot of that was driven by things like GDPR coming about but also, latency is a huge factor. If you have a global company, you don’t want all your data to be coming across from one part of the world where it’s centrally stored. So hubs started to come about.
Two years ago, we saw the rise of the Data Mesh and we saw the rise of the Lakehouse. These two movements were happening pretty much in parallel. And what we found at Avanade is they’re incredibly complimentary. There is a lot that can be done together and both have helped us to kind of surge ahead. What we’ve seen in the pandemic is the rise of the semantic knowledge graphs as a clear signal.
So, this is a very sharp sign that there’s an exponential curve happening. And if you follow the years plots, 40 years, 10 years, 5 years, 2 years, 1 year, this is massive because it tells us that things are moving faster in this space. There’s a lot of disruption coming. We don’t see each one of these being a kind of binary replacement. We are actually seeing the formation of a stack and we’re working with it in that way. They’re complimentary, there’s a lot of learnings. We’re not throwing the lake out in favor of a mesh. We’re actually figuring out how these different patterns of learnings actually join together and give us that stack because ultimately, what we’re trying to do is democratize data and have it in a really neat organizational structure that’s easy to find.
David Baxter: Yeah and lend towards data products ultimately here, right? Dael real quick, there’s a few examples that we’ve worked through with clients that kind of show the value chain. Could you highlight a few of the examples we have here?
Dael Williamson: So very early in the movement of all of this, when constructs like Delta Lake came out and Delta, I don’t even think people have realized just how massive that innovation truly is. It’s an incredible open source product out there in the market today. I think that with that coming about, and this concept of the Data Mesh and the idea of data products, we were looking at this very early, like good, probably a month or two after they came about in the financial services industry.
So big capital markets firm, global in nature, and really interested in doing something quite different. Lots of challenges, simple things like they have a wealth management division where they quite literally have to wait to run credit price exposures. So if they have the Hong Kong market and the market closes, they can’t run a credit risk exposure until the Pacific sunsets because of the coupled of nature of all the data in their risk platforms.
What we were able to do by decomposing things into more product-like function was, we were able to create these almost distributed into dependencies. What we did on top of that was actually take a lot of learning from the other converging patterns happening in the stack. So there’s a huge movement to change data catalog tooling, for example. There’s a huge movement in the industrial space around industrial IoT and digital twin. There’s a lot going on around Edge and cloud. There’s microservices which gives us an incredible amount of learning around domains and organizing logic into more smaller, composable, bounded contexts. Things like extended reality and AI have given us a huge amount of signals as to actually where the world could go.
So we’re trying not to work in a vacuum where it’s just one thing versus another, all of these things have helped us to kind of create this graph of a business, which represents a value chain. And in all value chains, we’re trying to figure out where is the action? Where’s the value, which are the points, and where’s the waste? How do we reduce the number of steps in the chain in order to maximize value, decrease time and all of those things. So we’re starting to find that we’re able to do experiments around modeling the simple… getting things simpler.
In the same space, we found other clients where they’ve got like 1400 applications in their investment banking division. That’s a lot of hops for data to go through in order to form a trader product. Now, if you can bubble the data up into more of a value chain, you can actually start to see, well, what are the necessary hops, if you’re going to do go on a transformation journey and modernize. You don’t have to take all this legacy, cauliflower architecture that’s organically grown over time with you. You can re-imagine how the flow of data needs to flow in a more streamlined and resilient way.
David Baxter: Yeah. And I think that leads into data governance as a topic. And if you have all this data and data coming from different data owners or different sources, how do you govern that? Maybe you can give us an overview of a client that we’re working with where there has been some governance thought about and introducing more of a distributed or federated paradigm of governance.
Dael Williamson: So one of the interesting things that we learned quite early in reading the original mesh article, but also looking at some of these other converging patterns, learning from a lot of the domain-driven design approaches, but borrowing from some very old ideas. So things like object-oriented programming, things like polymorphism. What we started to do was look at… And we actually went up a notch. We said, “All right, we want data products.” And in our example that I’ll talk about really quickly is, we want data products and financial services. So what we did was organized bounded context first, and we thought, let’s go with broad domains. Wealth management, investment bank, alternative data sets, alternative datasets being data from outside the organization, reference data, which let’s face it, we stuck with reference data because master data has got too many connotations. So we stuck with this is reference data. This is data that’s not going to change too often.
What we then did was model the business products that we had across the business. And we thought, “Okay, well in wealth management, we sell funds, right? We sell mortgages. So let’s model a data product group based on funds and mortgages.” And using that as a basis that, and then through object oriented kind of thinking and data product thinking, we were then able to go, “Okay, well any fund would inherit from that fund product group.” And what that naturally gave us was a couple of things. One, the business started to actually see a reflection of what they sold in the data. We were no longer talking about all the tools that make things happen behind the scenes because a lot of that was abstracted away in the self-service data platform type of idea.
So we put a user interface in front of that to hide all the different great technologies that we were using under. And Lakehouse was one of those because it gave us a lot of strength in schema enforcement. Bubbling that up, we were modeling data products against the simple markup language, which was data product groups and organizing them naturally into these domains. But the reflection that they had of the business products was what made it really powerful because it did something quite unexpected. It created a really interesting bridge between IT and business. Suddenly they were speaking the same language. Suddenly they were understanding each other. There was no longer this conversation about whether it’s SAP or whether it’s some other application out there. It became very much about what the data did as a reflection of the business. And that was a really powerful thing that we discovered.
Now, we’re doing a lot of experimenting in this type of modeling and what that’s going to look like. But these sort of things and what we’re finding is, in a more consumer centric business, it’s a little bit more predictable. But in a bit more of a manufacturing type of business, we’re leaning more into, well, okay what does the process value chain look like and how do you model that as a product? So each of these are bringing us some new ideas in what we’re effectively naming federated data governance. Where you’re effectively creating a bit more of an ownership structure.
Now, this is not radical. This is exactly how the internet works. So the internet, basically, as a simple markup language HTML. It’s got a simple interface, a browser. And through those two signals and a couple of basic rules that sit behind the scenes, they’re able to create actually, what is effectively data governance at scale. So, we were taking a lot of signals from that thinking.
David Baxter: That’s great, Dael. Now, the other concept here. I think because of putting this back into end-user or data product terminology is it unlocks the data and brings it together at a basis that’s meaningful to the business. As opposed to understanding, well, this source system, this source system, or this source system, which I think leads into, okay, we’ve got all this data, we’ve been leveraging the lighthouse, can you give us an example of the data and the metadata behind it and how that can be managed?
Dael Williamson: So, interestingly enough, this is where the rubber hits the road. What we’ve done in order to show the complimentary nature of these two patterns is we’ve got the Lakehouse and in what we call the data layer. So that notion of self-service infrastructure as a platform. The Lakehouse architecture fits really neatly into giving us a highly performant version of that. And the medallion approach that we’ve carried across is what we’re doing there. Now, being Avanade, we obviously have a lean into Azure. So we use Azure Databricks and Azure data lake store sits underneath that. And we’re using Delta formats to model the data across the bronze, silver, and gold medallions, right? Now, what that looks like in metadata land is where we’re applying Data Mesh thinking and a lot of this sort of more object oriented data modeling but we’re doing that in metadata.
And there’s a couple of reasons for that. Data’s getting a lot heavier. So video is not quick to copy across. We’re seeing the amplification of sound data and things like that, but it’s also easier to model when it’s in metadata. You don’t have to change the entire data structure. Yes, there are some quality checks and balances, but things like Delta give you a lot of very useful tools to help you to do that a lot simpler and in a more performant way.
Switching back to the modeling approach, you’re able to, we see the silver zone as a really good space for creating a almost de-normalized set up for your data products. And as I said before, we’ve got these different data product types, and we’ve got different data product groups. So data product types being your business data products and those are the ones that you’re bringing in from your data sources across the organization. Insights product types are things that are derived. And then of course, you have process data product types, which are almost more around business processes and how those are modeled.
We’re still working on that last aspect and that’s become a really interesting sort of area of exploration for us. What we then do is we create different reflections. And in the example that I’m showing you here is one that we’ve pulled from a bunch of major health care providers that we are doing this work with. Now, in this example, what we starting to see is in healthcare, you have different metrics globally. So you have the metric system, Imperial system but then there’s also different ways of modeling data. You have minimalist data synthesis versus statistical data synthesis. So these are two absolutely different ones that are adopted by different countries.
So we went, “Well, we’ll just create both. Why not?” We also have created other reflections. So using ICD-11, which is a health reporting standard. We also used an open source ontology seed camp, which unfortunately has been deprecated since, but we’re working on a new version of that ontology. And this is giving us that early start and signal of what a data marketplace could look like. All of this happening in sort of the metadata land, all using a lot of the thinking behind the Data Mesh around products and domains but adding our own sort of evolution to that in terms of how we model things, which we call our unified analytics modeling approach.
David Baxter: Great Dael. And if we talk about how do you bring this to life from a use case perspective? The key building blocks of that and the abstraction into more of functions that were performing, maybe you can just talk a little bit about that.
Dael Williamson: So we want to make this a lot more open to our clients. And we’ve kind of gone with this idea of, well, the data infrastructure is a platform. We want to provide that as a service to our clients so that they can actually do a lot more of the data, the metadata modeling of data and they can do a lot of their own more democratized data management and ownership, which fits into that federated data governance model.
So it’s almost providing the tools to businesses behind a very simple interface that allows them to create their data products from datasets that they register and manage. We also want to democratize the governance, so create that whole ownership structure. And then, what we’re leaning into at the moment is how do we create more data marketplace type of signals? And all of this is all packaged into something that we’re calling the use case factory, because we want to do these in use cases to get that sort of evolutionary and sort of incremental programmatic data platform evolution that the cloud provides us.
David Baxter: Thank you. And I would say this is a journey, right? You’re not going to lay down a Data Mesh and map your entire organization in one fell swoop. So this leads us towards concluding here. Avanade is here to help. We provide data architecture, assessments, and guidance. We can help with your users and usage profiling, which lead towards how can a Data Mesh help? We can help with the data value chain, which we talked about, which is end to end mapping of data sources and the value back to the business. And then also, if there’s innovation, that’s required to show how a Data Mesh can fit in with your organization, we’ve got services around that. And with that, I would thank everyone and enjoy the Data and AI 2001 Summit and Dael, any final words here?
Dael Williamson: No, it’s all fun. And we’re loving every minute of the journey and we’d love it if popped us a note and just asked a few questions. It’d be great to network and compare notes.
David Baxter: Thanks everyone.
Dael is Avanade’s Data & AI European CTO and leads our Global Data Platform COE Capability. His focus areas are working on growing key accounts in Europe and he leads a team working on data-driven u...
Dave is Avanade’s Global Data & AI Data Platform Modernization (DPM) Offering Lead. His focus is driving Avanade’s go-to market strategy and execution for our DPM Offering and sub-offerings. Dave...