This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise.
How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise?
•Data and analytics – Where are we?
•Why is the journey only half-way done?
•2021 and beyond – The new era of AI usage and not just build
•The requirement – event-driven, on-demand and automated analytics
•Operationalising what you build – DataOps, MLOps and RPA
•Mobilising the masses to integrate AI into processes – what needs to be done?
•Business strategy alignment – the guiding light to AI utilisation for high reward
•Agility step change – the shift to no-code integration of AI by citizen developers
•Recording decisions, and analysing business impact
•Reinforcement-learning – transitioning to continuous reward
Mike Ferguson: Hi everyone. My name is Mike Ferguson. My topic at the conference is building an artificially intelligent enterprise, and how you can maximize value from AI. That’s just a little bit about me over nearly 40 years in the industry. Now I’m an independent industry analyst, have been for about 28 years now. Prior to that, I was chief architect at Teradata and co-founder of Codd and Date with Dr. Ted Codd, the inventor of relational model, and Chris Date, author of books on relational database. Because by my firm, if you want to find more of those, a website address at the bottom of the slide.
Let’s get going. I want to talk about, first of all, where we are in data and analytics. And then look at that and then say, “What’s the real aspiration here towards being a fully data-driven enterprise?” And then how do you get from where we are now to that vision, in order to maximize the value from AI and truly become data-driven in the markets that you may operate in. Let’s, first of all, look at where we are now, and it’s not all perfect by any stretch, but if I look at where a lot of my clients are today… And it’s fair to say, they have multiple data warehouses in different parts of their value chain, maybe different warehouses built at different times of that over the last 10, 15, 20 years by different teams, in many cases, to satisfy different parts of reporting and analysis and different parts of the business. So, for example, in finance, or in the front office, in marketing and sales, or some part of your business operation, like inventory management and distribution or something like that.
So with that, that’s all fine, or to produce reports and analysis. But when it comes to trying to ask a management question, like sales versus inventory, or what’s in the sales pipeline versus book revenue, you’re probably going to have to go across multiple data marts or data warehouse to go down to that, which makes it more challenging to be able to do those kinds of reports. And if those are built by different teams over the years, which is normally the case, then chances are also that you’re going to get different identifiers and maybe different data names for the same data in each of those different data warehouses. It makes it a little bit more challenging to be able to stitch together that data if you’re a business user. But I think it’s fair to say today that a lot of my clients are in a situation where business elements using self-service BI tools, but because they have to increasingly connect to more and more data sources, they need some sort of data prepping order to stitch together that data to produce the reports and analysis they need.
And that all is sold under the label of agility. But for a lot of my clients, I don’t think it’s working out that way. For a lot of them the assumption is they just know what to do. And a lot of them don’t. It was pushing complexity onto those business users. The more and more data sources we have, and not only that, but the chances of them sharing it are pretty low, especially if different departments have gone and borrowed their own tools, which is often the case. And certainly some of my clients. Because they just can’t share metadata specifications across tools because there’s no standard for that. And so it’s not necessarily as clear-cut and as advantageous as you might think. And if add to it the hundreds, if not thousands, of data sources that are now emerging, it makes it even more challenging.
Especially now that we’re beyond transaction and master data, we’ve got machine-generated data like ClickStream and IOT data. We have human-generated data coming in from inbound email or web chat, or even voice in a contact center, males who have those opinions out on social networks or public review sites. And then you’ve got external data, including free open-government data. You can download from a broad range of governments around the world. And in addition, all of the external data that you make by in, for example in financial services, data from Bloomberg or Standard & Poor’s for example. So all of that coming into the enterprise now as people want to analyze it. And so what we’ve got is way beyond warehousing intel. I guess I would call it an analytical ecosystem, multiple platforms running different kinds of analytical workloads, all optimized for those.
And all of that includes graph analytics. It includes streaming analytics, it includes data science sandboxes to train a machine learning models and whatnot, as well as traditional data warehousing. And all of that today is on the cloud, where I think a vast majority of it is going on. But I think it’s fair to say that most companies are in a hybrid situation with some of this in the cloud and some of it on premises. But the other problem with these different platforms for analytical workloads is that they’re all operating as silos. And therefore the chances are that the data integration tools you’re using for a data warehouse are not the same as you’re using for a graph database, which are not the same as you’re using for streaming analytics or for data science sandboxes. And in fact, that’s the case in a lot of my clients at least, the number of tools that they’ve accumulated across all of these silos can get pretty large, maybe upwards of even 20-plus tools.
If you include programming languages like Python and Scala, and R are thrown into the mix. And also because there’s no standard in order to share metadata across these tools, it probably means that whoever’s using one tool has got no idea what’s being created in another. And so there’s no awareness of data that’s already been cleaned and integrated, it’s just whatever you can see within your tool set. And not only that, but this is point to point working within a silo. And so it’s often the case that the same data needed from the same data sources are needed for different silos. And so there’s significant amount of reinvention going on. The number of times that people are taking the same data and cleaning it and integrating it again and again for different analytical use cases is pretty high, and continues to be so, but not only that, but our landscape now is distributed.
We’re in a world where most companies are operating at the edge and multiple clouds and the data center, and therefore they’ve got data being ingested into all of these environments coming into cloud storage, coming into Hadoop and a data center for example, or itself on the cloud, into staging areas and data warehouses into fast rate. No SQL databases like Cassandra, for example. It’s kind of all a bit random. And therefore, the result is that what we’ve got is data across a distributor landscape. Data stores in the edge and the data center, multiple clouds, some of them are relational, some of them know SQL, cloud storage, Hadoop systems, or edge databases, all of it kind of out there. And so it’s not surprising that people are getting overwhelmed by this increasingly complex data landscape as it spreads all the way to the edge. And the chances of them finding and managing, being able to integrate the right data, is getting increasingly more difficult, let alone governing it.
And not only that though, but as the more and more demand to analyze more and more data comes in and potentially even thousands of data sources, the kind of traditional role of IT doing all of this data cleansing and integration is just disappearing as businesses now look upon centralized IT as a bottleneck and want in on the act and say, “Well, just give us our own self service data prep tools, and we’ll do it for ourselves.” And, as budgets have been more independent in different departments, that’s happened. We’ve seen an explosion of self-service data prep tools around the organization. And so different departments have got all of these tools, not necessarily the same one by the way, trying to access this distributed landscape. And so naturally when you get into this situation, a fairly ungoverned environment, you end up with a bit of a Wild West going on.
A so I see this a lot in my clients, and it’s becoming increasingly more difficult as the number of legislations and regulations that we have to comply with around the world continue to grow. And as one of my clients said, when we did a review for them and came up with what’s going on in their organization, she said, “Everyone’s blindly integrating data with no attempt to show what they create.” Pretty chaotic situation. Now you say, looking back at the silos before, “How many silos do we have now, if everyone’s using their own self-service data prep tools? And how many times are we taking the same data from the same sources and preparing it again and again?” And so from a consistency perspective it’s not surprising that people are now saying, “Hang on a second. Are we getting garbage in garbage, out here as more and more people to prepare the same data for each of their own silos, with our own self-service data preparation tools?” And we’ve got multiple self-service data preparation tools around the enterprise.
So I guess from one of my clients, the current situation is they’re looking at the picture on the left rather than the picture that they want, which is on the right. They’ve got a whole bunch of data that’s being prepared and integrated by lots of different people around the organization, but there’s potentiality in there somewhere. But no one knows where it is because no one’s got access to all the same tools. There’s no place to go look to find all of this raw data that could be already prepared. And so there’s a real chance that they continue to reinvent and create more and more instances of potentially inconsistent data again and again as you move forward.
And the same is true on the analytical scene. We’ve had a frenzy over the last decade with all kinds of technologies being purchased by different parts of the business, to be able to build models and analyze data. And so it’s not surprising that we’ve got everything from data science, workbenches, Jupiter Notebooks, Sapling Notebooks, different analytical libraries like H2O in one place or TensorFlow somewhere else. And so it’s a very fractured situation with a lot of smart people, but their skills are spread across a lot of different tools. And not only that, but the whole thing’s not very well integrated at different levels of the enterprise. Vast majority of all of this is going on often to the benefit of middle management, some of it happening in parts of operations or not so much at the strategic level.
So I still have clients that get Excel-based management information packs sent to the executive, rather than them access to something more interactive. So we don’t have this integration on alignment across common goals or different parts of the enterprise. And in a lot of cases, we’re not getting the usage to take advantage of AI to the max in operations at strategic level, as well as in the middle management level alone. Combine them to contribute to common goals like improving customer experience or reducing fraud.
The bottom line is, where we are today, is that we’ve ended up in a situation where it’s a very fractured situation and what a lot of organizations want to be able to do is industrialize that. And as one exec said to me recently, “The last decade was the era of build. We want this decade to be the era of usage.” And so that we put AI to work in order to benefit the organization. So then the question is, “What do you need to do to achieve that given where we are now?” And so I think the first step is to sort out the data foundation and really speed up our ability to produce pipelines using data ops and ML ops.
If you’re going to become data-driven, you’ve got to do it on a base of trusted data. And in order to be able to achieve that, we better know what data exists out there across the enterprise. So a data catalog has to have the ability to be able to do that, to discover what data exists across the distributed data landscape. And not only that, we then need some way to connect to all of those different sources of data across that distributed landscape, using Data Fabric software, to be able to build pipelines. Or, hopefully, in the next 12 to 18 months, be able to even generate them from the metadata and the mappings that are in the catalog. But nevertheless we would like to be able to connect to that distributed landscape and then build pipelines that can produce trusted data assets that other people can then shop for so that we incrementally build up a set of trusted data assets, and then speed people’s ability to find ready-made stuff that they can jump-start their project and go deliver value.
The idea behind this is to build it once and reuse it everywhere, kind of ready-made rather than what we saw before, which is just give everybody their own tools and say, “You figure it out.” Which of course is taking far longer and creating multiple instances of potentially inconsistent data. Secondly, we’ve got to industrialize this, and really get it to become like a well-oiled machine rather than individual people building one monolithic pipeline. Could we break this up into multiple components and create a component-based development environment to really shorten the time to build these data and analytical assets? And so getting into this whole idea of different components that can be orchestrated together in order to connect to data, ingest it from a distributed landscape, being able to clean it, integrate it, analyze it, and produce insights, or even to just produce trusted data that can be consumed into multiple analytical environments.
That means creating different kinds of components to ingest data, to transform it, to clean addresses, to be able to do specific kinds of analytics or cognitive kinds of services like convert voice-to-text, or to do sentiment scoring rather than reinventing that again and again and again. And be able to coordinate collaborative development of these components with common version control and get, for example, to be able to do branches in order to make changes or new components, and merges to bring it all together into an end-to-end pipeline. And obviously what we want to do then is, you CICDs automatically test that, automatically containerize it, and configure the run times in order to be able to deploy that once we got, got through the testing on containerization stage. All again to shorten the time to value to produce these pipelines.
It’s going to look like this, a little bit of ingesting or any kind of data on the left-hand side using a kind of data lake or data mesh environment and using Data Fabric to be able to produce trusted data in a trusted zone, which can then be consumed and used into different analytical environments, including data warehouse, data science, sandboxes, graph analysis, and have a kind of catalog or a marketplace to tell people what’s available to them. That they can jump start their project. So the marketplace becomes pretty key. What is that? That’s a catalog that gives us the ready-made trusted data and analytical assets. They’re, documented with common data names in the glossary, this formative data lineage, so that people know what it means and where it came from, and then they can trust it.
It’s organized to make it easy to find, maybe with the ability to search on it and so on. So it’s going to look a bit like an Amazon. This is just plain Amazon. If I want to go buy something from Amazon, I have a search box along the top, I’ve got search. On the left-hand side, I’ve got different products,, in this case books, and it’s rated and I can stick it in my cart. I want to do exactly the same for data products. I want to be able to search, find ready-made that’s trusted, or want to be able to see if it’s rated, want to be able to put it in my shopping cart. And I can do that today. That’s several products, Informatica Axon Data Marketplace, Culebra, Zaloni, and many others.
The idea is to jumpstart projects so people can find ready-made trusted data assets that get them going quickly so that they can then build on that and drive more value. New queries, new reports, new dashboards, new predictive models, new virtual views, combining trusted data. And then once they’ve got that, they can then publish that back into the marketplace. And so incrementally build up more and more of these assets that will jump-start people even more further down the line for new projects coming along, maybe 70%, 80% of what they need is already made. Once we have all of that in our marketplace, but then of course, the next thing is to put it to work around the enterprise. We need to combine analytics and integrated into the business, into business processes, in order to be able to maximize the value of it to contribute to overall business performance. Means integrating into processes and planning and whatnot.
The foundation that we created from trusted data and analytics published in a marketplace has to be able to support the decisions being made at operational levels, tactical and strategic levels in the enterprise. Thousands of decisions and operations levels, very few decisions at a strategic level, but nevertheless, we want to be able to take what we’ve building and put it to work at these different parts of the business on different levels of the business, which means that the trusted data and analytics that we’ve been created and published in the marketplace becomes central to the entire business so that we can share stuff in a trusted way and then be able to say, “OK, how do we align this with business goals?” For example, if I’m trying to reduce fraud, then what data and analytical assets in our marketplace are available to us to reduce fraud and where do they get deployed in the enterprise? At the operational level, at tactical, strategic level?
Is it in marketing? Is it in finance, or where do we deploy these things to achieve that business goal? So we need alignment of all of the data and analytical assets in our marketplace to tag them by business goals so that we can then work out on what levels these need to be deployed to guide or to automate and get mass contribution to the common business goals and the business strategy and work out where in business processes to be able to utilize those machine learning models, BI, and data assets to maximize business value. But in order to get it into those business processes, we’ve got to go to break the log jam here. Rather than just very smart IT professionals being able to integrate those into applications, could we break that log jam and mobilize the masses of citizen developers to get low code, no code integration of these analytics into different parts of the business to really drive value.
I think that’s the next challenge to empower the citizen developer, to be able to integrate this. And, and for real time, we need to be able to monitor live business conditions as they happen with our finger on the pulse of everyday operations in order to make sure that automatic decisions are made to also help contribute towards business goals or to alert people make completely automatic decisions if needs being. But again, we don’t just want it to be like one-time detection of these things. We want continuous absorbability. We want to be able to have all of these real-time agents deployed around the business on the lookout for certain conditions. And, when they happened to take decisions or alerts or recommendations in order to guide the business to do the right thing, but to learn from it so that we get recording those decisions and get reinforcement learning-based real-time agents to be able to grow the reward from monitoring all these agents.
And then, in addition, to get beyond the kind of quarterly planning or the annual planning, we want dynamic planning, we want dynamic resource allocation, dynamic process optimization so that we can integrate these analytics into the whole planning process of multiple levels. And based upon what’s going on right now to cause dynamic resource allocation on the back of certain conditions that occur to be able to cause dynamic process optimization and do it on a continuous basis and also on a reinforcement-learning basis in order to grow the reward, to improve overall business performance. I guess what we need then is some kind of architecture like this, connected from edge to data center ability to get at that data catalog that data produced, trusted data assets from it to then create trusted analytical assets that can all be published in a marketplace that we can then integrate into various parts of our business, front and back office or mainstream operations, as well as being able to continually monitor and dynamically re=optimize as we go forwards.
To do that, I need a whole bunch of key software components. It’s not just about Jupiter Notebooks and being able to write Python and use a TensorFlow library. What we’re talking about here is, we need a data catalog. We need Data Fabric to connect to this distributed landscape. We need to integrate planning and AI. We need to integrate corporate performance management with business processes. We need to be able to monitor, on an event driven basis, to cause the real-time world to come to life. And we need to be able to automate actions if necessary in order to cut costs and make things more efficient and optimize business processes and do it all in a collaborative environment, tied into common business goals.
I hope that will give you an idea of what I think needs to happen to really become an artificially intelligent business. And with that, I’ll just have to say thank you, and if you’ve got any feedback, please feel free to give it. It helps us to do better next time. Thanks.
Mike Ferguson is Managing Director of Intelligent Business Strategies. An independent IT industry analyst and consultant, he specialises in BI/analytics, data management and enterprise architecture. W...