Data, tools, and technology are nothing without people. And people make culture. At Slalom, we have developed a global offering on a modern culture of data – one of experimentation and innovation, where people have the power to accelerate business outcomes with rapid insights. Only by adapting your business culture — the mindsets and behaviors of your people — do we believe that you can realize the full potential of your investments in data and analytics. Slalom has partnered with Databricks to build a delivery framework using the best that both organizations have to offer. By combining our collective expertise across industries, Slalom’s practices, data engineering, data science and operationalization, we’ve put together an acceleration path capable of sprinting you towards your bold vision of transformation powered by data.
In this talk, we will walk through the five pillars of a Modern Culture of Data:
We’ll cover why they are important and how Databricks not only supports, but enables and empowers each of these pillars for our customers.
Krishanu Nandy: Hi everyone. My name is Krishanu Nandy and I am a senior consultant with Slalom. Today I’m going to talk to you about a modern culture of data powered by Slalom and Databricks. But first, a little bit about Slalom. We are a modern consulting firm founded in 2001. And really what we do is focus on localized consulting, so like a local market relationship building and focus, but with delivery at the global scale. While we’re HQ’ed in Seattle, we have supported clients in all 50 states within the US, and we’ve grown organically to Europe, Asia, and Australia with I believe our most recent market being Tokyo in Japan. And the way we work is we are advisors. We are strategic, we are engineers and we focus on people above all else. And the way we set up our engagements is with the autonomy to move fast. We make sure we do what’s right for the client.
And we do it in an agile manner, which makes us more personal and more nimble than other consulting firms. And we really deeply care about helping our clients tackle the biggest challenges and turn their visions into reality. For us, it’s never really about the project at hand, it’s about focusing on our customers, that’s you, and our customer’s customers, so your customers. Anyways, one of the nice things about being a consultant is that we get to see a lot. We get to see a lot of good. We get to see a lot of not so good. We get to see it across geographies, across industries, across verticals, within particular industries, and what that means is we’ve been able to gather insights from hundreds and thousands of engagements over our 20 year history. And we’ve been able to slowly crystallize those learnings together and bring them to you as global offerings.
And one of those global offerings is about a modern culture of data. And so the most obvious question might be what is a modern culture of data? It’s essentially an environment, a way of working and thinking where people are allowed to fail fast, but fail smart. It’s about experimentation. It’s about empowerment. It’s about critical thinking and collaboration. And I want to talk to you about why a modern culture of data is important, how Databricks enables it, and right at the end, we will have an actual customer story with a name, almost all you, we have certainly heard of, Comcast, where we’ll talk about how in partnership with Databricks and the digital experience team at Comcast, we are able to deliver a solution that was quite frankly, transformational for Comcast. The context is that data platforms are hard. The ecosystem, the architecture, they’re almost always a mess.
And part of the reason is there is overwhelming choice. In almost all organizations, there are multiple data systems, which leads to a lot of data movement, inconsistent copies of the data floating around and getting everything to work in a coherent and synergistic manner is genuinely, genuinely a hard problem. And it isn’t just tool silos, but also thought silos. Even if we think about the most common titles associated with data, data engineers, data scientists, data analysts, you ask each one of those groups what tools they would like to use in an ideal world and they’ll select something different. And so it’s no wonder that data is so, so hard to operationalize and secure. And the truth is few are doing it well. Gartner says that only one in five analytics projects will succeed over the next couple of years, and it’s this complexity, this scale of a challenge that affects almost all organizations across almost all industries that we are partnering with Databricks to end this all.
A modern culture of data unlocks the data potential of your organization. Almost all organizations collect data, generate data and attempt to drive insights from it. But this is a big problem. And so what we’ve tried to do is focus on five pillars. Towards the right is our diagram of how we think about a modern culture data, and right in the middle is a bold vision. It’s about defining where you’re going, where you are and where you want to be. Around those are the three pillars, access and transparency. It’s about removing barriers to access data and not just certain curated data sets with all that data, to data literacy on the bottom, which is not just tools, but what data exists and what that data means. It’s about guardianship. It’s about trust in data and interpretation. And of course at a global scale about data security, making sure that you are in compliance with every increasing data related regulations within your local markets or globally, if that’s where your organization is operating.
And all this leads to our ways of working, our culture. It’s about breaking down silos. It’s about having people be able to self-service and draw the insights that they need from the data while also ensuring that those insights are valid. And so during my talk, if there’s one takeaway, I would say, it’s this slide. A successful cultural shift unlocks the ability of an organization to realize a bold vision of data analytics and AI all on one platform, and how these tenets or pillars of the modern culture of data map are on the bottom. We want to create access and transparencies by a common platform, making sure the data engineers, analysts, and scientists are all working on the same platform, accessing the data in the same way. We want a platform that enables each of those individuals to focus time on their core competencies. You don’t want a data scientist trying to set up your infrastructure or attempting to figure out how to scale the algorithms that they’ve written to train a particular model.
We also want to raise the level of data literacy by making sure that the data that we have within our systems are accessible to the broader public, either via simple SQL interfaces, or by moving it out to BI too. And finally, you want to build guardianship, not just stewardship of particular data sets, but transparency around business logic, how your KPIs are calculated, what is a history of your ML mod tools all while leveraging the operationalization features and reliability features that the Databricks platform has. And I think it’s also critical to know that this isn’t just at the platform level. If you look at individual tools that support the open ecosystem that Databricks is founded on, you’ll find that they map almost seamlessly to modern culture of data. And while this is a highly simplified view, Redash now having been fully integrated into SQL analytics, I’ll definitely deep dive, more in a couple of future slides.
Let’s start at the middle with the bold vision. It’s about an aligned strategy with shared goals and objectives across all functions is a dependency of a data science team on, for example, the ideal infrastructure [dupe] accounted for within our bold vision. And that’s where we believe that there are a number of big problems. Databricks removes those barriers because it allows people to break down the silos that make it difficult to work together. Data is messy and slow. AI frankly is hard and more often than not, BI is limited just a fraction of the data. An analyst is just given a single curated data set and that’s it. And of course, all of that makes a lack of enterprise governance and security readiness a massive risk for your organization. And we believe these are problems that Databricks can help solve.
So from a modern culture data perspective, charting the clear part asks to number of questions. Do you have a business case? Is there a unified vision across the organization and a shared strategy for data, including producers of data maintain as a data and consumers? From a functional perspective, is that leadership sponsorship and appropriate investment of the level used in enact to change? And even simple questions like, “Are the KPIs you’re using the right ones to monitor how it is that you’re progressing within your organization?” And we believe Databricks can help all those use cases by virtue of being a single platform, you know that if you invest in Databricks your entire data function, engineers, scientists, analysts can work on that same platform and can grow and scale almost infinitely. Delta Lake allows the collection of data while storing it in the most inexpensive format in a way that is secure and robust.
So at any given point of time, you … without ever losing your historical data. And of course, operationalization, making all of your ideas reality. So often we see groups that have developed models that have never really made it past the POC stage. And that is where we believe having a bold vision, knowing where you want to be is so important. And having trust that the tool set that you have chosen will allow you to get there. The other piece that I’d mentioned is about access and transparency. And this is about removing all barriers to data access, making sure that the systems that you have allow for the democratization of data and making sure that data is available at the speed of business. And this is where Delta Lake comes in. It’s secure, it’s fast, and it’s open. Your data is always yours and your data is accessible … massively scalable.
And so with access and transparency, there are a similar number of questions. Do you have an architecture that is scalable for the future, or are you investing time in tools as well as people into something that has a limited lifespan and will need to be replaced at some point in time down the road? Are your end users spending a lot of time getting to the data and does it take detective work to get to what they really need? And are they able to see all the data, or is it just a curated data set that some group has served up for them?
With Databricks, the Spark run times are continuously upgraded. So you know for a fact that your environment and the way you are doing things will never be out of touch from what is the state-of-the-art. The resources themselves, the clusters, are effectively infinitely, scalable along with the storage, meaning that there is no problem large enough that the Databricks platform cannot handle. And with Delta Lake and ML flow, you have data and analog datability. You can track what happened to your data, what happened to your models over time and know for a fact that this is the path that they have taken.
The second of the pillars around bold vision is data literacy. And this is really about developing a mindset that data can be used to drive most every decision within an organization. It’s about having a pervasive culture of data and ongoing measurement of engagement, adoption, and impact. And this is where we believe a technology like Spark can help. Going back to the three titles we talked about, about data scientists, data engineers, and analysts, the fact that Spark supports … or several of the most common languages, Python R as well as SQL, that are used in data means that everyone can use the skills that they have while working on the same platform. And it’s not just limited to Spark.
The fact that there are collaborative work-spaces means that folks from different departments can work in the same notebooks. Even outside of a purely technical team, SQL analytics allows non-technical users to access data through BI tool connections. And what’s critical is that everyone is using and looking at the same data. From a perspective of people, data literacy is about how good are people at accessing the data? How confident are they in the interpretations? Are they using the most recent data available to them? And at any given point of time, do they have the insights that they need? But probably most telling from a cultural perspective is, are the people in your organization able to be proactive or predictive and not just reactive to changes within your industry or within their environment?
The third pillar outside of bold vision is about guardianship. And this isn’t just about trust in data and tools, but also making sure that your data is secure and compliant. And this requires clear data ownership, ensuring high data quality and fully adopting the standards in guardianship that regulations may or may not be required in your particular geography. One of the ways that Databricks helps enable this is with the ML flow. ML flow gives data scientists and AI teams control auditability and reproducibility with the end to end machine learning lifecycle. But again the places where guardianship can play in Databricks are not limited to ML flow. With guardianship, there are concerns about trust. Do you have common definitions, for example, of what revenue is within your organization, or are marketing and finance interpreting it differently? In some cases it’s true that, in certain cases, each person’s truth is valid, but having a common ground within which to do that and having a data system that reflects that truth is important.
It’s also critical to figure out whether your employees trust the data and that it is ready for analysis because your analysis is only as good as the data that comes into it. Do they spend most of the time doing validity checks and only 20% in actual analysis? And is it possible to flip that so that your employees are spending their time where they are most valuable? And of course, from a security perspective, there’s open questions about how you’re auditing your compliance with global regulations, and of course, as the ethical use of data embedded and culture … Databricks platform supports a large number of security standards. And within enterprise security, there is a security model in place with the platform to provide the most advanced protection with a single interface of control. And finally, we come to ways of working where we are talking about embedding the notion that data can be used on the many questions of your business that are asked by the leadership in your organization with data. And this comes down to an operating model that enables alignment on strategic objectives and collaboration across all functions.
Redash is an open source project that has now been integrated into a SQL analytics that makes it really easy, or a code free way to explore, visualize, and share the same data that data scientists and data engineers use across the organization. And so ways of working asks a similar number of questions. Is there a partnership between the business and IT, or are these team silos? Are people within your organization continuously learning, and now your talent acquisition teams able to find the skills that you want in the marketplace? Have you defined behaviors that you do and don’t want to see an the leaders modeling that behavior? Of course, things like self-service, are people able to answer their own questions or ask for themselves their own questions of the data that your organization has is critical.
And of course shadow IT. I was actually on an engagement where somebody had set up a desktop tower under their, or a PC tower under their desk because they weren’t getting what they needed from IT. And all of these concerns are addressed by Databricks because it is a continuously evolving platform. Features are added on an almost weekly basis, and what that means is you will never be out of date with what is the most modern technology available. And all of this came to a culmination with a project that we did with Comcast data experience team. That problem was pretty standard. They were working on an on-prem environment with vast amounts of billing data, and the cost was growing … the cost doubling annually. Essentially, while costs were going up, the majority of users did not have effective access to the data, which was a double problem.
And so the decision was made to migrate to the cloud and Slalom’s Databricks engineers were driving that. At the end of it, data scientists and other analysts had unprecedented access to data. It was scalable, unified batch and streaming data pipelines and a successful Delta Lake POC. What’s really interesting is how Slalom’s Databricks engineers represented only 10% of the team, but delivered 24% of the stories and migrated, not just historical data, but also the data pipelines. And the way they did this was by constantly applying the key elements of modern culture of data, which is listed on the right. Everything they were doing, they were thinking about things like guardianship, the governance process, the ways of working, expertise around agile processes and testing. And of course, data literacy in terms of delivering data migrations pipelines and handing it off to our partners at Comcast at the end of the project. So with that, I would like to wrap up my talk, thank you very much for your attention and feel free to reach out to me with any questions.
Krishanu is a data engineer and data scientist at Slalom. In particular, he enjoys architecting comprehensive end-to-end data solutions that bring value to both technical as well as non-technical end ...