Modernizing to a Cloud Data Architecture

May 27, 2021 04:25 PM (PT)

Download Slides

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

In this session watch:
Guido Oswald, Solutions Architect, Databricks
Matt Graves, Vice President, Enterprise Data & Analytics, GCI Communication Corp.



Guido Oswald: Hello. Welcome to this break out session on Modernizing to a Cloud Data Architecture. My name is Guido Oswald, I’m a solutions architect for Databricks based in Zurich, Switzerland. After a short presentation from me to set the baseline, I will be joined by Matt Graves, the VP of Enterprise Data and Analytics at GCI Communications Corp where he shares some of the learnings of the Hadoop to Databricks migration they went through. Should you have any questions during the presentation, feel free to ask them on the chat or Q&A section of this meeting platform. Let’s have a quick look into the reasons to modernize from your legacy big data platform into the cloud, and then talk about the technical and business benefits we have seen at other customers.
After looking at some of the ways we can help you control risk and cost of data migration, Matt will tell us how all these worked out at GCI Alaska. First question is probably the why. Why would someone looking at such a migration? Morgan Stanley has done some research on how our world will look past COVID, and they came back with the top 12 technology trends that will keep us building within the next years. Clearly, the way we work and the way businesses work have changed dramatically in the last year. This is not only in the retail sector but across all verticals. Everyone of us can feel the change in our personal and professional lives. IoT has become a mainstream, real-time streaming use cases that are quickly increasing and everything gets mobile these days.
All these changes we experience generate a massive data surge and therefore a lot of pressure on the legacy data and analytics backends, that many times were not built for such scale and dynamic. Because of that increase in data and the changing requirements in terms of time to market, streaming and predictive or prescriptive rather than backward looking reporting, we see the adoption of cloud-based platforms accelerating a lot currently. The $100 billion picture on the slide are actually only the increase compared to pre-COVID estimate. So why don’t the legacy architectures keep up? These silos that are pictured on the slides, data warehousing, data engineering, streaming, and machine learning have evolved over time. Because these silos need to exchange data, the complexity has often grown to an amount that is very hard to manage.
Although the data governance and data quality have become an issue in such architectures, many of the customers I am talking to have such a similar situation. There is a large amount of work going into building and operating such legacy platforms, which keeps the data teams from focusing on the important projects that they should be working on. This is also very true for the Hadoop environments that many of us have built in the past. So don’t get me wrong here. Hadoop was a great technology and did great things in the past 10 years or so, but the legacy on-prem Hadoop installations cannot keep up anymore. The cost of running Hadoop cluster in the own data center is high and it’s getting higher and this in times where we actually to reduce costs.
Many of the new use cases require real time or near real time data and reporting. These use cases are often machine learning use cases, again, predicting something rather than reporting on past data. Because of this limitation and the complexity of legacy architectures, the data teams cannot get down to the high impact business opportunities and chances are missed. Ultimately, this will result in falling behind the competition. We probably all have experienced that it is extremely dev ops intensive to operate an on-prem Hadoop deployment, due to the high complexity and the high failure rate. This results in low productivity because the engineers are busy keeping the system up and running, instead of implementing the high values kit I mentioned earlier.
Also, Hadoop glasses are pretty inelastic despite their theoretic possibility to scale. This is because they are sized for peak load and it’s usually a long process from the decision to expand the cluster, to the hardware running and the software stack being deployed. I had a customer with this process took almost a year to complete. This might be an extreme but as you can imagine in the cloud we’re talking about minutes to scale. This usually happens at a much lower costs than on-premise hardware. When it comes to data science and AI, the Hadoop clusters can run some of the workloads. But usually you would need to add additional tool stacks left and right to support the development and deployment of such use cases.
This usually creates new silos, limits collaboration and therefore slows down innovation. So if you put together a list of requirements for modern big data architecture based on the learnings of the past, this new platform should be cost effective and easy scalable, easy to manage and work reliable with all kinds of data. It should be able to process data and make predictions in real time. A recent Forrester economic impact study has found a stunning 417% ROI at companies migrating the legacy technologies to Databricks. They were benefiting from a higher productivity of their data teams and cost savings from switching off of the old infrastructure. Forrester also saw a 5% increase in revenue due to faster creation and optimization of ML models with Databricks.
Before we look at how we solve these limitations and achieve such an ROI, let’s quickly talk about Databricks as a company. Databricks provides a unified data analytics platform that allows data scientists, data engineers, business analyst and machine learning engineers to work on a single platform, a single Lakehouse and dramatically accelerate information. We have a global presence with 5000 customers and more than 450 partners, and you might know Databricks as the creator of popular open source projects like Apache Spark, Delta Lake, MLFlow, Koalas or Redash. We also coined the term Lakehouse that is becoming quite popular recently, almost a new category combining the data lakes and traditional data warehouses into a single platform.
The Databricks Lakehouse platform combines the best of both worlds, the scalability, flexibility and low cost of an open data lake with all the benefits of a data warehouse like, ACID transactions, up thoughts, time travel, indexing and so on. The Databricks Lakehouse platform is built on standout cloud object stores like AWS S3, Azure Data Lake Storage, ADLS, or Google Cloud Storage, GCS. Unlike on-prem Hadoop, storage and compute are separated hence therefore scale independently. Storage will scale automatically, this is the magic that the cloud provide us too. There is no more time spent on monitoring and replacing disks optimizing the network topology or expanding capacity, by adding additional storage nodes that also add compute capacity which might not always be needed.
Scanning compute is almost as easy with Databricks. Users can work on shared clusters or work on isolated clusters on their own depending on the requirements. We also support single node clusters which are very cost efficient when working with small data sets, but still have a full spark deployment available running in local mod. Databricks classes support automatic scaling and we provide a machine learning runtime, with the most common libraries already included. Users can easily install libraries with PIP or CONDA commands. The open source delta lake framework builds a layer on top of these storage and delivers all the features that we know from traditional data warehouses.
With this additional layer, we can run multiple workloads on top, that otherwise would need to copy data or work with a subset or less and obviously delivering less optimal results. Delta lake was also built from the beginning to support streaming data. This is a prerequisite for real time applications and dashboards. With the collaboration features built into Databricks, all the teams are able to work on a single platform and on the same data at the same time. Doesn’t matter if you’re a data scientist exploring large datasets with Python, or if you’re a SQL analyst creating real-time dashboards for customer support, or if you’re data engineer writing data pipelines in Scala, or if your machine learning engineer pushing a new machine learn model into production, to better predict customer turn. Doesn’t matter, the Databricks Lakehouse platform allows you to do all this at the lowest possible cost and minimum log in.
From a technology perspective, there are some low hanging fruits for Hadoop migration to the cloud. There some other required little more effort. As you would have guessed, Spark workloads transfer very easily to Databricks. if you are running still on Spark 2 I would advise you to lift this to Spark 3, and profiting from all the performance and other fixes we have done in the past.
All of your Hive and Impala workloads can also be transferred with little effort to the Databricks Lakehouse platform. You can actually expect some substantial cost and performance benefits here. With Spark structured streaming we have a perfect place for all your streaming workloads. Converting your legacy MapReduce code to Spark would obviously require some effort but again the gains will be substantial. Hbase workloads can sometimes move to Delta Lake, if Hbase was only used because of the limitation that [inaudible] has for example those updates. Sometimes these workloads are better house in a cloud native key value store like DynamoDB or CosmosDB. This depends on the use case and we will look into this with you case by case.
To speed up Hadoop to Databricks migrations, we also have automated many of the straight forward elements of such a migration project. The meta data off the Hive and Impala databases, for example, can be moved to Databricks provided Hive metastore or a shared metastore from the cloud provider in an automated fashion. No need for manual intervention here. There’s also some help replicating the security concept that is all the sentry and range of policies to Databricks and the cloud. When you’re using Oozie for on-prem orchestration, there are multiple options for you when you go to the cloud. Usually I see very happy faces when I tell my customers there is no more Oozie in the clouds. One popular option is Airflow but there are many other alternatives and we will introduce multitask jobs very soon which allow to execute multiple tasks in one drop in serial parallel in a Databricks job. Though look out for the other talks on this multitask jobs here in the Data AI summit.
Then I see many of my customers using the native cloud tooling as well for orchestration like Azure Data Factory or AWS glue. Since Databricks has a comprehensive and open API, orchestration and automation can happen from pretty much everywhere, pretty much every tool that is able to call the rest API. By automating on-premise to Hadoop Databricks migrations, we have seen quite some speed ups in the past. There are of course cost savings going alongside with this. This is a comparison of projects we did in the past. We have SI partners were trained to help us with these automated migration or Databricks professional services can also help reduce the migration time dramatically. Our vast partner ecosystem has offerings that help speed up and de-risk Hadoop migrations. some of them might already be in your toolset or you can spot your favorite SI here. I should mention that Databricks is also available on the Google cloud now so you can choose to run the migrated workloads on all of the 3 major clouds and this Databricks platform is almost identical between the clouds. You can change your mind later on as well without starting the next migration project.
I know that multi-cloud is not a big topic for many of us at the moment but I expect this to change in the future. Another important aspect of the cloud migration is data movement. There is also a tool to helping with this of almost all possible requirements. Usually it is unrealistic to move all the data into the cloud in one go or in a big bang migration. That is why it is helpful to have some synchronization between the on-prem and the cloud deployments during the phased migration.
So to recap we have talked about why it totally makes sense to migrate your legacy on-prem environments to the cloud and got a glimpse of the impact this can have. In terms of timing, I’d say the time is now. If you want to avoid falling behind the competition, you need to act quickly and realize the potential of a modern cloud based data and IR platform like the Databricks Lakehouse platform. We will stand by you, Databricks will help you migrating your workloads to the cloud as quickly and smoothly as possible. That is our promise.
If you want to learn more, you can always go to our website for migrations and find some detailed white papers and some customer stories around these topics. One of the customer stories you can hear right now, from someone who has successfully done such a Hadoop migration to Databricks and the cloud. I’m honored to be joined now by Matt Graves, VP for Data and Analytics at GCI Alaska to talk about his experience and learnings from this journey. Hey welcome Matt. Thank you for the time and for sharing your story today. Let’s start maybe by having you introduce yourself and chat a little bit about GCI Alaska.

Matt Graves: Sounds great. Thank you very much, Guido. So my name is Matt Graves. I’m Vice President of Enterprise Data and Analytics to GCI, started a little over a year ago in January of 2020 before the pandemic. It’s been quite a year of learning and frankly new experiences. A little bit about my background for the past 20 years, I spent about 8 years in Silicon Valley with internet start ups and after that stage of my life, I got recruited to Microsoft where I worked for the past 11 years running a big data team and then I started working at GCI after that running their enterprise data analytics team.
Just a little bit about GCI, it’s a quad play so telecom which means that it has mobile service to the state of Alaska. It has wireline access or internet access to the state of Alaska, provides cable TV to the state and local and long distance service so it’s got a a quite an extensive network. I think we have it seems like every imaginable technology so we’ve got satellite, microwave, fiber, co ax, 5G down to 2G, so it’s a 40 year old company in Alaska and has grown organically with lots of technology. Evolution along the way which means my job in enterprise data analytics says there’s lots of data to look at it. That’s the cool thing about the job is that the type of problems that we’re addressing in the tremendous amount of data that we have available to us to help support data driven decisions so thank you Guido for the opportunity to talk today and it’s nice to talk with you.

Guido Oswald: Yeah, nice to have you, Matt. So what business challenges was GCI Alaska trying to solve and what things did you wish you could do?

Matt Graves: So we were blessed with a lot of data. We had an on-prem Hadoop stack and it was about 60 terabytes of data and we’re constantly ingesting new network data regarding performance and call records and from all the different technologies. We’re being asked questions about network optimization. Where do we build our next cell tower? Where is our network having troubles? What happens, what if simulations on what if we upgrade of this 3G to 4G or LTE? In the data needed to support that so we had some challenges around just being an effective partner with our business in just staying and making data available and then doing analysis around the data. That was problem one.
But we’re not just a network business. We have customers and so the other side of the problem is what’s our customer experience in the network. Yes there’s churn. Yes there’s new product adoption. How are our customers actually using the network and how are they in house network performance impacting user experience? We could do it an isolation for an individual customer. But if somebody said well how are you doing it for all your customers. Evaluate that. We just couldn’t do it and so it was an embarrassment of riches that we had the data but we were paupers when it came to actually using that data to quickly and agilely solve those business problems. That’s the the high level of what things we wished we could do if only we had the capability and the frustrating part is we had the data.

Guido Oswald: Yeah I can feel you. I can feel you. Having worked for Gladero for many years and being in this Hadoop world for a long time, it’s an awesome technology but yeah I had many customers with facing similar problems. Can you just describe some of the limitations of the old architecture that impacted GCI Alaska from either fully delivering its data strategy or expanding it?

Matt Graves: Sure. I think that it’s in some ways it’s apples and oranges and I started and lead that drive to adopt Azure and in the decision to adopt Databricks. The challenges that we had on our data engineering team were things like, I was talking with a data engineer just this morning frankly and refreshing my memory of the Old World. He was talking about how with NifI he would get 75 emails in the morning with these warnings and alerts. He’d have to read through all of those emails, couldn’t miss one. Then he would have to go into the logs and evaluate the logs to figure out where an error had happened and generally speaking it was a group effort. So it was very frustrating so that it doesn’t exist today. He doesn’t want to go back to the Old World but now things like alerting identification of error, figuring out where to resume the job or what error to correct makes this life much easier.There’s the elasticity and flexibility that you have in managing clusters and cluster sizes, spinning up clusters for right sizing for the right job.
So that whole operationalization of our data managements and things like that are easier. Those were problems in the past. With so many different types of data sources, schema changes of inbound data would change and that would blow up our pipeline and so the ease and an ability to manage schema changes is a differentiator. Gosh. The data corrections with inserts and deletes, upserts, things like that, which was possible in Hadoop but just manually intensive and tedious because the constraint. Just to speak about the elasticity and flexibility, we started our Hadoop clusters with the right level. As we began opening up the data to business users who would access it with Alteryx and Tableau, we found that compute functions that had originally been calculated to be acceptable was being chewed up with lots of business users that were hungry for the data. So our ability to expand the compute function so we could do more with the data was had to be managed and cause limitations on our ability to serve business. We’re always asked to do more so that was a little bit of frustration for us and for our business users.

Guido Oswald: Yeah absolutely I can. I remember also this, a simple thing like updates, that was a tough thing to do. You have to put an Hbase in front of it to be able to do updates. It ended up in very complex architecture just to do updates. I’m really happy that we have Delta Lake now which eases up things a lot so-

Matt Graves: Makes a big difference.

Guido Oswald: Absolutely yeah. What was a compelling moment which led GCI Alaska to the aha that things had to be changed?

Matt Graves: Well, I think that there was a realization, maybe even before I got there, that we needed to drive. We need to reset and do something different in the future. I know that the company had made some steps prior to my arrival but we had not adopted AWS or Azure before my arrival. Having spent 11 years at Microsoft running a big data team there and my arrival at GCI, I guess I brought that driving experience to get it done. So the company knew that we had to modernize our big data platform and harden it. There were many failures, week on week on week with pipeline failures and things like that, for one reason or another.
We just had to have the willpower, the drive and direction to get over that hurdle, to figure out what to do and how to do it, who to do it with, that sort of thing. So it was apparent I think when they hired me, that they knew they needed to do something and so I think it was business frustration frankly. In the promise that we had a lot of data, we knew what to do with it. We knew the business problems. We just weren’t able to address those business problems in a timely way. We were able to make a compelling case for this is what we should do. We need to adopt Azure and we need to adopt Databricks because of the tight integration, the capabilities that it brings. We’ve convinced enough people that we knew what we’re doing that we got approval to do it.

Guido Oswald: That sounds very familiar to be honest. So can you shed some insight why Databricks and what was the migration process like?

Matt Graves: So I’d read a lot about Databricks and I hadn’t been a user before but I understood the benefits, the efficiencies that Databricks would bring. I’m not an expert on Hadoop. That’s not my background or competency but I knew what Databricks could bring to us from an efficiency standpoint, operational efficiencies. I did a lot of research on Delta Lake and understood the benefits of Delta Lake so it was more a matter of conditioning the team on migration to the cloud and upskilling the team to understand what the capabilities were. Databricks, by the way, did an excellent job in presales in evaluating our environment and helping us understand sizing and management and security and answer all those questions that we needed to have answered not just for my team but for in our security group, for our cloud platform team that manages our infrastructure. So it was getting involved with Microsoft in helping us with that server cloud adoption framework that they have which is very good and then getting deeply involved with Databricks not being afraid to ask dumb questions and learn and realize that it’s okay to say I need help. I don’t understand. I don’t know and learning. It’s learning. It’s just learning. It’s great.

Guido Oswald: I’m glad to hear that here. That this work like this and we do our best to do this as many other customers well. Can you provide a comparison between the old and the new architecture?

Matt Graves: I’d be happy to yes. Actually have a couple of slides as it turns out. I have a before. With the Hadoop infrastructure and essentially reading left to right, we’ve got different data sources. So what data sources we have, we’ve got all kinds of network services resources, telemetries from different systems, call record detail, detail around cable modem, information on utilization, lots of capacity type things, alerting on many different aspects of our network. We have billing system data. We have lots of information on post-paid and prepaid and types of products that customers have. We also had some streaming data from some of our data sources and some data ingested from other types of cloud services. We had essentially the typical nifi scoop and Oozie to schedule the orchestration of the ingestion and things like that.
That’s where that part of the 75 emails a day that went to one of our data ingestion engineers came from. We use that the typical Cloudera Hadoop tools for normalizing, ingesting, unifying, and processing the data. For consumption though, we used Alteryx Tableau primarily for our business users or our data analytics team to pull data out for analysis and dashboarding and things like that. I’m happy to say that we fully decommissioned all of our Hadoop hardware this year on January 31 so none of that hardware, about $1000000 of hardware exists anymore. We’re now fully on the cloud as of February 1, operational of 2021 so we’re a full you know 4 months or so into it. Now we ingest using orchestration and processes housed in Azure Data Factory and we use Event Hub for ingestion of some of our streaming services.
We utilize Stream Analytics and of course, Azure Databricks to manage spark clusters and I think I’ve talked about some of the benefits of that recently. We were big believers in Delta Lake for some of the reasons I mentioned previously on a schema changes. Something I didn’t bring up before is Databricks built in capability to handle small files. That was a big problem for us before with Cloudera and that problem’s solved with Databricks. I assume everybody understands the small file problem and when I ask I don’t know if you care to hear this but when people ask me what is a distributed compute function. This is not original date but this is illustrates the benefits of spark and the small file problem so if I gave you a packet of M and M’s and I said count all the green M and Ms in the package. You’d pour the package of M and M’s on the table and you’d say, there’s 15 green M and M’s. I’d say, that’s right. So that was easy right, you’d say yeah that’s easy.
What if I gave you a dump truck full of M and M’s? If that was so easy, count the dump truck full of M and M’s and you’d say, well that’s hard. Well, what if you can recruit 10000 of your best friends. Keep track of all of the M and M’s. Who’s doing what and then reassemble the results of the green M and M’s. You’d say, oh, that sounds easier. But if I said, well actually all the M and M’s in the truck are in small packets and those small packets have to be reassembled and you have to deal with each of those small packets of M and M’s in the dump truck. You’d say, oh that’s that’s more of a problem. That’s harder. So the spark clustering capability handles large files well, does a great job, small files are a difficulty and doesn’t really allow that distribution of compute function to be handled efficiently.
Databricks has solved that problem for us and we are very happy about that. In terms of that was on the side but hopefully that helps some of our listeners explain to non technical users in a simple way that’s easy to get their heads wrapped around. We use ML Flow, Delta Lake, I mentioned, we start everything in ADLS and then we still consume with Alteryx and Tableau but we are moving more and more to Power BI and the power platform in leveraging and utilizing that data. So that’s the current architecture. The reasons why I think we’ve talked about the productivity gains that we’ve had in both our ingestion team and our analysts. I should say that also bring up and I don’t know if this is the right time to bring it up but my team has essentially grown and as we announced that we were utilizing Azure and Databricks, the data science team used to be separate from my data engineering team.
Now with this more unified environments and certain unified set of tools, the data science team opted to join my team so my team is growing and I have another expansion with some data architecture folks that are all wildly competent That aggregation and the closeness of these cross disciplinary teams in the collaboration they can achieve working more closely together, even though they’re separated do the pandemic and everybody’s on Microsoft Teams these days. Having them in the same organization and in fact working on similar or the same business problems with their different disciplines has been a major benefit so the machine learning capabilities that are integrated with up with Databricks, the ease of integration with the Azure services, the elasticity has just made a big difference in our ability to execute. Last, the improvements for our data science team, the interface is better. Everything’s better. Power and scalability to do machine learning problems which now we do almost continuously and the integration with the Azure Data platform has been a giant benefit for us. From a security standpoint, from an operational standpoint and a control standpoints, that addresses that question that you had.

Guido Oswald: Absolutely and Matt you would be talking about upskilling your team. Can you give us a little bit of advice on how to do this or what to you this is?

Matt Graves: Well, in their practices that I’ve instituted and I think they’re best practices or I’d like to think so when I first started, I started in January and this is an Alaska-based company. I’m based in Seattle but I flew up to meet the team and I was up there every week for about 6 weeks just trying to get to know people that thing. My first all hands, this is before we made the migration to Azure, I said, by June 30 I want every single person in my organization to have achieved the Azure Fundamentals Certification and so they’re, what? What is this? It was a good goal and we were able to get every single person on my team to be Azure Fundamental Certified. Azure Fundamentals is not an advanced certification. It’s that ground level foundation. It was important to me for people to understand why we’re migrating to Azure.
If they talked about it at home or with a colleague at work, for each one of my team members to have a really good grasp of what Azure was and why we were doing it, the benefits that we would be able to achieve. So that was the ground level. As the data science team joined in and other data architects join my team, that’s the entry point. You’ve got to do this first. Then there’s lots of learning. The more advanced learning that Microsoft and Databricks provides from a professional development certification standpoint and so what I asked each and every person my team to do was to choose a professional development path and make progress every quarter, earning badges or taking courses for getting certifications. I constantly want my team to continue to upscale their knowledge. It’s been a good thing that the team knows what we’re doing. They know why we’re doing it. Even though we’re learning every day, their collaborative learning has been infectious which is that a good thing. I love to see that.
So it’s putting out the mandate and scaring everybody at first but then encouraging them to grow with their professional development certifications and then be able to actually put those into use in a platform that we own and operate and run. I think it’s a great incentive and it’s been embraced by my team and I’m grateful for that.

Guido Oswald: Absolutely. That’s great and let’s switch gears a little bit and then talk about the use cases and the benefits that are now unlocked.

Matt Graves: Sure. It’s interesting. We just went into full production in the first of February and the use cases are flying at us. It’s all of a sudden. I did a text session with my team, leaders for my team on March 15 and got a lot of interest from across GCI, marketing people, customer experience people, network services folks, product folks, just about 100 out of the more senior people at the company participated. There’s really a ground swell of interest in feeding us business cases. Let me describe a couple of those but before I do, I would like to talk about, make sure that we ground on what we’re doing here so I’ve got a slide that talks about automation of Databricks with Azure ingestion of network services data, pulling in the data, storing in Azure Data Lake, then forecasting at scale. I’ll explain what we’re talking about there and then storing out those results and presenting in Power BI so with network services optimization, a question can be, Alaska is a difficult terrain. It’s a difficult area to achieve mobile phone and wireline services things like that where there’s mountains and harsh environments and things like that.
So in some of our smaller communities, we have earlier generation mobile phone capabilities, 2G, 2.5G, and as we think about what it takes to deliver services to those communities, not doesn’t just take a cell tower. It takes the back hall back to our core to connect those phones with other phones and if we have a remote community that we offer 2.5 or 2G services too, we likely use satellite to uplink and downlink the data communications. Well, satellites is slower than fiber, slower than microwave and so. when we want to upgrade a community from something an earlier generation to a later generation like LTE, we have to say, well, how we going to back hall and how does that back hall across maybe microwave towers? How does that affect the total capability, that total capacity, the bandwidth, that microwave segment provides and so we have to say well, will we expect the call volume to do? I’m going to the next slide here. How do we project out the additional capacity as we go from 2 or 2.5G to LTE. What happens to the back hall and how much bandwidth additional is needed and project out with simulations, peak sent? Is there hospitals? Is there video and how is this really going to work?
So the data science team now has the data to be able to understand and look at analogous data sources, data points and figure out how additional bandwidth utilization will be absorbed by the network and where we might need to do upgrades in order to deliver that the newer generation microwave connectivity to different communities. So how do we do this without the date and without the tools to be able analyze it? It’s a crap shoot so we don’t want to do that. We want to do with data. We want to do it with understanding. Another use case is customer experience and when we think about network alerts, network performance, outages, slowness, dropped calls, throughput, there’s a lot to the network performance. We have to link it to a customer because we’re doing it for customers. We’re not doing it because we like to provide network services. We’re doing it to to serve our customers and make sure that we provide high quality services and that matters.
It matters a lot. 911 emergency calls run through the network. Critical information to hospitals and healthcare clinics run through the network and we really need to understand as we have events or as we want to drive improvements, who’s affected, who’s impacted. We have an outage in a particular community. It’d be nice to know who has an outage and why do we care. We care about the customer but we also care about our ability to serve the customer. So you know if you have an outage, they might go to a land line. If a cellular or wireless outage, they might go to a land line and call our call center where they might chat through our wireline internet access with our call center. Well, if the call center doesn’t know that there’s an outage and who might be impacted, then obviously call volumes increase but we’re not really tooling our customer service representatives so that they can deal effectively with customers and so that linkage of network performance and customer experience will help us with our dealing with customer calls, with marketing programs, with making sure how we spend our marketing funds and where we go after new customers or try to go after competitors to get those customers to migrate over.
It’s important to know where we provide the best quality service and where were most likely to win new customers and that sort of thing. The other types of use cases have to do with next best action if a customer service rep or a retail store rap is talking with the customer. What should they be talking about? It matters where the customer lives. It matters what services the customer has. It matters what the experience of the customer is. So how do you figure out what to talk to the customer about and what’s the most important? What’s the most impactful? What’s meaningful to the customer if they’re on a one gig internet service and they’ve got a 7 year old cable modem? Well, you probably should talk to them about upgrading their cable modem because they’re probably not getting the full benefits of the product that they’re buying from us so these next best actions are initially rules based but the way that we’re looking at this is using machine learning and getting feedback loops for I offered this to the customer because it was supposed to be the next best action but the customer rejected because too expensive, not appropriate, whatever those reasons are or they did accept it.
As we create a feedback loop for what was the next best action recommended. Did their front line team member offered that particular best action to the customer? What was the customer’s response? Did they want more information? Did they reject it? Did they accept it? That will feed into, we haven’t finished developing this yet but will feed into a machine learning model so we can fine tune those next best actions and get the biggest bang for our buck when we’re actually in front of a customer or online talking with them about things that we think they should be doing with us.

Guido Oswald: Very interesting. Thank you for these insights and having worked with a couple of telcos here in Switzerland and in Germany. It looks like many of the use cases are the same. The data is the same. The general volumes are the same. It’s a lot of data we’re dealing here with. As a customer of these telcos, you actually see who’s doing good with that and you see actually how beneficial it can be also from a customer experience if you use these to stay that efficiently and all the examples you gave, if you do that efficiently. You feel much better as a customer. Also, having said this, I think we are coming a long way in the past years using all this data but there is still a way to go. I think you’re a little bit ahead of many others in your vertical that are needing to move or make this move from Hadoop into the cloud so very interesting.

Matt Graves: Absolutely.

Guido Oswald: So forward looking, how are you rethinking the role of such a lake architecture can play in the future off the data and AI strategy at GCI Alaska?

Matt Graves: It’s an exciting future. The power of what we’re doing is that we’re able to connect the data and so I think in the past we looked at data in silos because that’s what we could do. That’s how we did things and now as we’re looking at the data and the integration and the one plus one equals 3. The synergy of tying data together is very powerful. Being able to take alerting, for example from some of our systems and being able to predict based on the messaging and the alerting what component of our network is failing, separates the signal from the noise and will allow us to with machine learning create an informed prediction model of maintenance for pending outage, capacity limitations. How do we address an outage in a more timely, more efficient manner? With data and compute power and if this isn’t a negative about the company but I see it as a target rich environments, there’s lots of opportunity.
When we talk about placing inventory on our trucks for repairs, whether it’s a residential whether it’s at a customer’s home or whether it’s a commercial business or whether it’s a cell tower whatever. We don’t know necessarily what equipment or what inventories on what trucks so when we assign truck rolls to a home or to a repair. We could with the data and the power of data and this isn’t necessarily, data science model but it’s using data to say what’s the best truck to roll out to this? The competency, of course, of the maintenance engineer but who do we send to this particular home to fix this? It matters. Truck rolls are expensive, especially in Alaska, which is huge, biggest state in the union.
The future is really about deeper integration into business problems and being a consultative, appreciating the pain. that a particular part of the business might be going through or introducing them to different ways of looking at a particular problem that’s more data informed. It could be computer, data science informed. It’s an education in collaboration process because some of the people in the business haven’t known to come to us and have tried to solve problems with heuristics or whatever. That’s great because that’s how things have been done in the past but the future really is advancing. The role of the data in how we do everything to serve the customer, expand to new communities, go after new customers that our competitors have today. How can we act most effectively and most efficiently and formed our services so that’s what I see is our future. It’s a nice long trajectory of a business value so I’m looking forward to that.

Guido Oswald: I couldn’t agree more, Matt with this. I really think your guys a little bit ahead of many others so you already switched off your on-prem to you’re now completely into the cloud. You’re leveraging all the benefits and realizing the value off the cloud and Databricks. So what advice would you give others out there that are on a similar journey to modernization?

Matt Graves: So I think that it starts with establishing the culture of your team and in education of your team and it can be people adjacent to your team but it’s important to get the buy in that this is the thing to do and recognizing the opportunities for improvements and efficiencies. It’s collaboration in so if you and your team don’t know why you’re doing this, then you need to figure that out because that’s the foundation and then you can start evangelizing about the benefits of this. Both Microsoft and Databricks did a very good job of helping us to understand what to do first, second, and third. So I always felt like I could reach out to DataBricks and Microsoft and have a well informed conversation with somebody that was very competent in what needed to be done. That’s a critical safety net. It’s talking with the security organization early and often about what we’re doing and making sure that they understand the architecture that you’re moving to.
They might not be as as informed and so bring them along on your learning journey with you. Partner with them. We heavily leveraged others in our business to establish our V nets and make sure that we understood exactly how to protect our environment and extend out our server data center capabilities into the cloud and how that would work. I was a little bit too aggressive in when I thought we would finish. We made the decision that we were going to adopt Azure and in around February so we started having meetings and focusing on what you need to do first, second, third, and there’s really no other organization at GCI that’s in Azure. We were the first. We were the leaders and in that cloud adoption framework that Microsoft has, there were a lot of learning’s about, are we trying to build out our Azure environment?
We used a hub and spoke model. Are we trying to do this in a way that serves every possible interest in the company or we’re gonna try to serve enterprise data analytics and I was successful in saying we’re gonna do this for enterprise data analytics and that made a lot of the decisions are easier. I think focuses is critically important to drive to an end goal that says this is what we want to do. This is how we want to do it and recognizing that hub and spoke model different than Hadoop, you can change over time. There’s the philosophies at Amazon is if you make a decision can the decision be undone or can the decision be modified in the long run. How impactful is this decision and frankly the decisions that you make should be well informed but remember that this is a service and the service can be modified, configured and changed.
If you have a capability that you need to include that you didn’t think about in the beginning, you can include it later on so you’re not making, you’re not casting in concrete. You know exactly how things have to work for the next 5 years. It’s more viable. It’s changeable so how you manage, how you tag, how you assign resources, how you create functions in Azure and things like that. There is a lot of learning but experiment crawl, walk, run and have confidence that you’re doing the right thing.

Guido Oswald: Through the ad, I totally agree and this is, I think one of the great things in the cloud, you have the possibility to. I know much quicker test something, fail, try something new and if you stay on an open platform, open source, ideally open performance. You have the choice. You can try it with a different tool if one tool doesn’t work as you expect it go somewhere else so I think this is very important to stay on an open platform open source ideally and build fast, quickly. Make your learnings and go further if you want.

Matt Graves: Yeah. Again, don’t be afraid to reach out to experts. Databricks and Microsoft, they are super responsive and I’ll just one other thing, we accidentally deleted an important file and we couldn’t recover it. My own team and we called Microsoft and they were able to on the backend recover the file and it was a mistake. The message here is that even though something doesn’t seem it’s possible from recovery standpoint, error correction standpoint, just make sure to reach out to your partners here because they really are there to help. They’ll go the extra mile it if it’s possible to do it. So there were many times I know that my team was was thrashing out a problem and I had to say call Microsoft, call Databricks. Let’s get them on the phone. Let’s just do it right now and they were there so that doesn’t mean you’re outsourcing your thinking. It means you’re reaching out for help and sometimes that’s hard to do psychologically but I would encourage you to do that and leverage the resources because it’s faster time to value.

Guido Oswald: Absolutely and this is what our job so yeah you already said. Please everybody out there, call us if you need help. Actually before this recording. We did this recording. I was jumping off of a customer call. We did exactly a similar thing so helping a customer on a last minute emergency also. Yes this is our job so don’t hesitate to call us and sometimes I find it magic with the cloud providers are doing in there. You can’t see what they do and how they do it but I mean recovering these files or delivering the performance, the IO performance, when you add another instance and you get another the same amount of IO capacity all the time.
This is something that we all dreamed off back in the Hadoop days. This is magic. Yeah I get it. Hey Matt, thank you so much for this talk and for your valuable insights. I think you helped a lot of others that are beginning that journey now.

Matt Graves: Let me say one other thing.

Guido Oswald: Of course.

Matt Graves: So I just want to put in a plug here. We’re recruiting for a manager, senior manager of cloud data engineering so if there is anybody listening to this, that is interested in joining the team and interviewing with us, please reach out to me directly. It’s [email protected] and I’d love to hear from you.

Guido Oswald: absolutely. So hopefully there’s many viewers on our data and AI summit so again Matt thank you so much and thank you all for everybody out there for watching this and I wish you a nice rest of the summits and enjoy the rest of the sessions. Thanks Matt. Thanks everybody out there.

Matt Graves: Thank you. Bye bye.

Guido Oswald: Bye.

Guido Oswald

Guido is a Solutions Architect for Databricks in Switzerland, where he helps customers define and implement modern data and advanced analytics architectures. He is TOGAF certified, a graduate computer...
Read more

Matt Graves

Matt Graves

Matt is the VP of Enterprise Data & Analytics at GCI Communication Corp. An accomplished executive with consistently successful, diverse leadership in the telecom, high technology, payments, and finan...
Read more