Champions
of Data + AI

Data leaders powering data-driven innovation

EPISODE 22

Demokratisieren von Daten an vorderster Front

Thom Kenney, former CDO, U.S. Special Operations Command joins the Champions of Data + AI podcast to share his experiences leveraging data insights to transform the modern military. We’ll also explore how he and his team work in a mixed unclassified and classified environment, manage data that lives in multiple different classifications around the world, and empower the teams — the data scientists and data engineers — to solve tough challenges amid ambiguity, resourcing and time constraints. Last but not least, Thom shares his perspectives about data and AI architecture and system design to ensure the solutions are future-ready.

headshot
Thom Kenny
Former CDO, U.S. Special Operations Command

Experienced CEO and CTO with multiple successful exits, Army Reserve officer and combat veteran, board member and investor.

Read Interview

Chris D’Agostino:
So Thom, it’s great to have you here on Champions of Data + AI. Thanks for taking the time.

Thom Kenney:
Thanks for having me, Chris.

Chris D’Agostino:
So we met last July on my fateful flight back from Milan that almost crashed. You might remember.

Thom Kenney:
I do.

Chris D’Agostino:
And spent a great afternoon talking about the work that you were doing at SOCOM. You’ve had 25 years in the Defense Department, private sector, running your own business, working for the New York Times. You’ve been around data for a long, long time. Curious about, if you think back to the work that you’ve done at SOCOM, without obviously divulging any trade secrets, classified information, the whole nine yards, since you and I have both worked in that space, can you think of a couple of really exciting data and AI projects that you worked on that made a meaningful difference in the work that you were doing?

Thom Kenney:
I can, and that experience that I had at Special Operations Command was one of the best experiences of my career. It’s not only an honor to work with some of the most elite professionals in the world, but when the work that you do every single day has a tangible impact on what’s happening around the world, it’s just an incredibly rewarding experience, and it’s very humbling as well. As I think about that time that I spent at SOCOM, the couple of things that come right to mind are about how we really try to move the data and AI understanding forward faster inside of the organization.

Thom Kenney:
We’ve put together a couple of really interesting training programs, one that was with MIT and one that was with Carnegie Mellon, and they were specifically designed for the senior leaders inside of Special Operations Command to understand a little bit more about data and artificial intelligence with a bit of the vernacular, a bit of the way that we talk about things from a technical perspective to empower these senior leaders, and there were over 800 that we educated, to understand how this might apply to what they’re doing every day.

Thom Kenney:
The second, which was a really impactful opportunity for us, was we often work in an unclassified and classified environment, data that lives in multiple different classifications around the world. We wanted to be able to empower the teams, in particular the data scientists and data engineers, to use tools like R and Python to be able to build in the lowest possible classification that they could. Put a Jupyter Notebook, run some Python code with dummy data in an unclassified environment to get the baseline model code up and running. But then when you want to move that into a classified environment, there’s often a very difficult hurdle that you have to go through called an authority to operate. And when you’re moving an entire software package, that can take time because each of the packages have to be evaluated from a security perspective as you move between different classifications.

Thom Kenney:
What we were able to deliver was something that was really exciting because we got to a point where you can have all of the same R and Python packages to do data science, machine learning, neural networks in the unclassified environment and in the classified environment, and then all you had to do was move your snippet of Python code or move your Jupyter Notebook from one to the other. And you don’t have to worry about the additional security constraints of all of these open source packages you’re using because they’ve already been vetted in both of these domains. That opened up a huge opportunity inside of SOCOM to really push down the capability to the lowest echelon. Digital natives across SOCOM are now developing R and Python applications for their commands that are getting after real world problems every day. And, interestingly, it’s the lower enlisted folks or the junior officers that are embracing this the most.

Thom Kenney:
They’re digital natives. They understand the importance of data. They’re excited about AI. They’ve been using technology since they came out of the womb. And these are folks that are just embracing the capabilities that we can deliver. This is also the first of its kind in the Department of Defense, where you can move code very, very seamlessly between one environment and another. And I think a third aspect that’s been really important for us as we look to the longterm view of where SOCOM needs to be is really starting to build this muscle memory about data is the most important work that you’re going to do when you do AI work. The data that feeds a machine learning algorithm or a neural network is more important than the model, because you can have the best models in the world, but if you have really poor data, that model is going to help you make really poor strategic data driven decisions.

Chris D’Agostino:
Yeah. So let’s drill in on that a little bit, a couple of thoughts. You were kind enough to invite me down last summer and I spent a day with you and had the opportunity to talk directly with General Clark, and to that point of executive education, I was really struck by just how engaged he was, how up to speed he was in terms of the opportunity with AI, very open minded to, what could the command be doing differently in order to move more quickly and really deploy some of these capabilities into the battlefield? I have a background in the counter-terrorism space post-9/11 taking in data from lots of different organizations, and to your point about data classification level and needing to do more on what we would call back then the low side, the unclassified environment, and the big challenge there, of course, was if you’re taking and you’re combining sensitive data that’s classified, was gathered through classified means, you’ve taken open source data or unclassified data, how do you fuse these things together?

Chris D’Agostino:
And to your point, moving algorithms that could be developed on the low side with the data and moving those to the high side and getting them to run equally well and be as trustworthy as possible was really hard 20 years ago. And it’s come a long way. It sounds like you all have made some pretty good headway of late as well.

Thom Kenney:
We’ve made some good headway, but the problem isn’t solved, and part of the reason why the problem isn’t solved is we have so much data. We have so many databases and so many systems, some of which we don’t even have access to anymore, that have important data that we should use for the historical context of what it was used for to help inform future context for where we may go. A perfect example of this is, inside of my world, civil affairs, I deployed as a civil affairs officer to Afghanistan, I’ve deployed to Africa, and one of the things that was really frustrating from my perspective looking at the information that I needed to do my job out in the field was there were two different systems that were converting while I was doing one of my deployments. And it was moving from a system called Tiger to a system called Sydney.

Thom Kenney:
And those two systems didn’t talk to each other and there was no data that was shared between them, and there was no data dump from one system and data upload into another. And not only that, the user interfaces were so different that there was a lot of training that you had to do to get after that problem. So as we look to what you’re talking about with counterterrorism, we’re much farther ahead than we were 20 years ago, but one of the challenges that we’ve still got is there’s so much data that’s in these different systems that don’t necessarily talk to each other. So we talk inside of SOCOM with our vendors and internally about some really important aspects that working with data to get to AI is a requirement. One is we’ve got to be able to have API enabled capabilities, the application programming interfaces that allow two pieces of software to share data and move things from one system to another system seamlessly.

Thom Kenney:
The other is we’ve got to be platform agnostic, be able to take a containerized piece of code from one location to another location, whether that’s from one cloud to another cloud or whether that’s from garrison out to the tactical ledge where we’re actually doing the fighting. The ability to move this capability is important, and that ties into a really interesting aspect of this, which is, we for years spent time figuring out, how do we bring all the data to where the compute space is? But if you flip that on its head and you move the compute to where the data is, that’s a really, really interesting problem set.

Chris D’Agostino:
Yeah. I mean, so this is in keeping with the founders of Databricks and the creators of Apache Spark. The founders of Databricks created Apache Spark. It was really, how do you do distributed computing? How do you minimize the amount of data shuffling between nodes? How do you push the algorithms to the data, get a result set? And what’s interesting to me, having spent 20 plus years in the intelligence community, some time in industry with a top 10 bank, now at Databricks, and talking to lots of customers the intelligence community and the Department of Defense, systems were siloed on purpose, deliberately so. It was an access control mechanism. It was a data protection mechanism.

Chris D’Agostino:
And now that you’re getting into the desire to apply more data sets from across these different systems and, frankly, the contract vehicles that are established to enable the development of these siloed environments, there’s a whole lot of, I would say, headwinds that make it challenging. You’ve got commercial entities that we talk to quite a bit in industry where they’ve built systems that have been siloed not because they were trying to keep the data separated necessarily. They just designed things to be fit for purpose and they used that API approach to really enable data exchange. But what they found is they’ve got all these API indications in this monster network topography of systems and the data exchange is really inefficient. And so what they’ve started to do is, how do we move more data from these source systems into a single environment where we can coalesce it and start doing more analysis there? But it is really, I think to your point, that push and pull of, okay, when do we need to bring data sets together and when do we need to push algorithms out?

Thom Kenney:
I think you’re spot on with one of the challenges about the multitude of APIs, data in different locations, and how do we bring that together to do analysis? But one of the things that we face in the Department of Defense is, the ownership of that data doesn’t necessarily allow you to access that data, and part of that problem goes to more of an architectural or a technical challenge. If you think about the concept of zero trust, and President Biden a few months ago said, “I want to have the systems in the U.S. government adhere to zero trust, be cloud based, and for access to important information be multifactor authentication across the board.” Now, there are all kinds of implied tasks, an Army term about how to figure out how to solve a problem, that go into that.

Thom Kenney:
But as you look at where the data lives, one of the challenges that we’ve got is, if we talk about bringing data together in an easier way, using a little less API, we do have a couple of issues. One is, how do you keep that data live to the point that you can actually use it? The last time of value for data is really, really important. So if you’ve got a piece of data that you know within 24 hours is not going to be usable for you anymore, how do you create that ability to move that data into your centralized data repository to be able to do analytics? That’s where an API is really, really helpful being able to make sure that your data is refreshed. But another side of this too is taking a completely different approach to our identity and access management.

Thom Kenney:
I know the Department is working on some really interesting things when it comes to federated identity and access management, abandoning this idea of we’re only going to have one identity in access management and federated across the department. Let those access management systems talk to each other. But the other big part of zero trust that has a huge implication with data is, how do we get to attribute based access control? We’re used to saying, “We’ve got a server and on that server is a database, and you can have access to that database.” And then inside that code is how it manages, all right, you are this persona or you’re that persona, and you have access to this data or that data. As we think about building systems of the future to get to a point where we can both coalesce data for advanced analytics but also protect that data at the same time while ensuring that data is the most up to date data, we’re going to have to get after attribute based control.

Thom Kenney:
Because, if we can’t get to that point, we’re always going to be burdened with all of these little independent systems around the world, which are not going to be well managed and they’re going to open up security risks for us. So we’ve got to take both of those approaches.

Chris D’Agostino:
And it’s interesting, I think back to my days in government and I think of most people talk about the three Vs, the volume, the velocity, and the variety of data. We always added in veracity, a fourth V, that organizations don’t often talk about. It ties back into what you said earlier, which is the notion of an amazing model trained on poor quality data is really not going to be worthwhile. The other thing is, how trustworthy is the data? And then, when you think about attribute based access control, this notion of, yeah, it used to be you’re granted almost access to the whole database or not at all. It was almost at a system level in the past.

Chris D’Agostino:
And now I’m sure the Command and DOD writ large is thinking about, how do I give the war fighter access to data in the moment for situational awareness that he or she may not otherwise have access to? And so the attribute may be okay, this person is deployed, they’re in the middle of an operation, they’re in this geolocation, it’s this time of day, we need to give them this piece of information that otherwise they may not have access to if they’re sitting at their desk back in a main office. And just to make the teams and the war effort, the war fighting effort, more efficient and more effective.

Thom Kenney:
Well, this is where robotic process automation has got to be a part of the equation, because you’re absolutely right, this war fighter in this location with this mission needs this information and needs it today. But if we have to go through a 14 step approval process at 19 levels of echelon, and it takes a week to get that approval, that is never going to work for the war fighter that’s at the tactical edge that needs it today. So when I talk about robotic process automation, we talk a lot about that inside of SOCOM, about how it may not be the most exciting thing. But when I was there at SOCOMM, we did talk a lot about, how could we improve and automate some of these things that really should go faster?

Chris D’Agostino:
I want to talk a little bit about, you said the time to last value, and help the viewers understand what you mean by that specifically and why that’s so relevant in the space that you’re in. Or if you can draw an analogy to a non-DOD space where that information, what comes to mind for me is self-driving vehicles and all the sensors and telemetry information they get, and the arrival of that data will dictate how the car responds.

Thom Kenney:
I can give two pretty simple examples I think that folks will easily relate to. The first one is from your counterterrorism background. When you have information about a particular individual that you may be targeting, if the last known position of that person was three days ago, that information is useless. That information was useful three days ago when that person was in that location. But that last value may only be an hour or two hours when we have to do an operation where we know someone may be somewhere. On the civilian side, think of it from a very easy use case. As an airline pilot, I need to know if anyone is still on the runway. So time to last value in an airport where it may be leveraging intelligence inside of the airport operations, now you’re measuring the importance of time to last value in seconds or even milliseconds for information like that.

Thom Kenney:
So those are a couple of examples where time to last value can be a little bit longer of a time. Or even for say personnel, time to last value where your assignment is, it doesn’t have to change until the next time that you move. But in a kinetic operation, maybe you’ve got a smaller window but not as small as maybe airport operations, for example.

Chris D’Agostino:
Well, airport operations sits close to home. I have my pilot’s license and I wondered when we took off from Milan last July why they didn’t have the most up to date weather on the storm going through Milan and the hail. The size of the hail that hit the aircraft cracked the windscreen, took out the nose cone, damaged the left engine, and we had to declare an emergency landing. We had to dump fuel. It was a mess, and I thought, “Why in the world would we take off and certainly vector through the heart of that storm when it was otherwise a fairly clear day?” So two really good examples.

Chris D’Agostino:
I want to talk a little about architectures. Databricks has been really advocating for this concept of a lake house, which is to combine the things that are great about a data lake in terms of semi-structured, unstructured, structured data being all in one place using low cost object stores from the major cloud providers and the concept of the data warehouse, where you start adding the governance and the usability of the data and the creation of data assets. Talk to me a little bit about how you feel the lake house paradigm is either good or not good, if you believe it, for the AI journey that a lot of organizations are on.

Thom Kenney:
I personally love the lake house paradigm and for a couple of reasons. One is, you were talking earlier about all the different kinds of disparate data. With the amount of data that’s going to be flowing in the next five, 10, 20 years that information’s going to be coming from datasets, it’s going to be coming from IoT devices, it’s going to come from JSON files, it’s going to be all over the board, and being able to move the compute closer to where that data is in a construct that’s better governed I think is going to be absolutely huge. So if I think about one of the most interesting aspects of the lake house is if you think about the data warehouse and the governance and the data lake, one of the pejorative things that we’ve seen with data lakes in industry and in the government is that data lakes become data swamps because they don’t have any of that governance. They don’t have a lot of overlap.

Thom Kenney:
So all of this massive amount of data is just getting dumped into a lake without really an understanding of, what could we do with the data. How do we tag it? How do we use metadata to understand it? How does that data interface? So the lake house concept elevates that problem to a point where you’ve got better management over the data flows that are going into your lake house that then give you an even easier time for your data engineers and your data scientists to know, what’s the data that I have? How is it being leveraged? How is it being updated? And it moves the organization forward a lot faster.

Chris D’Agostino:
Let’s shift gears a little bit. We’re running out of time so I want to talk to you about a couple other items. So let’s do some predictions here. You’ve been around big data for a while. I have as well. You’ve been around it in organizations that it’s mission critical and life saving. You’ve been around organizations where it’s driving revenue and decreasing costs. So you’ve seen both sides at petabyte scale. If you look five to 10 years out, what do you see as the key data challenges for organizations that want to make sense of this massive amount of data that’s being generated and collected?

Thom Kenney:
One of the things that we’ve talked about at SOCOM when we think about algorithms is the ability to affect an algorithm and the algorithm’s output without changing the algorithm. And the reason I say that in relation to where we’re going to be with data in five to 10 years is we understand holistically as a community, as a data community, that the data that you push into a system that’s run by an algorithm that produces an output, maybe for strategic decision making, you may be pushing good data, but you may be pushing too much good data relative to other data that you have. So when you think about machine learning and the law of averages, the no free lunch theorem in machine learning says, “No algorithm will perform better than any other algorithm over an infinite number of iterations.”

Thom Kenney:
So as you think about that, we’ve got to take into account that you can also change the outcome of all of those different algorithms based on the data that you’re pushing in. So, for example, if you think about data sets that are weighted in a particular way, if you flood in a whole bunch of data inside of that algorithm, knowing where it’s weighted, you’re going to impact that algorithm. If you think of how bots are being used today, and a lot of Elon Musk’s commentary about his concern about Twitter is the number of bots that are driving information to users, it’s the same type of paradigm. So as I think about five to 10 years out on the challenge, I think one of the challenges is going to be, we will have so much data and virtually all of it can be authoritative, but we may not still be able to make good decisions because the weighing of the data is not tuned properly to the data that we have.

Thom Kenney:
That’s an evolutionary step for us when we think about data and AI. Right now, today, we’re worried about, I just need really good data and I need to trust that data. The next step in this evolution in the next five to 10 years is going to be, all right, but I also need to understand how this data is weighted, where the importance of this data lies as it affects other pieces of data, and how that informs us for strategic decision making.

Chris D’Agostino:
Cool. Yeah, love it. So you’ve had a great career so far, a lot of senior positions, want to talk to you about what we can do as leaders in data and AI, what type of advice you would give people aspiring for a career similar to yours. What are some things? We’ve heard people suggest, always go to the problem project and be willing to go in and help fix a failing project and make a name for yourself inside of an organization, the work ethic component. We’ve heard people talk about what educational recommendations they’d have. What comes to mind for you when you think about recommending a career in the space that you’re in?

Thom Kenney:
I think one of the most undervalued talents for any technologist is their ability to communicate. When you think about data and you think about AI, you can go work on that hardest problem, you can go read all of these books, you can have your great work ethic, but at the end of the day, if you can’t communicate, it’s not going to help you advance in your career. And the reason I think this is so important is that a data scientist, a machine learning engineer, those folks are working on the technical aspects of the problem. If they have an ability to communicate with a product manager really well, communicate with an end user really well, that is only going to massively improve their ability to develop those solutions that people need to use every day.

Thom Kenney:
The isolation idea of the coder in the back room was great 20 years ago, and there’s lots of memes and movies about it, but today the most important data and AI folks that we’re going to see in the next five to 10 years, next 20 years, are going to be those folks that are also excellent personal communicators to be able to get at the root of the problem, identify where there are gaps, and be able to deliver something in a way that also allows them to take that constructive feedback to improve what they’re doing every single day.

Chris D’Agostino:
Yeah. I think it’s a great observation. I was going to say, and you brought it up, was the coder in the back room. I remember working on programs where I was one of the developers and we would joke, just push food under the door, leave us alone. We didn’t want to talk to anyone except to huddle and have our scrums, and things like that. But that data scientist, that software engineer or that person working with data at a technology level is now up in the front office and having to explain the work and educate business stakeholders that oftentimes are less technical. And so being able to communicate in such a way that they can translate some of the technical jargon into things that the business stakeholders understand, and, likewise, be able to understand the motivation from the business or the mission side as to why they’re doing things.

Chris D’Agostino:
I mean, oftentimes, we were building software, and we were a bit divorced from, okay, when it actually got deployed and was used, that wasn’t really our thing. We were taking the next set of features that we needed to build. And so it’s really I think that communication is critical, understanding both sides of it. Well, we’re out of time. I want to thank you, Thom, for being on Champions of Data + AI. It’s great to see you again and hope to see you soon in person.

Thom Kenney:
Thanks, Chris. Really appreciated the time today.