Location Intelligence Unlocks Insights Enabling Data-driven Decisions with Databricks & Precisely

May 27, 2021 11:00 AM (PT)

Increasing availability of location-based data and the growing capabilities of AI/ML provide an optimal opportunity for companies using Databricks to capitalize on location-based data science for a competitive edge. According to a Willis Towers Watson survey, 60% of companies are targeting AI/ML capabilities in 2021 to address IT and organizational bottlenecks, such as data infrastructure, to better analyze data when evaluating risk models and reducing manual input. 


Yet many companies have work to do in unlocking value from their data. To make sense of the volumes of business data, location provides a consistent and common thread to connect data across an organization. Using location, companies organize and manage data in a way that moves them to contextualized knowledge, automation, and better decision-making at all levels.


 Learn how clients are leveraging advanced analytics and enrichment solutions to: 

  • Simplify the complexity of location data and transform it into valuable insights
  • Enrich data with thousands of attributes for better, more accurate analytical models, such as AI and ML technologies 
  • Enable real-time answers when integrating geospatial data in business processes while leveraging the power of Databricks
  • Enhance customer-facing and operational tasks to create more meaningful and timely customer interactions
In this session watch:
Tim McKenzie, Business Solution Architect – Cloud Solutions, Precisely



Tim McKenzie: Hello everyone. My name is Tim McKenzie, and I lead the Cloud Native Solutions Team at Precisely. And thank you for joining today. So you may be wondering a bit about who is Precisely. So I’ll take a couple of seconds or a couple of minutes and introduce Precisely. And Precisely is all about data, and if we think about the world today, the world today, it is driven by data. However, this study, and I’m sure you’ve heard it in your own organization and would confirm this yourself is that leadership of most companies are really concerned about the quality of the data that they’re basing their decisions on. And so it’s tricky spot, right? Because we’re making more decisions from a data-driven perspective yet we still have continued questions about the quality of the data. So that brings us to the concept of data integrity, which is what Precisely it’s all about.
So data integrity is ensuring as best we can accuracy, consistency, and context of data so that the right folks are using the right data and they can trust that data when they make decisions about their business. And we view data integrity really from four different perspectives and the solutions that we deliver as a company land in these four different areas. One is data integration. So do you have the capability to connect to and pull the data that’s necessary to make any decision that you’re making within your business? Once you pull that data in, are you able to ensure that the data is of sufficient quality to begin to make decisions? Then we see a growing the need and a growing interest from our clients to bring in a location perspective. And we’ll talk about why location is becoming a growing relevant part of data analytics. And lastly, we’re seeing that customers are wanting not only to take advantage of the data that they’ve been able to procure in-house, but they want to enrich that data with additional data attribution from other third-party sources.
Precisely, we’ve been doing this for quite a long time, and maybe not under the brand name of Precisely, which was a newly branded name that was released last May of 2020, right in the middle of the pandemic, perfect timing. But the components that make up the brand of Precisely have been around for a long time and they are well-recognized in their space. You can see that we hold leadership positions and a significant portion of the data integrity matching data integrity solutions suite with data integration, pushing the envelope in terms of new innovations and new solutions that deliver meaningful products into the marketplace. So we think about data within the context, obviously, of data integrity, but also within the context of your own business. So thinking about how do you use data today? How do you take advantage of it?
And you can probably go through a long list of different ways that you use data. But one thing that I think you’ll bring up as a common theme is the fact that data itself is exploding, you have access to more data sources. Companies are able to collect more data, companies because of the reduced cost of storage, they’re able to store more data. And then no doubt there’s additional elements within our society today that are producing data more than ever before. So with all of that data, we come really to a choke point, and that is, can we as a company recognize the availability of more data sources and bigger amounts of data, and can we take advantage of that and turn it into some sort of competitive benefits so that we can produce better products or produce better interactions with our customers or make better insights about our pricing models or go to market strategies.
Can we leverage this data to gain a competitive edge in the marketplace? On the other side, we have companies who are feeling the pressure. The pressure of companies who’ve been able to figure out how to take advantage of data and turn it into an advantage. And so really when you think about the platform, the Databricks platform, the Databricks platform is growing I think because it meets both of these types of users. The users who are looking for new, bigger, better, faster ways to take advantage of data and to capitalize on the learnings that they’ve already made about their data and the insights that they can gain. And we’re also seeing that Databricks users are coming along on the other side, which they’re wanting to leverage a platform that is big enough, consistent enough, easy enough to use that we can bring data to a central place and take action on a wide variety of platforms and applications within your business.
All this saying that with the advent of the cloud, with the advent of platforms like Databricks, the demand for analytics is growing. So companies are expecting bigger insights, better insights, and faster insights. Along with that, we’re seeing a growing list of industries begin to recognize the value of location when they consider data analytics and consider the types of data that they might bring into a decision-making process. And we’re going to spend most of our time today talking about how Precisely enables location inside of Databricks to help companies gain a competitive edge. So is this slide, I’m not going to read this whole slide, but a significant portion of enterprises. And it depends on the industry for sure, but more and more industries are recognizing that location can play a significant role in the way that they interact with their customers, the way that they understand what their customers may need, and the way that they may deliver services to the marketplace.
And you can go across an industry from product managers to marketing, to R and D and labs, to operations, to even executive managers. So executive managers want to see a map, want to see where their customers are, where a particular risk is. Everyone else wants to see data in a dashboard or some sort of analytics response, or on a score associated to customers. But the value of location is that it can be a common thread that allows you to connect data across your organization and bring a new perspective to that data. We work with a lot of different industries and help a lot of our customers enable geospatial and location data as part of their business decisions. Just to take a couple here. So in the telecom space, we all are familiar with the race to deliver 5G, and to deliver faster network connections for every place across the world is a thing for sure.
And telecoms need to understand where their customers, where devices that connect to the network are located, and what experience they would expect from that connection. And they need to understand that so that they can then build a network that’s able to deliver the signal that their customers expect. We’ll talk about insurance in a minute. Looking at real estate, the whole property tech space is growing dramatically. And the types of insights that prop-tech companies are delivering to us as consumers is dramatic and it crosses all parts of it. I just picked out one, which is to me, one of the more interesting, and that is the growing number of apps and technology companies that are out there to help you better understand what you might want in a house, and then go help you decide which house is best for you.
Banking, all kinds of use cases in banking, a big one though these days is, let’s understand where this transaction is coming from. Let’s use that insight to either make sure that we as a bank are able to support that transaction from a location perspective, or by understanding location are we able to better agree or flag a particular transaction is fraudulent, and then retail. Retail use cases also are ever-expanding, from site selection of physical stores to delivery options and enabling delivery of product to pulling people in and understanding a population around one store versus another to make sure that you have the right product mix in place and healthcare list could go on and on here too. But overarching theme within healthcare is, do I have adequate network coverage to provide sufficient care to my constituents? And then lots of use cases fall off of that around the availability of care, the propensity of different types of disease by location and list could go on and on.
The challenge, and the challenge is not just with location data, obviously, but when we think about data science in general, we have challenges with prepping the data for use. I do think that location data might contribute to this challenge more so than some other data sets, and the reasons are many but location data in and of itself is just messy stuff sometimes. So if you think about, you start with just an address. Addresses here in the US they change all the time. Zipcodes change, a new city pops up, a new variation of a city pops up, we like to rename roads. They may be one thing one day, and we may have a second variation of that street address another day, when we skip around the world we see in every country virtually a different format, a different way of representing addresses.
And that’s just for the address. Then we start to think of all the location-based things that folks tend to want to have associated to that address, and shapes on a map, lines on a map, distance to a location, latitudes, and longitudes. All those are sometimes messy things that data scientists and folks in an analytics space may not find incredibly intuitive. Most data scientists, most analysts are used to rows and columns, not shapes on a map and this presents a challenge. Additionally, as our customers are looking for a third-party data to add or enhance their understanding of a given location, they’re finding it challenging to find robust and consistent data sources to be able to bring in so that data scientists can have the added attribution to apply to a model to get new insights. And then the reality is location changes all the time, properties change, buildings go up, buildings go down, businesses move in, businesses move out.
And then on top of that, just the reality of the fact that when we’re trying to perform spatial processes, it is typically computationally intense. So we have to apply a lot of compute power in order to get an answer. We have Precisely over the last seven or eight years have developed, I think a new set of strategies that ultimately enable business to take advantage of location without necessarily having to deal with nearly as much of the messiness of location, and that the strategy really has three pillars to it. The first being the strategy of organize. So having a solution in place that enables us to take in any location, whether it’s an address or a latitude and longitude, or the name of a business, and organize that piece of input data to what we call a trusted ID. And a trusted ID is an ID that is unique to that given location, meaning that there’s not a second ID that points back to the same location, meaning that that single ID doesn’t point to two locations and that it’s persistent.
So recognizing that address elements may change, that a lessee in a building may change, no matter what will ensure that the ID associated to that building and associated with that location is persistent. And so by having a trust around unique and persistent IDs for every location, that enables our clients to begin to stack other sets of data, other viewpoints about that location against this trusted ID, so that we can start to begin the process of converting messy spatial data into rows and columns of data that worked nicely and analytics platforms like Databricks. So the second part of our strategy is around enrich. And this part of our strategy really has two pieces. One is as a company, we provide a little over 10,000 attributes for every known location in the US. And all of these attributes are not delivered as shapes on a map or lines on a map, but as rows and columns in a spreadsheet or in a CSV file.
So we deliver all of our data sets organized against the universe of trusted IDs, and then as columns of data to gain a different viewpoint about each given location. And from here, once we build this harmonized area where we can organize data from a trusted ID perspective, enrich it with thousands of attributes to help our customers gain more insights into a location. We provide a series of analytic tools, which we’ll show in a minute that run inside of the Databricks platform. So what might this look like? We’re going to for the rest of this discussion, focus on the insurance industry, just because it makes for a clean, easy story that most of us understand. And so, we’ll use the next couple of slides and then ultimately a demonstration of our products running in Databricks to finish out the story around enabling location within an analytics platform like Databricks.
So looking at this hex grid circle graph thing, not sure what to call it. At the center point, you see the Precisely ID. So this is the term that we use to identify uniquely, every known location in the US and many other countries. That inner set of hexagon grids represents the categories of data that we deliver already pre-computed against Precisely IDs so that you can actually purchase these types of data from us as a company. So things like deep descriptions of a property. So in a insurance use case, so understanding the number of buildings on a property, understanding the type of construction in those buildings, understanding the type of equipment associated to those buildings, to things like hazard and risk data. Is this address in a flood plain, is it in a wildfire zone? Two demographics to better understand and describe the population that may live at a given place.
Two things that didn’t make the list here, but commercial listing. So understanding the businesses within a building, so you can understand the lessees inside of a high-rise or a shopping center to recognize whether or not there is some sort of risk associated to the type of business that they do. But in conjunction with the data that we deliver as a company is the opportunity to begin to connect data from other sources. So if we think about connected devices or IoT, we’re now in a world where insurance companies are beginning to provide, for example, auto insurance, based upon how you drive and where you drive. So being able to ingest a mobile trace data from an automobile into this environment where you can enrich under location with details about any given area that that vehicle may travel, to video or aerial imagery.
So we’ve all seen floods that totally cover houses. So, but being able to take in a satellite image after a flood and organize that satellite image even though we can’t see any of the houses be able to place Precisely IDs with parcel shapes and with building shapes on top of that aerial image, so if we’re an insurance company, we can actually see the buildings and policies that were impacted by that weather event. And again, that’s the next one, so taking in weather events. But then, being able to organize that data to data that an insurance company might have in house, which would be policies in force that are active, or it could be historical policies or claims or a rating or scores rated upon the concentration of risk. So that’s how many policies are within a small given geography so that we can understand if a big catastrophic event happened, how many policies would actually have risks.
We flip this over and I grabbed this slide from a webinar that Databricks gave not too long ago for specifically tied to the insurance industry. And in this slide, we talk about some of those same types of data sources on the left. So we see connected devices, we see video and images. We could certainly could have put social on the other slide as well, but the Databricks solution lends itself nicely to enabling clients to absorb data or enrich or ingest data, sorry, from multiple different sources into the data lakehouse inside Databricks, to then ultimately be able to deliver on a set of use cases that you see on the right.
The challenge oftentimes is if we’re ingesting all this data, can we organize it in a clean and unique way? And the answer is yes. With our solutions we’re able to leverage our geocoding and spatial processing running natively inside of Databricks to bring that nicely organized hex schematic into play in the lakehouse, so that in this case, an insurance analytics person would be able to perform data science with ease against data sets that are enriched not only with the data we described from. It could be delivered from Precisely, but also from all those source datasets that are used in this slide. So what does this look like? Well, it looks like data enrichment that’s delivered intentionally to provide business insights. So when we get to this harmonized place, where we’re feeding all these interesting sources into the lakehouse and we’re able to organize the data from a location perspective, with the Precisely capabilities and enrich data with additional Precisely attributes, we find ourselves in a place where we can build automation and build insights through the Databricks platform to deliver business insights.
This is a really high-level view of what this looks like, but ultimately, the Precisely solution lives and breathes inside of Databricks. It operates as elastic and a femoral processes that work within the constraints of a Databricks notebook. So our products would spin up with the instance of Databricks. We can perform our functions, execute all the pieces of the Databricks process that you may be wanting to initiate. And then we would shut down along with that Databricks cluster. So the idea here is that we’ve made geocoding and spatial enrichment an easy part of the Databricks ecosystem so that we can ultimately execute on use cases. So I’d like to finish up from here by moving over to an instance of Databricks and actually doing a demonstration at a high level of a potential insurance use case so that we can see the products from Precisely and action on top of Databricks delivering a business use case for an insurance customer.
This particular demo is based upon a make-believe insurance company, Summit Insurance Company, but a very real-life storm. And the storm is Tropical Storm Zeta. It came in and hit the coast of Louisiana and Mississippi. We can see it down here, I think somewhere in October of 2020, it was one of many storms that year, right? Last year, it seemed like they just kept stacking up on top of each other. So the use case here for Summit Insurance Company is that they know that Tropical Storm Zeta has hit the US and that they most likely will have some claims coming in that they will need to pay out in response to the storm. And the challenge is how to, number one, predict the potential impact with some degree of accuracy of the storm on their P and L, number one. Number two, quickly identify the customers that are most likely impacted by the storm, and will probably file a claim, and make sure that they’re super happy.
But then three, provide adequate protection and adequate governance to ensure that claims don’t slip through that someone that may have lost the roof a month ago, suddenly files a claim and claims it’s Hurricane Zeta blew their roof off just yesterday. So, with that context in mind, let’s walk through this demo and see how leveraging location and Precisely geocoding and spatial processes along with a few of our datasets can enable Summit Insurance to handle the storm a little bit better. So we see the storm path, we’ve seen things like this, I’m sure all of us on the weather channel or whatever news you watch, where we have it starts with a predicted storm path, but it ends with something like this, which is the actual storm path, the actual path that the storm took.
The challenge is to understand what addresses were actually impacted by the storm. So in the second step, we actually use one of our datasets, which we call the address fabric. In the US the address fabric is intended as an all-encompassing list of addresses that covers the entire US. We’re at about 207 million addresses right now, they’re all organized by Precisely ID, or in this case, the column is labeled PB key, and we’re taking that data set of the entire known universe of addresses in the US, and then we’re trying to match it up with the storm path. And so, we’re using spatial processes from Precisely running natively inside of Databricks to basically grab all the known addresses that fell within this hurricane or a tropical storm path. And then we decided let’s just limit it for now to Louisiana and Mississippi because we think that was the area that was most likely impacted.
So, after we identified all the potential addresses that fell within the path, the next step was to, ingest the policies in force file from Summit into this Databricks process. We could certainly use some Precisely solutions to go out to say their mainframe application, pull out their last active policies in force file probably from last night, and ingest it into this Databricks cluster. From there we would execute a process that we call geocoding, to take the addresses of that entire file of policies in force, and organize, and one, validate the address, two, apply a latitude and longitude, and three, assign our Precisely ID. And then in this Summit policies with Zeta impact zone command, all we did here is we took a join of the policies in force in Mississippi and Louisiana for Summit Insurance Company, and join them to this bigger universe of addresses that we know were impacted by Tropical Storm Zeta.
So now we have a count of a little over 10,000 addresses in Louisiana and a little over 4,000 addresses in Mississippi, where that we’re in the line or in the path of this tropical storm. Now, Summit is a insurance company that wants to be smart about what they do. So they want a little bit more information about these potential customers, other than the fact that they’d felt a little rain or felt a little wind thanks to Tropical Storm Zeta. So they began a process of enriching these records with data from Precisely. In this case, they enrich with a distance-to-coast calculation. So every one of these records, we identified how far each of those homes were from a coastline, whether that coastline could be the Gulf, could be Mississippi River, could be any known large waterway that could potentially produce flooding as a result of this storm.
And then we also captured elevation. So understanding where this particular property stood in relation to sea level, grab a few other fields in terms of residential or business, and a few other things that we thought might be useful as we went through this process. The next thing we did though was take at least these two elements, distance to coast and elevation, and leverage Databricks to create a score on all of these records. And the score basically is, the lower the score, the higher the likelihood that this particular address was impacted by Tropical Storm Zeta. And you can see that of those, what was it? About 15,000 policies in force in Mississippi and Louisiana, 74% of them were not impacted by the storm. And we can see, there was a few other or almost assuredly not impacted by the storm because the score is so high, but we do have roughly 10% of the policies in force that have a score under 50, which would indicate a high likelihood that these policies were impacted by the storm.
So from here, Summit’s able to extract that 10% of the policies and create a new file that gets passed to their financial disbursement group to immediately cut a check, probably for some, at least a nominal amount, so that these policies that were most likely impacted by the storm immediately get funding to help get them back on their feet. Along with that, we’d probably send the same file over to the claims department so that we could make sure that we have claims adjusters moving towards the locations where these most likely impacted policyholders live so that we can react to the storm as quickly as possible. On the other side, taking those addresses at the high end of the scoring, we can flag those addresses as well, send those to the claims department as a second file that gives that alerts claims that, “Hey, these addresses were most likely not impacted by Tropical Storm Zeta.”
Therefore, if they file a claim within the next number of days, let’s flag those claims for a little deeper investigation before we cut any checks or go too far in the claims process. The goal of all of this, and this is a very simplified example of what could be a much more complex solution, but the message here and the idea is that we can leverage the power of Databricks to ingest lots of different kinds of data. We can then make use of the power of Precisely to organize that data, to put it into a structure that is rows and columns of data, which we can then perform easy analytics against to ultimately get us to making some better decisions on behalf of both Summit Insurance Company, as well as the policyholders that were impacted by this event. Again, one of many examples, but with this, we wanted to finish our session and say, thank you for joining. If you have any questions from us or for us, please join us at our booth, our virtual booth. And we’d love to see you there. Thank you.

Tim McKenzie

Tim McKenzie and his team are leading our innovation efforts enabling our customers to take advantage of new cloud technologies and unlock value hidden in massive amounts of business data. With over 2...
Read more