Add Historical Analysis of Operational Data with Easy Configurations in Fivetran Automated Data Integration

Download Slides

Fivetran makes it easy to automate data ingestion particularly for operational data sources such as Salesforce, Zendesk, and Oracle Eloqua, no matter how source schemas and APIs change. Achieving historical analysis is cumbersome, time-consuming, and costly to build and maintain manually. A common approach is to include snapshots, which only take into account changes at a given time. Plus, the additional storage requirements can become unwieldy to manage. Type 2 Slowly Changing Dimension (SCD) allows you to track any change at any point in time. This session shows how Fivetran History Mode, which uses Type 2 SCD, can be easily configured and then switched on with 1-click and synchronized for a desired time period. This accelerates time to insights, making it easy to both automate data ingestion and historical analysis.

Speaker: Elesh Mistry


– [Elesh] Hi, and welcome to a video exploring how to add historical analysis of operational data with Fivetran Automated Data Integration. Okay, so introducing myself, first of all, I’m Elesh Mistry, and I’m a senior sales engineer here at Fivetran. So the agenda for this quick demonstration, is to go through what Fivetran is first of all, a setup of Fivetran, the challenge of historical analysis, and how to solve it using five Fivetran’s History Mode. Fivetran is essentially a fully managed automatic data integration platform, from which you can take data from a variety of sources, including applications, databases, events, files, and cloud functions, and pull them together into one single source of truth in your data warehouse, Databricks. Once you’ve got that data into Databricks, then you can run a machine learning, you can run artificial intelligence algorithms over that data to really give you that analytics of exactly the data from all those various sources. What Fivetran’s bringing to the table is the ability to have automatic data updates, as well as bringing that historical data, you can bring across incremental data, automatic schema migrations. These will keep your integrations alive, essentially, because with every incremental sync, what five Fivetran’s doing, it’s comparing your metadata between your source and target, and basically keeping your integration alive, because it’s making the changes on your warehouse, so your data can keep flowing. Automatic failure recovery, that’s all about, if your integration dies at a certain point, and you replayed integration, Fivetran is deduping the data, such that you’re not getting incorrect results in your data warehouse. The concept of idempotency. A micro-batched architecture. So you’re able to take that incremental syncs of data, to remove the need for nightly batch runs to augment your data warehouse data. You can do it throughout the day in smaller batches, such that you’ve got refreshed data more often. And extensible. So we’ve got over 180 native connectors from applications’ databases, but you can actually extend that with our cloud functions, which come in all three flavors of cloud function. The modern data stack, and this is where kind of ELT really comes to form. The ability to move away from that legacy way of working, with the event of the modern data warehouse, you now can afford to bring back all your data from all your sources into one single source of truth, and not have to manage, maintain the burden of multi integrations, which are brittle and liable to break. So as well as buying that initial software, with traditional ETL, you have to maintain it and be subject to very brittle integrations as well. So what Fivetran’s bringing to the table is a very straightforward way of bringing that data from all those sources and delivering it into Databricks. And once it’s there, with Fivetran, you’re able to refresh that data as and when you need it. So we’ll go through this in the demonstration, but what Fivetran does, is that you can actually choose your destination, whether it be Databricks on AWS or Azure, and subsequently, you can actually configure your Fivetran dashboard to bring in data from various sources. Fivetran’s History Mode essentially delivers Slowly Changing Dimensions Type 2 in a very straightforward way. When you’re bringing that historical data from sources, including APIs and databases, you can switch on history mode with a slide of a toggle, and essentially what it’s going to do, it’s gonna store the history of changes for a particular set of roles in your source, and subsequently, you can deliver them to your data warehouse, so you can analyze that historical data over time. And remember, it doesn’t have to be on permanently. You can do it from a particular in time as well. So you can analyze that data for a particular event, or a particular set of dates. And the use cases for history mode are tracking changes to opportunities over a period of time to gain an in depth understanding of your sales cycle. And remember, once you’ve actually created that model to understand those changes, you can actually change the model and replay over that data again and again to refine the model. Again, with account growth, tracking changes to plans and usage over time. So you have information on whether an account is either growing or reducing in usage. And the history of growth as well, to gain an understanding of bookings and cancellations. Prior to history mode, you’d actually lose this data. If a booking went through a lifecycle, so, say, for example, it went through, had an initial value of 10, and then went to a value of 20, and then was canceled, you’d lose that history of changes. And with Type 2 Slowly Changing Dimensions, you can pick and choose which dimensions you want to store and analyze over time. So spending time to insight with the modern data stack. So what the modern data stack really brings to the table, is just a really straightforward way of ingesting data using Fivetran, into a modern data warehouse like Databricks. And then, subsequently, once you’ve got that data in there, you can run DBT over that data. And with the Fivetran’s pre-built packages, you’re gaining an element of those transformations that you’d typically want to run on that data as it’s landed. And with history mode as well, you can actually further analyze that data, which is in your data warehouse, and set your synchronization frequency, such that your data’s being refreshed as an when you want it. So let’s go through. Let’s introduce you to Fivetran, and let’s go through a demonstration of historical analysis. Okay. So I’m just heading over to my Fivetran dashboard here. And what you’ll see is that I’ve got a few connectors already set up, one from SQL server, one from Salesforce, and I’ve got the Fivetran log connector in there as well. And for this particular Fivetran group, I’ve got my destination, which is Databricks on an AWS warehouse. Now let’s head over to Databricks, so you can see this data. So if we go to my data here, it will filter my databases. If I go to this particular example, SCD history, and my lead table. It’s just refreshing. And you can see the sample data here. And it’s just some sample data from my example Salesforce instance here. Okay, so head back to the Fivetran dashboard. I’ve also got my developer account here. So I’ve got my developer Salesforce account. And as you can see, it’s got some leads in there. So let’s go to today’s leads, and let’s go to all users, and you can see I’ve got some example leads in my Salesforce instance. So what I’m going to do, first of all, is head back to Fivetran, and I’m gonna create a connection to my Salesforce instance. So if we, first of all, add a connector, just search for Salesforce. And this is actually the workflow to add all our connectors. Webinar. Authorize that connection. And what it’s gonna do, because I’ve authenticated to Salesforce in my browser, it’s gonna pick up that authentication using all of two. It’s gonna connect to the API, and with the authentication details I’ve provided, it’s gonna connect to that Salesforce API, and test that I’ve got authentication to which parts of the schema, and it’s gonna deliver them back in my connector. Okay, so now if we go to view connector, it allows us to review the schema, and this brings back all the Salesforce objects which I’ve got access to. And you can see that I’ve got access to a number of them. Salesforce has got a massive number of objects, and I’ve also got access to all my custom objects as well. So now I’m gonna save and continue, and then I’m gonna start my initial sync. So within that 32 second element of the demo, what I’ve done, is that I’ve connected to my Salesforce instance, I’ve selected all the objects that I’ve got access to, and subsequently, what I’ve done is that I’ve started my integration to bring back all that data, so it’s starting something called the historical sync. After that, I can go to my setup tab, and it’s gonna go back to Salesforce once the historical sync is completed, and collect any deltas of data every six hours. So it’s gonna refresh my Salesforce data in my warehouse from source every six hours. But I can pull that right down back to five minutes. So you can refresh that data every five minutes in a really straightforward way. And that’s my integration built. And this is the real value of Fivetran. So if we go back to Fivetran here, and let’s go to the docs. Let’s go to Salesforce. What is Fivetran doing in terms of delivering that data? Well, what Fivetran is doing, is that it’s not looking at a object by object basis from the source. It’s taking the entirety of the objects from Salesforce, and it’s actually delivering them into your data warehouse. So what Fivetran’s doing is that, with a traditional integration developer, the value, the quality of the integration would depend on the knowledge of the integration developer. We’re taking that variable away from the integration. We have experts at the Salesforce API and for all our connectors. And we’re bringing that value to our customers, such that we’re delivering a normalized schema into your data warehouse, picking off the entire Salesforce API in this instance. And we’re delivering that in your warehouse. And from there, we deliver documentation, an ERD, and you can just run over this data using simple SQL or DBT. And we actually have pre-built DBT packages, which will allow you to run over the data as well, and really gain that analysis view in a really straightforward way. So now, if we head back to Fivetran, you can see that the integration is running. If we go down to an hour, it started the extract process of the load phases. And what it’s gonna when it finishes, it’s gonna actually land the entire entity relationship diagram in your data warehouse, from which you can report from in a straightforward way. Now, if you expand the number of connectors, so if you, in the same way, you can add connectors to applications, databases, events, and cloud functions, you can take all that data, land in your data warehouse in a right, in a very straightforward way, and then, subsequently, reports on it. So let’s head over to Databricks. So you can actually see I’ve got a database called Salesforce SCD History. This is an integration that I’ve previously built in Fivetran. So if you go to Fivetran, you can see one of my previous Salesforce connectors was called Salesforce SCD history. And this is just to show you one element of that history mode. So if we now head over to the lead table, and do a refresh, what we’ll see in the sample data here is that we’ve got 20 rows of data. And if we go down to the bottom, we’ll see that we’ve got Elesh Mistry, with the same primary key, has got free entries. Yeah. If we head over to Salesforce, you can see that Elesh Mistry, at the moment, the latest lead with Elesh Mistry, is actually Elesh Mistry two. And you can see that I changed it from Elesh to Elesh1, Elesh1 to Elesh2, at different times today. So if I go back to my leads here, what I’ll see, is I’ll only see one Elesh Mistry record. But what I’ve done in my integration, so if I go back to SCD History integration, if I look at the schema, if I just search for lead, I’ve actually turned on history mode for lead. And therefore, if I head over to Databricks, what you’ll see is that I’ve got three records for Elesh Mistry. One of them is the first record, and then the other two iterations. If I head over to the last three columns, you’ll really see the type two. And what you can see is that three extra fields have been added, three extra columns have been added to the end of my table, which is start, end, and active. The start is the initial creation of that first entry. And then the end time is when it was updated. This time corresponds to the next entry when it’s updated, So 12:57, 12:57. So now this entry became live, and it ended at 20:35, and then this entry became live. And because it’s still the current entry, it’s got a kind of a null end date at the moment. And as well as that, you can actually find out which entries, the current active one, it’s got a Boolean, The Fivetran active gives you an indication of which entry is the current present entry in your source database. But all of a sudden, you have a history of that lead over time. And you can also tell when those previous entries were live as well, ’cause you’ve got those timestamps associated. Okay, so if we head back to the first few rows, you’ll see that it went from Elesh1, Elesh2. And that’s how to build up a history mode. Now, if we go back to Fivetran, at this point, you could actually turn off history mode as well. So if we just kind of click that checkbox, and save, it will give you a warning, Essentially what’s happening, is that you can turn on and off history mode at will, and it will give you the choice whether you want to drop the unused history columns, and it’ll give you a kind of a warning, because it’s quite a destructive change to actually turn on and off history mode. And it will give… Fivetran will kind of understand that, and give you the option to either go ahead or cancel your operation. And really what Fivetran’s bringing to the table, in our documentation as well, it’s got some really good examples of how history mode works. So I’ve gone through a kind of a live example there. But also what you’ve got, is you’ve got some great examples in history mode. So if we click on history mode here, you’ve got a great example in our documentation of how history mode can kind of, you know, be taken from databases or applications, and you can build up a historical analysis of your data, and run any machine learning, any analysis you want as and when from whichever source that you’ve picked to bring data from. Okay, so if we go back to the slides now, what we’ll see is that our demo’s now complete, and I thank you for joining me for today’s session. And please send us any feedback, and please reach out to us if you want any more information on Fivetran. Thanks for joining us. Bye.

Watch more Data + AI sessions here
Try Databricks for free
« back
About Elesh Mistry


Elesh is one of the first sales engineer hires at EMEA Fivetran. With a strong drive to do whatever it takes to solve data integration challenges, Elesh has helped Fivetran EMEA grow to over 60 employees and 250 customers with offices in Dublin, London and Munich. Previously, he has worked as a Senior Solutions Engineer & Professional Service Consultant for Confluent, SnapLogic, Tibco and Raytheon.