Databricks SQL demo overview
So today I’m going to show you how easy it is to get started with Databricks SQL. In Databricks SQL, you run queries using SQL endpoints that provide low latency and infinite concurrency for SQL queries. They also magically autoscale. So out of the box, we provide you with access to a SQL editor, dashboards and alerts that are integrated right with your data. And of course, you can also connect to your favorite BI tools. Well, we’re open for business to Tableau Power BI, Looker, Spotfire and so much more. So let’s get started.
Click to expand the transcript →
Now, one thing to note is you can see here that I’m using the default SQL endpoint that comes out of the box with every workspace. But what’s great about these endpoints is that they eliminate the complexity of clusters, you don’t have to think about versions, you don’t have to think about Spark configuration. You just pick a t-shirt size, small, medium, large. And you can assign them to specific teams or departments. So you can streamline government governance and chargeback and soon with new Serverless Endpoints, you don’t need to think about startup time either ready for primetime in just a few seconds.
So now that we’ve got our results, why don’t we take a look at some of the execution details? You see here that we get a nice view of what actually happened. We can also understand some of the previous executions that we’ve run, and go back to a previous version if we’d like to. If we’re really firing on all cylinders, we can also go ahead and leverage any of these keyboard shortcuts, for things like opening and closing new tabs, or even executing our queries, highlighting certain portions, everything designed to make your analysis as quick and easy as possible. Now, of course, sometimes you don’t just want to generate results, you also want to visualize them and share them with other individuals and teams in the organization. So why don’t we go ahead and add a visualization. And you’ll see that’s right side by side with my with my results, let’s just do a quick scatterplot of the average fare by route, see if we can figure out which which routes we probably want to avoid, because they’re a little pricey. So you can see here, I was able to quickly just add that, that visualization. And I can quickly spot any of these outliers that I might want to avoid when I’m planning my trips – probably better to take the subway. So now this is great, I’m starting to generate some insights already. But what if I want to organize these insights per system and kind of come back to them time and time again with my team? Dashboards are a great way to do this. And of course, I’m sure you’ve guessed by now, we provide a number of sample dashboards out of the box. So let’s go ahead and pick our NYC Taxi one.
Now, you’ll see that this dashboard provides a great example of a number of capabilities you might want to take advantage of, whether that’s conditionally coloring certain values in a table, or adding interactivity via parameters and allowing the consumers of these dashboards to set different date ranges. And what’s great is if there’s a key metric I’m tracking like total trips, I can also right within Databricks SQL set up an alert so I know when certain thresholds are met. So here you’ll notice that I’ve selected my query total trips, I’ve decided I want to know as soon as we crossed that magical 300 mark for that time range. And we just want to get notified once when it happens. So I’m going to go ahead and create the alert. I’m going to tell it to send to myself. And I’m also going to go ahead and decide to make sure the entire team knows too, and make sure that message goes to Slack as well.
Alright, so with that, why don’t we jump over to Tableau where I’ve already configured it to query directly from Databricks SQL and to hit those SQL endpoints. You can see here that I was able to develop that same view of my data in a Tableau dashboard. And if I want to open anything up and do some additional analysis, like let’s say I want to understand kind of the difference in route fare by drop offs of code. You can see here that very quickly, I can see which zip codes are very close to each other. You can see very short trips, very cheap trips. But if I go a little further, I can also start to get some more insights and understand where the distance is actually a little longer. And this is all generating live queries against my Databricks SQL data, with the click of a button. So hopefully, this gave you an idea of what’s possible with analytics on the data lake whether you want to use our built in capabilities or your favorite BI tool.