Simon is a Microsoft Data Platform MVP, awarded in recognition of his contributions to the Microsoft Data Platform Community. Simon is a seasoned Cloud Solution Architect and technical lead with well over a decade of Microsoft Analytics experience. A deep techie with a focus on emerging cloud technologies and applying “big data” thinking to traditional analytics problems, Simon also has a passion for bringing it back to the high level and making sense of the bigger picture. When not tinkering with tech, Simon is a death-dodging London cyclist, a sampler of craft beers, an avid chef, and a generally nerdy person.
May 27, 2021 11:00 AM PT
It has long been said that business intelligence needs a relational warehouse, but that view is changing. With the Lakehouse architecture being shouted from the rooftops, Databricks have released SQL Analytics, an alternative workspace for SQL-savvy users to interact with an analytics-tuned cluster. But how does it work? Where do you start? What does a typical Data Analyst's user journey look like with the tool?
This session will introduce the new workspace and walk through the various key features - how you set up a SQL Endpoint, the query workspace, creating rich dashboards and connecting up BI tools such as Microsoft Power BI.
If you're truly trying to create a Lakehouse experience that satisfies your SQL-loving Data Analysts, this is a tool you'll need to be familiar with and include in your design patterns, and this session will set you on the right path.
May 27, 2021 11:35 AM PT
Tracking which incoming files have been processed has always required thought and design when implementing an ETL framework. The Autoloader feature of Databricks looks to simplify this, taking away the pain of file watching and queue management. However, there can also be a lot of nuance and complexity in setting up Autoloader and managing the process of ingesting data using it. After implementing an automated data loading process in a major US CPMG, Simon has some lessons to share from the experience.
This session will run through the initial setup and configuration of Autoloader in a Microsoft Azure environment, looking at the components used and what is created behind the scenes. We’ll then look at some of the limitations of the feature, before walking through the process of overcoming these limitations. We will build out a practical example that tackles evolving schemas, applying transformations to your stream, extracting telemetry from the process and finally, how to merge the incoming data into a Delta table.
After this session you will be better equipped to use Autoloader in a data ingestion platform, simplifying your production workloads and accelerating the time to realise value in your data!
November 18, 2020 04:00 PM PT
It's very easy to be distracted by the latest and greatest approaches with technology, but sometimes there's a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn't going anywhere, but as we move towards the "Data Lakehouse" paradigm - how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it's performance?
This session looks through the historical problems of attempting to build star-schemas in a lake and steps through a series of technical examples using features such as Delta file formats, Dynamic Partition Pruning and Adaptive Query Execution to tackle these problems.
Speaker: Simon Whiteley