Serge Smertin is a Resident Solutions Architect at Databricks. In his over 14 years of career, he’s been dealing with data solutions, cybersecurity, and heterogeneous system integration. His track record got novel ideas from whiteboard to operating them in production for years, like large-scale malware forensic analysis for the cyber-threat intelligence, or real-time data science platform as the basis for anomaly detection and decision support systems for an industry-leading payments service provider. At Databricks, Serge’s full-time job is to bring its strategic customers to the next level in their Data and AI journey. On rare occasions, when spare time is left, to accelerate Databricks adoption across more customers, he leads Databricks integration with Hashicorp Terraform, de-facto standard for multi-cloud Infrastructure-as-a-Code. To share knowledge, from time to time, Serge writes blogs and speaks at conferences internationally.
May 28, 2021 10:30 AM PT
The long term success of any part of Scribd's data platform relies on Platform Engineering putting tools in the hands of developers and data scientists to "choose their own adventure".
In this session we'll learn about Databricks (Labs) Terraform integration and how it can automate literally every aspect required for a production-grade platform: data security, permissions, continuous deployment and so on. We'll learn how Scribd offers their internal customers flexibility without acting as gatekeepers. Just about anything they might need in Databricks is a pull request away. We'll also learn about the typical deployment patterns of Databricks with Terraform among other customers and clouds and how the project evolved over time.
[daisna21-sessions-od]
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options - Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we'll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
[daisna21-sessions-od]
November 17, 2020 04:00 PM PT
In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.
We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.
Some of the abovementioned techniques are barely an inconvenience to implement, but difficult to support in the long run. We’ll show in which occasions Databricks Delta can help to make your datasets privacy-ready.
Speaker: Serge Smertin