An Experimentation Pipeline for Extracting Topics From Text Data Using PySparkJuly 29, 2021 by Srijith Rajamohan, Ph.D. in Engineering Blog This post is part of a series of posts on topic modeling. Topic modeling is the process of extracting topics from a set...
The Delta Between ML Today and Efficient ML TomorrowJuly 22, 2021 by Marijse van den Berg and Maria Zervou in Engineering Blog Delta Lake and MLflow both come up frequently in conversation but often as two entirely separate products. This blog will focus on the...
AML Solutions at Scale Using Databricks Lakehouse PlatformJuly 16, 2021 by Sri Ghattamaneni, Ricardo Portilla and Anindita Mahapatra in Engineering Blog Anti-Money Laundering (AML) compliance has been undoubtedly one of the top agenda items for regulators providing oversight of financial institutions across the globe...
Get Your Free Copy of Delta Lake: The Definitive Guide (Early Release)June 22, 2021 by Tathagata Das, Ryan Boyd, Denny Lee and Vini Jaiswal in Engineering Blog At the Data + AI Summit, we were thrilled to announce the early release of Delta Lake: The Definitive Guide , published by...
What’s New in Apache Spark™ 3.1 Release for Structured StreamingApril 27, 2021 by Yuanjian Li, Shixiong Zhu and Bo Zhang in Engineering Blog Along with providing the ability for streaming processing based on Spark Core and SQL API, Structured Streaming is one of the most important...