HomepageData + AI Summit 2024 Logo
June 10–13, 2024
San Francisco + Virtual
  • Sessions
  • 2024 Call for Presentations
Apply to speak

PII Detection at Scale on the Lakehouse

Wednesday, June 28 @11:30 AM
Attending in person? Add to your schedule ↗

Overview

SEEK is Australia’s largest online employment marketplace and a market leader spanning ten countries across Asia Pacific and Latin America. SEEK provides employment opportunities for roughly 16 million monthly active users and process 25 million candidate applications to listings. Processing millions of resumes involves handling and managing highly sensitive candidate information, usually inputted in a highly unstructured format. With recent high-profile data leaks in Australia, personally identifiable information (PII) protection has become a major focus area for large digital organizations.



 



The first step is detection, and SEEK has developed a custom framework built using HuggingFace transformers fine-tuned with nuances around employment. For example, “Software Engineer at Databricks” is not PII, but “CEO at Databricks” is PII. After identifying and anonymizing PII in stream and batch data, SEEK uses Unity Catalog’s data lineage to track PII through their reporting, ETL, and other downstream ML use-cases and govern access control achieving an organization-wide data management capability driven by deep learning and enforcement using Databricks.


Type

  • Breakout

Experience

  • In Person

Track

  • Data Governance, Databricks Experience (DBX)

Industry

  • Enterprise Technology, Healthcare and Life Sciences, Professional Services, Public Sector

Difficulty

  • Intermediate

Duration

  • 40 min

Session Speakers

Headshot of Ajmal Aziz

Ajmal Aziz

Solutions Architect

Databricks

Headshot of Rachael Straiton

Rachael Straiton

Chief of Data

SEEK Limited

Don't miss this year's event!

Register now