HomepageData + AI Summit 2022 Logo
Watch on demand

Obfuscating Sensitive Information from Spark UI and Logs

On Demand

Type

  • Session

Format

  • Virtual

Track

  • Data Security and Governance

Difficulty

  • Intermediate

Duration

  • 0 min
Download session slides

Vue d'ensemble

The spark UI and logs have useful information but also include sensitive data that need to be obfuscated.

To obfuscate the data, at Workday we have implemented methods for Apache Spark where the string representations for the TreeNode class can be configured to be obfuscated or non-obfuscated.To do this, we added a custom treenode printer for ui and a custom log4j appender which uses a list of rules based on class name/package name/log message regexes to decide whether to obfuscate third party libraries. In the Spark UI and in the logging, this results in the obfuscation of Spark Plans and column names.

In this talk we will go over the steps we have taken to implement the methods for obfuscation and show what it looks like in the Spark UI and logs. The methods shared have worked out well when deployed to production at workday, and other companies can also benefit from implementing these methods.

Session Speakers

Yian Liou

Ingénieur logiciel

Workday

Visionnez les temps forts du Data+AI Summit

Watch on demand