EnterprisePII is a first-of-its-kind large language model (LLM) data set aimed at detecting business-sensitive information.
The challenge of detecting and redacting sensitive business data is a significant issue for enterprises that want to leverage generative AI capabilities. The risk of LLMs unintentionally leaking confidential information to the public, third parties, or unauthorized internal users has been well-documented and hinders enterprise adoption.
Traditional PII detection models rely on Named Entity Recognition (NER) and only identify Personally Identifiable Information (PII), like addresses, phone numbers, or personal details. However, they fall short in detecting crucial business-sensitive information such as revenue figures, customer accounts, salary details, project owners, and strategic or commercial relationship notes.
That’s why Patronus AI developed EnterprisePII—a first-of-its-kind large language model (LLM) data set aimed at detecting business-sensitive information. AI researchers and developers can now freely access and utilize EnterprisePII to test their LLMs' ability to identify confidential data typically found in various business documents, such as meeting notes, commercial contracts, marketing emails, performance reviews, and more.
In the coming weeks, MosaicML will incorporate the EnterprisePII dataset into LLM Foundry, an open-source code repository for training, fine-tuning, evaluating, and deploying LLMs. A version of the dataset compatible with our Composer library will also be included in our LLM Eval Gauntlet (a comprehensive method for assessing the quality of LLMs).
To learn more about Patronus AI and EnterprisePII, read their latest announcement.