The three biggest security challenges facing AI and data initiatives

Published: October 31, 2018

In today’s business climate, the ability to anticipate and meet customer needs is central to success. Forward looking business leaders are looking to unleash the power of Artificial Intelligence (AI) to drive innovation, but this requires bringing together diverse teams and large volumes of data. With attackers getting more sophisticated, securing these complex data workflows is a top priority.

Security challenges in AI and big data

Many organizations face the common challenge of how best to manipulate large volumes of sensitive data to gain meaningful insights in a secure way. Data engineers and security teams struggle to give their data scientists and analysts the speed and access to the data they need to drive AI initiatives while ensuring consistent policy management, data governance, and security compliance.

Many opt to build their own advanced analytics solutions by cobbling together a plethora of data processing (Spark, Hive, Pig etc) and AI/ML tools (SparkML, Tensorflow, PyTorch etc), many of which are open source. This can introduce behaviors that increase security risk. According to Gartner, 80% of organizations will fail to develop a consolidated data security policy across silos. In an effort to address this, some companies over-rotate, by tightly locking down data. This can be costly, hindering their ability to innovate and getting in the way of meeting customer needs.

As the Chief Information Security Officer (CISO) of Databricks, I help customers establish and secure their AI data pipelines. I see the following 3 security challenges over and over again:

Teams acting in silos: For many organizations, technology, people and AI workflows exist in silos. Data engineers and data scientist work with their own toolsets. Often times, these tools are rapidly evolving open-source applications that are poorly integrated across data workflows. This not only kills innovation but also creates large security holes.
Inability to deploy securely at scale: Building a secure scalable architecture is difficult. You have to manage configuration, monitoring, patching, authentication, and security scanning. Enterprise who have compliance requirements (e.g. SOC2, ISO, HIPAA, GDPR) have a more difficult challenge.
Security as an afterthought: AI projects are started with a focus on speed and innovation. Security is often a bolt-on and thought of only later when a major compliance audit is due. At this point, it may be too late to solve some of the fundamental problems with the deployment.

Unified Security Approach for AI & Data

The Databricks Unified Analytics Platform brings together data and Machine Learning(ML) with best-in-class security on the most trusted clouds to accelerate innovation while minimizing risk. We built the Databricks Unified Analytics Platform with a security first mindset to solve the following problems:

1. Our Platform - Knockdown silos while keeping data secure

Data scientists and engineers often work in silos using a disparate collection of tools and fragmented data sets. Further AI requires the latest tool (Tensorflow, Pytorch, Keras etc) that enables the best or most efficient results for a given model. These tools are rapidly evolving and staying on top of the latest vulnerabilities and integration errors is cumbersome. Databricks solves these challenges by offering:

Unified Data and AI Workflows - a single platform that brings together the tools used by data scientists and engineers ensuring consistent and compatible security across the entire AI workflow.
Secure and Transparent Collaboration - unified workspaces (Notebooks) enable teams to work together on the same data while providing centralized auditing, tracking and commenting, reducing the security risk while accelerating cross-functional outcomes.

One shared workspace and consistent workflow

Security as a Core Design Principle - separate data and control planes along with well thought out access controls minimize user errors and unintended access to data. Essential capabilities such as encryption for data (in transit and at rest) and fine-grained access controls are built-in providing the data security that enterprises require. This is a key reason why organizations in heavily regulated industries with highly sensitive data — financial services, healthcare, and government agencies, in particular — choose Databricks.

2. Deploying and operating securely at scale

As the amounts and types of data, users, tools, workloads and ML models increase, the complexity of securing them increases exponentially. As a unified analytics platform, Databricks offers:

Most Secure Clouds - deeply integrated with AWS and Azure best practices and expertise that would be hard and time consuming for most companies to recreate in-house, delivering security along with scale and elasticity.
Secure Integrations - effortlessly manage and secure workflows with built-in integrations for popular enterprise security technologies s such as Single Sign-On (SSO) and System for Cross-domain Identity Management (SCIM) as well as big data technologies such as data lakes, data warehouses, and business intelligence tools.
User Security at Scale - user management capabilities make it easy to onboard and remove large numbers of users, apply policies, and manage access to data — with full audit capabilities.

Separate data and control planes, workload segregation and secure integrations

3. Our Culture - Security at our Core

With many DIY analytics and AI solutions, speed and innovation come first and security is an afterthought. We built Databricks with data security at its core from Day 1.

Built with a Security First Mindset - hard segregation of the data and control planes in separate AWS accounts with VPC peering leaves the data where it is (in your AWS/Azure account so we can’t access it), while integrations offer customers the management and control they require based on their preferred security toolset and resources.
Security Through Transparency – Databricks has been validated with third-party penetration testing and validation, and our certifications and compliance attestations include ISO 27001, SOC 2 Type 2, as well as meeting stringent data and policy requirements that are required in highly regulated industries like healthcare (eg. Sanford Health), financial services (eg. FINRA), and the public sector.
Security-Minded Teams - secure development and update management, low engineering attrition rates, and dedicated vulnerability identification resources (always be testing approach), provide our customers with a solid enterprise-ready solution.

Culture - Security first mindset

Conclusion - Data protection at every level

Databricks has been architected at every layer of our infrastructure to provide advanced security, risk prevention, and management controls for your data, AI and Apache Spark^TM workflows. By combining security and convenience, we bring together teams to realize the promise of AI and drive innovations that enable business transformation. Look for more blogs on each of these three Security pillars to be published in the upcoming weeks.

Try It!

Contact us to find out how Databricks can improve your security posture.
Learn more by downloading our security e-book Protecting Enterprise Data on Apache Spark.

What's next?

November 21, 2024/3 min read

How to present and share your Notebook insights in AI/BI Dashboards

December 10, 2024/7 min read

Security challenges in AI and big data

Unified Security Approach for AI & Data

Conclusion - Data protection at every level

Try It!

Never miss a Databricks post

Sign up

What's next?

How to present and share your Notebook insights in AI/BI Dashboards

Batch Inference on Fine Tuned Llama Models with Mosaic AI Model Serving