Skip to main content
Engineering blog

Data Exfiltration Protection with Azure Databricks

Learn details of how you could set up a secure Azure Databricks architecture to protect data exfiltration
Bhavin Kukadia
Abhinav Garg
Bruce Nelson
Michal Marusan
Jaroslav Jindrich (Microsoft)
Share this post

In the previous blog, we discussed how to securely access Azure Data Services from Azure Databricks using Virtual Network Service Endpoints or Private Link. Given a baseline of those best practices, in this article we walkthrough detailed steps on how to harden your Azure Databricks deployment from a network security perspective in order to prevent data exfiltration.

As per wikipedia: Data exfiltration occurs when malware and/or a malicious actor carries out an unauthorized data transfer from a computer. It is also commonly called data extrusion or data exportation. Data exfiltration is also considered a form of data theft. Since the year 2000, a number of data exfiltration efforts severely damaged the consumer confidence, corporate valuation, and intellectual property of businesses and national security of governments across the world. The problem assumes even more significance as enterprises start storing and processing sensitive data (PII, PHI or Strategic Confidential) with public cloud services.

Solving for data exfiltration can become an unmanageable problem if the PaaS service requires you to store your data with them or it processes the data in the service provider’s network. But with Azure Databricks, our customers get to keep all data in their Azure subscription and process it in their own managed private virtual network(s), all while preserving the PaaS nature of the fastest growing Data & AI service on Azure. We’ve come up with a secure deployment architecture for the platform while working with some of our most security-conscious customers, and it’s time that we share it out broadly.

High-level Data Exfiltration Protection Architecture

We recommend a hub and spoke topology styled reference architecture. The hub virtual network houses the shared infrastructure required to connect to validated sources and optionally to an on-premises environment. And the spoke virtual networks peer with the hub, while housing isolated Azure Databricks workspaces for different business units or segregated teams.

High-level view of art of the possible:

High-level view of the architecture recommended to prevent data exfiltration and secure sensitive information.

Following are high-level steps to set up a secure Azure Databricks deployment (see corresponding diagram below):

  1. Deploy Azure Databricks with secure cluster connectivity (SCC) enabled in a spoke virtual network using VNet injection (azuredatabricks-spoke-vnet in below diagram)
  2. Set up Private Link endpoints for your Azure Data Services in a separate subnet within the Azure Databricks spoke virtual network (privatelink-subnet in below diagram). This would ensure that all workload data is being accessed securely over Azure network backbone with default data exfiltration protection in place (see this for more). Also in general it’s completely fine to deploy these endpoints in another virtual network that’s peered to the one hosting the Azure Databricks workspace.
  3. Optionally, set up Azure SQL database as External Hive Metastore to override as the primary metastore for all clusters in the workspace. This is meant to override the configuration for consolidated metastore housed in the control plane.
  4. Deploy Azure Firewall (or other Network Virtual Appliance) in a hub virtual network (shared-infra-hub-vnet in below diagram). With Azure Firewall, you could configure:

Application rules that define fully qualified domain names (FQDNs) that are accessible through the firewall. Some Azure Databricks required traffic could be whitelisted using the application rules.

Network rules that define IP address, port and protocol for endpoints that can’t be configured using FQDNs. Some of the required Azure Databricks traffic needs to be whitelisted using the network rules.

Some of our customers prefer to use a third-party firewall appliance instead of Azure Firewall, which works generally fine. Though please note that each product has its own nuances and it’s better to engage relevant product support and network security teams to troubleshoot any pertinent issues.

• Set up Service Endpoint to Azure Storage for the Azure Firewall subnet, such that all traffic to whitelisted in-region or in-paired-region storage goes over the Azure network backbone (includes endpoints in Azure Databricks control plane if the customer data plane region is a match or paired).

  1. Create a user-defined route table with the following rules and attach it to Azure Databricks subnets.
  1. Configure virtual network peering between the Azure Databricks spoke and Azure Firewall hub virtual networks.

High-level steps recommended to set up a secure Azure Databricks deployment

Such a hub-and-spoke architecture allows creating multiple spoke VNETs for different purposes and teams. Though we’ve seen some of our customers implement isolation by creating separate subnets for different teams within a large contiguous virtual network. In such instances, it’s totally possible to set up multiple isolated Azure Databricks workspaces in their own subnet pairs, and deploy Azure Firewall in another sister subnet within the same virtual network.

We’ll now discuss the above setup in more detail below.

Secure Azure Databricks Deployment Details

Prerequisites

Please take a note of Azure Databricks control plane endpoints for your workspace from here (map it based on region of your workspace). We’ll need these details to configure Azure Firewall rules later.

NameSourceDestinationProtocol:PortPurpose
databricks-webappAzure Databricks workspace subnetsRegion specific Webapp Endpointhttps:443Communication with Azure Databricks webapp
databricks-webappAzure Databricks workspace subnetsRegion specific Webapp Endpointhttps:443Communication with Azure Databricks webapp
databricks-observability-eventhubAzure Databricks workspace subnetsRegion specific Observability Event Hub Endpointhttps:9093Transit for Azure Databricks on-cluster service specific telemetry
databricks-artifact-blob-storageAzure Databricks workspace subnetsRegion specific Artifact Blob Storage Endpointhttps:443Stores Databricks Runtime images to be deployed on cluster nodes
databricks-dbfsAzure Databricks workspace subnetsDBFS Blob Storage Endpointhttps:443Azure Databricks workspace root storage
databricks-sql-metastore
(OPTIONAL - please see Step 3 for External Hive Metastore below)
Azure Databricks workspace subnetsRegion specific SQL Metastore Endpointtcp:3306Stores metadata for databases and child objects in a Azure Databricks workspace

Step 1: Deploy Azure Databricks Workspace in your virtual network

The default deployment of Azure Databricks creates a new virtual network (with two subnets) in a resource group managed by Databricks. So as to make necessary customizations for a secure deployment, the workspace data plane should be deployed in your own virtual network. This quickstart shows how to do that in a few easy steps. Before that, you should create a virtual network named azuredatabricks-spoke-vnet with address space 10.2.1.0/24 in resource group adblabs-rg (names and address space are specific to this test setup).

Step 1 for setting up a secure Azure Databricks deployment: deploying Azure Databricks in your virtual network.

Referring to Azure Databricks deployment documentation:

    • Create a Azure Databricks workspace using Azure resource manager all-in-one template(ARM).

    • Click Deploy to Azure button which will take you to Azure portal
    • From the Azure portal, select Edit template.

Azure Databricks all-in-one template for VNet Injection

  • Add the following Parameter and Property to the template:Under Parameters section add:

"enableNoPublicIp": {
"defaultValue": "true",
"type": "bool"
}

Edit template to Enable No Public IP flag parameter

Scroll down all the way to the bottom, under workspace properties section add:

"enableNoPublicIp": {
"value": "[parameters('enableNoPublicIp')]" }
}

    • Save template and review create

ARM template configuration to set up a secure Azure Databricks deployment

SettingSuggested valueDescription
Workspace nameadblabs-wsSelect a name for your Azure Databricks workspace.
Subscription"Your subscription"Select the Azure subscription that you want to use.
Resource groupadblabs-rgSelect the same resource group you used for the virtual network.
LocationCentral USChoose the same location as your virtual network.
Enable No Public IPtrueDisables public ip’s on Azure Databricks cluster nodes.
Pricing TierPremiumFor more information on pricing tiers, see the Azure Databricks pricing page.
      • Once you've finished entering basic settings, select Next: Networking > and apply the following settings:

Deploy Azure Databricks workspace in your Virtual Network (VNet)YesThis setting allows you to deploy an Azure Databricks workspace in your virtual network.

SettingValueDescription
Virtual Networkazuredatabricks-spoke-vnetSelect the virtual network you created earlier.
Public Subnet Namepublic-subnetUse the default public subnet name, you could use any name though.
Public Subnet CIDR Range10.2.1.64/26Use a CIDR range up to and including /26.
Private Subnet Nameprivate-subnetUse the default private subnet name, you could use any name though.
Private Subnet CIDR Range10.2.1.128/26Use a CIDR range up to and including /26.

Click Review and Create. Few things to note:

      • The virtual network must include two subnets dedicated to each Azure Databricks workspace: a private subnet and public subnet (feel free to use a different nomenclature). The public subnet is the source of a private IP for each cluster node’s host VM. The private subnet is the source of a private IP for the Databricks Runtime container deployed on each cluster node. It indicates that each cluster node has two private IP addresses today.
      • Each workspace subnet size is allowed to be anywhere from /18 to /26, and the actual sizing will be based on forecasting for the overall workloads per workspace. The address space could be arbitrary (including non RFC 1918 ones), but it must align with the enterprise on-premises plus cloud network strategy.
      • Azure Databricks will create these subnets for you when you deploy the workspace using Azure portal and will perform subnet delegation to the Microsoft.Databricks/workspaces service. That allows Azure Databricks to create the required Network Security Group (NSG) rules. Azure Databricks will always give advance notice if we need to add or update the scope of an Azure Databricks-managed NSG rule. Please note that if these subnets already exist, the service will use those as such.
      • There is a one-to-one relationship between these subnets and an Azure Databricks workspace. You cannot share multiple workspaces across the same subnet pair, and must use a new subnet pair for each different workspace.
      • Notice the resource group and managed resource group in the Azure Databricks resource overview page on Azure portal. You cannot create any resources in the managed resource group, nor can you edit any existing ones.

Step 2: Set up Private Link Endpoints

As discussed in the Securely Accessing Azure Data Services blog, we’ll use Azure Private Link to securely connect previously created Azure Databricks workspace to your Azure Data Services. We do not recommend setting up access to such data services through a network virtual appliance / firewall, as that has a potential to adversely impact the performance of big data workloads and the intermediate infrastructure.

Please create a subnet privatelink-subnet with address space 10.2.1.0/26 in the virtual network azuredatabricks-spoke-vnet.

Step 2 for setting up a secure Azure Databricks deployment: setting up Private Link Endpoints

For the test setup, we’ll deploy a sample storage account and then create a Private Link endpoint for that. Referring to the setting up private link documentation:

      • On the upper-left side of the screen in the Azure portal, select Create a resource > Storage > Storage account.
      • In Create storage account - Basics, enter or select this information:

Resource groupSelect adblabs-rg. You created this in the previous section.

SettingValue
PROJECT DETAILS 
SubscriptionSelect your subscription.
INSTANCE DETAILS 
Storage account nameEnter myteststorageaccount. If this name is taken, please provide a unique name.
RegionSelect Central US (or the same region you used for Azure Databricks workspace and virtual network).
PerformanceLeave the default Standard.
ReplicationSelect Read-access geo-redundant storage (RA-GRS).

Select Next:Networking >

      • In Create a storage account - Networking, connectivity method, select Private Endpoint.
      • In Create a storage account - Networking, select Add Private Endpoint.
      • In Create Private Endpoint, enter or select this information:

PROJECT DETAILS

SettingValue
SubscriptionSelect your subscription.
Resource groupSelect adblabs-rg. You created this in the previous section.
LocationSelect Central US (or the same region you used for Azure Databricks workspace and virtual network).
NameEnter myStoragePrivateEndpoint.
Storage sub-resourceSelect dfs.
NETWORKING 
Virtual networkSelect azuredatabricks-spoke-vnet from resource group adblabs-rg.
SubnetSelect privatelink-subnet.
PRIVATE DNS INTEGRATION
Integrate with private DNS zoneLeave the default Yes.
Private DNS zoneLeave the default (New) privatelink.dfs.core.windows.net.

Select OK.

      • Select Review + create. You're taken to the Review + create page where Azure validates your configuration.
      • When you see the Validation passed message, select Create.
      • Browse to the storage account resource that you just created.

It’s possible to create more than one Private Link endpoint for supported Azure Data Services. To configure such endpoints for additional services, please refer to the relevant Azure documentation.

Step 3: Set up External Hive Metastore

Provision Azure SQL database

This step is optional. By default the consolidated regional metastore is used for the Azure Databricks workspace. Please skip to the next step if you would like to avoid managing a Azure SQL database for this end-to-end deployment.

Step 3 for setting up a secure Azure Databricks deployment: setting up external hive metastore.

Referring to provisioning an Azure SQL database documentation, please provision an Azure SQL database which we will use as an external hive metastore for the Azure Databricks workspace.

      • On the upper-left side of the screen in the Azure portal, select Create a resource > Databases > SQL database.
      • In Create SQL database - Basics, enter or select this information:

 

SettingValue
DATABASE DETAILS 
SubscriptionSelect your subscription.
Resource groupSelect adblabs-rg. You created this in the previous section.
INSTANCE DETAILS 
Database nameEnter myhivedatabase. If this name is taken, please provide a unique name.
      • In Server, select Create new.
      • In New server, enter or select this information:
SettingValue
Server nameEnter mysqlserver. If this name is taken, please provide a unique name.
Server admin loginEnter an administrator name of your choice.
PasswordEnter a password of your choice. The password must be at least 8 characters long and meet the defined requirements.
LocationSelect Central US (or the same region you used for Azure Databricks workspace and virtual network).

Select OK.

      • Select Review + create. You're taken to the Review + create page where Azure validates your configuration.
      • When you see the Validation passed message, select Create.
Create a Private Link endpoint

In this section, you will add a Private Link endpoint for the Azure SQL database created above. Referring from this source

      • On the upper-left side of the screen in the Azure portal, select Create a resource > Networking > Private Link Center.
      • In Private Link Center - Overview, on the option to Build a private connection to a service, select Start.
      • In Create a private endpoint - Basics, enter or select this information:
SettingValue
PROJECT DETAILS 
SubscriptionSelect your subscription.
Resource groupSelect adblabs-rg. You created this in the previous section.
INSTANCE DETAILS 
NameEnter mySqlDBPrivateEndpoint. If this name is taken, please provide a unique name.
RegionSelect Central US (or the same region you used for Azure Databricks workspace and virtual network).
Select Next: Resource 

In Create a private endpoint - Resource, enter or select this information:

SettingValue
Connection methodSelect connect to an Azure resource in my directory.
SubscriptionSelect your subscription.
Resource typeSelect Microsoft.Sql/servers.
ResourceSelect mysqlserver
Target sub-resourceSelect sqlServer

Select Next: Configuration

In Create a private endpoint - Configuration, enter or select this information:

NETWORKING

SettingValue
Virtual networkSelect azuredatabricks-spoke-vnetSubnetSelect privatelink-subnetPRIVATE DNS INTEGRATIONIntegrate with private DNS zoneSelect Yes.Private DNS ZoneSelect (New)privatelink.database.windows.net
      • Select Review + create. You're taken to the Review + create page where Azure validates your configuration.
      • When you see the Validation passed message, select Create.
Configure External Hive Metastore
      • From Azure Portal, search for the adblabs-rg resource group
      • Go to Azure Databricks workspace resource
      • Click Launch Workspace
      • Please follow the instructions documented here to configure the Azure SQL database created above as an external hive metastore for the Azure Databricks workspace.

Step 4: Deploy Azure Firewall

We recommend Azure Firewall as a scalable cloud firewall to act as the filtering device for Azure Databricks control plane traffic, DBFS Storage, and any allowed public endpoints to be accessible from your Azure Databricks workspace.

Step 4 for setting up a secure Azure Databricks deployment: deploying Azure firewall with relevant rules

Referring to the documentation for configuring an Azure Firewall, you could deploy Azure Firewall into a new virtual network. Please create the virtual network named hub-vnet with address space 10.3.1.0/24 in resource group adblabs-rg (names and address space are specific to this test setup). Also create a subnet named AzureFirewallSubnet with address space 10.3.1.0/26 in hub-vnet.

      • On the Azure portal menu or from the Home page, select Create a resource.
      • Type firewall in the search box and press Enter.
      • Select Firewall and then select Create.
      • On the Create a Firewall page, use the following table to configure the firewall:
SettingValue
Subscription"your subscription"
Resource groupadblabs-rg
Namefirewall
LocationSelect Central US (or the same region you used for Azure Databricks workspace and virtual network).
Choose a virtual networkUse existing: hub-vnet
Public IP addressAdd new. The Public IP address must be the Standard SKU type. Name it fw-public-ip
      • Select Review + create.
      • Review the summary, and then select Create to deploy the firewall.
      • This will take a few minutes.
      • After the deployment completes, go to the adblabs-rg resource group, and select the firewall
      • Note the private IP address. You'll use it later when you create the custom default route from Azure Databricks subnets.
Configure Azure Firewall Rules

With Azure Firewall, you can configure:

      • Application rules that define fully qualified domain names (FQDNs) that can be accessed from a subnet.
      • Network rules that define source address, protocol, destination port, and destination address.
      • Network traffic is subjected to the configured firewall rules when you route your network traffic to the firewall as the subnet default gateway.
Configure Application Rule

We first need to configure application rules to allow outbound access to Log Blob Storage and Artifact Blob Storage endpoints in the Azure Databricks control plane plus the DBFS Root Blob Storage for the workspace.

      • Go to the resource group adblabs-rg, and select the firewall.
      • On the firewall page, under Settings, select Rules.
      • Select the Application rule collection tab.
      • Select Add application rule collection.
      • For Name, type databricks-control-plane-services.
      • For Priority, type 200.
      • For Action, select Allow.
      • Configure the following in Rules -> Target FQDNs
NameSource typeSourceProtocol:PortTarget FQDNs
databricks-spark-log-blob-storageIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
https:443Refer notes from Prerequisites above (for Central US)
databricks-audit-log-blob-storageIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
https:443Refer notes from Prerequisites above (for Central US)

 

This is separate log storage only for US regions today

databricks-artifact-blob-storageIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
https:443Refer notes from Prerequisites above (for Central US)
databricks-dbfsIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
https:443Refer notes from Prerequisites above
Public Repositories for Python and R Libraries

 

(OPTIONAL - if workspace users are allowed to install libraries from public repos)

IP Address10.2.1.128/26,10.2.1.64/26https:443*pypi.org,*pythonhosted.org,cran.r-project.org
Add any other public repos as desired
Used by Ganglia UIIP Address10.2.1.128/26,10.2.1.64/26https:443cdnjs.com or cdnjs.cloudflare.com
Configure Network Rule

Some endpoints can’t be configured as application rules using FQDNs. So we’ll set those up as network rules, namely the Observability Event Hub and Webapp.

      • Open the resource group adblabs-rg, and select the firewall.
      • On the firewall page, under Settings, select Rules.
      • Select the Network rule collection tab.
      • Select Add network rule collection.
      • For Name, type databricks-control-plane-services.
      • For Priority, type 200.
      • For Action, select Allow.
      • Configure the following in Rules -> IP Addresses.
NameProtocolSource typeSourceDestination typeDestination AddressDestination Ports
databricks-webappTCPIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
IP AddressRefer notes from Prerequisites above (for Central US)443
databricks-observability-eventhubTCPIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
IP AddressRefer notes from Prerequisites above (for Central US)9093
databricks-sql-metastore
(OPTIONAL - please see Step 3 for External Hive Metastore above)
TCPIP AddressAzure Databricks workspace subnets
10.2.1.128/26,10.2.1.64/26
IP AddressRefer notes from Prerequisites above (for Central US)3306
Configure Virtual Network Service Endpoints
      • On the hub-vnet page, click Service endpoints and then Add
      • From Services select “Microsoft.Storage
      • In Subnets, select AzureFirewallSubnet

Configuring Virtual Network Service Endpoints

Service endpoint would allow traffic from AzureFirewallSubnet to Log Blob Storage, Artifact Blob Storage, and DBFS Storage to go over Azure network backbone, thus eliminating exposure to public networks.

If users are going to access Azure Storage using Service Principals, then we recommend creating an additional service endpoint from Azure Databricks workspace subnets to Microsoft.AzureActiveDirectory.

Step 5: Create User Defined Routes (UDRs)

At this point, the majority of the infrastructure setup for a secure, locked-down deployment has been completed. We now need to route appropriate traffic from Azure Databricks workspace subnets to the Control Plane SCC Relay IP (see FAQ below) and Azure Firewall setup earlier.

Step 5 for setting up a secure Azure Databricks deployment: creating User Defined Routes (UDRs)

Referring to the documentation for user defined routes:

      • On the Azure portal menu, select All services and search for Route Tables. Go to that section.
      • Select Add
      • For Name, type firewall-route.
      • For Subscription, select your subscription.
      • For the Resource group, select adblabs-rg.
      • For Location, select the same location that you used previously i.e. Central US
      • Select Create.
      • Select Refresh, and then select the firewall-route-table route table.
      • Select Routes and then select Add.
      • For Route name, add to-firewall.
      • For Address prefix, add 0.0.0.0/0.
      • For Next hop type, select Virtual appliance.
      • For the Next hop address, add the Private IP address for the Azure Firewall that you noted earlier.
      • Select OK.

Now add one more route for Azure Databricks SCC Relay IP.

      • Select Routes and then select Add.
      • For Route name, add to-central-us-databricks-SCC-relay-ip.
      • For Address prefix, add the Control Plane SCC relay service IP address for Central US from here. Please note that there could be more than one ip addresses for relay service and in that case add additional rules on the UDR accordingly. In order to get SCC relay IP, please run nslookup on the relay service endpoint e.g.,
      • For Next hop type, select Internet, although it says Internet, traffic between Azure Databricks data plane and Azure Databricks SCC relay service IP stays on Azure Network and does not travel over public internet, for more details please refer to this guide).
        .
      • Select OK.

The route table needs to be associated with both of the Azure Databricks workspace subnets.

      • Go to the firewall-route-table.
      • Select Subnets and then select Associate.
      • Select Virtual network > azuredatabricks-spoke-vnet.
      • For Subnet, select both workspace subnets.
      • Select OK.

Step 6: Configure VNET Peering

We are now at the last step. The virtual network azuredatabricks-spoke-vnet and hub-vnet need to be peered so that the route table configured earlier could work properly.

ALT TAG = Step 6 for setting up a secure Azure Databricks deployment: configuring VNET peering

Referring to the documentation for configuring VNET peering:

In the search box at the top of the Azure portal, enter virtual networks in the search box. When Virtual networks appear in the search results, select that view.

      • Go to hub-vnet.
      • Under Settings, select Peerings.
      • Select Add, and enter or select values as follows:
NameValue
Name of the peering from hub-vnet to remote virtual networkfrom-hub-vnet-to-databricks-spoke-vnet
Virtual network deployment modelResource Manager
SubscriptionSelect your subscription
Virtual Networkazuredatabricks-spoke-vnet or select the VNET where Azure Databricks is deployed
Name of the peering from remote virtual network to hub-vnetfrom-databricks-spoke-vnet-to-hub-vnet
      • Leave rest of the default values as is and click OK

The setup is now complete.

Step 7: Validate Deployment

It’s time to put everything to test now:

If the data access worked without any issues, that means you’ve accomplished the optimum secure deployment for Azure Databricks in your subscription. This was quite a bit of manual work, but that was more for a one-time showcase. In practical terms, you would want to automate such a setup using a combination of ARM Templates, Azure CLI, Azure SDK etc.:

Common Questions with Data Exfiltration Protection Architecture

Can I use service endpoint policies to secure data egress to Azure Data Services?

Yes, only with VNet injection. Service Endpoint Policies provides secure and direct connectivity to Azure services over an optimized route over the Azure backbone network. Service Endpoints can be used to secure connectivity to external Azure resources to only your virtual network. Service Endpoints are secure only if used in conjunction with properly defined network firewall rules for the Azure service using the Service Endpoint. Service Endpoints cannot be used in standard deployments because the virtual network is managed and cannot be applied to Databricks root DBFS storage..

Can I use Network Virtual Appliance (NVA) other than Azure Firewall?

Yes, you could use a third-party NVA as long as network traffic rules are configured as discussed in this article. Please note that we have tested this setup with Azure Firewall only, though some of our customers use other third-party appliances. It’s ideal to deploy the appliance on cloud rather than be on-premises.

Can I have a firewall subnet in the same virtual network as Azure Databricks?

Yes, you can. As per Azure reference architecture, it is advisable to use a hub-spoke virtual network topology to plan better for future. Should you choose to create the Azure Firewall subnet in the same virtual network as Azure Databricks workspace subnets, you wouldn’t need to configure virtual network peering as discussed in Step 6 above.

Can I filter Azure Databricks control plane SCC Relay ip traffic through Azure Firewall?
Yes, you can but we would not recommend it because:

      1. The traffic between Azure Databricks clusters(data plane), and the SCC Relay service stays over Azure Network and does not not flow over the public internet.
      2. CC Relay service and data plane needs to have rol plane initiates the stable and reliable communication in place, having a firewall or a virtual appliance between them introduces a single point of failure, e.g., in case of any firewall rule misconfiguration or scheduled downtime in cluster bootstrap (transient firewall issue) or won't be able to create new clusters or affect scheduling and running jobs

Can I analyze accepted or blocked traffic by Azure Firewall?

We recommend using Azure Firewall Logs and Metrics for that requirement.

Getting Started with Data Exfiltration Protection with Azure Databricks

We discussed utilizing cloud-native security control to implement data exfiltration protection for your Azure Databricks deployments, all of it which could be automated to enable data teams at scale. Some other things that you may want to consider and implement as part of this project:

Please reach out to your Microsoft or Databricks account team for any questions.

Try Databricks for free

Related posts

Engineering blog

Data Exfiltration Protection with Azure Databricks

In the previous blog, we discussed how to securely access Azure Data Services from Azure Databricks using Virtual Network Service Endpoints or Private...
See all Engineering Blog posts