Scanning Notebooks for Arbitrary Code(Python)

Loading...
###################################
###### Create widgets for user inputs ######
###################################
#URL of your Databricks deployment
dbutils.widgets.text(
  name='api_url',
  defaultValue='https://mydeployment.cloud.databricks.com',
  label='URL of your Databricks deployment'
)
#Arbitratry code to be scanned
dbutils.widgets.text(
  name='grep_string',
  defaultValue='ctx\|\phpass|\mlflow',
  label='Arbitratry code to be scanned'
)

Overview

The goal of this notebook is to scan your notebooks for any arbitrary string.

Configuration

To configure this notebook, see the next cell "Configuration." Then can go to the final cell and modify the %sh command to scan for what you want.

As this notebook will download all other notebooks, it must be run with the privileges of a user who can view all notebooks. Additionally, it is recommended that it be run on a single user cluster (or dedicated job cluster) rather than a shared cluster, as it will download potentially sensitive data and store it in shared ephemeral storage on the local machine. Leveraging Nitro instances on AWS for local disk encryption, and leveraging the encryption flag on Azure Databricks is prudent.

Configuring credentials

In order to use this notebook, you will need to have an admin-level personal access token. To practice secure coding practices, this should not be hardcoded in a notebook, and rather should be stored using the Databricks Secrets capability or using a third party secret manager. Configure the Databricks Secrets service using the databricks cli or use the API. (Docs for CLI and API)

Example for API, after configuring the CLI:

$ databricks secrets create-scope --scope YOUR_SCOPE_NAME
$ databricks secrets put --scope YOUR_SCOPE_NAME --key YOUR_KEY_NAME

The scope name and key name can be whatever you wish, just provide them below in the configuration section.

After each put, the CLI will open up the vi text editor. Press the letter 'i' to switch to "insert" mode and then type in the username (or password). To get out of "insert" mode, hit escape, and then save and quit by typing ':wq'. Databricks will automatically remove any leading or trailing whitespace. Confused by vi? You're not alone. vi for beginners

Warranty

No warranty is provided for this notebook, and it is not officially supported. For best effort support, ask your Databricks team to connect you with Security Field Engineering.

###################################
###### Secrets configuration ######
###################################
secret_configuration_personal_access_token = {
  "scope": "databricks-secrets-scanning",
  "key": "pat"
}
 
##################################
###### System Configuration ######
##################################
api_url = dbutils.widgets.get("api_url") #example: "https://mydeployment.cloud.databricks.com" 
# API URL Examples:
# AWS would be similar to: https://mydeployment.cloud.databricks.com
# Azure might be similar to: https://adb-39159328312492314.58.azuredatabricks.net or https://westus.azuredatabricks.net
 
##################################
###### Scan String Configuration ######
##################################
grep_string = dbutils.widgets.get("grep_string") #example: ctx|phpass|pyjwt
 
 
import random, json, os, hashlib
import json, requests, time
api_token = dbutils.secrets.get(scope=secret_configuration_personal_access_token['scope'], key=secret_configuration_personal_access_token['key'])
with open("/root/.databrickscfg", "w") as f:
  f.write("[DEFAULT]\n")
  f.write("host = {}\n".format(api_url))
  f.write("token = {}\n".format(api_token))
  f.close()
%sh
if [[ -d /tmp/scanning/db-migration ]] 
then
  rm -rf /tmp/scanning/db-migration
fi
 
if [[ ! -d /tmp/scanning ]] 
then
  mkdir /tmp/scanning
fi
 
cd /tmp/scanning
git clone https://github.com/mrchristine/db-migration
cd db-migration
date
echo "Start Export"
python export_db.py --workspace | wc -l
date
echo "Start Download"
python export_db.py --download --notebook-format SOURCE | wc -l
date
import os
os.environ['grep_string'] = grep_string
print(os.environ['grep_string'])
%sh
grep -R import /tmp/scanning/db-migration/logs/artifacts | grep -E $grep_string
###################################
###### Cleanup after your analysis ######
###################################
#%sh
#rm -rf /tmp/scanning