05_spend_optimization

import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snssns.set(font_scale = 1.4) 

%run ./99_utils

1.3: `get_params` and store values in variables

Three of the parameters returned by get_params are used in this notebook. For convenience, we will store the values for these parameters in new variables.
- database_name: the name of the database created in notebook 02_load_data. The default value can be overridden in the notebook 99_config
- gold_user_journey_tbl_path: the path used in 03_load_data to write out gold-level user journey data in delta format.
- gold_attribution_tbl_path: the path used in 03_load_data to write out gold-level attribution data in delta format.

params = get_params()database_name = params['database_name']gold_user_journey_tbl_path = params['gold_user_journey_tbl_path']gold_attribution_tbl_path = params['gold_attribution_tbl_path']gold_ad_spend_tbl_path = params['gold_ad_spend_tbl_path']

Notebook job #51613

_ = spark.sql("use {}".format(database_name))

_ = spark.sql('''  CREATE OR REPLACE TABLE gold_ad_spend (    campaign_id STRING,     total_spend_in_dollars FLOAT,     channel_spend MAP<STRING, FLOAT>,     campaign_start_date TIMESTAMP)  USING DELTA  LOCATION '{}'  '''.format(gold_ad_spend_tbl_path))

dbutils.widgets.text("adspend", "10000", "Campaign Budget in $")

%sqlINSERT INTO TABLE gold_ad_spendVALUES ("3d65f7e92e81480cac52a20dfdf64d5b", int($adspend),          MAP('Social Network', .2,              'Search Engine Marketing', .2,                'Google Display Network', .2,               'Affiliates', .2,               'Email', .2),          make_timestamp(2020, 5, 17, 0, 0, 0));

%sqlSELECT * FROM gold_ad_spend

ad_spend_df = spark.sql('select explode(channel_spend) as (channel, pct_spend), \                         round(total_spend_in_dollars * pct_spend, 2) as dollar_spend \                         from gold_ad_spend') ad_spend_df.createOrReplaceTempView("exploded_gold_ad_spend")display(ad_spend_df)

%sqlCREATE OR REPLACE TABLE base_conversion_rateUSING DELTA ASSELECT count(*) as count,  CASE     WHEN conversion == 0     THEN 'Impression'    ELSE 'Conversion'  END AS interaction_typeFROM  gold_user_journeyGROUP BY  conversion;

base_converion_rate_pd = spark.table("base_conversion_rate").toPandas() pie, ax = plt.subplots(figsize=[20,9])labels = base_converion_rate_pd['interaction_type']plt.pie(x=base_converion_rate_pd['count'], autopct="%.1f%%", explode=[0.05]*2, labels=labels, pctdistance=0.5)plt.title("Base Conversion Rate");

%sqlCREATE OR REPLACE TABLE conversions_by_date USING DELTA ASSELECT count(*) AS count,  'Conversion' AS interaction_type,  date(time) AS dateFROM bronzeWHERE conversion = 1GROUP BY dateORDER BY date;

conversions_by_date_pd = spark.table("conversions_by_date").toPandas() plt.figure(figsize=(20,9))pt = sns.lineplot(x='date',y='count',data=conversions_by_date_pd) pt.tick_params(labelsize=20)pt.set_xlabel('Date')pt.set_ylabel('Number of Conversions')plt.title("Conversions by Date");

%sqlCREATE OR REPLACE TABLE attribution_by_model_type USING DELTA ASSELECT attribution_model, channel, round(attribution_percent * (    SELECT count(*) FROM gold_user_journey WHERE conversion = 1)) AS conversions_attributedFROM gold_attribution;

attribution_by_model_type_pd = spark.table("attribution_by_model_type").toPandas() pt = sns.catplot(x='channel',y='conversions_attributed',hue='attribution_model',data=attribution_by_model_type_pd, kind='bar', aspect=4, legend=True)pt.fig.set_figwidth(20)pt.fig.set_figheight(9) plt.tick_params(labelsize=15)plt.ylabel("Number of Conversions")plt.xlabel("Channels")plt.title("Channel Performance");

%sql CREATE OR REPLACE TABLE cpa_summary USING DELTAASSELECT  spending.channel,  spending.dollar_spend,  attribution_count.attribution_model,  attribution_count.conversions_attributed,  round(spending.dollar_spend / attribution_count.conversions_attributed,2) AS CPA_in_DollarsFROM  (SELECT explode(channel_spend) AS (channel, spend),   round(total_spend_in_dollars * spend, 2) AS dollar_spend   FROM gold_ad_spend) AS spendingJOIN  (SELECT attribution_model, channel, round(attribution_percent * (      SELECT count(*) FROM gold_user_journey WHERE conversion = 1)) AS conversions_attributed   FROM gold_attribution) AS attribution_countON spending.channel = attribution_count.channel;

cpa_summary_pd = spark.table("cpa_summary").toPandas() pt = sns.catplot(x='channel', y='CPA_in_Dollars',hue='attribution_model',data=cpa_summary_pd, kind='bar', aspect=4, ci=None)plt.title("Cost of Aquisition by Channel")pt.fig.set_figwidth(20)pt.fig.set_figheight(9) plt.tick_params(labelsize=15)plt.ylabel("CPA in $")plt.xlabel("Channels")plt.title("Channel Cost per Aquisition");

Step 4: Budget Allocation Optimization.

Now that we have assigned credit to our marketing channels using Markov Chains, we can take a data-driven approach for budget allocation.

One KPI we can take a look at is Return on Ad Spend (ROAS).
In the ecommerce world, ROAS is calculated as:
ROAS = Revenue $ from marketing/ Advertising $ spent

In our example, instead of working with exact $ values, we will divide the % of conversion attributed to a channel by the % of total adspend allocated to that channel.

ROAS = CHANNEL CONVERSION WEIGHT / CHANNEL BUDGET WEIGHT
- ROAS value > 1 signifies that the channel has been allocated less budget than warranted by its conversion rate.
- ROAS value < 1 signifies that the channel has been allocated more budget than warranted by its conversion rate.
- ROAS value = 1 signifies and optimized budget allocation.

From ROAS, we can calculate the Proposed Budget for each channel

Proposed budget = Current budget X ROAS

To calculate ROAS we will join the following Delta Tables:

gold_attribution: This table contains the calculated attribution % per channel based on different attribution models.
exploded_gold_ad_spend: This table contains the current budget allocated per channel. The column pct_spend documents the % of the total budget that has been allocated to a given channel.

%sqlSELECT * FROM exploded_gold_ad_spend;

%sqlSELECT * FROM gold_attribution;

attribution_model

channel

attribution_percent

first_touch

Affiliates

0.31

first_touch

0.05

first_touch

Google Display Network

0.14

first_touch

Search Engine Marketing

0.29

first_touch

Social Network

0.21

last_touch

Affiliates

0.39

last_touch

0.01

last_touch

Google Display Network

0.1

last_touch

Search Engine Marketing

0.2

last_touch

Social Network

0.3

markov_chain

Search Engine Marketing

0.22

markov_chain

Google Display Network

0.12

markov_chain

Social Network

0.26

markov_chain

Affiliates

0.37

markov_chain

0.02

Showing all 15 rows.

%sqlCREATE OR REPLACE TABLE spend_optimization_view USING DELTAASSELECT  a.channel,  a.pct_spend,  b.attribution_percent,  b.attribution_percent / a.pct_spend as ROAS,  a.dollar_spend,  round(    (b.attribution_percent / a.pct_spend) * a.dollar_spend,    2  ) as proposed_dollar_spendFROM  exploded_gold_ad_spend a  JOIN gold_attribution b on a.channel = b.channel  and attribution_model = 'markov_chain';  CREATEOR REPLACE TABLE spend_optimization_final USING DELTA ASSELECT  channel,  'current_spending' AS spending,  dollar_spend as budget FROM exploded_gold_ad_spendUNIONSELECT  channel,  'proposed_spending' AS spending,  proposed_dollar_spend as budgetFROM  spend_optimization_view;  

spend_optimization_final_pd = spark.table("spend_optimization_final").toPandas() pt = sns.catplot(x='channel', y='budget', hue='spending', data=spend_optimization_final_pd, kind='bar', aspect=4, ci=None) plt.tick_params(labelsize=15)pt.fig.set_figwidth(20)pt.fig.set_figheight(9)plt.title("Spend Optimization per Channel")plt.ylabel("Budget in $")plt.xlabel("Channels")

Out[14]:

Text(0.5, 36.14399999999999, 'Channels')

Copyright Databricks, Inc. [2021]. The source in this notebook is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.

Library Name	Library license	Library License URL	Library Source URL
Matplotlib	Python Software Foundation (PSF) License	https://matplotlib.org/stable/users/license.html	https://github.com/matplotlib/matplotlib
Numpy	BSD-3-Clause License	https://github.com/numpy/numpy/blob/master/LICENSE.txt	https://github.com/numpy/numpy
Pandas	BSD 3-Clause License	https://github.com/pandas-dev/pandas/blob/master/LICENSE	https://github.com/pandas-dev/pandas
Python	Python Software Foundation (PSF)	https://github.com/python/cpython/blob/master/LICENSE	https://github.com/python/cpython
Seaborn	BSD-3-Clause License	https://github.com/mwaskom/seaborn/blob/master/LICENSE	https://github.com/mwaskom/seaborn
Spark	Apache-2.0 License	https://github.com/apache/spark/blob/master/LICENSE	https://github.com/apache/spark

05_spend_optimization(Python)

Overview

In this notebook you:

Step 1: Configure the Environment

Step 1.1: Import libraries

Step 1.2: Run the `utils` notebook to gain access to the functions `get_params`

1.3: `get_params` and store values in variables

Step 1.4: Set the current database so that it doesn't need to be manually specified each time it's used

Step 2: Create and Populate Ad Spend Table

Step 2.1: Create ad spend table

Step 2.2: Create widget for specifying the ad spend for a given campaign

Step 2.3: Populate ad spend table with synthetic ad spend data

Step 2.4: View campaign ad spend details

Step 2.5: Explode struct into multiple rows

Step 3: View Campaign Performance

Step 3.1: Base conversion rate

Step 3.2: Conversions by date

Step 3.3: Attribution by model type

Step 3.4: Cost per acquisition

Step 4: Budget Allocation Optimization.

05_spend_optimization(Python)

Overview

In this notebook you:

Step 1: Configure the Environment

Step 1.1: Import libraries

Step 1.2: Run the utils notebook to gain access to the functions get_params

1.3: get_params and store values in variables

Step 1.4: Set the current database so that it doesn't need to be manually specified each time it's used

Step 2: Create and Populate Ad Spend Table

Step 2.1: Create ad spend table

Step 2.2: Create widget for specifying the ad spend for a given campaign

Step 2.3: Populate ad spend table with synthetic ad spend data

Step 2.4: View campaign ad spend details

Step 2.5: Explode struct into multiple rows

Step 3: View Campaign Performance

Step 3.1: Base conversion rate

Step 3.2: Conversions by date

Step 3.3: Attribution by model type

Step 3.4: Cost per acquisition

Step 4: Budget Allocation Optimization.

Step 1.2: Run the `utils` notebook to gain access to the functions `get_params`

1.3: `get_params` and store values in variables