05_spend_optimization(Python)

Loading...

Overview

In this notebook you:

  • Build a dashboard to visualize the current state of a given campaign and to inform marketing budget reallocation decisions.

Step 1: Configure the Environment

In this step, we will:

  1. Import libraries
  2. Run the utils notebook to gain access to the functions get_params
  3. get_params and store values in variables
  4. Set the current database so that it doesn't need to be manually specified each time it's used
Step 1.1: Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(font_scale = 1.4)
 
Step 1.2: Run the utils notebook to gain access to the functions get_params
  • %run is a magic command provided within Databricks that enables you to run notebooks from within other notebooks.
  • get_params is a helper function that returns a few parameters used throughout this solution accelerator. Usage of these parameters will be explicit.
%run ./99_utils
1.3: get_params and store values in variables
  • Three of the parameters returned by get_params are used in this notebook. For convenience, we will store the values for these parameters in new variables.

    • database_name: the name of the database created in notebook 02_load_data. The default value can be overridden in the notebook 99_config
    • gold_user_journey_tbl_path: the path used in 03_load_data to write out gold-level user journey data in delta format.
    • gold_attribution_tbl_path: the path used in 03_load_data to write out gold-level attribution data in delta format.
params = get_params()
database_name = params['database_name']
gold_user_journey_tbl_path = params['gold_user_journey_tbl_path']
gold_attribution_tbl_path = params['gold_attribution_tbl_path']
gold_ad_spend_tbl_path = params['gold_ad_spend_tbl_path']
Step 1.4: Set the current database so that it doesn't need to be manually specified each time it's used
  • Please note that this is a completely optional step. An alternative approach would be to use the syntax database_name.table_name when querying the respective tables.
_ = spark.sql("use {}".format(database_name))

Step 2: Create and Populate Ad Spend Table

  1. Create ad spend table
  2. Create widget for specifying the ad spend for a given campaign
  3. Populate ad spend table with synthetic spend data
  4. View campaign ad spend details
  5. Explode struct into multiple rows
Step 2.1: Create ad spend table
_ = spark.sql('''
  CREATE OR REPLACE TABLE gold_ad_spend (
    campaign_id STRING, 
    total_spend_in_dollars FLOAT, 
    channel_spend MAP<STRING, FLOAT>, 
    campaign_start_date TIMESTAMP)
  USING DELTA
  LOCATION '{}'
  '''.format(gold_ad_spend_tbl_path))
Step 2.2: Create widget for specifying the ad spend for a given campaign
  • In practice, this data is typically captured directly from each marketing activation channel.
  • After running this command, you will see a widget appear at the top of the notebook. Widgets are useful for making notebooks and dashboards interactive.
  • The values passed to this widget include the name of the widget, the default value, and a label to be displayed next to the widget.
dbutils.widgets.text("adspend", "10000", "Campaign Budget in $")
Step 2.3: Populate ad spend table with synthetic ad spend data
  • Note that the value for the widget is passed in using $adspend
  • For illustration purposes, we have distributed spend uniformly across all marketing channels (20%)
%sql
INSERT INTO TABLE gold_ad_spend
VALUES ("3d65f7e92e81480cac52a20dfdf64d5b", int($adspend),
          MAP('Social Network', .2,
              'Search Engine Marketing', .2,  
              'Google Display Network', .2, 
              'Affiliates', .2, 
              'Email', .2), 
         make_timestamp(2020, 5, 17, 0, 0, 0));
OK
Step 2.4: View campaign ad spend details
  • The channel spend data currently exists as an array. We will explode these values into separate columns in the next step
%sql
SELECT * FROM gold_ad_spend
 
campaign_id
total_spend_in_dollars
channel_spend
campaign_start_date
1
3d65f7e92e81480cac52a20dfdf64d5b
10000
{"Search Engine Marketing": 0.2, "Email": 0.2, "Social Network": 0.2, "Affiliates": 0.2, "Google Display Network": 0.2}
2020-05-17T00:00:00.000+0000

Showing all 1 rows.

Step 2.5: Explode struct into multiple rows
ad_spend_df = spark.sql('select explode(channel_spend) as (channel, pct_spend), \
                         round(total_spend_in_dollars * pct_spend, 2) as dollar_spend \
                         from gold_ad_spend')
 
ad_spend_df.createOrReplaceTempView("exploded_gold_ad_spend")
display(ad_spend_df)
 
channel
pct_spend
dollar_spend
1
2
3
4
5
Social Network
0.2
2000
Search Engine Marketing
0.2
2000
Google Display Network
0.2
2000
Affiliates
0.2
2000
Email
0.2
2000

Showing all 5 rows.

Step 3: View Campaign Performance

In this section, we will create the following charts:

  1. Base conversion rate
  2. Conversions by date
  3. Attribution by model type
  4. Cost per acquisition
Step 3.1: Base conversion rate
%sql
CREATE OR REPLACE TABLE base_conversion_rate
USING DELTA AS
SELECT count(*) as count,
  CASE 
    WHEN conversion == 0 
    THEN 'Impression'
    ELSE 'Conversion'
  END AS interaction_type
FROM
  gold_user_journey
GROUP BY
  conversion;
OK
base_converion_rate_pd = spark.table("base_conversion_rate").toPandas()
 
pie, ax = plt.subplots(figsize=[20,9])
labels = base_converion_rate_pd['interaction_type']
plt.pie(x=base_converion_rate_pd['count'], autopct="%.1f%%", explode=[0.05]*2, labels=labels, pctdistance=0.5)
plt.title("Base Conversion Rate");
Step 3.2: Conversions by date
%sql
CREATE OR REPLACE TABLE conversions_by_date 
USING DELTA AS
SELECT count(*) AS count,
  'Conversion' AS interaction_type,
  date(time) AS date
FROM bronze
WHERE conversion = 1
GROUP BY date
ORDER BY date;
OK
conversions_by_date_pd = spark.table("conversions_by_date").toPandas()
 
plt.figure(figsize=(20,9))
pt = sns.lineplot(x='date',y='count',data=conversions_by_date_pd)
 
pt.tick_params(labelsize=20)
pt.set_xlabel('Date')
pt.set_ylabel('Number of Conversions')
plt.title("Conversions by Date");
Step 3.3: Attribution by model type
%sql
CREATE OR REPLACE TABLE attribution_by_model_type 
USING DELTA AS
SELECT attribution_model, channel, round(attribution_percent * (
    SELECT count(*) FROM gold_user_journey WHERE conversion = 1)) AS conversions_attributed
FROM gold_attribution;
OK
attribution_by_model_type_pd = spark.table("attribution_by_model_type").toPandas()
 
pt = sns.catplot(x='channel',y='conversions_attributed',hue='attribution_model',data=attribution_by_model_type_pd, kind='bar', aspect=4, legend=True)
pt.fig.set_figwidth(20)
pt.fig.set_figheight(9)
 
plt.tick_params(labelsize=15)
plt.ylabel("Number of Conversions")
plt.xlabel("Channels")
plt.title("Channel Performance");
Step 3.4: Cost per acquisition
%sql 
CREATE OR REPLACE TABLE cpa_summary 
USING DELTA
AS
SELECT
  spending.channel,
  spending.dollar_spend,
  attribution_count.attribution_model,
  attribution_count.conversions_attributed,
  round(spending.dollar_spend / attribution_count.conversions_attributed,2) AS CPA_in_Dollars
FROM
  (SELECT explode(channel_spend) AS (channel, spend),
   round(total_spend_in_dollars * spend, 2) AS dollar_spend
   FROM gold_ad_spend) AS spending
JOIN
  (SELECT attribution_model, channel, round(attribution_percent * (
      SELECT count(*) FROM gold_user_journey WHERE conversion = 1)) AS conversions_attributed
   FROM gold_attribution) AS attribution_count
ON spending.channel = attribution_count.channel;
OK
cpa_summary_pd = spark.table("cpa_summary").toPandas()
 
pt = sns.catplot(x='channel', y='CPA_in_Dollars',hue='attribution_model',data=cpa_summary_pd, kind='bar', aspect=4, ci=None)
plt.title("Cost of Aquisition by Channel")
pt.fig.set_figwidth(20)
pt.fig.set_figheight(9)
 
plt.tick_params(labelsize=15)
plt.ylabel("CPA in $")
plt.xlabel("Channels")
plt.title("Channel Cost per Aquisition");

Step 4: Budget Allocation Optimization.

Now that we have assigned credit to our marketing channels using Markov Chains, we can take a data-driven approach for budget allocation.

  • One KPI we can take a look at is Return on Ad Spend (ROAS).
  • In the ecommerce world, ROAS is calculated as:
    ROAS = Revenue $ from marketing/ Advertising $ spent

In our example, instead of working with exact $ values, we will divide the % of conversion attributed to a channel by the % of total adspend allocated to that channel.

  • ROAS = CHANNEL CONVERSION WEIGHT / CHANNEL BUDGET WEIGHT
    • ROAS value > 1 signifies that the channel has been allocated less budget than warranted by its conversion rate.
    • ROAS value < 1 signifies that the channel has been allocated more budget than warranted by its conversion rate.
    • ROAS value = 1 signifies and optimized budget allocation.

From ROAS, we can calculate the Proposed Budget for each channel

  • Proposed budget = Current budget X ROAS

To calculate ROAS we will join the following Delta Tables:

  • gold_attribution: This table contains the calculated attribution % per channel based on different attribution models.
  • exploded_gold_ad_spend: This table contains the current budget allocated per channel. The column pct_spend documents the % of the total budget that has been allocated to a given channel.
%sql
SELECT * FROM exploded_gold_ad_spend;
 
channel
pct_spend
dollar_spend
1
2
3
4
5
Social Network
0.2
2000
Search Engine Marketing
0.2
2000
Google Display Network
0.2
2000
Affiliates
0.2
2000
Email
0.2
2000

Showing all 5 rows.

%sql
SELECT * FROM gold_attribution;
 
attribution_model
channel
attribution_percent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
first_touch
Affiliates
0.31
first_touch
Email
0.05
first_touch
Google Display Network
0.14
first_touch
Search Engine Marketing
0.29
first_touch
Social Network
0.21
last_touch
Affiliates
0.39
last_touch
Email
0.01
last_touch
Google Display Network
0.1
last_touch
Search Engine Marketing
0.2
last_touch
Social Network
0.3
markov_chain
Search Engine Marketing
0.22
markov_chain
Google Display Network
0.12
markov_chain
Social Network
0.26
markov_chain
Affiliates
0.37
markov_chain
Email
0.02

Showing all 15 rows.

%sql
CREATE OR REPLACE TABLE spend_optimization_view 
USING DELTA
AS
SELECT
  a.channel,
  a.pct_spend,
  b.attribution_percent,
  b.attribution_percent / a.pct_spend as ROAS,
  a.dollar_spend,
  round(
    (b.attribution_percent / a.pct_spend) * a.dollar_spend,
    2
  ) as proposed_dollar_spend
FROM
  exploded_gold_ad_spend a
  JOIN gold_attribution b on a.channel = b.channel
  and attribution_model = 'markov_chain';
  
CREATE
OR REPLACE TABLE spend_optimization_final 
USING DELTA AS
SELECT
  channel,
  'current_spending' AS spending,
  dollar_spend as budget
 FROM exploded_gold_ad_spend
UNION
SELECT
  channel,
  'proposed_spending' AS spending,
  proposed_dollar_spend as budget
FROM
  spend_optimization_view;  
OK
spend_optimization_final_pd = spark.table("spend_optimization_final").toPandas()
 
pt = sns.catplot(x='channel', y='budget', hue='spending', data=spend_optimization_final_pd, kind='bar', aspect=4, ci=None)
 
plt.tick_params(labelsize=15)
pt.fig.set_figwidth(20)
pt.fig.set_figheight(9)
plt.title("Spend Optimization per Channel")
plt.ylabel("Budget in $")
plt.xlabel("Channels")
Out[14]:
Text(0.5, 36.14399999999999, 'Channels')

Copyright Databricks, Inc. [2021]. The source in this notebook is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.

Library Name Library license Library License URL Library Source URL
Matplotlib Python Software Foundation (PSF) License https://matplotlib.org/stable/users/license.html https://github.com/matplotlib/matplotlib
Numpy BSD-3-Clause License https://github.com/numpy/numpy/blob/master/LICENSE.txt https://github.com/numpy/numpy
Pandas BSD 3-Clause License https://github.com/pandas-dev/pandas/blob/master/LICENSE https://github.com/pandas-dev/pandas
Python Python Software Foundation (PSF) https://github.com/python/cpython/blob/master/LICENSE https://github.com/python/cpython
Seaborn BSD-3-Clause License https://github.com/mwaskom/seaborn/blob/master/LICENSE https://github.com/mwaskom/seaborn
Spark Apache-2.0 License https://github.com/apache/spark/blob/master/LICENSE https://github.com/apache/spark