Creative Gaming: Uplift Modeling

Author

Hanze Zou

Published

October 28, 2025

Case Overview – Creative Gaming: Uplift Modeling

The Creative Gaming: Space Pirates case explores the marketing challenges faced by Creative Gaming, a mobile game company whose hit title, Space Pirates, achieved rapid early success. To maintain player engagement and drive revenue, the company launched a paid campaign expansion called Zalon, priced at $14.99. However, only 5.75% of active users purchased the campaign, raising concerns about pricing and marketing effectiveness. CMO Mi Haruki tasked the analytics team with developing a data-driven targeting strategy to increase campaign adoption without changing the price.

Building on previous “propensity-to-buy” work, the new assignment focuses on uplift modeling—an approach that estimates the incremental impact of marketing interventions by distinguishing between users who buy because of an ad and those who would have purchased anyway. Using experimental data from 60,000 players—half exposed to ads (cg_ad_random) and half in the control group (cg_organic_control)—the objective is to predict which users show the highest causal uplift in conversion probability.

The project involves training multiple machine learning models—logistic regression, neural networks, random forests, and XGBoost—to compare uplift performance with traditional propensity models. The analysis will also determine the optimal share of customers to target in order to maximize incremental profit, not just response rate.

Ultimately, this case highlights how uplift modeling enables Creative Gaming to allocate marketing resources efficiently, improving ROI by focusing only on users whose behavior can be positively influenced by advertising.

Data

Description

import pandas as pd
import pyrsm as rsm

rsm.md("cg_ad_treatment_description.md")

Game telemetry dataset used for the Creative Gaming: Propensity-to-Buy Modeling case

Feature descriptions

  • converted: Purchased the Zalon campain (“yes” or “no”)
  • GameLevel: Highest level of game achieved by the user
  • NumGameDays: Number of days user played the game in last month (with or without network connection)
  • NumGameDays4Plus: Number of days user played the game in last month with 4 or more total users (this implies using a network connection)
  • NumInGameMessagesSent: Number of in-game messages sent to friends
  • NumFriends: Number of friends to which the user is connected (necessary to crew together in multiplayer mode)
  • NumFriendRequestIgnored: Number of friend requests this user has not replied to since game inception
  • NumSpaceHeroBadges: Number of “Space Hero” badges, the highest distinction for gameplay in Space Pirates
  • AcquiredSpaceship: Flag if the user owns a spaceship, i.e., does not have to crew on another user’s or NPC’s space ship (“no” or “yes”)
  • AcquiredIonWeapon: Flag if the user owns the powerful “ion weapon” (“no” or “yes”)
  • TimesLostSpaceship: The number of times the user destroyed his/her spaceship during gameplay. Spaceships need to be re-acquired if destroyed.
  • TimesKilled: Number of times the user was killed during gameplay
  • TimesCaptain: Number of times in last month that the user played in the role of a captain
  • TimesNavigator: Number of times in last month that the user played in the role of a navigator
  • PurchasedCoinPackSmall: Flag if the user purchased a small pack of Zathium in last month (“no” or “yes”)
  • PurchasedCoinPackLarge: Flag if the user purchased a large pack of Zathium in last month (“no” or “yes”)
  • NumAdsClicked: Number of in-app ads the user has clicked on
  • DaysUser: Number of days since user established a user ID with Creative Gaming (for Space Pirates or previous games)
  • UserConsole: Flag if the user plays Creative Gaming games on a console (“no” or “yes”)
  • UserHasOldOS: Flag if the user has iOS version 10 or earlier (“no” or “yes”)
  • rnd_30k: Dummy variable that randomly selects 30K customers (1) and the remaining 90K (0)

Data Preprocessing for Uplift Modeling

Before building the uplift model, several preprocessing steps were performed to construct a dataset suitable for causal inference.
The goal was to create a unified structure that allows direct comparison between users exposed to the advertisement (treatment group) and those who were not (control group).


1. Labeling Treatment and Control Groups

A binary variable ad was added to both datasets:

# Add a variable “ad” to cg_ad_random and set its value to 1 for all rows
cg_ad_random["ad"] = 1
cg_ad_random.head()

# Add a variable “ad” to cg_organic_control and set its value to 0 for all rows
cg_organic_control["ad"] = 0
cg_organic_control.head()
converted GameLevel NumGameDays NumGameDays4Plus NumInGameMessagesSent NumSpaceHeroBadges NumFriendRequestIgnored NumFriends AcquiredSpaceship AcquiredIonWeapon ... TimesKilled TimesCaptain TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge NumAdsClicked DaysUser UserConsole UserHasOldOS ad
0 no 7 18 0 124 0 81 0 yes no ... 0 0 4 no yes 3 2101 no no 0
1 no 10 3 2 60 0 18 479 no no ... 7 0 0 yes no 7 1644 yes no 0
2 no 2 1 0 0 0 0 0 no no ... 0 0 2 no no 8 3197 yes yes 0
3 no 2 11 1 125 0 73 217 no no ... 0 0 0 yes no 6 913 no no 0
4 no 8 15 0 0 0 6 51 yes no ... 0 2 1 yes no 21 2009 yes no 0

5 rows × 21 columns

2. Combining the Datasets

Both groups were stacked vertically into one dataset:

# Create a stacked dataset for the uplift analysis by combining cg_organic_control(Group 1) and cg_ad_random (Group 2). Use cg_rct_stacked as the name for the stacked dataset.
cg_rct_stacked = pd.concat([cg_organic_control, cg_ad_random], axis=0).reset_index(drop=True)
cg_rct_stacked.head()
converted GameLevel NumGameDays NumGameDays4Plus NumInGameMessagesSent NumSpaceHeroBadges NumFriendRequestIgnored NumFriends AcquiredSpaceship AcquiredIonWeapon ... TimesKilled TimesCaptain TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge NumAdsClicked DaysUser UserConsole UserHasOldOS ad
0 no 7 18 0 124 0 81 0 yes no ... 0 0 4 no yes 3 2101 no no 0
1 no 10 3 2 60 0 18 479 no no ... 7 0 0 yes no 7 1644 yes no 0
2 no 2 1 0 0 0 0 0 no no ... 0 0 2 no no 8 3197 yes yes 0
3 no 2 11 1 125 0 73 217 no no ... 0 0 0 yes no 6 913 no no 0
4 no 8 15 0 0 0 6 51 yes no ... 0 2 1 yes no 21 2009 yes no 0

5 rows × 21 columns

This creates a single modeling framework where each observation contains:

  • the treatment indicator ad
  • the target variable converted
  • user-level telemetry features (e.g., game activity, purchases, messages, etc.)

3. Creating Training and Test Splits

A new variable training was generated using stratified sampling by both converted and ad:

# Create a training variable (70% training and 30% test). Use 1234 as the seed. Use “converted” and “ad” as the blocking variables, in that order.
cg_rct_stacked["training"] = rsm.model.make_train(
    data = cg_rct_stacked, 
    test_size = 0.3, 
    strat_var=["converted", "ad"], 
    random_state=1234
)
cg_rct_stacked.head()
converted GameLevel NumGameDays NumGameDays4Plus NumInGameMessagesSent NumSpaceHeroBadges NumFriendRequestIgnored NumFriends AcquiredSpaceship AcquiredIonWeapon ... TimesCaptain TimesNavigator PurchasedCoinPackSmall PurchasedCoinPackLarge NumAdsClicked DaysUser UserConsole UserHasOldOS ad training
0 no 7 18 0 124 0 81 0 yes no ... 0 4 no yes 3 2101 no no 0 1.0
1 no 10 3 2 60 0 18 479 no no ... 0 0 yes no 7 1644 yes no 0 1.0
2 no 2 1 0 0 0 0 0 no no ... 0 2 no no 8 3197 yes yes 0 1.0
3 no 2 11 1 125 0 73 217 no no ... 0 0 yes no 6 913 no no 0 0.0
4 no 8 15 0 0 0 6 51 yes no ... 2 1 yes no 21 2009 yes no 0 1.0

5 rows × 22 columns

Stratification ensures that treatment/control and conversion distributions remain balanced between training and test sets, preventing sampling bias.

4. Validating Stratification

A cross-tabulation confirmed that conversion rates were balanced between the training and test sets within both groups:

# Check if the probability of yes/no is similar across the training and test sets for ad ==0 and ad == 1.
pd.crosstab(cg_rct_stacked.converted, [cg_rct_stacked.ad, cg_rct_stacked.training],normalize="columns").round(2)
ad 0 1
training 0.0 1.0 0.0 1.0
converted
yes 0.06 0.06 0.13 0.13
no 0.94 0.94 0.87 0.87

Logistic Regression Uplift Model

To estimate the causal effect of advertising on conversion, two logistic regression models were trained separately for the treatment group (ad = 1) and the control group (ad = 0).
Both models used the same set of explanatory variables derived from in-game telemetry data.
Variables unrelated to player behavior or statistically insignificant in preliminary tests (e.g., TimesKilled, NumFriendRequestIgnored, DaysUser, AcquiredIonWeapon) were excluded to reduce noise and potential multicollinearity.

['GameLevel',
 'NumGameDays',
 'NumGameDays4Plus',
 'NumInGameMessagesSent',
 'NumSpaceHeroBadges',
 'NumFriendRequestIgnored',
 'NumFriends',
 'AcquiredSpaceship',
 'AcquiredIonWeapon',
 'TimesLostSpaceship',
 'TimesKilled',
 'TimesCaptain',
 'TimesNavigator',
 'PurchasedCoinPackSmall',
 'PurchasedCoinPackLarge',
 'NumAdsClicked',
 'DaysUser',
 'UserConsole',
 'UserHasOldOS']
# remove insignificant variables
for col in ["TimesKilled", "TimesNavigator","NumInGameMessagesSent","NumFriendRequestIgnored","DaysUser","AcquiredIonWeapon","PurchasedCoinPackSmall","UserConsole"]:
    if col in evar:
        evar.remove(col)

evar
['GameLevel',
 'NumGameDays',
 'NumGameDays4Plus',
 'NumSpaceHeroBadges',
 'NumFriends',
 'AcquiredSpaceship',
 'TimesLostSpaceship',
 'TimesCaptain',
 'PurchasedCoinPackLarge',
 'NumAdsClicked',
 'UserHasOldOS']

Treatment Model :

clf_treatment = rsm.model.logistic(
    data = cg_rct_stacked.query("training == 1 & ad == 1"), 
    rvar = "converted",
    lev="yes",
    evar = evar, 
)
clf_treatment.summary()
Logistic regression (GLM)
Data                 : Not provided
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumSpaceHeroBadges, NumFriends, AcquiredSpaceship, TimesLostSpaceship, TimesCaptain, PurchasedCoinPackLarge, NumAdsClicked, UserHasOldOS
Null hyp.: There is no effect of x on converted
Alt. hyp.: There is an effect of x on converted

                                OR     OR%  coefficient  std.error  z.value p.value     
Intercept                    0.027  -97.3%       -3.597      0.073  -49.297  < .001  ***
AcquiredSpaceship[yes]       1.087    8.7%        0.084      0.049    1.723   0.085    .
PurchasedCoinPackLarge[yes]  1.210   21.0%        0.191      0.048    3.949  < .001  ***
UserHasOldOS[yes]            0.799  -20.1%       -0.224      0.081   -2.756   0.006   **
GameLevel                    1.059    5.9%        0.058      0.009    6.410  < .001  ***
NumGameDays                  1.015    1.5%        0.015      0.004    4.195  < .001  ***
NumGameDays4Plus             1.011    1.1%        0.011      0.006    1.715   0.086    .
NumSpaceHeroBadges           1.028    2.8%        0.027      0.009    2.978   0.003   **
NumFriends                   1.002    0.2%        0.002      0.000    9.638  < .001  ***
TimesLostSpaceship           0.993   -0.7%       -0.007      0.002   -3.159   0.002   **
TimesCaptain                 1.005    0.5%        0.005      0.002    2.111   0.035    *
NumAdsClicked                1.094    9.4%        0.089      0.003   33.177  < .001  ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared (McFadden): 0.096
Pseudo R-squared (McFadden adjusted): 0.095
Area under the RO Curve (AUC): 0.711
Log-likelihood: -7348.152, AIC: 14720.304, BIC: 14815.732
Chi-squared: 1566.121, df(11), p.value < 0.001 
Nr obs: 21,000

Control Model :

clf_control = rsm.model.logistic(
    data = cg_rct_stacked.query("training == 1 & ad == 0"), 
    rvar = "converted",
    lev="yes",
    evar = evar, 
)
clf_control.summary()
Logistic regression (GLM)
Data                 : Not provided
Response variable    : converted
Level                : yes
Explanatory variables: GameLevel, NumGameDays, NumGameDays4Plus, NumSpaceHeroBadges, NumFriends, AcquiredSpaceship, TimesLostSpaceship, TimesCaptain, PurchasedCoinPackLarge, NumAdsClicked, UserHasOldOS
Null hyp.: There is no effect of x on converted
Alt. hyp.: There is an effect of x on converted

                                OR     OR%  coefficient  std.error  z.value p.value     
Intercept                    0.008  -99.2%       -4.797      0.118  -40.594  < .001  ***
AcquiredSpaceship[yes]       1.500   50.0%        0.406      0.071    5.698  < .001  ***
PurchasedCoinPackLarge[yes]  1.408   40.8%        0.342      0.073    4.697  < .001  ***
UserHasOldOS[yes]            0.821  -17.9%       -0.198      0.124   -1.597    0.11     
GameLevel                    1.103   10.3%        0.098      0.014    6.936  < .001  ***
NumGameDays                  1.032    3.2%        0.031      0.005    5.812  < .001  ***
NumGameDays4Plus             1.042    4.2%        0.041      0.008    5.045  < .001  ***
NumSpaceHeroBadges           1.473   47.3%        0.387      0.012   32.428  < .001  ***
NumFriends                   1.001    0.1%        0.001      0.000    3.845  < .001  ***
TimesLostSpaceship           0.942   -5.8%       -0.060      0.006  -10.130  < .001  ***
TimesCaptain                 0.994   -0.6%       -0.006      0.003   -1.742   0.081    .
NumAdsClicked                1.029    2.9%        0.029      0.004    7.675  < .001  ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared (McFadden): 0.192
Pseudo R-squared (McFadden adjusted): 0.189
Area under the RO Curve (AUC): 0.822
Log-likelihood: -3705.11, AIC: 7434.221, BIC: 7529.648
Chi-squared: 1755.473, df(11), p.value < 0.001 
Nr obs: 21,000

The treatment model achieved a McFadden pseudo-R² of approximately 0.096 with an AUC of 0.71, while the control model performed slightly better (pseudo-R² ≈ 0.19, AUC ≈ 0.82).
For both models, features such as GameLevel, NumGameDays, NumSpaceHeroBadges, and NumAdsClicked showed strong positive effects on the probability of purchasing the Zalon campaign.
These results indicate that highly engaged players—those with frequent gameplay and in-app activity—are more likely to convert regardless of ad exposure.

Predicted probabilities from both regressions were combined to compute an uplift score, defined as:

[ _i = (Y=1 ad=1, X_i) - (Y=1 ad=0, X_i) ]

This value represents the incremental likelihood of purchase attributable to advertising.

pred_store = cg_rct_stacked.copy()
pred_store["clf_pred_treatment"] = clf_treatment.predict(cg_rct_stacked)['prediction']
pred_store["clf_pred_control"] = clf_control.predict(cg_rct_stacked)['prediction']
pred_store["clf_uplift_score"] = pred_store.clf_pred_treatment - pred_store.clf_pred_control

Using the uplift_tab function with 20 quantile bins, the resulting uplift table and incremental uplift plot demonstrated that the model effectively distinguishes between segments positively influenced by ads and those with neutral or negative responses.
The top-scored deciles produced substantial incremental response gains, confirming that logistic regression provides a meaningful baseline for uplift-based targeting before testing more complex models such as Random Forest or XGBoost.

clf_tab = rsm.uplift_tab(pred_store.query("training == 0"), "converted", "yes", "clf_uplift_score", "ad", 1, qnt = 20)
clf_tab
pred bins cum_prop T_resp T_n C_resp C_n incremental_resp inc_uplift uplift
0 clf_uplift_score 1 0.05 194 450 67 618 145.213592 1.613484 0.322697
1 clf_uplift_score 2 0.10 318 900 96 1170 244.153846 2.712821 0.223019
2 clf_uplift_score 3 0.15 433 1350 125 1710 334.315789 3.714620 0.201852
3 clf_uplift_score 4 0.20 541 1800 156 2161 411.060157 4.567335 0.171264
4 clf_uplift_score 5 0.25 599 2250 172 2642 452.520061 5.028001 0.095625
5 clf_uplift_score 6 0.30 647 2700 181 3084 488.536965 5.428188 0.086305
6 clf_uplift_score 7 0.35 689 3150 199 3621 515.884838 5.732054 0.059814
7 clf_uplift_score 8 0.40 730 3600 211 4108 545.092502 6.056583 0.066470
8 clf_uplift_score 9 0.45 754 4049 232 4572 548.538933 6.094877 0.008193
9 clf_uplift_score 10 0.50 791 4500 244 5074 574.602680 6.384474 0.058136
10 clf_uplift_score 11 0.55 830 4950 254 5525 602.434389 6.693715 0.064494
11 clf_uplift_score 12 0.60 861 5400 273 6012 615.790419 6.842116 0.029875
12 clf_uplift_score 13 0.65 886 5850 282 6458 630.549396 7.006104 0.035376
13 clf_uplift_score 14 0.70 928 6300 295 6886 658.104560 7.312273 0.062960
14 clf_uplift_score 15 0.75 962 6750 310 7350 677.306122 7.525624 0.043228
15 clf_uplift_score 16 0.80 1007 7200 323 7760 707.309278 7.858992 0.068293
16 clf_uplift_score 17 0.85 1041 7650 334 8232 730.613703 8.117930 0.052250
17 clf_uplift_score 18 0.90 1079 8100 359 8624 741.813080 8.242368 0.020669
18 clf_uplift_score 19 0.95 1131 8550 429 8828 715.509515 7.950106 -0.227582
19 clf_uplift_score 20 1.00 1174 9000 512 9000 662.000000 7.355556 -0.387003

Model Evaluation and Comparison

Uplift Model Performance

The incremental uplift curve demonstrates that the logistic regression uplift model effectively distinguishes between customers who are positively influenced by advertising and those who are not.
The incremental gain peaks around the top 30–40% of the population, where the additional conversion rate is the highest.
Beyond this range, the curve begins to flatten, indicating diminishing marginal returns from additional targeting.

The uplift distribution plot provides a complementary view: the highest deciles exhibit strong positive uplift, while the lowest segments show negative uplift values.
This implies that targeting low-scoring users could even reduce conversions—highlighting the importance of selective marketing.


Comparison with Propensity Model

When comparing the uplift and propensity-based approaches, the uplift model (clf_uplift_score) consistently outperforms the propensity model (clf_pred_treatment) across nearly all quantile bins.
As seen in the incremental uplift plot, the uplift curve lies above the propensity curve, achieving higher incremental gains for the same proportion of targeted customers.

This shows that the uplift model better identifies causally responsive segments, rather than simply those with high baseline purchase probabilities.

The uplift bar chart further confirms this pattern: although both models capture high-value customers, the uplift model more effectively isolates the truly persuadable users who convert because of the ad.
This distinction is key for efficient campaign allocation.


Profitability Analysis

Based on the extrapolation to a total audience of 120,000 customers, targeting the top 25% (≈30,000 users) using the uplift model yields approximately $45,443 in additional profit compared to no targeting.

perc = 30000 / 120000
clf_incremental = clf_tab[clf_tab["cum_prop"] == perc]["incremental_resp"].item()
clf_extra_profit = clf_incremental * (120000 / 9000) * 14.99 - 30000 *1.5
print(f"Extra profit: ${clf_extra_profit:.2f}")
Extra profit: $45443.68

When comparing against the propensity-based targeting strategy, the uplift model delivers an estimated $13,457.75 higher incremental profit under the same budget assumptions.

clf_propensity_tab = rsm.uplift_tab(
    pred_store.query("training == 0"),
    "converted",
    "yes",
    "clf_pred_treatment",
    "ad",
    1,
    qnt = 20,
)
clf_propensity_tab

clf_propensity_incremental = clf_propensity_tab[clf_propensity_tab["cum_prop"] == perc]["incremental_resp"].item()
clf_extra_profit_propensity = (clf_incremental-clf_propensity_incremental) * (120000 / 9000) * 14.99

print(f"Expect to earn {clf_extra_profit_propensity:.2f} more by using an uplift model rather than a propensity model")
Expect to earn 13457.75 more by using an uplift model rather than a propensity model

These results confirm that the uplift model provides a more effective and profitable targeting framework, ensuring that advertising resources focus on customers who are most likely to respond positively.


Key Takeaways

  • Uplift modeling captures incremental ad effects rather than overall likelihoods.
  • Top deciles show the largest positive uplift; targeting beyond 40–50% becomes inefficient.
  • Negative uplift in the lowest bins warns against indiscriminate ad exposure.
  • Compared to the propensity model, the uplift model increases both conversion efficiency and incremental profit.

Overall, the logistic uplift model demonstrates clear practical value by maximizing ad ROI and avoiding unnecessary spending on non-responsive users. ## Model Evaluation and Comparison

Uplift Model Performance

The incremental uplift curve demonstrates that the logistic regression uplift model effectively distinguishes between customers who are positively influenced by advertising and those who are not.
The incremental gain peaks around the top 30–40% of the population, where the additional conversion rate is the highest.
Beyond this range, the curve begins to flatten, indicating diminishing marginal returns from additional targeting.

The uplift distribution plot provides a complementary view: the highest deciles exhibit strong positive uplift, while the lowest segments show negative uplift values.
This implies that targeting low-scoring users could even reduce conversions—highlighting the importance of selective marketing.


Comparison with Propensity Model

When comparing the uplift and propensity-based approaches, the uplift model (clf_uplift_score) consistently outperforms the propensity model (clf_pred_treatment) across nearly all quantile bins.
As seen in the incremental uplift plot, the uplift curve lies above the propensity curve, achieving higher incremental gains for the same proportion of targeted customers.
This shows that the uplift model better identifies causally responsive segments, rather than simply those with high baseline purchase probabilities.

The uplift bar chart further confirms this pattern: although both models capture high-value customers, the uplift model more effectively isolates the truly persuadable users who convert because of the ad.
This distinction is key for efficient campaign allocation.


Profitability Analysis

Based on the extrapolation to a total audience of 120,000 customers, targeting the top 25% (≈30,000 users) using the uplift model yields approximately $45,443 in additional profit compared to no targeting.
When comparing against the propensity-based targeting strategy, the uplift model delivers an estimated $13,457.75 higher incremental profit under the same budget assumptions.

These results confirm that the uplift model provides a more effective and profitable targeting framework, ensuring that advertising resources focus on customers who are most likely to respond positively.


Key Takeaways

  • Uplift modeling captures incremental ad effects rather than overall likelihoods.
  • Top deciles show the largest positive uplift; targeting beyond 40–50% becomes inefficient.
  • Negative uplift in the lowest bins warns against indiscriminate ad exposure.
  • Compared to the propensity model, the uplift model increases both conversion efficiency and incremental profit.

Overall, the logistic uplift model demonstrates clear practical value by maximizing ad ROI and avoiding unnecessary spending on non-responsive users.