A Replication of Karlan and List (2007)

Author

Hanze Zou

Published

April 23, 2025

Introduction

Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the American Economic Review in 2007. The article and supporting data are available from the AEA website and from Innovations for Poverty Action as part of Harvard’s Dataverse.

This study implements a large-scale natural field experiment to investigate the causal effect of matching grants on individual charitable giving behavior. Over 50,000 previous donors to a politically liberal nonprofit organization were randomly assigned to receive one of several direct mail solicitations. The treatment conditions varied along three dimensions: the match ratio (1:1, 2:1, or 3:1), the maximum amount of the matching grant ($25,000, $50,000, $100,000, or unspecified), and the suggested contribution amount (equal to, 1.25×, or 1.5× the donor’s previous highest gift). A control group received an otherwise identical letter that did not mention any matching grant. This design allows for the estimation of the isolated and joint effects of perceived “price” reductions and solicitation framing on giving behavior. The results demonstrate that the presence of a matching grant significantly increases both the likelihood of donation and total revenue per solicitation, although higher match ratios beyond 1:1 do not yield additional increases in giving.

This project seeks to replicate their results.

Data

Description

Summary Statistics

import pandas as pd
import numpy as np

df = pd.read_stata("karlan_list_2007.dta")

desc = df.describe().T.round(4)
display(desc)

	count	mean	std	min	25%	50%	75%	max
treatment	50083.0	0.6668	0.4714	0.0000	0.0000	1.0000	1.0000	1.0000
control	50083.0	0.3332	0.4714	0.0000	0.0000	0.0000	1.0000	1.0000
ratio2	50083.0	0.2223	0.4158	0.0000	0.0000	0.0000	0.0000	1.0000
ratio3	50083.0	0.2222	0.4157	0.0000	0.0000	0.0000	0.0000	1.0000
size25	50083.0	0.1667	0.3727	0.0000	0.0000	0.0000	0.0000	1.0000
size50	50083.0	0.1666	0.3726	0.0000	0.0000	0.0000	0.0000	1.0000
size100	50083.0	0.1667	0.3727	0.0000	0.0000	0.0000	0.0000	1.0000
sizeno	50083.0	0.1667	0.3728	0.0000	0.0000	0.0000	0.0000	1.0000
askd1	50083.0	0.2223	0.4158	0.0000	0.0000	0.0000	0.0000	1.0000
askd2	50083.0	0.2223	0.4158	0.0000	0.0000	0.0000	0.0000	1.0000
askd3	50083.0	0.2222	0.4157	0.0000	0.0000	0.0000	0.0000	1.0000
ask1	50083.0	71.5018	101.7289	25.0000	35.0000	45.0000	65.0000	1500.0000
ask2	50083.0	91.7927	127.2526	35.0000	45.0000	60.0000	85.0000	1875.0000
ask3	50083.0	111.0463	151.6736	50.0000	55.0000	70.0000	100.0000	2250.0000
amount	50083.0	0.9157	8.7092	0.0000	0.0000	0.0000	0.0000	400.0000
gave	50083.0	0.0206	0.1422	0.0000	0.0000	0.0000	0.0000	1.0000
amountchange	50083.0	-52.6720	1267.2386	-200412.1250	-50.0000	-30.0000	-25.0000	275.0000
hpa	50083.0	59.3850	71.1773	0.0000	30.0000	45.0000	60.0000	1000.0000
ltmedmra	50083.0	0.4937	0.5000	0.0000	0.0000	0.0000	1.0000	1.0000
freq	50083.0	8.0394	11.3945	0.0000	2.0000	4.0000	10.0000	218.0000
years	50082.0	6.0975	5.5035	0.0000	2.0000	5.0000	9.0000	95.0000
year5	50083.0	0.5088	0.4999	0.0000	0.0000	1.0000	1.0000	1.0000
mrm2	50082.0	13.0073	12.0814	0.0000	4.0000	8.0000	19.0000	168.0000
dormant	50083.0	0.5235	0.4995	0.0000	0.0000	1.0000	1.0000	1.0000
female	48972.0	0.2777	0.4479	0.0000	0.0000	0.0000	1.0000	1.0000
couple	48935.0	0.0919	0.2889	0.0000	0.0000	0.0000	0.0000	1.0000
state50one	50083.0	0.0010	0.0316	0.0000	0.0000	0.0000	0.0000	1.0000
nonlit	49631.0	2.4739	1.9615	0.0000	1.0000	3.0000	4.0000	6.0000
cases	49631.0	1.4998	1.1551	0.0000	1.0000	1.0000	2.0000	4.0000
statecnt	50083.0	5.9988	5.7463	0.0020	1.8332	3.5388	9.6070	17.3688
stateresponse	50083.0	0.0206	0.0052	0.0000	0.0182	0.0197	0.0230	0.0769
stateresponset	50083.0	0.0220	0.0063	0.0000	0.0185	0.0217	0.0247	0.1111
stateresponsec	50080.0	0.0177	0.0075	0.0000	0.0129	0.0199	0.0208	0.0526
stateresponsetminc	50080.0	0.0043	0.0091	-0.0476	-0.0014	0.0018	0.0105	0.1111
perbush	50048.0	0.4879	0.0787	0.0909	0.4444	0.4848	0.5253	0.7320
close25	50048.0	0.1857	0.3889	0.0000	0.0000	0.0000	0.0000	1.0000
red0	50048.0	0.4045	0.4908	0.0000	0.0000	0.0000	1.0000	1.0000
blue0	50048.0	0.5955	0.4908	0.0000	0.0000	1.0000	1.0000	1.0000
redcty	49978.0	0.5102	0.4999	0.0000	0.0000	1.0000	1.0000	1.0000
bluecty	49978.0	0.4887	0.4999	0.0000	0.0000	0.0000	1.0000	1.0000
pwhite	48217.0	0.8196	0.1686	0.0094	0.7558	0.8728	0.9388	1.0000
pblack	48047.0	0.0867	0.1359	0.0000	0.0147	0.0366	0.0909	0.9896
page18_39	48217.0	0.3217	0.1030	0.0000	0.2583	0.3055	0.3691	0.9975
ave_hh_sz	48221.0	2.4290	0.3781	0.0000	2.2100	2.4400	2.6600	5.2700
median_hhincome	48209.0	54815.7005	22027.3167	5000.0000	39181.0000	50673.0000	66005.0000	200001.0000
powner	48214.0	0.6694	0.1934	0.0000	0.5602	0.7123	0.8168	1.0000
psch_atlstba	48215.0	0.3917	0.1866	0.0000	0.2356	0.3737	0.5300	1.0000
pop_propurban	48217.0	0.8720	0.2587	0.0000	0.8849	1.0000	1.0000	1.0000

Log Donation Amount by Giving Status

import matplotlib.pyplot as plt
import seaborn as sns
df['log_amount'] = np.log1p(df['amount'])

plt.figure(figsize=(6, 4))
sns.violinplot(x="gave", y="log_amount", data=df)
plt.title("Log Donation Amount by Giving Status")
plt.xlabel("Gave (0 = No, 1 = Yes)")
plt.ylabel("log(Amount + 1)")
plt.show()

The violin plot reveals stark differences in the distribution of log-transformed donation amounts between donors and non-donors. As expected, non-donors (gave = 0) cluster tightly at log(1) = 0, indicating a mass point of zero contributions. In contrast, donors (gave = 1) exhibit a wide and right-skewed distribution of giving behavior. The spread among donors reflects considerable heterogeneity, with a central tendency around moderate amounts and a long upper tail. This highlights the need to account for both zero-inflation and skewness when modeling donation behavior.

Donation Decision (Raw Counts)

plt.figure(figsize=(6, 4))
sns.countplot(x='gave', data=df)
plt.title('Donation Decision')
plt.xlabel('Gave (1 = Yes, 0 = No)')
plt.ylabel('Count')
plt.show()

This count plot shows the distribution of donation decisions. A significant majority of participants chose not to donate, emphasizing the zero-inflated nature of the outcome variable. This imbalance must be considered in subsequent analyses and modeling.

Treatment Assignment

plt.figure(figsize=(6, 4))
sns.countplot(x='treatment', data=df)
plt.title('Treatment Assignment')
plt.xlabel('Treatment (1 = Treated)')
plt.ylabel('Count')
plt.show()

The bar chart visualizes the allocation of individuals into treatment and control groups. Approximately two-thirds of the sample were assigned to treatment, reflecting the randomization strategy used in the experiment. The group sizes are reasonably balanced for comparative analysis.

Distribution of Match Ratios

plt.figure(figsize=(6, 4))
sns.countplot(x='ratio', data=df)
plt.title('Distribution of Match Ratios')
plt.xlabel('Match Ratio Type')
plt.ylabel('Count')
plt.show()

This plot displays the distribution of match ratio assignments across the sample. The control group is the largest, while each of the three treatment conditions—1:1, 2:1, and 3:1—were assigned in roughly equal proportions, ensuring variation in price treatments.

Variable Definitions

Variable	Description
`treatment`	Treatment
`control`	Control
`ratio`	Match ratio
`ratio2`	2:1 match ratio
`ratio3`	3:1 match ratio
`size`	Match threshold
`size25`	$25,000 match threshold
`size50`	$50,000 match threshold
`size100`	$100,000 match threshold
`sizeno`	Unstated match threshold
`ask`	Suggested donation amount
`askd1`	Suggested donation was highest previous contribution
`askd2`	Suggested donation was 1.25 x highest previous contribution
`askd3`	Suggested donation was 1.50 x highest previous contribution
`ask1`	Highest previous contribution (for suggestion)
`ask2`	1.25 x highest previous contribution (for suggestion)
`ask3`	1.50 x highest previous contribution (for suggestion)
`amount`	Dollars given
`gave`	Gave anything
`amountchange`	Change in amount given
`hpa`	Highest previous contribution
`ltmedmra`	Small prior donor: last gift was less than median $35
`freq`	Number of prior donations
`years`	Number of years since initial donation
`year5`	At least 5 years since initial donation
`mrm2`	Number of months since last donation
`dormant`	Already donated in 2005
`female`	Female
`couple`	Couple
`state50one`	State tag: 1 for one observation of each of 50 states; 0 otherwise
`nonlit`	Nonlitigation
`cases`	Court cases from state in 2004-5 in which organization was involved
`statecnt`	Percent of sample from state
`stateresponse`	Proportion of sample from the state who gave
`stateresponset`	Proportion of treated sample from the state who gave
`stateresponsec`	Proportion of control sample from the state who gave
`stateresponsetminc`	stateresponset - stateresponsec
`perbush`	State vote share for Bush
`close25`	State vote share for Bush between 47.5% and 52.5%
`red0`	Red state
`blue0`	Blue state
`redcty`	Red county
`bluecty`	Blue county
`pwhite`	Proportion white within zip code
`pblack`	Proportion black within zip code
`page18_39`	Proportion age 18-39 within zip code
`ave_hh_sz`	Average household size within zip code
`median_hhincome`	Median household income within zip code
`powner`	Proportion house owner within zip code
`psch_atlstba`	Proportion who finished college within zip code
`pop_propurban`	Proportion of population urban within zip code

Balance Test

As an ad hoc test of the randomization mechanism, I provide a series of tests that compare aspects of the treatment and control groups to assess whether they are statistically significantly different from one another.

Summary Balance Check (Mean Differences and p-values)

We report mean differences across treatment and control groups for a set of pre-treatment covariates, along with p-values from two-sample t-tests. These tests assess whether the randomization achieved balance on observable characteristics at baseline.

import pandas as pd
from scipy.stats import ttest_ind
from IPython.display import display

# Define variables to test
vars_to_test = [
    "mrm2", "hpa", "freq", "years", "dormant", "female", "couple",
    "pwhite", "pblack", "page18_39", "ave_hh_sz", "red0", "redcty"
]

# Split by treatment
df_treat = df[df["treatment"] == 1]
df_control = df[df["treatment"] == 0]

# Collect results
results = []
for var in vars_to_test:
    treat_vals = df_treat[var].dropna()
    control_vals = df_control[var].dropna()
    
    t_stat, p_val = ttest_ind(treat_vals, control_vals, equal_var=False)
    mean_diff = treat_vals.mean() - control_vals.mean()
    
    results.append({
        "Variable": var,
        "Diff": round(mean_diff, 6),
        "p_Value": round(p_val, 6)
    })

balance_df = pd.DataFrame(results)
display(balance_df)

	Variable	Diff	p_Value
0	mrm2	0.013686	0.904855
1	hpa	0.637074	0.331840
2	freq	-0.011979	0.911740
3	years	-0.057549	0.275317
4	dormant	0.000823	0.861961
5	female	-0.007547	0.079523
6	couple	-0.001617	0.560397
7	pwhite	-0.000913	0.576132
8	pblack	0.000129	0.922294
9	page18_39	-0.000124	0.901123
10	ave_hh_sz	0.003012	0.410315
11	red0	0.008727	0.060488
12	redcty	0.004289	0.365931

Regression-Based Balance Test

To further validate the randomization, we estimate simple OLS regressions of each baseline covariate on the treatment assignment indicator. In each case, the coefficient on the treatment dummy captures the average difference between groups, controlling for sampling variability. None of the estimates are statistically significant at conventional levels, supporting the success of randomization.

import statsmodels.formula.api as smf
for var in vars_to_test:
    model = smf.ols(f"{var} ~ treatment", data=df).fit()
    print(f"\n{var} ~ treatment")
    print(model.summary().tables[1])


mrm2 ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     12.9981      0.094    138.979      0.000      12.815      13.181
treatment      0.0137      0.115      0.119      0.905      -0.211       0.238
==============================================================================

hpa ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     58.9602      0.551    107.005      0.000      57.880      60.040
treatment      0.6371      0.675      0.944      0.345      -0.685       1.960
==============================================================================

freq ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      8.0473      0.088     91.231      0.000       7.874       8.220
treatment     -0.0120      0.108     -0.111      0.912      -0.224       0.200
==============================================================================

years ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      6.1359      0.043    144.023      0.000       6.052       6.219
treatment     -0.0575      0.052     -1.103      0.270      -0.160       0.045
==============================================================================

dormant ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5229      0.004    135.247      0.000       0.515       0.531
treatment      0.0008      0.005      0.174      0.862      -0.008       0.010
==============================================================================

female ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2827      0.004     80.688      0.000       0.276       0.290
treatment     -0.0075      0.004     -1.758      0.079      -0.016       0.001
==============================================================================

couple ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0930      0.002     41.124      0.000       0.089       0.097
treatment     -0.0016      0.003     -0.584      0.559      -0.007       0.004
==============================================================================

pwhite ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.8202      0.001    616.281      0.000       0.818       0.823
treatment     -0.0009      0.002     -0.560      0.575      -0.004       0.002
==============================================================================

pblack ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0866      0.001     80.617      0.000       0.085       0.089
treatment      0.0001      0.001      0.098      0.922      -0.002       0.003
==============================================================================

page18_39 ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.3218      0.001    395.516      0.000       0.320       0.323
treatment     -0.0001      0.001     -0.124      0.901      -0.002       0.002
==============================================================================

ave_hh_sz ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.4270      0.003    812.995      0.000       2.421       2.433
treatment      0.0030      0.004      0.824      0.410      -0.004       0.010
==============================================================================

red0 ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.3986      0.004    104.893      0.000       0.391       0.406
treatment      0.0087      0.005      1.875      0.061      -0.000       0.018
==============================================================================

redcty ~ treatment
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5074      0.004    130.990      0.000       0.500       0.515
treatment      0.0043      0.005      0.904      0.366      -0.005       0.014
==============================================================================

To verify the validity of the randomization mechanism, we conducted a series of balance tests comparing pre-treatment covariates between the treatment and control groups. The analysis includes demographic characteristics (e.g., gender, couple status, race), donation history (e.g., number of months since last donation, frequency of giving, highest previous donation), and geographic indicators (e.g., red state or red county residence).

None of the observed covariates differ significantly at the 5% level, with all p-values well above conventional thresholds. The smallest p-value observed was for the binary red0 variable (p = 0.060), which narrowly misses significance. The remaining variables exhibit even weaker associations with treatment assignment (e.g., female, p = 0.0795; hpa, p = 0.332). The differences in means across all covariates are minor in magnitude.

Together, these results suggest that the random assignment process was successful, and that the treatment and control groups are statistically comparable at baseline. This provides confidence that any post-treatment differences in outcomes can be causally attributed to the experimental interventions rather than pre-existing differences between groups.

Experimental Results

Charitable Contribution Made

First, I analyze whether matched donations lead to an increased response rate of making a donation.

Bar Plot: Donation Rates by Group

T-Test and Linear Regression

t-test: t = 3.101, p = 0.0019
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0179      0.001     16.225      0.000       0.016       0.020
treatment      0.0042      0.001      3.101      0.002       0.002       0.007
==============================================================================

The response rate in the control group is approximately 1.79%, while the treatment group shows a higher rate of about 2.21%. This 0.42 percentage point increase is statistically significant (t = 3.10, p = 0.002).

The linear regression confirms this: the coefficient on treatment is 0.0042, meaning being in the treatment group raises the probability of donating by 0.42 percentage points. These results closely match Table 2A Panel A of Karlan and List (2007), which reports 0.018 for control and 0.022 for treatment.

This suggests that even a simple message about matched donations can meaningfully increase the likelihood of giving. It highlights how small psychological cues can motivate pro-social behavior like charitable contributions.

import statsmodels.api as sm

df["intercept"] = 1
probit_model = sm.Probit(df["gave"], df[["intercept", "treatment"]])
result = probit_model.fit()
margeff = result.get_margeff()
print(result.summary())
print(margeff.summary())

Optimization terminated successfully.
         Current function value: 0.100443
         Iterations 7
                          Probit Regression Results                           
==============================================================================
Dep. Variable:                   gave   No. Observations:                50083
Model:                         Probit   Df Residuals:                    50081
Method:                           MLE   Df Model:                            1
Date:                Sun, 06 Jul 2025   Pseudo R-squ.:               0.0009783
Time:                        22:20:20   Log-Likelihood:                -5030.5
converged:                       True   LL-Null:                       -5035.4
Covariance Type:            nonrobust   LLR p-value:                  0.001696
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept     -2.1001      0.023    -90.073      0.000      -2.146      -2.054
treatment      0.0868      0.028      3.113      0.002       0.032       0.141
==============================================================================
       Probit Marginal Effects       
=====================================
Dep. Variable:                   gave
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
treatment      0.0043      0.001      3.104      0.002       0.002       0.007
==============================================================================

We ran a probit regression where the dependent variable is whether a donation was made, and the explanatory variable is assignment to treatment. The probit coefficient on treatment is 0.0868 (p = 0.002), which is statistically significant.

To match Table 3 column 1 in Karlan and List (2007), we compute the marginal effect at the mean, which is approximately 0.0042 with a standard error of 0.001. This matches the reported value of 0.004 (0.001), confirming the validity of our replication.

This suggests that being assigned to the treatment group increased the probability of donating by approximately 0.42 percentage points.

Differences between Match Rates

Next, I assess the effectiveness of different sizes of matched donations on the response rate.

df_ratio = df[df["treatment"] == 1]


gave_1to1 = df_ratio[df_ratio["ratio"] == 1]["gave"]
gave_2to1 = df_ratio[df_ratio["ratio2"] == 1]["gave"]
gave_3to1 = df_ratio[df_ratio["ratio3"] == 1]["gave"]

# compare 2:1 vs 1:1
t21, p21 = ttest_ind(gave_2to1, gave_1to1, equal_var=False)
# compare 3:1 vs 1:1
t31, p31 = ttest_ind(gave_3to1, gave_1to1, equal_var=False)

ttest_table = pd.DataFrame({
    "Comparison": ["2:1 vs 1:1", "3:1 vs 1:1"],
    "t-stat": [round(t21, 3), round(t31, 3)],
    "p-value": [round(p21, 4), round(p31, 4)]
})

display(ttest_table)

	Comparison	t-stat	p-value
0	2:1 vs 1:1	0.965	0.3345
1	3:1 vs 1:1	1.015	0.3101

==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0207      0.001     14.912      0.000       0.018       0.023
ratio2         0.0019      0.002      0.958      0.338      -0.002       0.006
ratio3         0.0020      0.002      1.008      0.313      -0.002       0.006
==============================================================================

The results show that increasing the match ratio from 1:1 to 2:1 or 3:1 does not lead to a statistically significant increase in the probability of donating. Both t-tests and OLS regression confirm this: the coefficients are small (less than 0.2 percentage points), and the p-values are above 0.3, well beyond common significance thresholds.

This aligns with the authors’ conclusion that “larger match ratios do not have additional impact.” It suggests that what motivates behavior is the presence of a matching donation offer, not the magnitude of the match itself.

3:1 vs 2:1 (via model) diff = 0.00010

To assess whether larger match ratios increase the likelihood of giving, we compute the response rate differences directly from the data and from the regression coefficients.

The donation rate for 1:1 is X%, for 2:1 is Y%, and for 3:1 is Z%. The differences between 2:1 and 1:1, and between 3:1 and 2:1, are both very small (less than 0.2 percentage points) and statistically insignificant.

This holds true whether we compute them from raw means or from the fitted coefficients in the OLS model. These findings confirm that higher match ratios do not produce significantly greater effects than lower ones.

Size of Charitable Contribution

In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.

Unconditional Amount t-test: t = 1.918, p = 0.0551
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.8133      0.067     12.063      0.000       0.681       0.945
treatment      0.1536      0.083      1.861      0.063      -0.008       0.315
==============================================================================

==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     45.5403      2.423     18.792      0.000      40.785      50.296
treatment     -1.6684      2.872     -0.581      0.561      -7.305       3.968
==============================================================================

We first examine donation amounts across treatment and control groups, regardless of whether someone donated. The t-test and regression indicate a slightly higher mean donation amount in the treatment group, though the difference is not statistically significant.

Next, we restrict to only those who made a donation. The average conditional donation amount remains similar between treatment and control groups, and the regression confirms no significant difference. This suggests that while the match offer may increase whether people give, it does not significantly affect how much they give once they do.

Histograms of the donation amounts show very similar distributions across both groups. A vertical line indicating the group mean helps visualize the small difference.

Simulation Experiment

As a reminder of how the t-statistic “works,” in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem.

Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made.

Further suppose that the true distribution of respondents who do get a charitable donation match of any size is Bernoulli with probability p=0.022 that a donation is made.

Law of Large Numbers

control_vals = df[df["treatment"] == 0]["amount"].dropna().values
treat_vals = df[df["treatment"] == 1]["amount"].dropna().values

np.random.seed(42)
control_draws = np.random.choice(control_vals, 10000, replace=True)
treatment_draws = np.random.choice(treat_vals, 10000, replace=True)

diffs = treatment_draws - control_draws

cumulative_avg = np.cumsum(diffs) / np.arange(1, 10001)

true_diff = treat_vals.mean() - control_vals.mean()


plt.figure(figsize=(10, 5))
plt.plot(cumulative_avg, label='Cumulative Average of Differences')
plt.axhline(true_diff, color='red', linestyle='--', label=f'True Mean Diff ({true_diff:.2f})')
plt.title("Law of Large Numbers: Cumulative Avg of Treatment - Control")
plt.xlabel("Number of Simulations")
plt.ylabel("Difference in Means")
plt.legend()
plt.tight_layout()
plt.show()

This plot demonstrates the Law of Large Numbers using the treatment and control donation amount distributions. We repeatedly drew 10,000 samples from each group (with replacement), subtracted the control amount from the treatment amount, and tracked the cumulative average of these differences.

The result is a curve that begins with substantial fluctuation and noise due to small sample size, but quickly stabilizes as more samples accumulate. Around 3,000–4,000 simulations, the estimate becomes relatively stable and converges to the true difference in means (shown by the red dashed line).

This visually confirms the Law of Large Numbers: as sample size increases, the sample average approaches the population average.

Central Limit Theorem

np.random.seed(42)

control_data = df[df["treatment"] == 0]["amount"].dropna().values
treat_data = df[df["treatment"] == 1]["amount"].dropna().values


sample_sizes = [50, 200, 500, 1000]
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, n in enumerate(sample_sizes):
    diffs = []
    for _ in range(2000):
        control_sample = np.random.choice(control_data, n, replace=True)
        treat_sample = np.random.choice(treat_data, n, replace=True)
        diffs.append(treat_sample.mean() - control_sample.mean())
    
    sns.histplot(diffs, bins=30, ax=axes[idx], kde=True, color="blue")
    axes[idx].axvline(0, color="red", linestyle="--")
    axes[idx].set_title(f"Sample size = {n}")
    axes[idx].set_xlabel("Mean Difference (Treatment - Control)")
    axes[idx].set_ylabel("Frequency")

plt.suptitle("Central Limit Theorem Simulation", fontsize=16)
plt.tight_layout()
plt.subplots_adjust(top=0.92)
plt.show()

This simulation illustrates the Central Limit Theorem by repeatedly sampling from the treatment and control groups at increasing sample sizes: 50, 200, 500, and 1000.

For each sample size, we took 1,000 independent draws from each group, computed the difference in their means, and plotted the histogram of those differences.

We observe the following:

At small sample sizes (n=50), the distribution of average differences is wide and irregular, with noticeable skewness and occasional outliers. The red vertical line (at zero) is often not near the center.
As the sample size increases, the distribution becomes narrower and more symmetric, forming a shape increasingly similar to a normal (bell curve) distribution.
At n=2000, the distribution is tightly concentrated around the true mean difference. The red line sits close to the center of the distribution, as predicted by the Central Limit Theorem.

This simulation provides strong visual evidence that as sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the original data’s shape. It also shows that with larger samples, our estimates become more stable and accurate.