Estimating Marketing ROI in Insurance and Banking Using Observational Data: A Causal Inference Approach

Authors: Chaimae Sriti, Thierry Duchesne, Paul-Louis Rivest

Abstract

Measuring the return on investment (ROI) of marketing campaigns is a critical yet challenging task for insurance and banking institutions. Unlike randomized trials, observational data on advertising spending and customer behavior are plagued by confounding factors, correlated treatments across media channels, and strong seasonal patterns. In this paper, we formalize and expand a methodology for causal ROI estimation using observational data, with a focus on insurance quote requests and banking product outcomes (loan applications, account openings). We present a comprehensive comparison of statistical and causal inference techniques, including propensity score matching and weighting, regression adjustment, doubly robust estimation, and instrumental variables. We also incorporate dimension reduction (principal component analysis) and clustering to address high-dimensional covariates and improve covariate balance. The methods are illustrated in a detailed case study with simulated industry data, and we report results through comparative tables and figures. The analysis demonstrates how naive models can yield biased ROI estimates when marketing channels are correlated, and how causal methods mitigate these biases. We conclude by discussing practical challenges – such as unobserved confounders and seasonality – and outline future directions for robust marketing ROI analysis in financial services.

1. Introduction

Financial institutions like insurance companies and banks invest heavily in marketing across multiple channels – from direct mail and phone solicitation to radio, television, and online ads. A fundamental question for these firms is how much each advertising dollar truly contributes to business outcomes such as insurance quote requests, loan applications, or new account openings. Accurately estimating the causal effect (the ROI) of each marketing channel is vital for budget optimization and strategic planning. However, measuring causal ROI is challenging, especially with observational data. Unlike a randomized experiment, where advertising exposure could be randomly assigned, observational marketing data are non-experimental and subject to several complications...

2. Background and Problem Formulation

2.1 Causal ROI Analysis in Marketing Defining ROI and Causal Effect: Marketing ROI is typically defined as the incremental return (in revenue or relevant outcomes) per unit of investment in a marketing activity. In our context, we focus on causal ROI – that is, the increase in the expected number of insurance quotes or banking product sign-ups attributable to an advertising intervention, compared to a scenario with no such intervention (or a lower level of advertising)...

3. Methodology

In this section, we formalize the estimation problem and then detail each method used to estimate causal effects. We assume we have data indexed by i=1,...,N (which could represent region-period pairs or individual customers). Let Yi be the outcome of interest, TiA the treatment variable of primary interest (e.g., spend in Media A), Ti-A represent other treatment variables (other media spends), and Xi the vector of observed covariates...

4. Case Study

To illustrate the application of the above methods, we conduct a case study using simulated data. The simulation is designed to mimic a scenario for an insurance company estimating the impact of Radio Advertising (Media A) on the number of Insurance Quote Requests in different regions...

5. Conclusion and Future Work

Accurate estimation of marketing ROI in the insurance and banking sectors requires disentangling causation from correlation in observational data. In this paper, we presented a comprehensive methodology that combines statistical and econometric techniques to achieve this goal...

References

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.
Li, L. (2011). Propensity Score Analysis with Matching Weights. (Technical Report)
Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962–973
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press
Lewis, R. A., & Rao, J. M. (2015). The unfavorable economics of measuring the returns to advertising. Quarterly Journal of Economics, 130(4), 1941–1973
Abdi, H., & Williams, L. (2010). Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4), 433-459.
Barter, R. (2016). Confounding in causal inference: what is it, and what to do about it?
Bruce, A., & Bruce, P. (2017). Practical statistics for data scientists: 50 essential concepts. O'Reilly Media, Inc.
Chang, P. T., & Lee, E. S. (1996). A generalized fuzzy weighted least-squares regression. Fuzzy Sets and Systems, 82(3), 289-298.
Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2014). NbClust: An R package for determining the relevant number of clusters in a dataset. Journal of Statistical Software.
Dunn, P., & Gordon, S. (2018). Generalized Linear Models With Examples in R. Springer-Verlag New York.
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics, 21, 768-769.
Hotelling, H. (1933). Analysis of a complex of statistical variables with principal components. Journal of Educational Psychology, 24, 417-441.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Kassambara, A., & Mundt, F. (2017). factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.5.
Kulis, B., & Jordan, M. I. (2011). Revisiting k-means: New algorithms via bayesian nonparametrics. arXiv preprint arXiv:1111.0352.
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (Vol. 5). McGraw-Hill Irwin Boston.
Labarère, J., Bosson, J. L., François, P., & Fine, M. J. (2008). Propensity score analysis in observational research: application to a study of prophylaxis against venous thromboembolism. La Revue de medecine interne, 29(3), 255-258.
Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: A package for multivariate analysis. Journal of Statistical Software.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2016). Feature selection: A data perspective. ACM Computing Surveys, 50.
Lloyd, S. (1982). Least squares quantization in PCM. IEEE transactions on information theory, 28(2), 129-137.
Mao, H., & Li, L. (2018). PSW: Propensity Score Weighting Methods for Dichotomous Treatments. R package version 1.1-3.
Miguel, H., & Robins, J. (2019). Causal Inference. Boca Raton: Chapman Hall/CRC.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11).
Rubin, D. B. (1977). Assignment to treatment groups on the basis of a covariate. Journal of educational Statistics, 2(1), 1-26.
Sanghamitraand, B., & Sriparna, S. (2012). Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications. Springer Publishing Company.
Sarstedt, M., & Mooi, E. (2014). Regression Analysis (pp. 193-233).