Customer Churn Prediction


Customer retention is top priority for many companies, since the cost of acquiring new customers are several times more expensive than retaining existing ones.   Therefore, Customer Churn Prediction is one of the most common applications in business.  In fact, telecommunications and finance businesses were some of the earliest and widest adopters of customer retention applications.

Types of Churn

In business, churn can be characterized as either subscription or non-subscription:

  • Subscription:
    • Customers make purchases at discrete intervals, on a contract or autopay
    • Churn event is observed and explicitly recorded
    • Example: Netflix, cellular phone service provider
  • Non-Subscription:
    • Customers are free to buy or not at any time
    • Churn event is not explicitly observed
    • The challenge lies in defining a clear churn event timestamp, and is often done by finding a certain threshold for a period of inactivity.
    • Example: Bank accounts, online retailer


For subscription-based business:

  • Predict customers’ chance of leaving a cellular phone subscription plan.
  • Predict customers’ risk of leaving a health insurance contract.


For non-subscription-based business:

  • Predict if a customer will close their checking account.
  • Predict if a customer will stop shopping on a fashion website.
  • Predict if a customer will stop using an app or a game.


In this exercise, we develop machine learning models for a subscription-based businesses using Neural Network, Random Forest and 10 other algorithms.  Our technology toolbox for this exercise are Tensorflow, Keras, and scikit-learn.

Machine Learning Accelerator Framework

Monetize’s Machine Learning Accelerator Framework <Link to Blog: Machine Learning Accelerator Framework> shortens the life cycle of your data science project, through our learned best practices and automation of machine learning.  The five major steps of this framework are:


1. Understand your use case and goals

The first and most important step is to clearly understand your business use case and goals.


A well-constructed model can inform a wide range of decisions (not just make a churn prediction) and and that information can flow into numerous internal tools and applications for further action.


By deploying a predictive machine learning model over large set of data points, we can gain new insights into user behavior, fuel new engagement strategies and impact customer retention.


A high quality Customer Churn Prediction Model will be able to provide these capabilities:

  • Measuring feature impacts on the likelihood of churn in order to understand why customers choose to leave, which can inform long-term retention initiatives [feature ranking of churned customers]
  • Measuring feature impacts on the likelihood of churn in order to understand why customers choose to stay [feature ranking of non-churned customers]
  • Predicting the probability of churn and group these probabilities into different group for marketing and email campaigns, and discounting strategically with promotion campaigns to customers with a high churn risk [choose ML models which produce probability]
  • Integrating outputs with internal applications, such as a customer call center, to provide relevant real-time churn risk information.


Use Case: Your Marketing department has designed a special retention offer to avoid customer attrition.  How do we decide which customers should be offered the special retention deal prior to the expiration of their contracts?

2. Identify stakeholders and users

In our use case, these will be Marketing Department, IT, Statisticians, Data Scientists. ***

3. Identify metrics optimization

A common situation in Customer Churn is a class imbalance in dataset.  For example, 80% of the data are non-churning customers and 20% of the data are churning customers.


In Customer Churn situation, False Negatives are worse than False Positives.  False Negatives are will-be-churned customers who will not be included in the marketing promotion.  False Positives are customers who will not churned but receive the marketing promotion.  A model with lower number of False Negatives is usually better.


For simplicity, we will use Accuracy Score as a metric for our dataset in this blog.

4. Design a Data Collection Program

One approach to data collection is to take the data on prior churn and extract patterns, for example patterns of behavior that are useful and can help us to predict those customers who are more likely to leave in the future.  Each customer might be described by a large number of attributes, such as age, usage, customer service history and many other factors.

5. Automated Machine Learning

Preprocess Data


Label Encoding (aka Integer Encoding)

Categorical data are variables that contain label values e.g. State, Country, size) rather than numeric values.  Many machine learning algorithms such as neural networks are not able to support categorical values and require all input variables and output variables to be numeric.

In our dataset, we have categorical columns such as State, Subscribing to Internal Plan (Yes/No values), and Subscribing to Voicemail Plan (Yes/No values).  These data are encoded to integers.


One-Hot Encoding

In addition, for categorical variables where no ordinal relationship exists, such as State, One-Hot Encoding is applied.  Otherwise, it may result in poor performance or unexpected results in your prediction models because of a natural ordering between these integers.


Feature Scaling (aka Data Normalization)

We also normalize the data to values between [-1,1].

Build Models

Machine Learning Accelerator Framework automatically builds models using various machine learning algorithms using the following algorithms:

  • Neural Networks
  • Decision Tree
  • Random Forest
  • AdaBoost
  • Logistic Regression
  • kNN
  • Support Vector Machine
  • Gaussian Naive Bayes

Select Models


The best results are obtained using tree-based algorithms and neural networks.  Here they are:

  • Random Forest with an accuracy of 93.8%
  • MLP (multi-layer perceptron) (a form of neural network) with sklearn, with an accuracy of 89.8%
  • An ensemble of tree-based algorithms (AdaBoost), with accuracy of 89.1% on test set.
  • Artificial Neural Network with TensorFlow, with an accuracy of 88.8%.


In this blog, we covered different types of churn and illustrates a typical workflow to build your own customer churn prediction model.  We also illustrate how Machine Learning Accelerator Framework enable your organization to  utilize the industry’s best practices to build and evaluate machine learning models.  Summary results of different models are illustrated at the end.


If you are interested in bringing your organization to ****** please email *******


Leave a Reply

Your email address will not be published. Required fields are marked *