In the age of big data wise decisions and actions are more likely when the power of the data is mixed with professional expertise in a proper ratio. One recent example is the idea of implementing a machine learning algorithm to address customer cancellations. As we introduced in Mitigate Customer Churn with Data Science, customer cancellation (churn) harms business and data science has been a proven cure. Algorithms can handle the complex factors driving cancellations and help us address concerns proactively with customers who have not had the right experience. Let’s jump into the details:
The Aim of Our Predictive Churn Model
The model was created for two purposes:
How We Approached this Problem
First, we pulled delivery metric as well as contract terms over a one-year time period for the model to learn patterns. A solid chunk of historical data is required for many algorithms to discover some of the more complex nuances of the data.
Our aim is to predict which customers are likely to cancel a product within a near-term timeframe. Start with splitting up the customer life into multiple 30-day windows, we end up with a training dataset that mimics our desired prediction: likelihood to cancel within 30 days. Using the data from the most recent 30-day experience we can plug the trained parameters into our model to predict as far as 30 days in the future. This paradigm gives us a target, or response variable, that can be predicted from a list of explanatory variables, also known as attributes.
This data preparation approach is based on the assumption that a bad experience in the past 30 days is highly influential in triggering the customer’s decision to cancel. Since the 30-day windows for one customer are not completely independent, we link adjacent windows by including variables that compare the delivery changes between each window and the one before it. These window-over-window difference and lagged variables help to establish continuity and history in each of the individual records provided to the model for training.
The importance of choosing a right group of explanatory variables to include into your model can never be over-emphasized. It is quite common that a simple model performs well just by providing it the best features. As Luca Massaron, a senior data scientists who ranks high on Kaggle said, “No algorithm alone, to my knowledge, can supplement the information gain given by correct feature engineering.” The explanatory variables we considered engineering cover the following four themes:
Model Training and Selection
When building a predictive model, it is a common practice to try a variety of algorithms and compare them on an equal playing field. After quite a few trials of different models, our final choice is the eXtreme Gradient Boosting model (xgboost).
XGBoost initially started as a research project by Tianqi Chen as part of the Distributed (Deep) Machine Learning Community (DMLC) group. The term eXtreme Gradient means that the optimization method has been modified to work on larger datasets with many variables and missing values that other methods might have trouble handling. The word boosting means that a single “base” classifier is learned iteratively and added together to create a final classifier. When re-training the classifier at each iteration a weight is given proportional to its accuracy. Weights are assigned to each data point in a way that learning is emphasized on improving accuracy for misclassified observations, twisting the model in each round to get them correctly predicted.
In our case, we use a decision tree as the base classifier to build relationships and then boosting to improve accuracy. Tree models are great models to initially work with because they are easy to interpret, gracefully handle sparse or missing data, and can detect complex, nonlinear breaks in the data. Xgboost tree models are powerful in handling sparse data well because to find the optimal split for each tree node, it either greedily enumerates over all possible splits for features or it uses information of feature distribution to calculate weighted quantile. In either case it sets a default direction for the missing value based on what is learnt from the data itself. In addition, tree models’ ability to rank predictors based on their predictive importance provides knowledge of the most influential attributes, which can guide us on steps to best address problems with real world understanding of the most critical drivers.
Customer cancellations are a relatively infrequent occurrence, making our dataset imbalanced. Noted that when catching minority class instances is granted top priority (catching churners in our case), accuracy is not a good evaluation metric. For this reason, we base performance evaluation and model selection on precision, sensitivity, and other metrics that are more suitable for gauging effectiveness of a predictive model under imbalanced circumstances.
The final result is a predictive engine that outputs cancellation risk scores for each customer and product that they have purchased. From there it’s straightforward to calculate the expected revenue loss based on the churn risk and the customer’s spending. From a business perspective it makes sense to prioritize customers by expected loss in order to preserve revenue.
As an initial test, we’ve decided to deliver a list of 100 at-risk customers along with other metrics describing their experience and portfolio so that the retention specialists feel knowledgeable and empowered to handle the outreach.
We like to think of our customer retention solution as a cooperation of data science and human expertise, each part doing the job that it is best at. Models identify potential problematic customer from data that is complex and difficult for humans to draw insight from, while human efforts amend the model’s lack of context and communication to decide a course of action during a real conversation with our customers. Such cooperation ensures a strengthened bond with customers to provide a better suite of products and services at Homes.com.