In this tutorial, I want to show you how deep neural networks can be used to predict the future behavior of people. This is usually referred to as Predictive Behaviour Modeling.
In particular, we will discuss what Predictive Behavior Modeling actually means, in which business areas it is used and more importantly how to use deep neural networks to model future behavior.
1. What exactly is Predictive Behaviour Modeling?
Simply speaking predictive behavior modeling is an area of predictive analytics that tries to predict or model the future behavior of people — in particular customers.
The prediction of future behavior can also be formulated as a classification problem.
For example, if there are four possible behaviors or actions a customer can perform, the predictive behavior model, which is a deep neural network, assigns each of these actions a probability score.
The probability score represents the likelihood that the given customer will take the action associated with this probability score.
Predictive behavior modeling is the science of building algorithmic models and training them based on historical customer data to predict the future behavior of these customers. In other words, to predict the likelihood a customer will take in a particular action.
In the field of predictive analysis, predictive behavior modeling goes beyond passive customer analysis. Rather than trying to make well-founded assumptions based on the analysis of historical data that is normally done by humans, predictive behavior modeling allows companies to make decisions based on future predictions made by algorithmic models. And as you might already suspect, these algorithmic models are deep neural network models.
Before we delve deeper into the practical implementation of predictive behavior modeling, we should first take a look at some concrete uses cases.
2. Use cases for Predictive Behaviour Modeling
Predicting of Customer Churn
One important application of predictive behavior modeling is in the area of customer churn. Customer churn occurs when customers or subscribers stop doing business with a company or service and leave.
Given a large amount of customer data like demographic data, customer purchase history, service usage, billing data, etc. a neural network that is trained on this data can perform a classification of customers into various categories of risk in terms of future churn.
More, precisely we would train the model on customer data from the past. The data would contain information of customers who stopped doing business with the company, ass well as information of customers who are still doing business.
The network trained on this data would be able to classify a brand new customer into one of the categories of risk for future churn.
Also known as customer attrition, customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers. Customer retention is generally less expensive as you’ve already earned the trust and loyalty of existing customers. As you can imagine, it is critical for a business to predict a potential churn.
After segmenting and identifying the customers most likely to leave, the company can take the necessary steps (such as marketing or incentives) to convince them to stay customers. In addition, customers will feel more relevant because the company is communicating with them, resulting in greater satisfaction, brand loyalty, and word-of-mouth referrals.
Predicting the Outcome of a Marketing Campaign
Another use case of customer behavior modeling is in the field of marketing. Algorithmic models can be trained on the results of previous marketing campaigns or strategies that targeted a particular group of people.
Some marketing campaigns are more appealing to one group of people, and other campaigns are more appealing to another group. Some people with certain characteristics who are exposed to a particular marketing campaign under certain conditions are more willing to buy or upgrade to a new product or service than others.
The trained algorithmic models can predict exactly that.
We as deep learning engineers would implement a neural network model that can predict which kind of marketing campaigns or actions are more likely to be successful for a particular group of people.
This would be also a classification task. Given the data of a particular customer, we would implement a neural network that performs a classification of this customer into various groups.
Each group is associated with the likelihood that the customer in this group will buy a product or service advertised in the marketing campaign.
Predictive behavior modeling enables marketers to know in advance which marketing actions are more likely to be successful.
With this knowledge, less time and money would be spent on people who have no interest in the product or service. Instead, the extra money and time could be spent to attract customers who, according to the neural network’s predictions, are more interested in the product or service advertised in the campaign. And this would result in a better return on investment.
I hope you are now convinced of how important it is to model customer behavior. We have considered the examples of churn modeling and marketing and seen the benefits this technology can bring to the business.
Let’s continue with that and discuss how modeling customer behavior with deep neural networks can be achieved in practice.
3. Predictive Behaviour Modeling with Neural Networks
Predictive behavior modeling is a process that requires a lot of data. More precisely customer data. A company that wants to implement a deep learning model that can predict whether a particular customer will take certain action in the future requires the data of that customer and also the data of thousands of other customers.
Take a look at the following sample of the famous “Bank Marketing Dataset”:
This dataset contains information of approx. 45 000 customers of a Portuguese bank. The data is related with direct marketing campaigns of this bank. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.
Each data instance contains the information of a particular customer as well as the outcome of the marketing campaign.
The data can be divided into two categories that you must be already familiar with, the features (blue) and the labels (red).
In this case, the features are the ‘Age’, ‘Education’ or ‘Job’ and also any other characteristics that can describe the customers. In deep learning features are usually abbreviated as x.
The label, on the other hand, that is abbreviated as y represents the action that the customer described by features x took in the past. In this case, “no” means that the customer is still in the bank and “yes” means that this customer has stopped doing business with the bank.
As you might already have thought, we can not use the features from this dataset directly to train the neural network model. The features contain besides numeric data, also string data types, which can not be fed into a model directly.
Instead the dataset must be reprocessed in various ways. We have discussed all necessary steps of data preprocessing in the article “Data Preprocessing Steps for Deep Learning in Python”. Please refer to this article if you are interested in this topic.
Regarding the label, we would represent it as a vector. Each entry in this vector would represent a possible action or behaviour of a customer. An entry value of 1 means that the action associated with this entry was taken by the customer in the past— and 0 otherwise.
In the case of bank marketing from the previous example, that binary number would tell us whether the person with the given features accepted the offer advertised in the marketing campaign or not.
The same principle would apply to the question if a customer has purchased a certain product, has made a product upgrade or has performed any other action in the past.
Given the features and the labels of thousands of customers, we as deep learning engineers can implement a neural network model that can learn from the customer data from the past to be able to predict the probability that a customer, described by features, will take a particular action in the future.
For this, we must show the neural network the customer features x over and over again. For each input x, the network would compute an output y that will be compared with the ground truth label. Based on the error between the label and the prediction, we would improve the neural network with the gradient descent algorithm.
For a more detailed explanation of how a neural network learn please refer to the article “What is Deep Learning and how does it work?”
4. Predictive Behavior Modeling is a Classification Task
Since we want to predict probabilities, customer behavior modeling is a classification task. In this case, we are classifying a customer given by the features into several groups of possible actions. The action group with the highest predicted probability score, is considered as the expected behavior of the customer in the future.
During classification, we want to implement a neural network that performs a mapping from input features x to an output y. The output y is the output of the neural network model, which is a probability score between 0 and 1.
This value gives us the probability that the customer described by the features x that the neural network receives will take the action associated with the probability y. Having customer data all we need to do is to build a neural network model that is able to solve the given training objective.
Since we want to predict a probability between 0 and 1 which we compare with the actual label in the dataset, our training objective here would be the minimization of the cross-entropy loss function. (You can read more about loss functions in the article “Loss Functions in Deep Learning”)
Now let us discuss how many output neurons the neural network should have for this type of task.
For example, suppose you perform a classification task for only two possible behaviors or actions. Let’s call these actions as action A and action B.
In this case, it may be sufficient to use only one output neuron if the label y_hat is a binary number. For example, action A could be represented by a value of 0 and action B by the value of 1 or vice versa.
However, a common and more appropriate approach would be to use the same number of output neurons as there are actions. For two possible actions, you would have two output neurons. In this case, the output of the network would be a vector with two entries. While the label vector would be a binary vector in which each entry is either 1 or 0.
The entries in the output vector y would contain probabilities in the range between 0 and 1. Since during classification we have mutually exclusive classes, it is recommended to use the Softmax function as the function in the last layer. This would make the output of the neural network to sum up to one, which is a convenient way to represent a probability distribution, as we learned earlier in the second module.
- Predictive Behavior Modeling is a crucial task in the field of Predictive Analytics
- Predictive Behavior Modeling is a simple classification task. Each class represents a possible action
- The behavior can be modeled by simple feedforward neural network