Using VARISTA to predict customer LTV

PUBLISHED
2021.01.10 (Sun)
CATEGORY
Tutorials
TOPIC
Service Analysis
READING TIME
14分52秒

This article has been translated from Japanese into English using DeepL.

Purpose of this article

The purpose of this article is to help you understand the flow of building a model to predict a customer's future LTV segment using VARISTA.

Flow

  • Overview
  • Data Preparation
  • Creating Teacher Data (RMF Analysis)
  • Building a prediction model
  • Checking the prediction model

This method can be used to predict the LTV segment of a customer a little ahead of time, even without information on what the customer has bought.

Overview

It is important to know the lifetime value of your customers.
In this article, we will use open data to build a model that predicts which customer segment the customer will be in the next 6 months, based on the customer's purchase information for 3 months.

1*di8UmMCSQEj0j62IqqlGEQ

Prepare the data

The data should contain at least the following items

  • Customer ID
  • Purchase date
  • Purchase amount (quantity, unit price)

In this article, we will use Online Retail Data Set to explain the flow.
This data set contains 541,909 purchase information, and includes eight characteristics. (Figure below)
1*n1c5DPMTH4K5Vq-V1rYohQ

First, we will process the data and perform RFM analysis to rank the users.
This data contains information from December 1, 2010 to December 9, 2011. This time, we will calculate the RFM by dividing the data into 3 months, and then merge this calculation result with the next 6 months data to create the teacher data.
Example) March 1, 2011 - May 31, 2011 (3 months)
6 months from June 1, 2011 to December 1, 2011, etc.
The delimitation of the time period depends on the industry and service, so the time period should be defined while creating and validating the model.
However, please note that RFM is not suitable for industries where purchases do not occur frequently. (For example, a product that is purchased only once every few years.


Creating Teacher Data

Let's set up an arbitrary 3-month period and calculate RFM for each customer, where RFM is replaced by variables such as Recency, Frequency, and Monetary (Revenue). Segment is clustered by k-Mean using OverallScore and divided into Rank 1 to Rank 3.
The segment is clustered by k-Mean using OverallScore and divided into Rank 1 to Rank 3.

Note that this Segment is different from the LTVCluster that appears in the second half.
1*0cQCtbSGMn8UVVHNyMzJmA

Next, calculate the LTV for each customer for the next 6 months and add it to the data.
We simply do a UnitPrice x Quantity and add the result as 6_Month_Revenue.
This allows us to correlate a customer's buying behavior (RFM) over the next 3 months with how much revenue they will bring in after 6 months.
1*E28gm2l1IQnJnb8vmh0KwQ

Then cluster the customers into three classes based on their LTV after 6 months.

  • LTVCluster is added to the last column.
    1*ecf4wvopacCCW0damfErVQ

Let's see how the breakdown looks like for each cluster.
LTVCluster : A value between 0 and 2, where the lower the number, the lower the LTV.
count: Number of customers in the cluster
mean: Average LTV in the cluster
In the figure below, we can see that cluster 2 generates an average profit of £8,222, while cluster 0 generates an average profit of only £396 in 6 months.
1*Sul7RNJdjnIJrjVCmToaIw

Now, we will proceed to build the model based on the data we just created.
1*ecf4wvopacCCW0damfErVQ

However, since 6_Month_Revenue is used as a variable when clustering the LTVCluster that we want to predict this time, it will cause a leak. Therefore, we will delete 6_Month_Revenue.
Also, VARISTA automatically transforms the category variables, but we will use One Hot Encoding to speed up the learning process.
The final result will be the teacher data like this
1*ng-yljoqP3dB2Rh0SCk43Q

Building a Predictive Model

We will use VARISTA's AutoML feature to build a forecasting model.
Create a project and upload the data to VARISTA.
Let's try to visualize the data.
The figure below shows the correlation with LTVCluster, which is correlated with Revenue, and also seems to be correlated with OverallScore.
1*tLf3bUosPy51oov6K41GeA

Next, we will use VARISTA's Auto ML to build the model.
Select Model in the sidebar and select Create AI Model > Start Training.
At this point, make sure that the column to predict is set to LTVCluster.
1*PNlG6m-sSimVnvBmBVpKvQ

After a while, the training will be completed and the model built by VARISTA will be displayed. The number of data is not that large, so the training will be completed in about 10-20 minutes.

Checking the prediction model

The overall percentage of correct answers is shown as 80.5%, and the percentage of correct answers for 0 is 95.4%, but the accuracy for 1 and 2 does not seem to be very good.
In the case of VARISTA, it automatically performs cross-validation using 20% of the training data (default value).
To further improve the accuracy, consider adding more variables to the data or creating more variables.
Since the number of data for 1 and 2 is much smaller than the number of data for 0, if it is possible to increase the number, we will consider increasing it to the same level as 0.
1*y40bNQQyS9iGXEzJylL8bA
In addition to the simple information displayed in VARISTA, you can also see detailed information about the study.
1*pdYUx0o7jIuWLjjDO1IM2A
The basic functions of VARISTA are available free of charge, so please register from the link below and give it a try!
Also, if you have any questions about data creation, please feel free to contact us via chat on our official website.


VARISTAは機械学習モデルの開発、管理をノーコードで効率的に行うことができる新しいプラットフォームです。
データをお持ちでしたらすぐに始められますので、是非ともご相談ください。
Made with
by VARISTA Team.
© COLLESTA, Inc. 2021. All rights reserved.