A tutorial on creating a contract prediction model
This article has been translated from Japanese into English using DeepL.
This article explains how to use VARISTA to build a model to predict telemarketing success.
About the Dataset
In this tutorial, we will use a dataset from the Bank of Portugal's telemarketing campaign from May 2008 to June 2013, which is available at UCI Machine Learning.
You may have seen this dataset before as it is often used in tutorials.
This dataset contains information on whether or not a customer actually signed up for a time deposit based on the telemarketing campaign.
The dataset contains 41,188 pieces of customer information organized by 21 characteristics, as shown in Table 1 below.
Download the dataset from here.
Bank Marketing Data Set
After the page transition, go to the Data Folder and download the data-additional.zip file.
Uploading data to VARISTA
Create a new project and upload the data you have downloaded.
Once the data has been analyzed, set the prediction column to "y".
Understanding the data
Use VARISTA's Visualize function to review your data.
Select the data you have uploaded and choose Visualize.
When you check the visualize, you can see the age distribution, occupation, married or unmarried, and other information.
By selecting Correlation, you can check the correlation between the target variable and each of the features.
Housing and loan do not seem to have much effect on the contract.
After checking the duration, it seems that the longer the call, the more it affects the contract.
! [blog tutorial visualize 04](//images.ctfassets.net/8qlu80sl3ynp/37ExVyI6RRmXFxe3lk7EFu/cd9add21ffee7ba1bd8660bdc67542c4/blog_ tutorial_visualize_04.png)
As you can see, VARISTA's visualize function allows you to immediately visualize and check the distribution and correlation of your data.
Creating a Predictive Model
We will actually create a predictive model in AutoML, which is included in VARISTA.
In this case, we will use a binary classification of "contract" and "no contract".
VARISTA will automatically determine whether to create a regression or classification (binary or multi-level) model.
Select a model from the left menu and click on the "Create AI Model" button.
Make sure that bank-additional-full.csv is selected and the column to be predicted is set to "y", then click the "Start Learning" button.
VARISTA will automatically create the model using the AutoML function.
After a while, the training will be completed.
When the training is complete, the model details will be displayed.
Let's check each panel.
The score for the model itself, as calculated by VARISTA, is shown as 65.
The overall score, the percentage of people who guessed that they would sign a contract, and the percentage of people who guessed that they would not sign a contract are shown.
VARISTA does cross-validation when it generates the model, and the results are shown here.
The features that had a high impact on whether or not to sign up for a term deposit are labeled duration.
The percentage of data split for cross-validation and the confusion matrix are also displayed like this.
This model uses 20% of the total data as test data.
The model uses 20% of the data as test data (the percentage can be changed in the training settings).
Checking the confusion matrix, it seems that there are 218 cases where the model predicts that a customer who actually signed a contract will not sign a contract. This is about 23.5% of the total, which is a relatively large number.
The reason is that the data is biased.
The above is the process of creating a model using VARISTA.
We encourage you to download the dataset and try it out with VARISTA.