What is Feature Engineering?
What is feature engineering?
Feature engineering is the process of creating new data features from existing data features using domain knowledge.
The goal of feature engineering is to improve the quality of data and the prediction performance, or generalization performance, of machine learning models.
Even though we have the data to create machine learning models, the data as it is is often useless for creating machine learning models.
To create a model with good generalization performance, it is necessary to use good quality data, and set the appropriate algorithm and parameters for that data. In particular, the quality of the data has a very large impact on the generalization performance of the resulting model.
Therefore, feature engineering is a very important process.
Examples of Feature Engineering
We will use the following data to illustrate an example of feature engineering.
For example, let's assume that we have data on purchase history in a store. (Figure below)
Using this data, we would like to create a model that predicts which customers will return to the store to make a purchase.
Let's assume that the features included in this data are as follows
Predictions that can be made using this dataset in its original state
So, we combine several features to create a new feature. This is feature engineering.
For example, if we combine Manufacturer, Date of Purchase, and Customer ID, we can find out who has purchased the same manufacturer's product multiple times.
This feature will be added to the new data as "Did you buy from manufacturer A?
You can also combine customer ID, product price, product quantity, and brand to generate total amount spent as a new feature.
Some customers may not shop frequently, but once they do, the amount they spend is high.
This feature can be an important piece of information for understanding customer loyalty.
If you merge the created features with the original features, you will get this kind of data.
Using some features in this way, create a new feature and create a model, and check the prediction accuracy.
If the prediction accuracy is not enough, we will perform feature engineering again.
Flow of feature engineering 1.
- make a hypothesis about what kind of features should be present
- create features
Create the model, check the generalization performance, and if it is not the system you want, go back to step 1 again.
Why is feature engineering important?
To improve generalization performance, hyper-parameter optimization of the algorithm is done, but in many cases, feature engineering has a larger impact on generalization performance than hyper-parameter optimization.
Automating Feature Engineering
Feature engineering is a very large part of the workflow of creating a machine learning model.
Typically, it is estimated that 70% of the work is spent on this task.
Therefore, automating this task can be beneficial in many cases as it increases efficiency and reproducibility.
However, it is important to note that automation is not necessarily a panacea.
Automated feature engineering is a supporting tool, not a replacement for the data scientist, so the final decision on which features to employ should be made by the data scientist with domain knowledge.