Replacing categories with counts or the frequency of observations
In count with counts or frequency of observations” or frequency encoding, we replace the categories with the count or the fraction of observations showing that category. That is, if 10 out of 100 observations show the blue category for the Color variable, we would replace blue with 10 when doing count encoding, or with 0.1 if performing frequency encoding. These encoding methods are useful when there is a relationship between the category frequency and the target. For example, in sales, the frequency of a product may indicate its popularity.
Note
If two different categories are present in the same number of observations, they will be replaced by the same value, which may lead to information loss.
In this recipe, we will perform count and frequency encoding using pandas and feature-engine.
How to do it...
We’ll start by encoding one variable with pandas and then we’ll automate the process...