CloudyML

Wrapper Method

Of Feature Selection

Wrapper Method of Feature Selection

Wrapper Method of Feature Selection

Wrapper method feature selectionYou must have often come across big datasets with huge numbers of variables and felt clueless about which variable to keep and which to ignore while training the model. In today’s world, most of the data that we deal with is high dimensional data. And this high dimensionality (large no.of columns) of data more often than not prove to be a curse in the performance of the machine learning models.Because more variables doesn’t always add more discriminative power for the target variable inference rather it makes the model overfit. Also the presence of  irrelevant or redundant features may result in the addition of noise thereby reducing the model performance , apart from the higher computation time. This is called the curse of dimensionality. To solve this problem, we perform feature reduction to come up with an optimal number of features to train the model based on certain criterias.

Feature Reduction is further subdivided into feature selection and feature extraction. Feature extraction is out of scope of this article. So, we will rather focus on feature selection. Feature selection methods can be optimum, heuristic or randomized based on the feature selection approach. And based upon the evaluation approach of the  feature subsets, we can divide it  into two types- 

  1. Unsupervised or Filter method.
  2. Supervised or Wrapper method.
feature selection

Here, we will explore the wrapper method of  feature subset evaluation.The wrapper method of feature selection falls under the heuristic or greedy feature search approach. It greedily searches all the possible feature subset combinations and tests it against the evaluation criterion of the specific ML algorithm. The evaluation criteria is nothing but the performance metric of the specific model.For eg, in classification algorithms, the evaluation criteria can be accuracy, precision, recall, f1 score etc. and for regression, it can be R-squared, Adjusted R squared etc.

This is exactly the way the wrapper method of feature selection works. However, the wrapper method is computationally more intensive than the filter method. But it can fine tune the feature subset selection according to the specific model. That is why the wrapper method of feature selection is a popular way of combating the curse of dimensionality in machine learning. The stepwise regression , a popular form of feature selection in traditional regression analysis, also follows a greedy search wrapper method.

The wrapper method of feature selection can be further divided into three categories: forward selection, backward selection and exhaustive selection.

Let’s implement the wrapper method in Python to understand better how this works. For that, I will consider the Wine dataset which contains 14 numeric columns and this data is available in kaggle.

To implement the wrapper method of feature selection, we will be using a Python library called mlxtend. To install this library, you can simply type the following line in the anaconda command prompt.

And then import necessary libraries.

Data Science Course With projects

Next, let’s import the data.

Now let’s implement the wrapper method step by step.

Step Forward Feature Selection

In the step forward feature selection, one feature is selected in the first step against the evaluation criteria , then a combination of  2 features(which includes the 1st selected feature) are evaluated and this process goes on till the specified number of features are selected. The selected features are based on the highest yield of the model performance.

Now, let’s have a look at how we can implement it in Python.But before that, we will need to preprocess the data.

Data Preprocessing

In Preprocessing, at first we will split the train and test data and see the shapes of the splitted data.

So, we have the X_train data shape as (142,13) and X_test data shape as (36,13).

Now, we will remove all the columns having correlation greater than 0.8 in the X_train data. For that we will write a custom defined function and then call the defined function against X_train data.

The only independent column having correlation higher than 0.8 is “Flavonoids”, so we will remove that.

Now, we will implement the step forward feature selection codes. So, for that we will use the SequentialFeatureSelector function in the mlxtend library.We shall use the Random Forest Classifier to find the best optimal parameters and the evaluation criteria would be ROC-AUC. We will choose the best 8 features.

Here, n_jobs stands for the number of cores it will use for execution(-1 means all cores), k_features are the number of features we want which is 8, forward=True defines we are doing forward step feature selection , verbose is for logging the progress of the feature selector, scoring defines the evaluation criterion and cv is for cross validation folds. Now that we have defined our feature selector model, let’s fit this into the training dataset.

Now let’s look at which all 8 features were selected with the step forward wrapper method.

The output generated is:

Step Backward Feature Selection

It follows the  backwards step by step feature elimination method to select the specified number of features. The preprocessing and the coding is the same as forward selection , only we need to specify forward=False in place of forward = True while implementing backward feature selection.

The output that we obtain is:

Exhaustive Feature Selection

In this method, the best subset of features is selected from all the possible feature subsets. Suppose there are 10 features in a data  and we want the top 4 features, then the algorithm will evaluate the possible feature combinations(210)and ultimately settle down on the best feature subset.The number of combinations is given  by the formula nCr=n!/r!(n-r)!. It is the greediest of all algorithms among all the wrapper methods since it tries all combinations of feature subsets and selects the best one subset. A downside of the exhaustive feature selection method is that it is very slow in comparison to the other two methods.

Now, let’s see how to implement this feature selection method in Python. To implement this, we will be using the ExhaustiveFeatureSelector function of the mlxtend library. In this class, we can specify the maximum and minimum feature subset combination we want to select. Also, I will change the evaluation criterion(scoring) to accuracy here just for a change.Here, I will make the cross validation folds(CV) as none because exhaustive feature selection is computationally very expensive and it would take a lot of time to execute otherwise.

 The preprocessing will remain the same. Here is the code snippet and the corresponding output we will get for the exhaustive feature selector model training.

In this method, the best subset of features is selected from all the possible feature subsets. Suppose there are 10 features in a data  and we want the top 4 features, then the algorithm will evaluate the possible feature combinations(210)and ultimately settle down on the best feature subset.The number of combinations is given  by the formula nCr=n!/r!(n-r)!. It is the greediest of all algorithms among all the wrapper methods since it tries all combinations of feature subsets and selects the best one subset. A downside of the exhaustive feature selection method is that it is very slow in comparison to the other two methods.

Now, let’s see how to implement this feature selection method in Python. To implement this, we will be using the ExhaustiveFeatureSelector function of the mlxtend library. In this class, we can specify the maximum and minimum feature subset combination we want to select. Also, I will change the evaluation criterion(scoring) to accuracy here just for a change.Here, I will make the cross validation folds(CV) as none because exhaustive feature selection is computationally very expensive and it would take a lot of time to execute otherwise.

 The preprocessing will remain the same. Here is the code snippet and the corresponding output we will get for the exhaustive feature selector model training.

In this method, the best subset of features is selected from all the possible feature subsets. Suppose there are 10 features in a data  and we want the top 4 features, then the algorithm will evaluate the possible feature combinations(210)and ultimately settle down on the best feature subset.The number of combinations is given  by the formula nCr=n!/r!(n-r)!. It is the greediest of all algorithms among all the wrapper methods since it tries all combinations of feature subsets and selects the best one subset. A downside of the exhaustive feature selection method is that it is very slow in comparison to the other two methods.

Now, let’s see how to implement this feature selection method in Python. To implement this, we will be using the ExhaustiveFeatureSelector function of the mlxtend library. In this class, we can specify the maximum and minimum feature subset combination we want to select. Also, I will change the evaluation criterion(scoring) to accuracy here just for a change.Here, I will make the cross validation folds(CV) as none because exhaustive feature selection is computationally very expensive and it would take a lot of time to execute otherwise.

 The preprocessing will remain the same. Here is the code snippet and the corresponding output we will get for the exhaustive feature selector model training.

Here, the model is called and fitted into X_train and y_train data. n_jobs(Number of cores it will use for execution) is kept as -1 (means it will see all the cores of CPU for execution) and n_estimators are kept as 100.

Now, let’s look at what resultant output we get.

Here, we can see the number of feature subsets trained in the model. Total of 1287 feature subset has been trained one by one to select the best feature subset. We can also mathematically calculate the total number of combinations of feature subsets that will be evaluated. Here, we have 13 columns in the training dataset , out of which a combination of 4 subsets and 5 subsets will be computed. So, 13C4 +13C5 = 1287. Now, we will see the best feature selected through this method.

Output:

Exhaustive Feature Selection

The scikit-learn library supports a class function called the recursive feature elimination in the feature_selection module. In this wrapper method of feature selection, at first the model is trained with all the features and various weights gets assigned to each feature through an estimator(e.g, the coefficients of a linear model).Then, the least important features gets pruned from the current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached. Also, there is another function offered by sklearn called recursive feature elimination with cross-validation. This performs recursive elimination in a cross-validation loop to find the optimal number of features.

The figure below shows the RFE class function as defined in the official documentation of sklearn.RFE

And the parameters for calling RFE functions are  defined as shown below;

Conclusion

Well, so here in this blog, we have learnt in detail about the wrapper method of feature selection which is a very common feature selection technique used widely in model building of specified Machine Learning algorithms. We implemented the step forward, step backward and exhaustive feature selection techniques in python. And also learnt about the recursive elimination technique. These techniques fall under the wrapper method of feature selection. Intuitively speaking, we can use the step forward and backward selection method when the dataset is very large. Whereas in case of a small  dataset, we can go for the exhaustive feature selection method. Recursive elimination is good to use in case of classification problems. I hope this blog will be beneficial and informative for the readers.

Take A Look At Our Popular Data Science Course

Here We Offers You a Data Science Course With projects.Please take a look of Our Course

Scroll to Top