Decision tree algorithm is one amongst the foremost versatile algorithms in machine learning which can perform both **classification** and **regression** analysis. When coupled with **ensemble techniques** it performs even better. The algorithm works by dividing the entire dataset into a *tree-like structure* supported by some rules and conditions and then gives predictions based on those conditions. It empowers** predictive modeling** with *higher accuracy*, *better stability *and provides *ease of interpretation*.
We will use Iris dataset to get a better understanding of the concept and the process.

Generally, while using decision trees, there is a high chance of overfitting the model as it gets very complex with greater *depth* and greater *number of splits,* hence increasing the variance of the model. This reduces the training error but the prediction on new data points is relatively poor. Thus, the pruning process comes to the rescue.

We can clearly see that the model gets overfitted when fitted with an Unpruned Decision Tree. There is sufficient difference between the training and the testing accuracy and hence the tree needs to be pruned before fitting into the model.

In general, pruning is a process to remove selected parts of a plant such as bud, branches or roots**. **Similarly,** Decision Tree pruning** ensures trimming down a full tree to reduce the complexity and variance of the model. It makes the decision tree versatile enough to adapt any kind of new data fed to it, thereby fixing the problem of overfitting. It reduces the size of the decision tree which might slightly increase the training error but drastically decrease the testing error, hence making it more adaptable.

The above example clearly depicts the difference of an unpruned and a pruned tree. The unpruned tree looks denser and complex with high variance and hence overfitting the model. Whereas, the pruned tree is optimally dense, less complex with reduced variance and more accuracy in prediction on unseen data points.

Tree pruning is generally performed in two ways – by **Pre-pruning** or by **Post-pruning**.

## Pre-pruning

Pre-pruning, also known as forward pruning, stops the non-significant branches from generating. This technique is used before the construction of a decision tree. It uses a condition to decide when it should terminate splitting of some of the branches prematurely as the tree is generated.

**Hyperparameter tuning** can be used to find best fit values for parameters like* ’max_depth’, ‘max_samples_leaf’, ‘max_samples_split’, etc*.

- The pruned tree definitely shows some improvement in test accuracy but still there is a scope for more.

## Post-pruning

Post-pruning, also known as backward pruning, is the process where the decision tree is generated first and then the non-significant branches are removed. This technique is used after the construction of the decision tree. It is used when decision tree has very large or infinite depth and shows overfitting of the model. In Pre-pruning, we used parameters like *‘max_depth’* and *‘max_samples_split’* but here we prune the branches of decision tree using ** cost_complexity_pruning** technique.

**ccp_alpha**, the cost complexity parameter, parameterizes this pruning technique.

After appending the list for each alpha to our model, we will plot Accuracy vs alpha graph to know the value of alpha for which we will get maximum training accuracy.

We can choose **cpp_alpha = 0.05** as we get the maximum **Test Accuracy = 0.93** along with optimum train accuracy with it. Although our **Train Accuracy** has decreased to **0.96**, our model is now more generalized and it will perform better on unseen data.

**We can see now that our model is not Overfitting and performance on test data has improved much.**

Also, it can be inferred that:

- Pruning plays an important role in fitting models using the Decision Tree algorithm.
- Post-pruning is more efficient than pre-pruning.
- Selecting the correct value of cpp_alpha is the key factor in the Post-pruning process.
- Hyperparameter tuning is an important step in the Pre-pruning process.

Code:Get code on GithubDate : 16/05/2021