All Metrics you need to know for becoming a Data Scientist

When building a model, the foremost priority is to work on performance of the model.

Evaluation metrics are thus used for scoring and finding out under which circumstances, the model performs best. It’s the performance of evaluation metrics that adds value to business models. They measure the quality of statistical or Machine Learning models. These include classification accuracy, logarithmic loss, confusion matrix, etc. There are several such metrics and scoring parameters that we will learn in detail as we proceed ahead.

1. Classification Accuracy

  • It is the most common evaluation metric used for classifier models.
  • It summarizes the performance of a classification model as the ratio of number of correct predictions to the total number of predictions.
  • Easy to calculate.
  • Intuitive to understand.
  • It fails when the data is severely skewed or imbalanced.
  • It can be implemented using sklearn’s ‘accuracy_score’ method.

2. Logarithmic Loss

where, yij = Describes whether sample i belongs to class j or not.

Pij = Probability of sample i belonging to class j.

  • Log loss penalizes false classifications.
  • It works well for multiclass classification.
  • It has no upper bound i.e. range is [0, ∞).
  • Zero or 0 corresponds to higher accuracy whereas infinity or ∞ corresponds to lower accuracy.
  • Using this metric is a bad choice when handling imbalanced datasets.
  • It can be implemented using sklearn’s ‘log_loss’ method.

3. Confusion Matrix

  • Confusion matrix describes complete performance of the model.

4. F1 Score

  • F1 Score uses Harmonic means of Precision and Recall to calculate the score.
  • It tells how precise and robust the classifier is.
  • It prefers harmonic mean (H.M) over arithmetic mean (A.M) as the former punishes extreme values more.
  • Its range is [0, 1].
  • It is easy to explain to business stakeholders, which in many cases can be a deciding factor.
  • It can be implemented using sklearn’s ‘f1_score’ method.

5. ROC - AUC

  • AUC of a classifier is equal to the capability of the model to distinguish correctly between the classes while the Receiver Operating System(ROC) is a probability curve.
  • ROC-AUC is used for binary classification and represents the degree or measure of separability.
  • It shows the performance of a classification model at all classification thresholds.
  • AUC score is calculated from the plot for False Positive Rate(FPS) vs True Positive Rate(TPR).
  • The ROC curve is plotted with TPR(Sensitivity) against the FPR(1-Specificity) where TPR is on the y-axis and FPR is on the x-axis.
  • Greater the value of ROC-AUC, the greater will be the performance of the model.
  • Its range is [0, 1].
  • It can be implemented using sklearn’s ‘roc_auc_score’ method.

7. Mean Absolute Error(MAE)

Where, yj = Predicted Value

ŷj = Actual Value

  • It measures the average magnitude of absolute errors in the data.
  • It measures the accuracy for continuous variables.
  • The lower the value, the better the model’s performance.
  • It can be implemented using sklearn’s ‘mean_absolute_error’ method.

8. Mean Absolute Percentage Error(MAPE)

where,  yj = Predicted Value

ŷj = Actual Value

  • It measures the average percentage of absolute errors in the data.
  • It is one of the most popular metrics for evaluating forecasting performance.
  • Lower the value of MAPE, better fit is the model.
  • It can be implemented using sklearn’s ‘mean_absolute_error’ method, multiplied by 100.

9. Mean Squared Error(MSE)

where,  yj = Predicted Value

ŷj = Actual Value

  • MSE is calculated by taking the average square of difference between real values and predicted values of the data.
  • It is most useful when the dataset contains outliers, or unexpected values.
  • It can be implemented using sklearn’s ‘mean_squared_error’ method.

10. Root Mean Squared Error(RMSE)

where,  yj = Predicted Value

ŷj = Actual Value

  • It is basically an average of the root of the square of difference between the real values and the predicted values.
  • It is also a standard way of measuring the error of a model predicting quantitative data.
  • It resembles the formula of Euclidean Distance and hence gives only positive values.
  • It is commonly used in climatology, forecasting, and regression analysis to verify experimental results.
  • It can be implemented by taking square root of result given by sklearn’s ‘mean_squared_error’ method.

11. Root Mean Squared Logarithmic Error(RMSLE)

where,  yj = Predicted Value

ŷj = Actual Value

  • It is an average of the root of the square of difference between the logarithm of real values and the logarithm of predicted values.
  • It is a measure of ratio between the true and the predicted values.
  • It penalizes underestimates more than overestimates, introducing an asymmetry in the error curve.
  • It can be used in regression problems where we don’t want large errors to be significantly more penalized than small errors.
  • It can be implemented using sklearn’s ‘mean_squared_log_error’ method.

12. Gini Coefficient or Gini Index(G.I)

G.I = 2 * AUC –1

  • It is the ratio between the area within the model curve and the random model line (A) and the area between the perfect model curve and the random model line (A+B).
  • It is used in Classification problems.
  • Gini Coefficient has similar pros and cons as the ROC-AUC metric.

13. R-squared

where,  yj = Predicted Value

ŷj = Actual Value

ȳ = Mean of predicted values

  • It gives information about the goodness of fit of a model.
  • It describes how well the regression predictions approximate the real data points.
  • It measures the proportion of the variation in dependent variable(label) explained by the independent variables(features) for a linear regression model.
  • An R2 of 1 indicates that the regression predictions made by the model perfectly fit the data.
  • It has a drawback assuming that every variable helps in explaining the variation in the target, which might not always be true.
  • R2 value for the model would either remain same or increase but it would never decrease.
  • It can be implemented using sklearn’s ‘r2_score’ method.

14. Adjusted R-squared

Where, k = Number of Features

n = Number of Samples

  • R2Adjusted is a modified version of R2, adjusted for the number of predictors in the model.
  • It adjusts the statistic based on the number of independent variables in the model.
  • The value of R2Adjusted increases only if the new term improves the model more than expected by chance and decreases otherwise.
  • Its value decreases when a predictor improves the model by less than expected by chance.
  • Unlike R2, it penalizes for adding features which are not useful for predicting the target.

After going through all the evaluation metrics generally used, we can easily say that it’s totally situation dependent which metric to use when. But we definitely have the choice to choose the best one among all the suitable metrics. Be careful about the pros and cons of every metric you are using because at the end it’s your metric choice that will earn value to you and your model. 

Our Popular Data Science Course

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top