 CloudyML

## 1. What do you understand by the term Normal Distribution?

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve.

## 2. Explain bias, variance trade off.

Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.

## 3. How can you compute significance using p-value

The p-value is the probability that the null hypothesis is true. (1 – the p-value) is the probability that the alternative hypothesis is true. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

## 4. Differentiate between a multi-label classification problem and a multi-class classification problem.

Multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification). Multi-label classification involves predicting zero or more class labels.

## 5. If the training loss of your model is high and almost equal to the validation loss, what does it mean? What should you do?

If the training loss is high and validation loss are almost equal, it means that you’re avoiding case of overfitting which is good. But, if your loss is high it means your model is going through underfitting. In this case, you can surely increase layers to increase your accuracy and decrease your training loss.

## 6. Why L1 regularizations cause parameter sparsity whereas L2 regularization does not?

This is due to the shape of Bias region formed by L1 Norm. When compared to L2 Norm, L1 doesn’t concede any area around the axes.

## 7. What is the advantage of performing dimensionality reduction before fitting an SVM?

Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations 