Data Scientist Interview QnA
1. Time Series (ARIMA)?
ARIMA, short for ‘AutoRegressive Integrated Moving Average’, is a forecasting algorithm based on the idea that the information in the past values of the time series can alone be used to predict the future values.
2. How to reduce overfitting ?
Techniques to reduce overfitting:
Increase training data.
*Reduce model complexity.
*Early stopping during the training phase.
*Ridge Regularization and Lasso Regularization.
3. What is precision/recall ratio?
When it comes to precision we’re talking about the true positives over the true positives plus the false positives. As opposed to recall which is the number of true positives over the true positives and the false negatives.
4. Dimensionality reduction?
Dimensionality Reduction is used to reduce the feature space with consideration by a set of principal features.
5. Bias and variance?
Bias is one type of error which occurs due to wrong assumptions about data such as assuming data is linear when in reality, data follows a complex function. On the other hand, variance gets introduced with high sensitivity to variations in training data.
6. Difference between classification and clustering?
In classification data are grouped by analyzing data objects whose class label is known. Clustering analyzes data objects without knowing class label. There is some prior knowledge of attributes of each classification. There is no prior knowledge of attributes of data to form clusters.
7. Deal unbalanced classification?
Techniques to Handle Imbalanced Data
Use the right evaluation metrics
*Use K-fold Cross-Validation in the right way
*Ensemble different resampled datasets
*Resample with different ratios
*Cluster the abundant class
*Design your own models