Data Science Interviews
with questions at a glance

We have talked a lot about data science and related fields and it’s time now to throw some light on how to crack a job in data science domain. What steps to follow to cross the line?  How many interview rounds are there in general for a data scientist job? What are the questions asked? How important is my project section? Does my previous experience matter? And so on…

Let’s answer all these questions one by one in the easiest possible format.

Interview Rounds

  • There are generally 2-4 technical rounds followed by 1 HR round.
  • There rounds comprise of discussion on Python/R skills, Statistics, Machine Learning, Deep Learning, Projects, etc.
  • Few companies also prefer coding round as the first round of interview process but it’s generally for Jr. Data Scientist positions and is very less observed in the interviews’ pattern.
  • Amazon has a unique behavioural interview round where they judge you on 14 leadership principles and ask scenario-based questions.
  • Very few companies have such unique rounds and hence they are challenging to cross too.

Interview Questions

There is variety of questions asked in data science interviews belonging to statistics, machine learning/deep learning algorithms, scenario-based questions, guestimates, etc. Few of them are mentioned below.

  1. Why do you want to/ why did you choose data science as your career?
  2. What is difference between AI, ML and DL?
  3. What is a Python Package, and Have you created your own Python Package?
  4. Can you write a program for inverted star program in python?
  5. Write a program to create a data frame and remove elements from it.
  6. Write code to find the 8th highest value in the Data Frame.
  7. What’s difference between an array and a list?
  8. Differentiate between Supervised, Unsupervised and Reinforcement learning with their algorithm example.
  9. How would you deal with feature of 4 categories and 20% null values?
  10. What is central tendency?
  11. Which central tendency method is used if there exist any outliers?
  12. Explain, What is Central limit theorem?
  13. What is Chi-Square test?
  14. What is A/B testing?
  15. Tell us the difference between Z and t distribution (Linked to A/B testing)?
  16. Tell some outlier treatment methods.
  17. What is ANOVA test?
  18. What is Cross validation?
  19. How will you work in a machine learning project if there is a huge imbalance in the data?
  20. Tell the formula of sigmoid function.
  21. Can we use sigmoid function in case of multiple classifications?
  22. What is Area under the curve (AUC)?
  23. Which metric is used to split a node in Decision Tree?
  24. Explain ensemble learning?
  25. What is P value?
  26. What are histograms?
  27. Tell us about confidence interval?
  28. What’s the reason for high bias or variance?
  29. Which models are generally high biased or high variances?
  30. Why do we select validation data other than test data?
  31. What are the differences between linear and logistic regression?
  32. Why do we take such a complex cost function for logistic regression?
  33. Differentiate between random forest and decision tree?
  34. How would you decide when to stop splitting the tree?
  35. What are the measures of central tendency?
  36. What is the requirement of k means algorithm?
  37. Which clustering technique uses combination of clusters?
  38. Which is the oldest probability distribution?
  39. What all values can a random variable take?
  40. What are the different types of random variables?
  41. Describe normality of residuals.
  42. What is T-test used for?
  43. How do you performdimensionality reduction?
  44. What are the assumptions of linear regression algorithm?
  45. Differentiate between Correlation and covariance.
  46. How to identify & treat outliers and missing values?
  47. Explain Box and whisker plot.
  48. Explain any unsupervised learning algorithm.
  49. Describe Random Forest.
  50. What packages in Python can be used for ML? Why do we prefer one over another?
  51. What are the Evaluation Metric parameters for testing Logistic Regression?
  52. NumPy vs Pandas basic difference.
  53. Tuple vs Dictionary. Where do we use them?
  54. What is NER(Named Entity Recognition)?
  55. Can Linear Regression be used for Classification? If Yes, why if No why?
  56. What is Naive Bayes Theorem? Multinomial, Bernoulli, Gaussian Naive Bayes.
  57. Differentiate between Over Sampling and Under Sampling.
  58. what is the different between Over Fitting and Under Fitting.
  59. Differentiate between Gini Index and Entropy.
  60. What are the advantages and disadvantages of PCA?
  61. How to deal with imbalance data in classification modelling?
  62. What is Gradient Descent? What is Learning Rate and why we need to reduce or increase? Tell us why Global minimum is reached and why it doesn’t improve when increasing the LR after that point?
  63. What is Log-Loss and ROC-AUC?
  64. Two Logistic Regression Models – Which one will you choose – One is trained on 70% and other on 80% data. Accuracy is almost same?
  65. Explain bias – variance trade off. How does this affect the model?
  66. What is multi collinearity? How to identify and remove it?
  67. Differentiate between Sensitivity and Specificity.
  68. What is difference between K-NN and K-Means clustering?
  69. How to handle missing data? What imputation techniques can be used?
  70. Explain how you would find and tackle an outlier in the dataset. Follow up: What about inlier?
  71. How to determine if a coin is biased? Hint: Hypothesis testing
  72. Is interpretability important for machine learning model? If so, ways to achieve interpretability for a machine learning models?
  73. How would you design a data science pipeline?
  74. What does a statistical test do?
  75. Explain topic modelling in NLP and various methods in performing topic modelling.
  76. Describe back propagation in few words and its variants?
  77. Explain the architecture of CNN.
  78. If we put a 3×3 filter over 6×6 image what will be the size of the output image?
  79. What will you do to reduce overfitting in deep learning models?
  80. How would you check if the model is suffering from multi-Collinearity?
  81. Why is CNN architecture suitable for image classification and not an RNN?
  82. What are the approaches for solving class imbalance problem?
  83. Tell us about transfer learning? What are the steps you would take to perform transfer learning?
  84. Explain concepts of epoch, batch, and iteration in deep learning.
  85. When sampling, what types of biases can be inflected? How to control the biases?
  86. What are some of the types of activation functions and specifically when to use them?
  87. Tell us the conditions that should be satisfied for a time series to be stationary?
  88. What is the difference between Batch and Stochastic Gradient Descent?
  89. What happens when neural nets are too small? Tell us, What happens when they are large enough?
  90. Why do we need pooling layer in CNN? Common pooling methods?
  91. Are ensemble models better than individual models? Why/why – not?
  92. How is random forest different from Gradient boosting algorithm, given both are tree-based algorithm?
  93. Describe steps involved in creating a neural network?
  94. In brief, how would you perform the task of sentiment analysis?
  95. Is XOR data linearly separable?
  96. How do we classify XOR data using logistic regression?
  97. LSTM solves the vanishing gradient problem that RNN primarily have. How?
  98. GRU is faster compared to LSTM. Why?
  99. Use Case – Consider you are working for pen manufacturing company. How would you help sales team with leads using Data analysis?
  100. I have 2 guns with 6 holes in each, and I load a single bullet In each gun, what is the probability that if I fire the guns simultaneously,at least 1 gun will fire (at least means one or more than one)?
  101. There are 2 groups g1 and g2, g1 will ask g2 members to give them 1 member so that they both will be equal in number, g2 will ask g1 members to give them 1 member so that they will be double in number of g1, how many members are there in each group?
  102. Tell the Order of execution of an SQL query.
  103. SQL Questions – Group by Top 2 Salaries for Employees – use Row num and Partition.
  104. Differentiate between inner join and cross join.
  105. What is group-by?
  106. Complex sql query– 2 table are there, Table1 (cust_id,Name) and Table2(cust_id,Transaction_amt). Write a query to return the name of customers with 8th highest lifetime purchase.Achieve the same using python.

Some Data Science Companies (not ranked) for Job Hunting

2. Tredence Analytics
3. Fractal Analytics
4. Tiger Analytics
5. Bridgei2i
6. Ugam
7. Latent View
8. Brillio
9. Abzooba
10. AbsolutData
11. Gramemer
12. BluePi
13. Knowledge Foundry
14. Wipro
15. TCS
16. Accenture
17. Purplle
18. AbsoluteData
19. Hansa CEquity
20. Lymbyc
21. IBM
22. PwC
23. EY
24. KPMG
25. Sibia
26. ZS
27. ZF
28. TechVantage
29. L&T Infotech
30. Cognizant
31. Amazon
32. Microsoft
33. Walmart
34. Philips
35. Ford
36. JP Morgan
37. Deloitte
38. Shell
39. Mu Sigma
40. Postman
41. Altrix
42. HP
43. HCL
44. Dell
45. Paypal
46. Fidelity Investments
47. Rakuten
48. Infosys
49. Flipkart
50. Myntra

There may be more questions that can be asked from you except the ones listed above. There may be few more companies where you can apply for data scientist role. But I am sure you may be confident by now to go and face any interview and crack it in the best possible way. I hope this blog solved all your data science interview related queries. Now, I can confidently say that you are now interview ready.

Keep applying, keep working hard and you will get what you deserve!

Our Popular Data Science Course

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top