CloudyML

Data Scientist Interview QnA
Company: Tredence

1. Explain how the filter function works in python?

The filter() method filters a series using a function that checks if each element in
the sequence is true or not. The filter() function takes two arguments: function – a function and iterable – an iterable like sets, lists, tuples etc.

2. How to remove duplicate elements from a list?

First we have a List that contains duplicates. Create a dictionary, using the List items as keys. This will automatically remove any duplicates because dictionaries cannot have duplicate keys. Then, convert the dictionary back into a
list.

3. Difference between a shallow and a deep copy?

It’s faster to do shallow repetitions. It does, however, handle pointers and references in a “lazy” manner. It just copies over the pointer price rather than producing a current copy of the specific knowledge the pointer links to. As a result, each of the initial and subsequent copies can have pointers that relate to the same underlying knowledge. Deep repetition clones the underlying data completely. It is not shared by the first and, as a result, by the copy.

4. What is TF/IDF vectorization?

The TF-IDF statistic, which stands for term frequency–inverse document frequency, is a numerical measure of how essential a word is to a document in a collection or corpus. It’s frequently used in information retrieval, text mining, and user modelling searches as a weighting factor. The tf–idf value rises in proportion to the number of times a word appears in a document and is offset by the number of documents in the corpus that contain the term, which helps to compensate for the fact that some words appear more frequently than others.

5. Explain how the filter function works in python?

The filter() method filters a series using a function that checks if each element in the sequence is true or not. The filter() function takes two arguments: function – a function and iterable – an iterable like sets, lists, tuples etc.

6. How to remove duplicate elements from a list?

First we have a List that contains duplicates. Create a dictionary, using the List items as keys. This will automatically remove any duplicates because dictionaries cannot have duplicate keys. Then, convert the dictionary back into a
list.

7. Difference between a shallow and a deep copy?

It’s faster to do shallow repetitions. It does, however, handle pointers and references in a “lazy” manner. It just copies over the pointer price rather than producing a current copy of the specific knowledge the pointer links to. As a result, each of the initial and subsequent copies can have pointers that relate to the same underlying knowledge. Deep repetition clones the underlying data completely. It is not shared by the first and, as a result, by the copy.

8. What is TF/IDF vectorization?

The TF-IDF statistic, which stands for term frequency–inverse document frequency, is a numerical measure of how essential a word is to a document in a collection or corpus. [1] It’s frequently used in information retrieval, text mining, and user modelling searches as a weighting factor. The tf–idf value rises in proportion to the number of times a word appears in a document and is offset by the number of documents in the corpus that contain the term, which helps to compensate for the fact that some words appear more frequently than others.

9. In what terms DBSCAN is better than K- Means Clustering?

Outliers and noisy datasets are easily handled by DBScan clustering, whereas outliers and noisy datasets are not adequately handled by K-means clustering. The number of clusters does not need to be stated in DBScan clustering. The number of clusters specified in K-means clustering is important.

Get Complete Hands-On Practical Learning Experience

Data Scientist/Analytics

Become Job-Ready

Scroll to Top