Machine Learning For Data Science

Course Curriculum

Your first assignment consist of various basic concepts of python. This language is important for any learner in order to drive various machine learning based solutions in this course.

This assignment keeps beginners in mind and covers topics like python variables, numeric python operators, Logical Operators, various loop statements(If, while and for) , Functions in python, strings and their operations/functions, list and list comprehension along with reference videos on every topic. Learners can learn by solving problems on each concept of python covered here.

Since data structures are way of organizing and storing data, hence it becomes important topic. This assignment aims to cover various python data structures and their implementations. Topics which are covered are Lists and it’s operations such as slicing, deleting ,appending, updating etc. List comprehensions, Sets and it’s operations like union, intersection, diferences etc, Tuples and its implementation, Dictionaries and its operations like adding and removing key value pairs, iterating item values etc. Along with handson problem added reference videos on every topic covered.

Numpy aims to provide an array object that is up to 50x faster than traditional Python lists, this assignment will help learners to optimize their code using it. Numpy Assignment covers topics like defining various different dimensions of numpy arrays, Various Numpy functions to create arrays like arange(), eye(), full(), diag(), linespace() etc, Defining Numpy array with random values, Reshaping arrays to different dimensions, Numpy array indexing and slicing, Difference between Numpy copy and view function, Bonus operation on numpy like hstack() and vstack(), Numpy array modifications using insert, delete and append functions, Mathematical operations and searching in Numpy arrays. Also shown practical operation on how arrays are faster than lists. To understand all topics thoroughly, we have added reference links on each topic.

Pandas has functions for analyzing, cleaning, exploring, and manipulating data, which makes it important library for data science. This assignment introduces topics on pandas like pandas series and its operations like sort, append, indexing etc, Pandas dataframes and its operations like accessing existing rows, columns, adding new rows or columns. Converting series to dataframes, Concatenation of one or more dataframes, dataframeelement acess using conditions, dataframe Indexes, loc and iloc, reading csv, merging, groupby and apply function. For more conceptual clarity we also added reference videos for all the topics.

Data Cleaning plays an important role in the field of Data Managements as well as Analytics and Machine Learning. This assignment will give you practical experience on how to handle any dirty data. You will learn how to treat inconsistent/irrelevant columns in the data, Handling Missing values by dropping empty records, imputing missing fields using techniques like forward fill, backward fill, mean imputation, constant imputation, interpolation and knn, Pandas data frame shallow and deep copy methods, Working and optimizing code with iterrows and itertuples, renaming columns with meaningful labels, treating duplicate values, Treating constant( low variance) column values. Implementing Regular expressions on textual data to play with different patterns. For more conceptual clarity also added reference links on each topic.

Regular Expressions, or regex or regexp in short, are extremely and amazingly powerful in searching and manipulating text strings, particularly in processing text data. One line of regex can easily replace several dozen lines of programming code. In this assignment you will be solving easy to hard level regex problems like matching digit and non-digit characters, detecting HTML tags in text, IP address validation, detecting email addresses, detecting domain problems, whitespace and non-whitespace problems, and substring problems. Provided reference video and document links for any assistance. It is widely used in projects that involve text validation, NLP and text mining, hence Regex has become a useful tool to know.

Exploratory Data Analysis is a way of visualizing, summarizing and interpreting the information that is hidden in rows and column format. In this assignment you will be applying Data cleaning techniques which you learned in previous assignment, Method to fetch basic statistical information out of data, Detecting Outliers which pollutes the data, outlier removal techniques like IQR, and Z-score and removing them to make a uniform dataset. You will implement Univariate plots like box plot, Bar plots, Count Plots, Histogram and density plots, Bivariate plots like Scatter plots, Line plots, box plots with respect to third variable and joint distribution plots, also Multivariate plots like Pair plot, multivariate scatter plot, parallel coordinates and Heatmaps. Every topic has a YouTube reference link to give you better conceptual clarity.

Feature selection in machine learning is to find the best set of features to reduce the computational cost and improve the performance of the ML models. This assignment is full of techniques used for Feature selection. Here you will be implementing Intrinsic method like Tree based feature selection using feature importance and SelectFromModel, wrapper methods like RFE and SelectKbest and few filter methods like Missing value ratio threshold, Variance Threshold, Chi2 Test and Anova test, also you will learn about univariate ROC_AUC test and techniques to remove multicollinearity like Variance inflation factor. For our learners we also added reference links on each topic.

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models. This assignment would give you chance to feature engineering using various techniques. Using a sample dataset you will create new features out of raw data by calculating sum, subtraction, mean etc, Imputation of missing values both textual and numerical, Handling outliers and detecting it using percentile and standard deviation, Scaling techniques like use of Binning to avoid overfitting, Encoding techniques like One-hot encoding and label encoding, Scaling techniques like normalization and standardization, Implementing Variable transformation using, log, square root, reciprocal and exponential. Date and Time engineering, Feature creation (sum, subtraction, mean etc),Variable Transformation (Log, reciprocal, exponential etc). Reference videos links.

Simple Linear Regression is a type of Regression algorithms. In this assignment you will learn how to build your first Machine learning model using simple linear regression. Here you will learn about mathematics behind working of simple linear regression. Learners will be provided a dataset, where they can find relationship between two variables statistically and visually and find the best fit line for the dataset by fitting the model on training data, you can also look at how your model intercept and slope looks like, Predicting on test dataset and evaluating model using RMSE, R square, Residual square error and learning basics of overfitting, underfitting and assumptions of simple linear regression. Learners are provided with reference link on each topic for better conceptual clarity

Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one predictor variable to predict the response variable. This assignment will give opportunity to implement various steps involved in machine learning like Data cleaning, feature engineering, feature selection and finally building Multiple Linear Regression model using ordinary least squares and evaluating the model using metrics like R square, adjusted R square and visualizing parity, trend and error term plots. Reference link on each topic is also added on each topic for better conceptual clarity.

Advance regression techniques will introduce on Regularization using Ridge algorithm, Lasso algorithm and ElasticNet algorithms in order to reduce the model error and avoid overfitting scenario. In this this assignment you will explore the dataset provided using EDA, you will be required to clean the data, Prepare the data using data engineering techniques, Scaling the training data, Building Ridge, Lasso and ElasticNet models and implementing hyperparameter tuning using GridsearchCV, you will also build polynomial regression to create a generalized model. Model evaluation and selection using adjusted R square, mae, rmse and R square. Youtube reference link on each topic are also provided to give better conceptual clarity.

Logistic regression is the go-to method for binary classification problems (problems with two class values). In this assignment you will discover the logistic regression algorithm for machine learning. You will be Implementing Logistic Regression through a case study where you will play with data, require to clean the data, preparing it using feature engineering methods, removal of outliers, Feature scaling and building your first classification model using Logistic regression, removal of multicollinearity using VIF, Rebuilding the model and evaluating it using confusion metrics, Plotting ROC Curve, selection of cutoff probability and finalizing the best model. Reference links on each topic to give better conceptual clarity.

In this assignment you will discover the Principal Component Analysis machine learning method for dimensionality reduction. You will be able to learn and implement Math behind PCA , Standardization, covariance matrix computation, computing eigen vectors and eigen values and their use, creating Feature vectors and two dimensional visualization for the same. Finally building Principal component based data using a given dataset. Reference videos links are provided on each topic covered.

The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems. In this assignment you get handson experience on implementing KNN, where you will learn about Meshgrid concepts, Creating a function for different K values on Nine different variety of datasets. You will be required to solve a use case using KNN, build an effective model, making use of best p-value, improving model by selecting the best k value based on accuracy scores.Reference link on each topic are provided for better conceptual clarity.

Decision Tree algorithm uses the tree representation to solve any classification or regression problems. In this assignment you will get an opportunity to solve a use case using Decision tree. You will be able to discover on concepts of splitting criteria like Homogeneity, Entropy, Gini Index, Information and gain. learn about hyperparameter involved in decision tree and Tuning it to improve the model performance, You will be able visualize your tree structure in order to know logic behind its split. Building you Decision tree Model and evaluating it using confusion metrics. Preventing overfitting Issues in DT using Pruning, where minimum cost complexity method is used. Reference videos on each topic to make your learning smooth.

Naïve Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in a wide variety of classification tasks. In this assignment, we will understand the Naïve Bayes algorithm and all essential concepts so that there is no room for doubts in understanding. Here we will be implementing different Niave Bayes algorithms available like Burnoulli, Multinomial and Guassian algorithms to solve a case study. You will learn how to vectorize words in the textual data using CountVectorizer, Concepts of Bayes Theorem and Laplace Smoothing will be cleared here. Reference link on each topic are provided for better conceptual clarity

Bagging is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. In this assignment you discover about Bagging concepts by solving a case study. You will get to know about Weak learners, Bias variance tradeoff, Bagging meta-estimator, Random Forest, Bootstrap method, bootstrap aggregation, Estimated Performance and Variable Importance. Reference links and provided on each topic for better conceptual clarity

Boosting algorithms often outperform simpler models like logistic regression and decision trees. It is a general ensemble method that creates a strong classifier from a number of weak classifiers. In this assignment you will be solving a case study using various boosting algorithms like AdaBoost (Adaptive Boosting), Gradient Tree Boosting, XGBoost, LightGBM and CatBoost. You will get an opportunity to know difference in working of each algorithm and selecting the best model for to solve your problem. Reference video links are provided on each topic to give better conceptual clarity.

Cluster analysis, or clustering, is an unsupervised machine learning task. It involves automatically discovering natural grouping in data.

This assignment will provide you with opportunity to implement various clustering methods to solve a case study. Here you will learn by doing on topics like Affinity Propagation, Agglomerative Clustering, BIRCH, DBSCAN, K-Means, Mini-Batch K-Means, Mean Shift, OPTICS, Spectral Clustering, Mixture of Gaussians clustering methods, Hopkins test and Hierarchical Clustering methods. Reference video links are provided on each topic to give better conceptual clarity.

support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points to solve any regression or classification problems. In this assignment you will discover SVM by solving hands-on case study. Here you will get to know about Hyperplane, Mathematics Behind SVM, Support Vectors, Hyperparameters in SVM like C, gamma and kernels like linear, rbf and polynomial, Slack Variable and advantage & disadvantage of using it. Reference videos on each topics are provided along with interview videos for better conceptual clarity

Gradient descent is an optimization algorithm that’s used when training a machine learning model, hence it becomes important to know about it. In this assignment we will learn by doing stuffs on Gradient Descent like Defining Cost Functions, Implementing batch gradient Descent, stochastic gradient descent, Optimization, Closed Form Vs Gradient Descent, evaluation using Plot cost vs Time, Learning rate, Rescale inputs, few passes and Plot mean Cost. Reference link on each topic are covered for better conceptual clarity.

Scroll to Top