Data Science World
Roadmap to Data-Science
Data Scientist Roadmap Written By Rahul
Hi guys, I’m Rahul and in this blog I’ll be discussing a very general question that comes into the mind of a person who thinks of pursuing data science as a career . I’m sure you also got that question in your head right now …. Isn’t it??
Well, that general question is “How to approach data science?”.Beside this question we will be covering many more such questions like, “Where to begin?”, “How long does it take?”, etc.
Before diving into the topic, let me tell you why am I a good fit to discuss this topic. I have done graduation in physics, then I moved into yoga & finished masters in yoga science. I worked as a yoga therapist in Rishikesh for nearly a year. During the beginning of the pandemic, I decided to learn python and then I came to know about data science, I had the same questions in my head and I struggled a lot to get where I’m right now(I’m currently working in a startup CloudyML as a Data Scientist). I’m still learning everyday and I wanted to share this so that it can be helpful for those who are just beginning.
Let’s dive into the journey of data science from scratch………..
What is Data-Science?
This is the first question everybody should be clear about when jumping into it. In simple words, data science is the study of data. It involves developing methods of storing, and analyzing data to extract useful information. The purpose of data science is to gain insights and knowledge from any type of data whether it’s structured or unstructured to make profits for the organization.
Although data science is a separate field as it has connections with computer science too. Data science is more closely related to mathematics or statistics. Having a good grasp of math or statistics can be helpful. But don’t worry if you are not good in math or statistics because you can be better in it with practice.
Why should I choose data science ?
Even though you have opted data science for your career, you should ask about it and make your determination strong about data science so that you can stick with it in the long run. Explain to yourself “Why did you choose it ? is it money or you love it..”. No matter what is the reason, be sure about it and then work on it.
Which components of data science do you need to know ? – Data science is a vast field. Here you need to choose which role you feel comfortable with. Whether it’s data analyst, machine learning engineer, business analyst etc. All these roles have 50-60% components in common. For example people in all these roles learn python, sql and they are aware of some data visualization tools like excel, tableau.
In this blog, we will be covering what you need to become aware of all these roles and target your favorite role. We will cover python, machine learning, deep learning, sql, and about some data visualization tools like excel & tableau.
Want to Become a Data Scientist?
Here We Offers You a Data Science Course With projects.Please take a look of Our Course
1st Block of Data Science: PYTHON
Come on, Let’s first discuss what python is?
According to official python documentation, Python is an interpreted, object-oriented, high-level programming language with dynamic semantics.
Here interpreted means code is processed at runtime by the interpreter. You do not need to compile your program before executing it like we do in C/C++.
Object oriented means a technique of programming that encapsulates code within objects or in short you can define your own objects.
Why are we using python?
Python’s simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. In addition, Python supports modules and packages, which encourages program modularity and code reuse which makes the life of a programmer more cool and less nerdy….
Which modules/libraries/concepts of python you need to know for data science?
The main components of python which you should know are – data structures in python, functions, basic understanding of oops in python and these concepts are called fundamentals of python.
There are some scientific libraries you must know after covering fundamentals which are numpy, pandas, matplotlib, seaborn, sklearn, scipy, stats etc. These scientific libraries will help you throughout data science whenever you want to work with any kind of data. There are some famous deep learning frameworks like tensorflow, keras, pytorch if you plan to go deeper into the machine learning domain.
Which courses can help you in python?
There are many courses on python available on coursera with financial aid. But there are two courses which are worth investing time to get hands on python.
1. Crash course on google
2. Python for everybody
Here are some links provided for youtube videos:
Which websites to follow for logical thinking in python?
hackerrank, hackerearth, leetcode codechef(You can start from hackerrank as it’s easy to start with).
Build Your Career in Data Science
Explore all domain of Data Science.We Are Invited You to Take a look at our Data Science Course
2nd block: MACHINE LEARNING
After python, we will dive into machine learning. The first thing we need to do is to understand how to deal with data. How we can play with arrays, series, & dataframes. To play with arrays, we need to understand numpy methods. Numpy is especially made to manipulate arrays. It will take you a few weeks to understand 50-60% methods of numpy and the rest of the 40% will be along your journey to data science.
Once you have understood the array part, then you will go for the dataframe part. Then comes the role of the pandas library. Pandas will help you how to deal with series(which is like a column of the dataframe) and dataframe in whole. It may be very annoying in the beginning to learn pandas but with some patience and hard work, you will get it. Once you understand pandas, you will use it like a pro. Even if you don’t remember the methods of numpy, pandas etc, never ever feel low as you can just google and get the job done. Nobody memorizes the methods, we just learn how to do it by googling.
After you have learnt how to deal with arrays, series and dataframes, next comes the job of visualization. Obviously you are not going to just see what’s in the data by looking at dataframes….You need to plot charts, graphs etc, for this we have a matplotlib library. It has tons of methods for histogram, bar plot, pie chart, scatterplot, line chart etc. In the beginning, it may look like this is very tedious and you just don’t get it….. but trust me, there’s a trend in everything. Once you cross the threshold of this painful process of learning, you will see the pattern in learning a topic and then you will start loving it…..
There’s another data visualization library seaborn which has amazing visualization plots with some added features. This is also very useful and widely used. The difference between matplotlib and seaborn is that seaborn is the big brother of matplotlib. It is built on the roof of matplotlib. It has beautiful themes which represent data in an elegant way which helps in visualizing. You are going to love seaborn for sure while learning. It is easy to understand and learn.
Book to learn these libraries
- Python for data analysis by O’reilly.
- YoutubeVideo –
Once you unlock playing with data, you get the eyes of the oracle from the Matrix movie, your mind can build the model and predict the future…. If you have not seen “The Matrix” movie, you must watch it to understand what I said in the last line. As an AI lover, you will enjoy this recommendation by me. After finishing the analysis part, there comes the sklearn library which helps you in splitting your data and importing models from it to build different models based on what you want to do with the data or how you want to deal with the data..
Using sklearn is very easy because if you have come this far, you have crossed the threshold I was talking about. There’s also a stats library which will give you the third eye of the data scientist. Understanding the stats library is very helpful in the long run. It helps you dive deeper into the statistical significance of the data which is normally not visible in the charts and plots.
Exploring libraries mentioned above, you need to be great in the concepts of machine learning and to do that you should know how ML(machine learning) algorithms work in the backend.
Machine learning is broadly classified into 3 types:
Supervised, Unsupervised and Reinforcement.
Become Data Scientist
A Course Specially Design for you
In supervised you should know what is supervised?
Which algorithms are supervised?
Why is it important? etc.
Some examples of supervised algorithms are regression, classification, decision tree, random forest, knn, svm etc.
The questions you should ask in the form of what, why, which like we did in supervised, same goes for unsupervised. You need to process your brain functioning like you have merged into it. Then you will start loving it. Some of the popular algorithms in unsupervised ML are k-means, apriori, hierarchical, pca etc.
Reinforcement learning is also a very interesting domain of ML where we are literally working towards building AI which can think of its own. It is based on a reward – penalty system where we reward the agent when it takes a good step towards learning and if it takes a negative step, we penalize it. In short, it is based on learning from experiences. I haven’t explored it much so I leave it up to you.
Now we are at the end of talking about ML, so it’s time to use all the knowledge learnt till now to put it into action. It’s time to take a project and do all the steps from understanding the data, cleaning, visualizing , splitting data into train/test, model building and then evaluating your model on the test data. Once you do this whole process multiple times, your hesitation will go away but if you have not done it yet, there will be some room for hesitation. I remember my hesitation and it went away after trying all these steps on some datasets.
Reference book : Hands on ML by O’reilly publication
Reference course: Most famous Machine learning by Andrew Ng on Coursera
Reference videos of youtube:
3rd block : DEEP LEARNING
When it comes to deep learning, we see two kinds of mindset in the tech community. One who says it’s not important to learn deep learning and others who say, it’s important. I’m not going to prove who is right or wrong here. I feel that if the subject is interesting and it is connected to our domain, we should dive into it.
Let's see some useful cases of deep learning : -
Have you ever thought about how self-driving cars estimate the distance from other objects?
Deep learning has a big role in object detection and scene perception in autonomous vehicles. When the first self-driving car ALVINN was created in 1989, it used neural networks to detect lane lines, segment the ground and drive.
Did it occur to you how your emotions and sentiment can be understood by the media using survey forms?
Sentiment analysis is such a process where using natural language processing, the company can understand the nature of the customer. Through survey forms, your reviews on IMDB or on any other platform, your sentiments can be classified as positive or negative. For example, during the time of election, tweets on twitter can be accumulated and classified to see how much percentage of people tweets positively for a particular political party and they can use this information to understand which govt is highly likely to form in next term.
Do you know how amazon alexa and apple siri works ? – Amazon alexa and Apple siri are known as virtual assistants. The main applications on which these virtual assistants are based is speech recognition, speech to text recognition with natural language processing. Google home, Microsoft Cortana come in the same category.
Let’s see how social media platforms like facebook, twitter, instagram keeps their users engaged? –
Twitter has access to a lot of data from their users, so they analyze it using deep neural networks and use sophisticated algorithms so that they know what their users prefer.
Facebbok uses deep learning to recommend products, pages, friends etc. It uses an artificial neural network for facial recognition that makes perfect tagging plausible.
Instagram uses deep learning to avoid cyberbullying and erasing bad comments.
Can we use deep learning in healthcare?
Deep learning is revolutionizing healthcare systems. Using wearable sensors we can get a lot of data to analyze a patient’s status of health like blood sugar level, blood pressure, heartbeat counts and various other medical data. Through this data we can derive insights into the possibility of any future diseases and work on it in advance to stop it.
Now let’s discuss important topics to cover in deep learning. Before starting deep learning, it’s good to know the basics of tensorflow.
Reference Video Link:
After this video, you will have basic understanding of tensorflow which will help you in understanding coding in deep learning later. First building block of deep learning is neural networks.
You should have understanding of what is it? and why do we use it?, what is it’s architecture?
As usually in the beginning you might feel you are not getting it, but hold on as it’s a topic with complexity so it takes time to sink in. You should cover learnings of how to do hyperparameter tuning, regularization, optimization in deep learning as it was in machine learning too. Apply all of this on a dataset to get hands on and feel confident. Learn how to structure your project from beginning to the end. Then after covering all the basic components, learn about CNN(convolutional neural networks) as it is the fundamentals of computer vision. CNN has applications in autonomous driving, face recognition, reading radiology images in health sector etc. Once you understand how CNN works, you should apply it for visual detection and recognition tasks. You will also learn how to use pre-trained convnets. Learn how to apply all of this on image, video and on other 2-D, 3-D data.Cover the topics of RNN, GRUs, LSTMs later for NLP too. To cover all these topics, you can refer to this coursera specialization
Best course name: -
1) Deep Learning Specialization by deeplearning.ai on coursera
2) DeepLearning.AI TensorFlow Developer Professional Certificate on coursera.
Best book I suggest – Deep learning with python by manning publication 2nd edition
While studying the deep learning NLP part, if you start loving the semantics of the language and if you feel drawn towards how speech recognition works, you can explore the domain of natural language processing(NLP). There are two python libraries (NLTK and spacy), which will help you deal with texts, lines and paragraphs. NLTK is a string processing library and it has varieties of algorithms to choose for a particular problem whereas spacy uses an object oriented approach and it has most updated algorithms for the problem. To be better in NLP, it’s good to have the knowledge of the regex library of python which acts as butter in NLP.
Best course on NLP – NLP specialization by deep learning.ai on coursera
I highly recommend spending enough time to cover these courses and do some projects to practice & please don’t try to just finish it because you need to process all of this which takes time.
After covering deep learning, you will probably get the rough idea which domain you may like where you can spend most of the time in work. Personally I like computer vision but since I started learning NLP, I started loving NLP more than computer vision.
Become a Data Scientist
Explore All Domain Of Data Science
4th block - SQL
Whenever it comes to data, we should also ask “what type of data?”, “where to store it?”, “which format of data?” and many more questions you may ask….. process such questions in your mind like you are into it as a lover.
Have you ever wondered how much data is created each day?
Well..it’s roughly one quintillion(10^18).
Google alone creates more than 20 petabytes of data everyday from 3.5 billion search queries. Google stores all this data so that when we search next time, it can show us based on our behavior of previous searches. Why am I talking about this ? Just looking at the previous data, you can think why data is important for companies and for today’s world. If you can develop interest in this subject, you can do something great to benefit yourself and for society too. Now there are different types of data, it can be structured data, unstructured data and semi structured data. There are different databases according to the type of data. Just remember, once you are into the field of data, you can imagine how important databases can be since in the end, you are going to store data and do some operations on it.
You should be aware of at least one Mysql database in the beginning. Once you start with this, you will be aware of other relational(for structured data) & non relational databases(for non structured data).
This is my humble opinion, never underestimate this thing because in the beginning many beginners leave this to finish it in the end and then they face difficulty in the interviews. I made the mistake of finishing SQL in the end and I could not crack interview due to this subject.
Best books for sql –
Head first SQl,SQLcookbook
Video link :-
5th block - EXCEL
If you are more interested in the analysis part of data science and not interested in too much coding, then tools like excel are very important for you. Basics of EXCEL is must for everyone but intermediate and advanced parts are necessary for data analyst aspirants. Some of the common topics you must know are the use of computing functions like sums, avg etc, conditional formatting, plotting charts, worksheet designing, different formulas like VLookup, concat etc. Basically in short, If I give you a dataset or excel sheet and ask you to summarize me and do certain tasks, you should know how to proceed in excel and process it in mind. It does not matter if you have to google things, you should have enough confidence to get it done.
6th block - TABLEAU
Tableau is also more important for data analyst aspirants. Even if you are not interested in the analyst role and you have done your hands dirty on it, it’s a plus point. It’s an interesting software to visualize data in a very easy way. To plot the same things in matplotlib and seaborn, you have to write multiple lines of code and here you just do some drag and drop and you get a beautiful plot. Let me do a simple comparison. If you are learning the basics of matplotlib and seaborn, it may take a few days but tableau is very easy to learn and you can learn basic plotting in a day. I’m not a pro at tableau but I have spent some time on an analysis project so I can say this, if you love the analysis part, you will love tableau.
For the basics level, you should be aware of field types, aggregation, filter, joins, joining/blending, calculated field, difference between .twb/.twbx etc.
Here we come to the end of the blog. I hope I gave you all some useful information. Please do share your feedback guys.
Be Expert in Data Science
We Offers You to Take a look at our DATA SCIENCE COURSE