#DATA SCIENCE & ANALYTICS TOPICS

Module 1 - SQL

SQL is the most underrated skill of a Data Engineer. It is used to pull required data points from the complex database of clients. You need to write long queries using joins to get the relevant data points.

Module 2: Python

Python is the backbone of Data Science. It is the most widely used language for DE. It’s very easy to learn compared to other languages and non-tech people can also learn it. You need not become an expert in it. You should mainly know how to manipulate data using it.

Module 3: Maths & Statistics

Statistics is what makes Data Science unique. Lots of Data Science problems are solved using statistics tests. Also understanding the dataset is done using statistics. It is very important for interviews.

Module 4: Tableau

Tableau is a software that offers collaborative data visualization for organizations working with business information analytics. In this course we have covered tableau from basics to advance which will help any individual to clear data analyst interviews.

Module 5: PowerBi

Microsoft Power BI is a business intelligence platform that provides nontechnical business users with tools for aggregating, analyzing, visualizing and sharing data. In our course we have covered all aspects of powerbi necessary to clear data analyst interviews with different case studies to showcase in your resume.

Module 6: MS Excel

Excel is a spreadsheet program from Microsoft and a component of its Office product group for business applications. Microsoft Excel enables users to format, organize and calculate data in a spreadsheet. Its features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications (VBA).

Module 7: Introduction to Data Science

If you are starting going to start your career in Data Domain then you must have understand of this domain completely. In this section you will get the complete overview of the data science domain and its different key components.

Module 8: Machine Learning

Machine learning is core of Data Science. These are mathematical algorithms which try to find patterns and relationships in the input and output of the given dataset. You need to know the inner workings of the algorithms and also how to do hyper parameter tuning.

Module 9: Deep Learning

Deep Learning is a subset of Machine learning. It deals with neural networks which solves complex problems of Computer Vision, Natural Language Processing and time series predictions. Someone will to target advance Data Science roles must have this skill.

Module 10: AWS QuickSight

Amazon QuickSight allows everyone in the organization to understand data by asking questions in natural language, exploring through interactive dashboards, or automatically looking for patterns and outliers powered by machine learning. It is also an add on tool for resume and shows your desire for learning.

Module 11: Google Data Studio

Data Studio is a free tool that turns your data into informative, easy to read, easy to share, and fully customizable dashboards and reports. It is not a necessary tool for you but it's good to add extra tool in your resume to show your fire for learning. It can be helpful if your company decides to use this tool for data analysis.

Module 12: DSA

DSA is not a prime necessity of Data Science but some companies do ask DSA related questions specifically the product based companies like Amazon

Module 13: Generative AI

Generative artificial intelligence (AI) describes algorithms (such as ChatGPT) that can be used to create new content, including audio, code, images, text, simulations, and videos.

Module 14: R Language

R is not widely used as compared to Python but still some companies use it. It’s very power when it comes to plotting variety of graphics. The Exploratory Data Analysis is done better with R as compared to python.

Module 15: Kaggle Optimization Course

Kaggle is a good place to participate in machine learning/ deep learning competitions. This course covers about kaggle platform and how you can utilise kaggle to build your portfolio. It explains how you can use dataset, notebooks, competition here to get medals and boost your portfolio.

Module 16: Google Data Studio

# DATA ENGINEERING TOPICS

Module 17 - Big Data Introduction

✅ Intro to Big data
✅ Hadoop and its evolution
✅ HDFS Architecture
✅ Hadoop ecosystem intro
✅ Linux commands
✅ HDFS commands

Module 18: Map Reduce

✅ Intro to Map Reduce
✅ Different phases of Map Reduce
✅ Combiners and Partitioners
✅ Hash Function in Map Reduce
✅ Shuffling and sorting in Map Reduce
✅ Map Reduce Use Case

Module 19: Hive

✅ What is Hive
✅ Hive Query Language
✅ Comparison Hive vs RDBMS
✅ Hive Architecture
✅ Hive Views
✅ Hive Subqueries
✅ Built-in Functions
✅ Partitioning
✅ Bucketing
✅ Ranking
✅ Sorting
✅ Hive File Formats

Module 20: Sqoop

✅ Introduction
✅ Sqoop Import
✅ Sqoop Eval
✅ Sqoop Export
✅ Connecting to MySQL
✅ Sqoop Incremental
✅ Sqoop job creation

Module 21: HBASE

✅ Introduction
✅ Properties of HBase
✅ RDBMS vs HBASE
✅ HBASE Architecture
✅ HFile
✅ Zookeeper
✅ Update HBASE Data
✅ Delete HBASE Data
✅ Cassandra Overview
✅ HBASE vs Cassandra
✅ Filters in HBase.

Module 22: Scala

✅ Scala Introduction
✅ Why Scala
✅ Datatypes
✅ Strings
✅ If/else
✅ For Loop
✅ While Loop
✅ Functions
✅ Arrays
✅ Lists
✅ Tuples
✅ SetMap
✅ Functional Program
✅ Anonymous Function
✅ Recursion
✅ Scala Operators
✅ Scala Type System

Module 23: Spark

✅ What is Spark
✅ Spark comparison with Map Reduce
✅ RDD/DAG
✅ Immutability
✅ RDD Lineage
✅ Accumulators
✅ Spark Stages
✅ Spark on Yarn
✅ Spark Storge
✅ Intro to SparkSQL
✅ Handling columns in Dataframe/dataset
✅ Aggregations
✅ Window Aggregations
✅ Joins using Data Frame
✅ Broad Cast Join
✅ Shuffle sort-merge join
✅ Spark optimization
✅ Spark Streaming

Module 24: Kafka

✅ Introduction
✅ Kafka Architecture
✅ Index
✅ Cluster
✅ Integrating Kafka with Spark

Module 25: Airflow

✅ Intro to Apache Airflow
✅ Airflow Architecture
✅ Airflow Installation
✅ Creating and viewing DAG
✅ Cron job creation
✅ Logs Viewing
✅ Sensors

Module 26: Amazon Web Services(AWS)

✅ AWS EMR
✅ OnPrem vs Cloud
✅ HDFS vs S3
✅ What is S3
✅ EC2
✅ Elastic IP
✅ AWS storage, networking
✅ S3 and EBS
✅ Athena
✅ AWS Glue
✅ AWS Redshift

Module 27: Azure Databricks

✅ Introduction to Databricks
✅ Databricks Workspace Assets
✅ Databricks Architecture Overview
✅ DBFS Overview
✅ Data Utility in Databricks
✅ File System Utility
✅ Widgets Utility in Databricks
✅ Data Utility in Databricks
✅ Creating a Mount Point
✅ Mount Azure Blob Storage to DBFS
✅ Secret Utility in Databricks
✅ Access ADLS Gen2 Storage Using Account Key
✅ Access Data Lake Storage Gen2 or Blob Storage
✅ Access ADLS Gen2 or Blob Storage Using a SAS Token

Module 28: SnowFlake

✅Introduction to Snowflake
✅Architecture
✅Loading Data
✅Copying Options
✅Loading Unstructured Data
✅Performance Optimization
✅Loading Data from Azure
✅Snowpipe
✅Snowpipe for Azure
✅Time Travel
✅Fail Safe
✅Type of Tables
✅Zero-Copy Cloning
✅Data Sharing
✅Data Sampling
✅Scheduling Tasks

Module 29: Google Cloud Platform

✅ Introduction to GCP
✅ Bigquery
✅ Pub/sub

#LANGCHAIN LANGGRAPH & MULTI AGENT SYSTEMS TOPICS

Module 30: Foundations of Generative AI

Focus: Mastering the ecosystem, prompt engineering, and the development environment.

(1.1) The Generative AI Ecosystem :

• LLM Architecture: Understanding Transformers, Tokens, and Context Windows.
• Open vs. Closed Source: Navigating HuggingFace vs. OpenAI/Anthropic APIs.
• Environment Setup: Python virtual environments, API Key management, and Jupyter bestpractices.

‍(1.2) Advanced Prompt Engineering :‍Prompting Strategies: Zero-shot, Few-shot, and Chain-of-Thought (CoT).
• System Prompts: Defining robust personas and operational boundaries.
• Project: Building a ”Language Tutor” using advanced persona prompting.

Module 31: RAG Pipelines & Vector Databases

Focus: Grounding AI in private data to eliminate hallucinations.

(2.1) Data Engineering for AI :

• Embeddings Explained: Transforming text into vector representations.

• Vector Stores: Implementation with ChromaDB and Pinecone.
• Chunking Strategies: Recursive Character Splitting vs. Semantic Splitting.

‍(2.2) Retrieval Architectures :

• RAG Logic: The Retrieve-Augment-Generate workflow.
• Advanced Retrieval: Implementing Hybrid Search and Reranking.
• Project: Building a ”PDF Chatbot” capable of querying complex documents.

Module 32: Fine-Tuning & Specialized Models

Focus: Customizing models for specific tasks and modalities.

‍(3.1) Fine-Tuning LLMs :

• When to Fine-tune: Trade-offs between RAG and Fine-tuning.

• PEFT Techniques: Efficient training using LoRA and QLoRA.
• Dataset Preparation: Formatting JSONL data for training.

‍(3.2) Multimodal AI (Computer Vision) :

• Vision Models: Working with GPT-4o Vision and Open Source alternatives.
• Image Generation: Basics of Diffusion models.
• Project: Building a ”Visual Q&A System” that analyzes images.

Module 33: Autonomous Agents & LangGraph

Focus: Moving from linear Chains to cyclic, stateful Graphs.

‍(4.1) Tools & Function Calling :

• Function Calling API: Teaching LLMs to use calculators, search, and APIs.

• Custom Tools: Using decorators to wrap Python functions for agents.
• The ReAct Loop: Reason → Act → Observe architectures.

‍(4.2) LangGraph Architecture (NEW) :

• Chains vs. Graphs: Why production agents need loops, not just lines.

• State Management: Defining a global TypedDict state schema.
• Cyclic Flows: Implementing ”Self-Correction” loops (e.g., if code fails, try again).

• Persistence: Adding ”Memory” to agents using Database Checkpointers.

Module 34: Multi-Agent Orchestration

Focus: Orchestrating teams of agents for complex enterprise tasks.

‍(5.1)Multi-Agent Patterns (NEW) :

‍• The Supervisor Pattern: Building a central ”Manager” agent that routes tasks to workers.• Reliability Engineering: Using Pydantic for strict Structured Output (JSON).• Handoffs: Techniques for passing state between specialized agents (e.g., Researcher → Writer).

‍(5.2) Capstone Project: Autonomous Competitor Analyst :

• Objective: Build a Supervisor-Worker system that autonomously researches a company andwrites a report.
Architecture:

• Supervisor: Orchestrates the workflow.
• Research Agent: Uses Tavily Search API to gather live data.
• Writer Agent: Compiles findings into a markdown report.
• Outcome: A fully functional, self-correcting multi-agent system.

#MLOPS TOPICS

Module 35: Introduction to MLOps

# Exploring MLOps Concepts
# Significance and Benefits
# Real-World MLOps Instances on AWS

Module 36: Development and Sharing Tools

# PyCharm, Streamlit, and GitHub Essentials
# Creating Interactive Apps using Streamlit

Module 37: Efficient Workflows: Building and Deploying Pipelines

# Constructing Step-by-Step Workflows
# Model Deployment and Sharing Techniques

Module 38: Creating Web Applications: Flask and Postman Basics

# Understanding Flask and Postman
# Building Your Own Web Application

Module 39: Managing Machine Learning Experiments with MLflow

# MLflow Fundamentals and Detailed Study# Organizing and Tracking Experiments

Module 40: Simplified Deployment and Scalability with Docker and Kubernetes on AWS

# Introduction to Docker and Containerization
# Streamlining Deployment using Docker
# Setting Up CI/CD Pipelines
# Scaling Applications with Kubernetes on AWS

Module 41: Effective Design of Machine Learning Systems: A Comprehensive Case Study

# Applying MLOps Principles in a Real-World Scenario

Data Domain OTT

Our Unique Features

Hands-On Learning

Doubt Clearance Support

Other Important Course Features

Pre-recordedVideos

Certification

Self-Paced