# AI & Machine Learning – Complete Self-Learning Syllabus

This is an introduction to AI & Machine Learning self-learning. If you are a newcomer or do not have a strong foundation in basic programming, please complete basic programming fundamentals [C, Java & OOPS](https://app.gitbook.com/o/5nOLt3aBCJD1a86UoFud/s/PtFThxELOLa73jiJHqt2/ "mention") first before starting ML.

***

## 0. Orientation & Foundations

***

### 0.1 Artificial Intelligence (AI)

#### Overview

* What is Artificial Intelligence
* Ability of machines to mimic human intelligence
* Real-world AI examples

***

### 0.2 Machine Learning (ML)

#### Overview

* What is Machine Learning
* Subset of Artificial Intelligence
* Uses data to solve tasks
* Learns patterns from past data

***

### 0.3 Deep Learning (DL)

#### Overview

* What is Deep Learning
* Subset of Machine Learning
* Uses neural networks inspired by the human brain

***

### 0.4 Comparison of Concepts

#### AI vs ML vs DL

* AI as a broad concept of intelligent systems
* ML as data-driven statistical learning
* DL as neural-network-based learning

#### ML vs Data Roles

* Machine Learning vs Data Science
* Machine Learning vs Data Analyst

***

### 0.5 Traditional Programming vs Machine Learning

#### Traditional Programming

* Rules + Data → Output
* Fixed logic
* No learning from data

#### Machine Learning

* Data + Output → Model
* Model learns rules automatically
* Improves with experience

***

### 0.6 Usage of Machine Learning

#### Why Machine Learning is Used

* Handles large amounts of data
* Works with structured and unstructured data
* Learns automatically without explicit rules
* Improves performance over time

#### When Machine Learning Should NOT Be Used

* Very small datasets
* Simple rule-based problems
* No clear objective

#### Real-World ML Systems Overview

* Recommendation systems
* Fraud detection systems
* Quality control systems
* Autonomous decision systems

***

## 1. Introduction to Machine Learning

***

### 1.1 Basics

#### Definition & Concept

* Definition of Machine Learning
* How machines learn from data
* Learning from historical data
* Predictive capability
* Examples of ML in daily life

#### ML Models

* Trained using data
* Based on probability, statistics, and linear algebra

***

### 1.2 Data in Machine Learning

#### Why ML handles Data

* Handles large amounts of data
* Improves performance with experience

#### Real-World Data Examples

* Structured data (Rows, columns, databases)
* Unstructured data (Text, images, audio)
* E-commerce data (Sales reports)
* Customer datasets (Age, Gender, Location)

***

## 2. Types of Machine Learning & Algorithms

***

### 2.1 Supervised Learning

#### Overview

* Definition: Uses labeled data where input and output are known
* Target variable is known

#### 2.1.1 Regression

* **Definition**: Predicts continuous values
* **Examples**: House price prediction, Temperature prediction, Stock prices
* **Algorithms**:
  * Linear Regression
  * Multiple Linear Regression
* **Evaluation Metrics**:
  * Mean Squared Error (MSE)
  * R² Score

#### 2.1.2 Classification

* **Definition**: Predicts categorical values
* **Examples**: Spam detection, Disease diagnosis, Pass/Fail
* **Algorithms**:
  * Logistic Regression
  * K-Nearest Neighbors (KNN)
  * Decision Tree
  * Random Forest
  * Support Vector Machine (SVM)
  * Naive Bayes
* **Evaluation Metrics**:
  * Accuracy
  * Precision
  * Recall
  * F1-Score
  * Confusion Matrix
  * Classification Report

***

### 2.2 Unsupervised Learning

#### Overview

* Definition: Uses unlabeled data to find hidden patterns

#### 2.2.1 Clustering

* **Definition**: Groups similar data points
* **Examples**: Customer segmentation, Product grouping
* **Algorithms**:
  * K-Means Clustering
  * Hierarchical Clustering

#### 2.2.2 Dimensionality Reduction

* **Definition**: Reduces number of features for visualization and performance
* **Algorithms**:
  * Principal Component Analysis (PCA)

***

### 2.3 Semi-Supervised Learning

#### Overview

* Uses combination of labeled and unlabeled data
* Used when labeling is expensive

***

### 2.4 Reinforcement Learning

#### Overview

* Learns using trial and error

#### Components

* Agent
* Environment
* Actions
* Rewards
* Policy

#### Applications

* Game playing
* Robotics
* Autonomous systems

***

### 2.5 Comparison of ML Types

#### Analysis

* Differences between supervised, unsupervised, semi-supervised, and reinforcement learning
* Use-cases for each ML type

***

## 3. Applications of Machine Learning

***

### 3.1 Industry Use-Cases

#### Key Areas

* Recommendation systems
* Image recognition
* Speech recognition
* Natural Language Processing (NLP)
* Fraud detection
* Healthcare
* Manufacturing & quality control
* Autonomous systems
* Chatbots & virtual assistants

***

## 4. Machine Learning Workflow

***

### 4.1 End-to-End ML Pipeline

#### Steps

1. **Data Collection**
2. **Data Pre-processing**
   * Cleaning the data after collecting it
     * Handling missing values
     * Removing duplicate values
     * Handling other anomalies such as skewed data, outliers, noise, etc.
3. **Exploratory Data Analysis (EDA)**
   * Understanding and studying the data
   * Gaining strong knowledge about the dataset
   * Analyzing data distributions, relationships, and patterns
4. **Feature Engineering**
   * Creating or adding new columns (features) into the dataset if required
   * Feature Encoding:
     * In the data, there might be categorical values (for example, string data) These need to be converted into numerical values.
     * To do this conversion, many encoding methods are available in machine learning
       * One-Hot Encoding
       * Dummy Encoding
       * Label Encoding
       * etc.
5. **Feature Selection**
   * The dataset may contain many unnecessary or unwanted columns
     * Feature selection is the process of selecting only the necessary columns
     * Many machine learning algorithms are available to perform feature selection
6. **Split into Training and Testing Sets**
   * 80% of the data is used for training
   * The remaining 20% of the data is used for testing
   * The same data should not be used for both training and testing
7. **Feature Scaling**
   * The dataset may contain values in different units or formats
     * Making uniformity among these values is called feature scaling
   * Techniques used for feature scaling in machine learning include
     * Standard Scaling
     * Min-Max Scaling
     * etc.
8. **Building the Machine Learning Model**
   * In machine learning, there are many algorithms such as regression and classification
     * Linear Regression
     * Logistic Regression
     * Clustering
     * etc.
   * After understanding the problem and task, an appropriate ML algorithm is selected to build the model
9. **Model Evaluation**
   * After building the model, it must be tested or evaluated
     * Various model evaluation metrics are used to measure performance
10. **Hyperparameter Tuning**
    * If the model performance is not sufficient or the result is not good, it is improved by providing more training or adjust parameters to improve performance
11. **Model Saving**
    * Once the model performance is verified as good during evaluation
      * The trained model is saved (Machine learning libraries provide methods to save models).
12. **Testing with Unseen Data**
    * After completing all processes, the model is tested with unseen data
    * Training and evaluation are done using available data. Fresh, new data used for testing is called unseen data
13. **Model Deployment**
    * This is the final stage of the machine learning workflow
    * The trained model is implemented in a real-world application

***

### 4.2 Data Preprocessing

#### Tasks

* Handling missing values
* Handling outliers
* Removing duplicates
* Encoding categorical variables
* Feature scaling
  * Normalization
  * Standardization

***

### 4.3 Exploratory Data Analysis (EDA)

#### Techniques

* Dataset overview
* Data types and shape
* Statistical summary
* Data distribution
* Central tendency
* Data spread
* Correlation analysis
* Visualization

***

## 5. Python Programming for Machine Learning

***

### 5.1 Python Basics

#### Fundamentals

* Python introduction
* Variables
* Keywords
* Comments
* Indentation

***

### 5.2 Data Types

#### Types

* Integer
* Float
* String
* Boolean
* Type conversion

***

### 5.3 Data Structures

#### Structures

* List
* Tuple
* Set
* Dictionary
* Differences between data structures
* Use-cases of data structures in ML

***

### 5.4 Operations on Data Structures

#### Operations

* Insert
* Update
* Delete
* Indexing
* Slicing

***

### 5.5 Operators & Control Flow

#### Operators

* Arithmetic operators
* Relational operators
* Logical operators
* Assignment operators
* Membership operators

#### Control Statements

* if
* if-else
* elif
* for loop
* while loop
* break
* continue
* pass
* Difference between for loop and while loop

***

### 5.6 Functions & IO

#### Inbuilt Functions

* len()
* sum()
* min()
* max()
* sorted()
* type()

#### Input & Output

* input()
* print()
* Formatted output

***

### 5.7 Logical Practice Problems

#### Practice

* Prime number
* Palindrome
* Armstrong number
* Fibonacci series
* Factorial
* Unique elements in list
* Frequency counting
* Largest and smallest element
* Pattern problems
* Array and string problems

***

## 6. Python Libraries for Machine Learning

***

### 6.1 NumPy

#### Concepts

* Arrays
* Array operations
* Vectorized operations
* Mathematical functions

***

### 6.2 Pandas

#### Concepts

* Series
* DataFrame
* Data loading
* Data cleaning
* Data manipulation

***

### 6.3 Data Visualization

#### Concepts

* Matplotlib
* Seaborn

#### Plots

* Line plot
* Bar plot
* Histogram
* Box plot

***

### 6.4 Scikit-Learn

#### Concepts

* Introduction to scikit-learn
* Datasets
* Model training
* Model prediction
* Model evaluation

***

## 7. Statistics for Machine Learning

***

### 7.1 Descriptive Statistics

#### Measures

* Mean
* Median
* Mode
* Range
* Variance
* Standard deviation
* Quartiles
* Interquartile range (IQR)

***

### 7.2 Statistical Equations

#### Formulas

* Mean formula
* Variance formula
* Standard deviation formula
* Quartile calculation

***

### 7.3 Usage of Statistics in ML

#### Applications

* Mean for normalization
* Median for outlier handling
* Variance for feature importance
* Standard deviation for scaling
* Quartiles for data distribution analysis

***

## 8. Mathematics for Machine Learning

***

### 8.1 Vectors

#### Concepts

* Vector definition
* Vector representation
* Vector addition
* Scalar multiplication
* Dot product
* Vector usage in ML

***

### 8.2 Matrices

#### Concepts

* Matrix representation
* Matrix addition
* Matrix multiplication
* Matrix transpose
* Identity matrix
* Matrix usage in ML

***

## 9. Probability for Machine Learning

***

### 9.1 Probability Basics

#### Concepts

* Probability definition
* Sample space
* Events

***

### 9.2 Types of Events

#### Categories

* Independent events
* Dependent events
* Conditional probability

***

### 9.3 Probability in Machine Learning

#### Applications

* Classification problems
* Prediction confidence
* Risk and uncertainty
* Naive Bayes intuition

***

### 9.4 Advanced Probability Topics

#### Topics

* Bayes theorem
* Random variables
* Probability distributions
* Normal distribution
* Binomial distribution

***

## 10. Model Evaluation & Optimization

***

### 10.1 Evaluation Concepts

#### Topics

* Training data vs testing data
* Overfitting
* Underfitting
* Bias vs variance
* Cross-validation
* Hyperparameter tuning

***

## 11. Machine Learning Projects

***

### 11.1 Project List

#### Projects

* Student performance prediction
* House price prediction
* Customer segmentation
* Spam detection
* Recommendation system (basic)
* Quality defect prediction
* End-to-end ML project workflow

***

## 12. MLOps & Deployment (Introductory)

***

### 12.1 Deployment Basics

#### Concepts

* Model saving and loading
* Basic deployment concepts
* ML lifecycle overview
* Monitoring models

***

## 13. Ethics & Responsible AI

***

### 13.1 Responsible AI

#### Topics

* Bias in ML models
* Fairness
* Explainability
* Privacy concerns
