> For the complete documentation index, see [llms.txt](https://aiml.coding.shidan.magmc.in/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://aiml.coding.shidan.magmc.in/readme.md).

# AI & Machine Learning – Complete Self-Learning Syllabus

This is an introduction to AI & Machine Learning self-learning. If you are a newcomer or do not have a strong foundation in basic programming, please complete basic programming fundamentals [C, Java & OOPS](https://foundation.coding.shidan.magmc.in/) first before starting ML.

***

## 0. Orientation & Foundations

***

### 0.1 Artificial Intelligence (AI)

#### Overview

* What is Artificial Intelligence
* Ability of machines to mimic human intelligence
* Real-world AI examples

***

### 0.2 Machine Learning (ML)

#### Overview

* What is Machine Learning
* Subset of Artificial Intelligence
* Uses data to solve tasks
* Learns patterns from past data

***

### 0.3 Deep Learning (DL)

#### Overview

* What is Deep Learning
* Subset of Machine Learning
* Uses neural networks inspired by the human brain

***

### 0.4 Comparison of Concepts

#### AI vs ML vs DL

* AI as a broad concept of intelligent systems
* ML as data-driven statistical learning
* DL as neural-network-based learning

#### ML vs Data Roles

* Machine Learning vs Data Science
* Machine Learning vs Data Analyst

***

### 0.5 Traditional Programming vs Machine Learning

#### Traditional Programming

* Rules + Data → Output
* Fixed logic
* No learning from data

#### Machine Learning

* Data + Output → Model
* Model learns rules automatically
* Improves with experience

***

### 0.6 Usage of Machine Learning

#### Why Machine Learning is Used

* Handles large amounts of data
* Works with structured and unstructured data
* Learns automatically without explicit rules
* Improves performance over time

#### When Machine Learning Should NOT Be Used

* Very small datasets
* Simple rule-based problems
* No clear objective

#### Real-World ML Systems Overview

* Recommendation systems
* Fraud detection systems
* Quality control systems
* Autonomous decision systems

***

## 1. Introduction to Machine Learning

***

### 1.1 Basics

#### Definition & Concept

* Definition of Machine Learning
* How machines learn from data
* Learning from historical data
* Predictive capability
* Examples of ML in daily life

#### ML Models

* Trained using data
* Based on probability, statistics, and linear algebra

***

### 1.2 Data in Machine Learning

#### Why ML handles Data

* Handles large amounts of data
* Improves performance with experience

#### Real-World Data Examples

* Structured data (Rows, columns, databases)
* Unstructured data (Text, images, audio)
* E-commerce data (Sales reports)
* Customer datasets (Age, Gender, Location)

***

## 2. Types of Machine Learning & Algorithms

***

### 2.1 Supervised Learning

#### Overview

* Definition: Uses labeled data where input and output are known
* Target variable is known

#### 2.1.1 Regression

* **Definition**: Predicts continuous values
* **Examples**: House price prediction, Temperature prediction, Stock prices
* **Algorithms**:
  * Linear Regression
  * Multiple Linear Regression
* **Evaluation Metrics**:
  * Mean Squared Error (MSE)
  * R² Score

#### 2.1.2 Classification

* **Definition**: Predicts categorical values
* **Examples**: Spam detection, Disease diagnosis, Pass/Fail
* **Algorithms**:
  * Logistic Regression
  * K-Nearest Neighbors (KNN)
  * Decision Tree
  * Random Forest
  * Support Vector Machine (SVM)
  * Naive Bayes
* **Evaluation Metrics**:
  * Accuracy
  * Precision
  * Recall
  * F1-Score
  * Confusion Matrix
  * Classification Report

***

### 2.2 Unsupervised Learning

#### Overview

* Definition: Uses unlabeled data to find hidden patterns

#### 2.2.1 Clustering

* **Definition**: Groups similar data points
* **Examples**: Customer segmentation, Product grouping
* **Algorithms**:
  * K-Means Clustering
  * Hierarchical Clustering

#### 2.2.2 Dimensionality Reduction

* **Definition**: Reduces number of features for visualization and performance
* **Algorithms**:
  * Principal Component Analysis (PCA)

***

### 2.3 Semi-Supervised Learning

#### Overview

* Uses combination of labeled and unlabeled data
* Used when labeling is expensive

***

### 2.4 Reinforcement Learning

#### Overview

* Learns using trial and error

#### Components

* Agent
* Environment
* Actions
* Rewards
* Policy

#### Applications

* Game playing
* Robotics
* Autonomous systems

***

### 2.5 Comparison of ML Types

#### Analysis

* Differences between supervised, unsupervised, semi-supervised, and reinforcement learning
* Use-cases for each ML type

***

## 3. Applications of Machine Learning

***

### 3.1 Industry Use-Cases

#### Key Areas

* Recommendation systems
* Image recognition
* Speech recognition
* Natural Language Processing (NLP)
* Fraud detection
* Healthcare
* Manufacturing & quality control
* Autonomous systems
* Chatbots & virtual assistants

***

## 4. Machine Learning Workflow

***

### 4.1 End-to-End ML Pipeline

#### Steps

1. **Data Collection**
2. **Data Pre-processing**
   * Cleaning the data after collecting it
     * Handling missing values
     * Removing duplicate values
     * Handling other anomalies such as skewed data, outliers, noise, etc.
3. **Exploratory Data Analysis (EDA)**
   * Understanding and studying the data
   * Gaining strong knowledge about the dataset
   * Analyzing data distributions, relationships, and patterns
4. **Feature Engineering**
   * Creating or adding new columns (features) into the dataset if required
   * Feature Encoding:
     * In the data, there might be categorical values (for example, string data) These need to be converted into numerical values.
     * To do this conversion, many encoding methods are available in machine learning
       * One-Hot Encoding
       * Dummy Encoding
       * Label Encoding
       * etc.
5. **Feature Selection**
   * The dataset may contain many unnecessary or unwanted columns
     * Feature selection is the process of selecting only the necessary columns
     * Many machine learning algorithms are available to perform feature selection
6. **Split into Training and Testing Sets**
   * 80% of the data is used for training
   * The remaining 20% of the data is used for testing
   * The same data should not be used for both training and testing
7. **Feature Scaling**
   * The dataset may contain values in different units or formats
     * Making uniformity among these values is called feature scaling
   * Techniques used for feature scaling in machine learning include
     * Standard Scaling
     * Min-Max Scaling
     * etc.
8. **Building the Machine Learning Model**
   * In machine learning, there are many algorithms such as regression and classification
     * Linear Regression
     * Logistic Regression
     * Clustering
     * etc.
   * After understanding the problem and task, an appropriate ML algorithm is selected to build the model
9. **Model Evaluation**
   * After building the model, it must be tested or evaluated
     * Various model evaluation metrics are used to measure performance
10. **Hyperparameter Tuning**
    * If the model performance is not sufficient or the result is not good, it is improved by providing more training or adjust parameters to improve performance
11. **Model Saving**
    * Once the model performance is verified as good during evaluation
      * The trained model is saved (Machine learning libraries provide methods to save models).
12. **Testing with Unseen Data**
    * After completing all processes, the model is tested with unseen data
    * Training and evaluation are done using available data. Fresh, new data used for testing is called unseen data
13. **Model Deployment**
    * This is the final stage of the machine learning workflow
    * The trained model is implemented in a real-world application

***

### 4.2 Data Preprocessing

#### Tasks

* Handling missing values
* Handling outliers
* Removing duplicates
* Encoding categorical variables
* Feature scaling
  * Normalization
  * Standardization

***

### 4.3 Exploratory Data Analysis (EDA)

#### Techniques

* Dataset overview
* Data types and shape
* Statistical summary
* Data distribution
* Central tendency
* Data spread
* Correlation analysis
* Visualization

***

## 5. Python Programming for Machine Learning

***

### 5.1 Python Basics

#### Fundamentals

* Python introduction
* Variables
* Keywords
* Comments
* Indentation

***

### 5.2 Data Types

#### Types

* Integer
* Float
* String
* Boolean
* Type conversion

***

### 5.3 Data Structures

#### Structures

* List
* Tuple
* Set
* Dictionary
* Differences between data structures
* Use-cases of data structures in ML

***

### 5.4 Operations on Data Structures

#### Operations

* Insert
* Update
* Delete
* Indexing
* Slicing

***

### 5.5 Operators & Control Flow

#### Operators

* Arithmetic operators
* Relational operators
* Logical operators
* Assignment operators
* Membership operators

#### Control Statements

* if
* if-else
* elif
* for loop
* while loop
* break
* continue
* pass
* Difference between for loop and while loop

***

### 5.6 Functions & IO

#### Inbuilt Functions

* len()
* sum()
* min()
* max()
* sorted()
* type()

#### Input & Output

* input()
* print()
* Formatted output

***

### 5.7 Logical Practice Problems

#### Practice

* Prime number
* Palindrome
* Armstrong number
* Fibonacci series
* Factorial
* Unique elements in list
* Frequency counting
* Largest and smallest element
* Pattern problems
* Array and string problems

***

## 6. Python Libraries for Machine Learning

***

### 6.1 NumPy

#### Concepts

* Arrays
* Array operations
* Vectorized operations
* Mathematical functions

***

### 6.2 Pandas

#### Concepts

* Series
* DataFrame
* Data loading
* Data cleaning
* Data manipulation

***

### 6.3 Data Visualization

#### Concepts

* Matplotlib
* Seaborn

#### Plots

* Line plot
* Bar plot
* Histogram
* Box plot

***

### 6.4 Scikit-Learn

#### Concepts

* Introduction to scikit-learn
* Datasets
* Model training
* Model prediction
* Model evaluation

***

## 7. Statistics for Machine Learning

***

### 7.1 Descriptive Statistics

#### Measures

* Mean
* Median
* Mode
* Range
* Variance
* Standard deviation
* Quartiles
* Interquartile range (IQR)

***

### 7.2 Statistical Equations

#### Formulas

* Mean formula
* Variance formula
* Standard deviation formula
* Quartile calculation

***

### 7.3 Usage of Statistics in ML

#### Applications

* Mean for normalization
* Median for outlier handling
* Variance for feature importance
* Standard deviation for scaling
* Quartiles for data distribution analysis

***

## 8. Mathematics for Machine Learning

***

### 8.1 Vectors

#### Concepts

* Vector definition
* Vector representation
* Vector addition
* Scalar multiplication
* Dot product
* Vector usage in ML

***

### 8.2 Matrices

#### Concepts

* Matrix representation
* Matrix addition
* Matrix multiplication
* Matrix transpose
* Identity matrix
* Matrix usage in ML

***

## 9. Probability for Machine Learning

***

### 9.1 Probability Basics

#### Concepts

* Probability definition
* Sample space
* Events

***

### 9.2 Types of Events

#### Categories

* Independent events
* Dependent events
* Conditional probability

***

### 9.3 Probability in Machine Learning

#### Applications

* Classification problems
* Prediction confidence
* Risk and uncertainty
* Naive Bayes intuition

***

### 9.4 Advanced Probability Topics

#### Topics

* Bayes theorem
* Random variables
* Probability distributions
* Normal distribution
* Binomial distribution

***

## 10. Model Evaluation & Optimization

***

### 10.1 Evaluation Concepts

#### Topics

* Training data vs testing data
* Overfitting
* Underfitting
* Bias vs variance
* Cross-validation
* Hyperparameter tuning

***

## 11. Machine Learning Projects

***

### 11.1 Project List

#### Projects

* Student performance prediction
* House price prediction
* Customer segmentation
* Spam detection
* Recommendation system (basic)
* Quality defect prediction
* End-to-end ML project workflow

***

## 12. MLOps & Deployment (Introductory)

***

### 12.1 Deployment Basics

#### Concepts

* Model saving and loading
* Basic deployment concepts
* ML lifecycle overview
* Monitoring models

***

## 13. Ethics & Responsible AI

***

### 13.1 Responsible AI

#### Topics

* Bias in ML models
* Fairness
* Explainability
* Privacy concerns


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://aiml.coding.shidan.magmc.in/readme.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.