🧠 Supervised Learning
📘 What Is Supervised Learning?
Supervised Learning is a type of Machine Learning where the model learns from labeled data — that is, data with known input (features) and output (target) values.
The goal is to train a model that can predict the output for new, unseen inputs.
Examples:
- Predicting house prices based on area, location, and rooms 🏠
- Classifying emails as spam or not spam 📧
- Predicting if a patient has a disease based on medical data 🧬
📈 Linear Regression
Concept
Linear Regression is used to predict a continuous numeric value.
It assumes a linear relationship between input features (X) and the target variable (y).
Equation:
y = m * x + b
Where:
mis the slope (how much y changes with x)bis the intercept (value of y when x = 0)
Example in Python
import numpy as np
from sklearn.linear_model import LinearRegression
# Training data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])
# Create and train model
model = LinearRegression()
model.fit(X, y)
# Predict
prediction = model.predict([[6]])
print("Predicted value for x=6:", prediction)
Use case: Predicting prices, sales, or other continuous outcomes.
📊 Logistic Regression
Concept
Despite its name, Logistic Regression is used for classification, not regression.
It predicts probabilities that map input features to categorical outcomes (like yes/no, 0/1).
Equation:
P(y=1) = 1 / (1 + e^-(b0 + b1*x))
If P(y=1) > 0.5, classify as 1; otherwise, classify as 0.
Example in Python
from sklearn.linear_model import LogisticRegression
# Example data
X = [[1], [2], [3], [4], [5]]
y = [0, 0, 0, 1, 1]
model = LogisticRegression()
model.fit(X, y)
# Prediction
print(model.predict([[3.5]])) # Output: 0 or 1
Use case: Spam detection, credit approval, disease diagnosis.
🌳 Decision Trees
Concept
Decision Trees split data into branches based on feature conditions, forming a tree-like model of decisions.
Each branch represents a rule, and each leaf represents an outcome.
Advantages:
- Easy to interpret and visualize
- Handles both numerical and categorical data
Example in Python
from sklearn.tree import DecisionTreeClassifier
X = [[0, 0], [1, 1]]
y = [0, 1]
model = DecisionTreeClassifier()
model.fit(X, y)
print(model.predict([[0.5, 0.5]]))
Use case: Customer segmentation, loan approval, fraud detection.
🌲 Random Forests
Concept
A Random Forest is an ensemble method that builds multiple decision trees and combines their results (via averaging or voting).
This reduces overfitting and improves accuracy.
Advantages:
- More accurate and stable than a single decision tree
- Works well on complex datasets
Example in Python
from sklearn.ensemble import RandomForestClassifier
X = [[0, 0], [1, 1], [1, 0], [0, 1]]
y = [0, 1, 1, 0]
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
print(model.predict([[0.2, 0.8]]))
Use case: Credit scoring, medical diagnosis, stock market predictions.
📍 K-Nearest Neighbors (KNN)
Concept
The K-Nearest Neighbors (KNN) algorithm classifies a data point based on how its neighbors are classified.
It looks at the ‘k’ closest data points and assigns the majority label among them.
Steps:
- Choose number of neighbors (k).
- Measure distance (usually Euclidean).
- Predict the most common class among neighbors.
Example in Python
from sklearn.neighbors import KNeighborsClassifier
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
print(model.predict([[1.5]])) # Predict class for new value
Use case: Image recognition, recommendation systems, pattern classification.
⚙️ Support Vector Machines (SVM)
Concept
Support Vector Machines are powerful models that find the best boundary (hyperplane) separating different classes.
SVMs try to maximize the margin between data points of different classes.
Advantages:
- Effective in high-dimensional spaces
- Works well with small datasets
Example in Python
from sklearn import svm
X = [[0, 0], [1, 1]]
y = [0, 1]
model = svm.SVC(kernel='linear')
model.fit(X, y)
print(model.predict([[0.5, 0.5]]))
Use case: Text classification, handwriting recognition, face detection.
🧠 Summary
| Algorithm | Type | Main Use | Key Idea |
|---|---|---|---|
| Linear Regression | Regression | Predict continuous values | Fits a straight line to data |
| Logistic Regression | Classification | Binary outcomes | Uses sigmoid function for probabilities |
| Decision Tree | Classification/Regression | Rule-based predictions | Splits data by conditions |
| Random Forest | Classification/Regression | Ensemble learning | Combines multiple trees |
| KNN | Classification | Based on neighbor votes | Uses distance metrics |
| SVM | Classification | Separates classes | Maximizes margin between data points |
Supervised learning algorithms form the foundation of predictive analytics.
Choosing the right algorithm depends on the type of data, problem goal, and performance requirements.