Gridscript

πŸ€– Introduction to Machine Learning

πŸ“˜ What Is Machine Learning?

Machine Learning (ML) is a branch of Artificial Intelligence (AI) that enables computers to learn from data and make predictions or decisions without being explicitly programmed.
Instead of following fixed rules, ML systems identify patterns in data and improve automatically with experience.

Key idea: Provide data ➜ let the computer learn ➜ make predictions on new data.

Examples of Machine Learning Applications

🧠 Supervised vs. Unsupervised Learning

Machine Learning algorithms are generally classified into two main categories: supervised and unsupervised learning.

1. Supervised Learning

Supervised learning uses labeled data β€” the input data comes with the correct answers (targets).
The goal is to learn a mapping from inputs (X) to outputs (y).

Examples:

Types of supervised learning:

TypeDescriptionExample
RegressionPredicts continuous valuesPredicting prices, temperatures
ClassificationPredicts categories or labelsSpam detection, sentiment analysis

Example using scikit-learn:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Example data
X = [[1000], [1500], [2000], [2500]]
y = [200000, 250000, 300000, 350000]

# Split data into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)
print(predictions)

2. Unsupervised Learning

Unsupervised learning uses unlabeled data β€” there are no predefined outputs.
The goal is to find patterns or structure within the data.

Examples:

Types of unsupervised learning:

TypeDescriptionExample
ClusteringGroups similar data pointsCustomer segmentation
Dimensionality ReductionSimplifies data by reducing variablesPCA (Principal Component Analysis)

Example using KMeans clustering:

from sklearn.cluster import KMeans
import numpy as np

data = np.array([[1, 2], [1, 4], [1, 0],
                 [10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(data)

print("Cluster centers:", kmeans.cluster_centers_)
print("Labels:", kmeans.labels_)

🧩 Train-Test Split

To evaluate how well a model performs, we divide our dataset into two parts:

This ensures the model generalizes well and doesn’t just memorize the training data.

Example:

from sklearn.model_selection import train_test_split
import numpy as np

X = np.random.rand(100, 5)
y = np.random.rand(100)

# Split 80% for training, 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training set size:", len(X_train))
print("Test set size:", len(X_test))

πŸ“ Evaluation Metrics

Once a model is trained, we need to measure how well it performs.
The choice of evaluation metric depends on the type of problem (classification or regression).

1. Accuracy

The percentage of correctly predicted labels.

Accuracy = (Number of correct predictions) / (Total predictions)

Example: If a model correctly predicts 90 out of 100 test samples,
Accuracy = 90 / 100 = 0.9 (or 90%)

In scikit-learn:

from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]
print("Accuracy:", accuracy_score(y_true, y_pred))

2. Precision

Precision measures how many of the positive predictions were actually correct.

Precision = True Positives / (True Positives + False Positives)

Use case: Important when false positives are costly (e.g., spam detection).

3. Recall

Recall measures how many of the actual positives were correctly predicted.

Recall = True Positives / (True Positives + False Negatives)

Use case: Important when missing a positive case is costly (e.g., detecting diseases).

4. Combined Example in Python

from sklearn.metrics import precision_score, recall_score, f1_score

y_true = [1, 0, 1, 1, 0, 1]
y_pred = [1, 0, 1, 0, 0, 1]

print("Precision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))
print("F1 Score:", f1_score(y_true, y_pred))

🧠 Summary

ConceptDescription
Machine LearningEnables systems to learn from data and make predictions
Supervised LearningLearns from labeled data (e.g., regression, classification)
Unsupervised LearningFinds hidden patterns in unlabeled data (e.g., clustering)
Train-Test SplitDivides data into training and testing sets to evaluate models
AccuracyMeasures overall correctness
PrecisionHow many predicted positives were correct
RecallHow many actual positives were found

By understanding these foundational ML concepts, you can start building, training, and evaluating models that learn from data β€” the heart of modern Artificial Intelligence!