📈 Model Evaluation & Improvement

📘 Introduction

Building a machine learning model is only part of the process — we also need to evaluate how well it performs and improve it when necessary.
Good model evaluation ensures that the model generalizes well to new, unseen data rather than just memorizing the training data.

🧪 Cross-Validation

Concept

Cross-validation is a technique used to evaluate a model’s performance more reliably.
Instead of using a single train-test split, the data is divided into multiple folds, and the model is trained and tested multiple times.

How It Works

Split data into k equal parts (folds).
Train the model on k-1 folds and test it on the remaining fold.
Repeat the process k times, changing the test fold each time.
Average the performance results.

Common choice: k = 5 or k = 10 (5-fold or 10-fold cross-validation).

Example in Python

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# Model
model = LogisticRegression(max_iter=200)

# 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)

print("Cross-validation scores:", scores)
print("Average accuracy:", scores.mean())

Advantages:

Provides a more accurate estimate of model performance.
Reduces bias due to a particular train/test split.

⚙️ Hyperparameter Tuning (GridSearchCV)

Concept

Each model has parameters that can be tuned to improve performance — these are called hyperparameters.
Finding the best combination of these parameters is known as hyperparameter tuning.

Example:

For a decision tree: max_depth, min_samples_split
For KNN: n_neighbors
For SVM: kernel, C, gamma

Grid Search

GridSearchCV systematically tries all possible combinations of hyperparameter values, evaluates each one using cross-validation, and selects the best-performing model.

Example in Python

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# Define model and parameter grid
model = SVC()
param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf'],
    'gamma': [0.001, 0.01, 0.1]
}

# Grid search with 5-fold cross-validation
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X, y)

print("Best parameters:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)

Advantages:

Automates hyperparameter selection.
Ensures model optimization based on cross-validation results.

Alternative methods:

RandomizedSearchCV: Randomly tests a subset of parameters for faster results.
Bayesian optimization: More advanced, efficient approach to find optimal values.

🧩 Avoiding Overfitting and Underfitting

Overfitting

Occurs when a model learns the training data too well, including its noise and details, leading to poor generalization.

Symptoms:

High accuracy on training data
Low accuracy on test data

Solutions:

Use cross-validation
Reduce model complexity (simplify model or fewer features)
Apply regularization (e.g., L1 or L2 penalties)
Add more training data
Use techniques like dropout (for neural networks)

Underfitting

Occurs when a model is too simple and cannot capture the patterns in the data.

Symptoms:

Low accuracy on both training and test data

Solutions:

Increase model complexity (add layers/features)
Reduce regularization
Ensure sufficient feature representation

Visualizing Overfitting vs Underfitting

Model Type	Training Accuracy	Test Accuracy	Behavior
Underfitted	Low	Low	Model too simple
Good Fit	High	High	Balanced performance
Overfitted	Very High	Low	Model memorized data

Example: Detecting Overfitting

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model with different depths
for depth in [2, 5, 10]:
    model = DecisionTreeClassifier(max_depth=depth, random_state=42)
    model.fit(X_train, y_train)
    train_acc = accuracy_score(y_train, model.predict(X_train))
    test_acc = accuracy_score(y_test, model.predict(X_test))
    print(f"Depth={depth}: Train Accuracy={train_acc:.2f}, Test Accuracy={test_acc:.2f}")

Interpretation:

If training accuracy >> test accuracy → Overfitting
If both are low → Underfitting

🧠 Summary

Concept	Description	Benefit
Cross-Validation	Splits data into folds for more reliable model evaluation	Reduces bias and improves reliability
GridSearchCV	Tests multiple hyperparameter combinations	Optimizes model performance
Overfitting	Model too complex, memorizes training data	Use regularization, simplify model
Underfitting	Model too simple, misses patterns	Add complexity, more features

By combining cross-validation, hyperparameter tuning, and regularization techniques, you can build machine learning models that are accurate, robust, and generalize well to unseen data.