π Model Evaluation & Improvement
π Introduction
Building a machine learning model is only part of the process β we also need to evaluate how well it performs and improve it when necessary.
Good model evaluation ensures that the model generalizes well to new, unseen data rather than just memorizing the training data.
π§ͺ Cross-Validation
Concept
Cross-validation is a technique used to evaluate a modelβs performance more reliably.
Instead of using a single train-test split, the data is divided into multiple folds, and the model is trained and tested multiple times.
How It Works
- Split data into k equal parts (folds).
- Train the model on k-1 folds and test it on the remaining fold.
- Repeat the process k times, changing the test fold each time.
- Average the performance results.
Common choice: k = 5 or k = 10 (5-fold or 10-fold cross-validation).
Example in Python
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
# Load data
X, y = load_iris(return_X_y=True)
# Model
model = LogisticRegression(max_iter=200)
# 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)
print("Cross-validation scores:", scores)
print("Average accuracy:", scores.mean())
Advantages:
- Provides a more accurate estimate of model performance.
- Reduces bias due to a particular train/test split.
βοΈ Hyperparameter Tuning (GridSearchCV)
Concept
Each model has parameters that can be tuned to improve performance β these are called hyperparameters.
Finding the best combination of these parameters is known as hyperparameter tuning.
Example:
- For a decision tree:
max_depth,min_samples_split - For KNN:
n_neighbors - For SVM:
kernel,C,gamma
Grid Search
GridSearchCV systematically tries all possible combinations of hyperparameter values, evaluates each one using cross-validation, and selects the best-performing model.
Example in Python
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
# Load data
X, y = load_iris(return_X_y=True)
# Define model and parameter grid
model = SVC()
param_grid = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf'],
'gamma': [0.001, 0.01, 0.1]
}
# Grid search with 5-fold cross-validation
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X, y)
print("Best parameters:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)
Advantages:
- Automates hyperparameter selection.
- Ensures model optimization based on cross-validation results.
Alternative methods:
- RandomizedSearchCV: Randomly tests a subset of parameters for faster results.
- Bayesian optimization: More advanced, efficient approach to find optimal values.
π§© Avoiding Overfitting and Underfitting
Overfitting
Occurs when a model learns the training data too well, including its noise and details, leading to poor generalization.
Symptoms:
- High accuracy on training data
- Low accuracy on test data
Solutions:
- Use cross-validation
- Reduce model complexity (simplify model or fewer features)
- Apply regularization (e.g.,
L1orL2penalties) - Add more training data
- Use techniques like dropout (for neural networks)
Underfitting
Occurs when a model is too simple and cannot capture the patterns in the data.
Symptoms:
- Low accuracy on both training and test data
Solutions:
- Increase model complexity (add layers/features)
- Reduce regularization
- Ensure sufficient feature representation
Visualizing Overfitting vs Underfitting
| Model Type | Training Accuracy | Test Accuracy | Behavior |
|---|---|---|---|
| Underfitted | Low | Low | Model too simple |
| Good Fit | High | High | Balanced performance |
| Overfitted | Very High | Low | Model memorized data |
Example: Detecting Overfitting
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train model with different depths
for depth in [2, 5, 10]:
model = DecisionTreeClassifier(max_depth=depth, random_state=42)
model.fit(X_train, y_train)
train_acc = accuracy_score(y_train, model.predict(X_train))
test_acc = accuracy_score(y_test, model.predict(X_test))
print(f"Depth={depth}: Train Accuracy={train_acc:.2f}, Test Accuracy={test_acc:.2f}")
Interpretation:
- If training accuracy >> test accuracy β Overfitting
- If both are low β Underfitting
π§ Summary
| Concept | Description | Benefit |
|---|---|---|
| Cross-Validation | Splits data into folds for more reliable model evaluation | Reduces bias and improves reliability |
| GridSearchCV | Tests multiple hyperparameter combinations | Optimizes model performance |
| Overfitting | Model too complex, memorizes training data | Use regularization, simplify model |
| Underfitting | Model too simple, misses patterns | Add complexity, more features |
By combining cross-validation, hyperparameter tuning, and regularization techniques, you can build machine learning models that are accurate, robust, and generalize well to unseen data.