π Data Visualization in Python
π Introduction
Data visualization is the process of representing data graphically to make insights easier to understand and communicate.
In Data Science, visualization helps reveal patterns, trends, and relationships that might not be obvious from raw data.
Two of the most widely used libraries for visualization in Python are:
- Matplotlib β a foundational library for creating static, animated, and interactive plots.
- Seaborn β built on top of Matplotlib, it provides a simpler and more visually appealing interface.
π Plotting with Matplotlib
1. Line Plot
Line plots are used to visualize trends over time or continuous data.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]
plt.plot(x, y, marker='o')
plt.title("Line Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()
Use case: Displaying trends such as sales over time, temperature changes, or stock prices.
2. Bar Plot
Bar plots are used to compare quantities of different categories.
categories = ["A", "B", "C", "D"]
values = [23, 45, 56, 78]
plt.bar(categories, values, color='skyblue')
plt.title("Bar Chart Example")
plt.xlabel("Category")
plt.ylabel("Value")
plt.show()
Use case: Comparing sales by product type, or counts across different groups.
3. Scatter Plot
Scatter plots show the relationship between two numeric variables.
x = [5, 7, 8, 10, 12]
y = [12, 14, 15, 18, 22]
plt.scatter(x, y, color='green')
plt.title("Scatter Plot Example")
plt.xlabel("Variable X")
plt.ylabel("Variable Y")
plt.show()
Use case: Finding correlations (e.g., age vs. income, study hours vs. exam scores).
4. Histogram
Histograms show the distribution of a single numeric variable.
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30, color='purple', alpha=0.7)
plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
Use case: Understanding data distribution, such as exam scores, salaries, or customer ages.
π¨ Plotting with Seaborn
Seaborn makes it easier to create beautiful and informative plots with less code.
import seaborn as sns
import matplotlib.pyplot as plt
# Example dataset
tips = sns.load_dataset("tips")
# Basic scatter plot
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.title("Total Bill vs Tip (Seaborn)")
plt.show()
1. Bar Plot (Seaborn)
sns.barplot(x="day", y="total_bill", data=tips, palette="coolwarm")
plt.title("Average Total Bill by Day")
plt.show()
2. Histogram / Distribution Plot
sns.histplot(tips["total_bill"], bins=20, kde=True, color="orange")
plt.title("Distribution of Total Bill")
plt.show()
3. Box Plot
Box plots show data spread, median, and outliers.
sns.boxplot(x="day", y="total_bill", data=tips, palette="Set2")
plt.title("Box Plot of Total Bill by Day")
plt.show()
4. Pair Plot
Pair plots show pairwise relationships across multiple variables.
sns.pairplot(tips, hue="sex", palette="husl")
plt.show()
π§ Understanding How to Tell a Story with Charts
Creating charts isnβt just about displaying data β itβs about communicating insights clearly.
A good data visualization tells a story that highlights key findings and patterns.
1. Define Your Purpose
Ask: What do I want to show?
- Comparison β Use bar charts
- Trends β Use line plots
- Distribution β Use histograms
- Relationships β Use scatter plots
2. Keep It Simple
Avoid clutter and unnecessary decorations. Focus on clarity.
- Use clear labels and titles.
- Choose colors wisely (avoid too many).
- Use grid lines and legends only when needed.
3. Highlight Key Insights
Use color or annotations to emphasize important points.
x = [1, 2, 3, 4, 5]
y = [10, 20, 15, 25, 30]
plt.plot(x, y, marker='o', color='blue')
plt.title("Sales Over Time")
plt.xlabel("Month")
plt.ylabel("Sales ($)")
# Highlight the maximum point
max_index = y.index(max(y))
plt.annotate("Peak Sales", xy=(x[max_index], y[max_index]), xytext=(x[max_index]-0.5, y[max_index]+3),
arrowprops=dict(facecolor='red', shrink=0.05))
plt.show()
4. Choose the Right Chart Type
| Goal | Recommended Chart |
|---|---|
| Compare categories | Bar Chart |
| Show trend over time | Line Plot |
| Show data distribution | Histogram |
| Show correlation | Scatter Plot |
| Show part-to-whole | Pie Chart (use sparingly) |
π§ Summary
| Concept | Description |
|---|---|
| Matplotlib | Foundation library for flexible plotting |
| Seaborn | High-level API for attractive statistical graphics |
| Line Plot | Shows trends over time |
| Bar Plot | Compares categories |
| Scatter Plot | Reveals relationships between variables |
| Histogram | Shows data distribution |
| Storytelling | Focus on clear insights, simplicity, and appropriate chart choice |
Effective data visualization transforms complex data into actionable insights.
Use charts to inform, not overwhelm, and always design with your audience in mind.