Gridscript

📊 Introduction to Statistics & Probability

📘 What Is Statistics?

Statistics is the branch of mathematics that deals with collecting, analyzing, interpreting, and presenting data.
It helps us make sense of large datasets and draw conclusions or make predictions.

There are two main types of statistics:

🎯 What Is Probability?

Probability measures the likelihood of an event occurring.
It ranges from 0 to 1, where:

Formula:

P(Event) = (Number of favorable outcomes) / (Total number of outcomes)

Example: If you roll a 6-sided die, the probability of getting a “3” is:

P(3) = 1 / 6 ≈ 0.1667

📈 Mean, Median, Mode, and Variance

1. Mean (Average)

The mean is the sum of all values divided by the number of values.

data = [10, 20, 30, 40]
mean = sum(data) / len(data)
print(mean)  # 25.0

Formula:

Mean = (x₁ + x₂ + ... + xₙ) / n

2. Median

The median is the middle value when the data is sorted.
If there’s an even number of values, it’s the average of the two middle values.

import numpy as np

data = [5, 8, 12, 20, 25]
median = np.median(data)
print(median)  # 12

3. Mode

The mode is the most frequent value in a dataset.

from statistics import mode

data = [1, 2, 2, 3, 4]
print(mode(data))  # 2

4. Variance and Standard Deviation

Variance measures how spread out the data is from the mean.
Standard deviation is the square root of variance.

import numpy as np

data = [10, 12, 23, 23, 16, 23, 21, 16]
variance = np.var(data)
std_dev = np.std(data)
print("Variance:", variance)
print("Standard Deviation:", std_dev)

Formulas:

Variance (σ²) = Σ(xᵢ - μ)² / n
Standard Deviation (σ) = √Variance

📊 Probability Basics

1. Independent Events

Two events are independent if the outcome of one does not affect the other.
Example: Rolling two dice — the result of one die doesn’t influence the other.

P(A and B) = P(A) × P(B)

2. Dependent Events

Two events are dependent if one affects the other.
Example: Drawing cards from a deck without replacement.

P(A and B) = P(A) × P(B|A)

3. Mutually Exclusive Events

Events that cannot happen at the same time.
Example: Getting “Heads” or “Tails” on a single coin flip.

P(A or B) = P(A) + P(B)

🔔 Normal Distribution

The Normal Distribution (or Gaussian Distribution) is a continuous probability distribution that is symmetric around the mean.
Most real-world data (like height, weight, or test scores) follows this pattern.

Key Properties:

import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 0, 1  # mean and standard deviation
data = np.random.normal(mu, sigma, 1000)

plt.hist(data, bins=30, density=True, alpha=0.6)
plt.title("Normal Distribution")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

🔗 Correlation vs. Causation

Correlation

Correlation measures the strength and direction of a relationship between two variables.
It does not mean that one variable causes the other.

import pandas as pd

data = {
    "Hours_Studied": [2, 4, 6, 8, 10],
    "Exam_Score": [50, 55, 65, 70, 85]
}
df = pd.DataFrame(data)

print(df.corr())

Interpretation of correlation coefficient (r):

r valueInterpretation
+1Perfect positive correlation
0No correlation
-1Perfect negative correlation

Causation

Causation means one variable directly affects another.
For example, increasing study time causes better exam performance (if proven experimentally).

⚠️ Remember: Correlation does not imply causation!
Just because two variables move together doesn’t mean one causes the other.

🧠 Summary

ConceptDescription
Mean / Median / ModeMeasures of central tendency
Variance / Std DevMeasure how data spreads around the mean
ProbabilityLikelihood of an event happening
Normal DistributionBell-shaped curve describing natural variations
Correlation vs. CausationCorrelation shows relationship; causation shows cause-effect

By understanding these core statistical and probability concepts, you’ll have a strong foundation for data analysis, hypothesis testing, and machine learning!