π Python for Data Science
π Why Python for Data Science
Python is one of the most popular programming languages for Data Science due to its:
- Simplicity and readability
- Large collection of powerful libraries
- Strong community support
- Flexibility for data analysis, visualization, and machine learning
Python allows you to go from data collection to model building to visualization β all within one language.
π§© Python Syntax & Data Types
Python uses a clean, easy-to-read syntax.
Here are some basic concepts and data types used in almost every program.
Basic Syntax
# This is a comment
print("Hello, Data Science!")
Variables
Variables are used to store data.
name = "Alice"
age = 25
height = 1.68
Data Types
| Type | Example | Description |
|---|---|---|
int | 10 | Integer (whole number) |
float | 3.14 | Decimal number |
str | "Data" | Text (string) |
bool | True / False | Boolean values |
list | [1, 2, 3] | Ordered collection |
dict | {"name": "Bob", "age": 30} | Key-value pairs |
π Lists, Dictionaries, Loops, and Functions
Lists
Lists are used to store multiple items in one variable.
numbers = [10, 20, 30]
print(numbers[0]) # Access the first element
numbers.append(40) # Add a new element
Dictionaries
Dictionaries store data as key-value pairs.
person = {"name": "Alice", "age": 25}
print(person["name"]) # Access value by key
person["city"] = "London" # Add a new key-value pair
Loops
Loops help repeat actions efficiently.
For loop
for i in range(5):
print(i)
While loop
count = 0
while count < 3:
print("Count:", count)
count += 1
Functions
Functions group reusable blocks of code.
def greet(name):
return f"Hello, {name}!"
print(greet("Data Scientist"))
π» Using Jupyter Notebook and VS Code
Jupyter Notebook
Jupyter Notebook is an interactive environment for data analysis, visualization, and experimentation.
You can organize code in cells and display charts or tables inline.
Itβs commonly used for:
- Exploratory Data Analysis (EDA)
- Machine learning experiments
- Tutorials and reports
To run Jupyter:
pip install jupyter
jupyter notebook
VS Code
VS Code is a general-purpose code editor with extensions for Python and Data Science.
It provides:
- Syntax highlighting and autocompletion
- Debugging tools
- Integration with Jupyter Notebooks
- Git version control
Recommended Extensions:
- Python
- Jupyter
- Pylance (for IntelliSense)
- GitLens
π Popular Python Libraries for Data Science
1. NumPy
Used for numerical and matrix operations.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.mean()) # Average
print(arr * 2) # Vectorized operations
Key features:
- Fast mathematical operations
- Multidimensional arrays
- Foundation for many other libraries (like pandas and TensorFlow)
2. pandas
Used for data manipulation and analysis.
import pandas as pd
data = {"Name": ["Alice", "Bob"], "Age": [25, 30]}
df = pd.DataFrame(data)
print(df)
print(df["Age"].mean()) # Average age
Key features:
- Works with tabular data (rows and columns)
- Tools for filtering, grouping, and aggregating
- Can read/write CSV, Excel, and SQL files
3. Matplotlib
Used for data visualization.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y, marker='o')
plt.title("Simple Line Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Key features:
- Create line, bar, scatter, and pie charts
- Highly customizable
- Often used with pandas and NumPy
π§ Summary
| Concept | Description |
|---|---|
| Python | The main programming language for data science |
| Data Types | int, float, str, bool, list, dict |
| Core Concepts | Loops, functions, variables |
| Tools | Jupyter Notebook (interactive), VS Code (editor) |
| Libraries | NumPy (math), pandas (data analysis), Matplotlib (visualization) |
With these foundations, youβre ready to start working with real datasets and exploring the world of Data Science!