Pandas DataFrames for Real‑World Data Analysis

Pandas is the most important Python library for working with tabular data. It builds on NumPy and gives you high‑level tools to load, clean, transform and summarize datasets.

Series & DataFrame Basics

A Series is a one‑dimensional labeled array, while a DataFrame is a two‑dimensional table with labeled rows and columns. You can think of a DataFrame as a collection of Series that share the same index.

Under the hood, pandas stores data in column‑oriented blocks, which makes column‑wise operations like aggregations and filtering very efficient. A well‑chosen index (for example, a date or an ID) can speed up lookups and align data from different tables when you join or concatenate them.

import pandas as pd

data = {
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000, 60000, 70000]
}

df = pd.DataFrame(data)

print(df)
print(df.dtypes)
print(df.describe())

Indexing, Filtering & Sorting

Pandas supports multiple indexing methods: .loc (label‑based), .iloc (position‑based) and Boolean indexing. Choosing the right one makes your code clearer and less error‑prone.

# Boolean filter
high_earners = df[df["salary"] >= 60000]

# loc: label-based
row_bob = df.loc[1]                # second row
subset = df.loc[:, ["name", "age"]]

# iloc: position-based
first_two_rows = df.iloc[0:2, :]

print(high_earners)
print(subset)

GroupBy & Aggregations

GroupBy splits the data into groups, applies an operation (like sum or mean) and then combines the results. This is crucial for reporting and feature engineering.

df = pd.DataFrame({
    "department": ["IT", "IT", "HR", "HR", "Finance"],
    "salary": [60000, 65000, 45000, 47000, 70000],
    "bonus":  [5000, 7000, 3000, 3500, 9000]
})

dept_stats = df.groupby("department").agg(
    avg_salary=("salary", "mean"),
    max_salary=("salary", "max"),
    total_bonus=("bonus", "sum")
).reset_index()

print(dept_stats)

Next: Data Cleaning

Related Data Science Links

Series & DataFrame Basics

Indexing, Filtering & Sorting

GroupBy & Aggregations