Data Wrangling Intermediate
~15 min read

Reshaping & Transforming Data with Pandas

Data wrangling is the art of turning raw tables into the exact shapes you need for analysis and modeling: long vs wide, aggregated vs detailed, joined and enriched.

Melt & Pivot

Melt converts wide data (many columns) into long format (row per observation), while pivot does the opposite. Many visualization and modeling tasks prefer long format.

Thinking in terms of tidy data, each variable should have its own column and each observation its own row. Melt and pivot are the core operations that let you move between human‑friendly report layouts and machine‑friendly tidy tables without losing information.

import pandas as pd

df = pd.DataFrame({
    "country": ["US", "US", "IN", "IN"],
    "year": [2023, 2024, 2023, 2024],
    "sales_a": [10, 12, 8, 9],
    "sales_b": [5, 7, 4, 6]
})

long = df.melt(
    id_vars=["country", "year"],
    value_vars=["sales_a", "sales_b"],
    var_name="product",
    value_name="sales"
)

wide = long.pivot(
    index=["country", "year"],
    columns="product",
    values="sales"
).reset_index()

Merging Multiple Tables

pd.merge implements SQL‑style joins (inner, left, right, outer). It is essential when combining fact tables with dimension tables like users, products or time.

users = pd.DataFrame({
    "user_id": [1, 2, 3],
    "country": ["US", "IN", "US"]
})

orders = pd.DataFrame({
    "order_id": [101, 102, 103],
    "user_id": [1, 1, 3],
    "amount": [50, 30, 80]
})

joined = pd.merge(
    orders,
    users,
    on="user_id",
    how="left"
)

print(joined)