Related Data Science Links
Learn Pandas Data Science Tutorial, validate concepts with Pandas Data Science MCQ Questions, and prepare interviews through Pandas Data Science Interview Questions and Answers.
Pandas DataFrames for Real‑World Data Analysis
Pandas is the most important Python library for working with tabular data. It builds on NumPy and gives you high‑level tools to load, clean, transform and summarize datasets.
Series & DataFrame Basics
A Series is a one‑dimensional labeled array, while a DataFrame is a two‑dimensional table with labeled rows and columns. You can think of a DataFrame as a collection of Series that share the same index.
Under the hood, pandas stores data in column‑oriented blocks, which makes column‑wise operations like aggregations and filtering very efficient. A well‑chosen index (for example, a date or an ID) can speed up lookups and align data from different tables when you join or concatenate them.
import pandas as pd
data = {
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"salary": [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print(df)
print(df.dtypes)
print(df.describe())
Indexing, Filtering & Sorting
Pandas supports multiple indexing methods: .loc (label‑based),
.iloc (position‑based) and Boolean indexing. Choosing the right one makes your code
clearer and less error‑prone.
# Boolean filter
high_earners = df[df["salary"] >= 60000]
# loc: label-based
row_bob = df.loc[1] # second row
subset = df.loc[:, ["name", "age"]]
# iloc: position-based
first_two_rows = df.iloc[0:2, :]
print(high_earners)
print(subset)
GroupBy & Aggregations
GroupBy splits the data into groups, applies an operation (like sum or mean) and then combines the results. This is crucial for reporting and feature engineering.
df = pd.DataFrame({
"department": ["IT", "IT", "HR", "HR", "Finance"],
"salary": [60000, 65000, 45000, 47000, 70000],
"bonus": [5000, 7000, 3000, 3500, 9000]
})
dept_stats = df.groupby("department").agg(
avg_salary=("salary", "mean"),
max_salary=("salary", "max"),
total_bonus=("bonus", "sum")
).reset_index()
print(dept_stats)