Related Data Science Links
Learn Descriptive Stats Data Science Tutorial, validate concepts with Descriptive Stats Data Science MCQ Questions, and prepare interviews through Descriptive Stats Data Science Interview Questions and Answers.
Descriptive Statistics for Exploratory Data Analysis (EDA)
Descriptive statistics summarize your dataset with a few numbers. They are the first tools you should use in any Data Science project to understand your data.
Measures of Central Tendency
Central tendency measures tell you where the “center” of the data lies.
- Mean: arithmetic average, sensitive to outliers.
- Median: middle value, robust to outliers.
- Mode: most frequent value.
import numpy as np
import pandas as pd
from scipy import stats
data = np.array([10, 12, 13, 13, 14, 100]) # 100 is an outlier
mean = data.mean()
median = np.median(data)
mode = stats.mode(data, keepdims=True).mode[0]
print("Data:", data)
print("Mean :", mean)
print("Median:", median)
print("Mode :", mode)
Measures of Spread (Dispersion)
Spread tells you how variable your data is. Two datasets can have the same mean with very different spreads.
- Range: max − min.
- Variance & Standard Deviation: average squared deviation from the mean.
- Percentiles & IQR: robust spread measures (IQR = Q3 − Q1).
import numpy as np
data = np.array([10, 12, 13, 13, 14, 100])
data_min, data_max = data.min(), data.max()
data_range = data_max - data_min
variance = np.var(data, ddof=1) # sample variance
std_dev = np.std(data, ddof=1) # sample standard deviation
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
print("Range :", data_range)
print("Variance :", round(variance, 2))
print("Std Dev :", round(std_dev, 2))
print("Q1, Q3, IQR:", q1, q3, iqr)
Quick Summary with pandas.describe()
In real projects you rarely compute all statistics manually. Instead, you use
pandas.DataFrame.describe() to get a quick overview.
import pandas as pd
df = pd.DataFrame({
"age": [23, 25, 31, 40, 29, 37, 45],
"salary": [35000, 42000, 50000, 70000, 48000, 65000, 90000]
})
print(df.describe())