Given a data set, the three most common measures of center are the mean, median, and mode.
| Measure | Definition | When to prefer |
|---|---|---|
| Mean (x̄) | Sum of all values ÷ count | Symmetric data, no extreme outliers |
| Median | Middle value when sorted | Skewed data or outliers present |
| Mode | Most frequent value | Categorical data or multimodal distributions |
Data set: 4, 7, 7, 9, 12, 15, 15, 15, 20 (n = 9)
Center alone doesn't describe a distribution. Two data sets can have the same mean but very different variability.
Range = Maximum − Minimum. Simple but sensitive to outliers.
The variance (s²) measures average squared deviation from the mean. The standard deviation (s) is its square root, in the same units as the data.
Data: 2, 4, 4, 4, 5, 5, 7, 9 (n = 8, x̄ = 5)
Population vs. Sample: Divide by N for a population (σ²), by n−1 for a sample (s²). The −1 correction (Bessel's correction) removes bias.
IQR = Q3 − Q1. Measures the middle 50% spread. Robust to outliers.
An outlier is commonly defined as a value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR.
Data: 3, 3, 5, 8, 10, 12, 85. Find the mean and median. Which is a better measure of center here?
Data: 1, 3, 4, 5, 6, 7, 9, 40. Find Q1, Q3, IQR. Is 40 an outlier?
A data set is: 5, 8, 8, 10, 14. What is the median?
Mean = (5+8+8+10+14)/5 = 45/5 = 9. But the median is the middle value when sorted: 5, 8, 8, 10, 14 → 8.
With 5 data points, the median is the 3rd value: 5, 8, 8, 10, 14 → 8.
Averaging adjacent middle values applies when n is even. For n=5 (odd), the median is exactly the 3rd value: 8.
Standard deviation is calculated using n−1 for a sample (instead of n) because:
n−1 (Bessel's correction) corrects downward bias: a sample's spread naturally underestimates the population's spread. Dividing by n−1 gives an unbiased estimator.
IQR and robust statistics handle outliers. Dividing by n−1 is specifically about unbiased variance estimation.
The data does not need to sum to zero. n−1 in the denominator adjusts for the fact that a sample underestimates population variance.
The IQR is preferred over the range for measuring spread when:
The normal distribution is symmetric and has few outliers, so the range is often adequate. IQR truly shines when the distribution is skewed or has outliers.
Sample size affects precision, not the choice of IQR vs. range. Outliers and skewness drive the preference for IQR.
Repeats matter for mode. IQR vs. range is about outlier sensitivity.
If you add a constant c to every value in a data set, which statistic does NOT change?
Adding c to every value: new mean = old mean + c. Mean does change.
Adding c shifts every value, including the middle value. Median = old median + c.
The most frequent value increases by c, so mode = old mode + c. Only spread measures remain unchanged.
Which measure of center is most affected by extreme outliers?
The median uses only rank, not magnitude. An extreme outlier changes position in the sorted list but rarely the middle value.
An outlier typically appears once and doesn't become the mode. Mode is resistant to outliers.
IQR is the most robust common spread measure. It explicitly excludes the outer 25% on each side.