MATH 201: Statistics & Probability › Lesson 1 of 10

Descriptive Statistics: Mean, Median, Mode, Spread

Lesson 1 · OKSTEM College · AS Computer Science

Measures of Center

Given a data set, the three most common measures of center are the mean, median, and mode.

MeasureDefinitionWhen to prefer
Mean (x̄)Sum of all values ÷ countSymmetric data, no extreme outliers
MedianMiddle value when sortedSkewed data or outliers present
ModeMost frequent valueCategorical data or multimodal distributions

Worked Example — Find Mean, Median, Mode

Data set: 4, 7, 7, 9, 12, 15, 15, 15, 20 (n = 9)

Mean: Sum = 4+7+7+9+12+15+15+15+20 = 104. Mean = 104 ÷ 9 ≈ 11.56
Median: n=9 (odd) → middle is the 5th value. Sorted: 4,7,7,9,12,15,15,15,20 → Median = 12
Mode: 15 appears 3 times (most frequent) → Mode = 15

Measures of Spread

Center alone doesn't describe a distribution. Two data sets can have the same mean but very different variability.

Range

Range = Maximum − Minimum. Simple but sensitive to outliers.

Variance and Standard Deviation

The variance (s²) measures average squared deviation from the mean. The standard deviation (s) is its square root, in the same units as the data.

Worked Example — Standard Deviation

Data: 2, 4, 4, 4, 5, 5, 7, 9 (n = 8, x̄ = 5)

Find deviations from mean: (2−5)²=9, (4−5)²=1, (4−5)²=1, (4−5)²=1, (5−5)²=0, (5−5)²=0, (7−5)²=4, (9−5)²=16
Sum of squared deviations: 9+1+1+1+0+0+4+16 = 32
Sample variance: s² = 32 ÷ (8−1) = 32 ÷ 7 ≈ 4.57 (divide by n−1 for sample)
Standard deviation: s = √4.57 ≈ 2.14

Population vs. Sample: Divide by N for a population (σ²), by n−1 for a sample (s²). The −1 correction (Bessel's correction) removes bias.

Interquartile Range (IQR)

IQR = Q3 − Q1. Measures the middle 50% spread. Robust to outliers.

An outlier is commonly defined as a value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR.

Practice Problems

Problem 1 — Mean & Median

Data: 3, 3, 5, 8, 10, 12, 85. Find the mean and median. Which is a better measure of center here?

Step 1: Sum = 3+3+5+8+10+12+85 = ?
Step 2: Mean = sum ÷ 7
Step 3: Median = middle value (4th of 7, already sorted)
Sum = 126. Mean = 18, Median = 8. The value 85 is an outlier that pulls the mean up. The median (8) is a better center here.

Problem 2 — IQR & Outlier Detection

Data: 1, 3, 4, 5, 6, 7, 9, 40. Find Q1, Q3, IQR. Is 40 an outlier?

Step 1: Q1 = median of lower half (1,3,4,5) = (3+4)÷2
Step 2: Q3 = median of upper half (6,7,9,40) = (7+9)÷2
Step 3: IQR = Q3 − Q1. Upper fence = Q3 + 1.5×IQR
Q1=3.5, Q3=8, IQR=4.5. Upper fence = 8 + 1.5×4.5 = 8 + 6.75 = 14.75. Since 40 > 14.75, 40 is an outlier.

📊 Distribution Explorer

50

Knowledge Check

A data set is: 5, 8, 8, 10, 14. What is the median?

Correct — the middle (3rd) value of 5 sorted values is 8.
9 is the mean, not the median. The median is the positional middle.
10 is the 4th value, not the 3rd (middle) for n=5.
8.5 would be the average of the two middle values — but n=5 (odd) has a single middle value.
📖 Quick Recap

Mean = (5+8+8+10+14)/5 = 45/5 = 9. But the median is the middle value when sorted: 5, 8, 8, 10, 14 → 8.

📖 Quick Recap

With 5 data points, the median is the 3rd value: 5, 8, 8, 10, 14 → 8.

📖 Quick Recap

Averaging adjacent middle values applies when n is even. For n=5 (odd), the median is exactly the 3rd value: 8.

Standard deviation is calculated using n−1 for a sample (instead of n) because:

Simplicity isn't the reason — n−1 corrects a statistical bias.
Correct — dividing by n−1 is Bessel's correction to remove bias.
Bessel's correction is not specifically about outliers.
The data sum is unrelated to the n−1 correction.
📖 Quick Recap

n−1 (Bessel's correction) corrects downward bias: a sample's spread naturally underestimates the population's spread. Dividing by n−1 gives an unbiased estimator.

📖 Quick Recap

IQR and robust statistics handle outliers. Dividing by n−1 is specifically about unbiased variance estimation.

📖 Quick Recap

The data does not need to sum to zero. n−1 in the denominator adjusts for the fact that a sample underestimates population variance.

The IQR is preferred over the range for measuring spread when:

IQR works for any distribution, but its main advantage is outlier robustness.
Small sample size alone doesn't determine which spread measure to use.
Correct — IQR uses only the middle 50%, making it resistant to extreme values.
Repeated values relate to mode, not the IQR vs. range choice.
📖 Quick Recap

The normal distribution is symmetric and has few outliers, so the range is often adequate. IQR truly shines when the distribution is skewed or has outliers.

📖 Quick Recap

Sample size affects precision, not the choice of IQR vs. range. Outliers and skewness drive the preference for IQR.

📖 Quick Recap

Repeats matter for mode. IQR vs. range is about outlier sensitivity.

If you add a constant c to every value in a data set, which statistic does NOT change?

The mean shifts by c.
The median also shifts by c.
Correct — adding a constant shifts all values equally; deviations from the new mean are unchanged.
Mode also shifts by c.
📖 Quick Recap

Adding c to every value: new mean = old mean + c. Mean does change.

📖 Quick Recap

Adding c shifts every value, including the middle value. Median = old median + c.

📖 Quick Recap

The most frequent value increases by c, so mode = old mode + c. Only spread measures remain unchanged.

Which measure of center is most affected by extreme outliers?

Median is resistant to outliers since it only depends on position.
Correct — the mean uses every value, so one large outlier can distort it greatly.
Mode uses frequency; a single extreme outlier rarely matches the most common value.
IQR uses only Q1 and Q3, ignoring extreme tails.
📖 Quick Recap

The median uses only rank, not magnitude. An extreme outlier changes position in the sorted list but rarely the middle value.

📖 Quick Recap

An outlier typically appears once and doesn't become the mode. Mode is resistant to outliers.

📖 Quick Recap

IQR is the most robust common spread measure. It explicitly excludes the outer 25% on each side.

Next →