Descriptive Statistics: Mean, Median, Mode, Spread

Lesson 1 Â· OKSTEM College Â· AS Computer Science

Measures of Center

Given a data set, the three most common measures of center are the mean, median, and mode.

Measure	Definition	When to prefer
Mean (xÌ„)	Sum of all values Ã· count	Symmetric data, no extreme outliers
Median	Middle value when sorted	Skewed data or outliers present
Mode	Most frequent value	Categorical data or multimodal distributions

Worked Example â€” Find Mean, Median, Mode

Data set: 4, 7, 7, 9, 12, 15, 15, 15, 20 (n = 9)

Mean: Sum = 4+7+7+9+12+15+15+15+20 = 104. Mean = 104 Ã· 9 â‰ˆ 11.56

Median: n=9 (odd) â†’ middle is the 5th value. Sorted: 4,7,7,9,12,15,15,15,20 â†’ Median = 12

Mode: 15 appears 3 times (most frequent) â†’ Mode = 15

Measures of Spread

Center alone doesn't describe a distribution. Two data sets can have the same mean but very different variability.

Range

Range = Maximum âˆ’ Minimum. Simple but sensitive to outliers.

Variance and Standard Deviation

The variance (sÂ²) measures average squared deviation from the mean. The standard deviation (s) is its square root, in the same units as the data.

Worked Example â€” Standard Deviation

Data: 2, 4, 4, 4, 5, 5, 7, 9 (n = 8, xÌ„ = 5)

Find deviations from mean: (2âˆ’5)Â²=9, (4âˆ’5)Â²=1, (4âˆ’5)Â²=1, (4âˆ’5)Â²=1, (5âˆ’5)Â²=0, (5âˆ’5)Â²=0, (7âˆ’5)Â²=4, (9âˆ’5)Â²=16

Sum of squared deviations: 9+1+1+1+0+0+4+16 = 32

Sample variance: sÂ² = 32 Ã· (8âˆ’1) = 32 Ã· 7 â‰ˆ 4.57 (divide by nâˆ’1 for sample)

Standard deviation: s = âˆš4.57 â‰ˆ 2.14

Population vs. Sample: Divide by N for a population (ÏƒÂ²), by nâˆ’1 for a sample (sÂ²). The âˆ’1 correction (Bessel's correction) removes bias.

Interquartile Range (IQR)

IQR = Q3 âˆ’ Q1. Measures the middle 50% spread. Robust to outliers.

An outlier is commonly defined as a value below Q1 âˆ’ 1.5Ã—IQR or above Q3 + 1.5Ã—IQR.

Practice Problems

Problem 1 â€” Mean & Median

Data: 3, 3, 5, 8, 10, 12, 85. Find the mean and median. Which is a better measure of center here?

Step 1: Sum = 3+3+5+8+10+12+85 = ?

Step 2: Mean = sum Ã· 7

Step 3: Median = middle value (4th of 7, already sorted)

Sum = 126. Mean = 18, Median = 8. The value 85 is an outlier that pulls the mean up. The median (8) is a better center here.

Problem 2 â€” IQR & Outlier Detection

Data: 1, 3, 4, 5, 6, 7, 9, 40. Find Q1, Q3, IQR. Is 40 an outlier?

Step 1: Q1 = median of lower half (1,3,4,5) = (3+4)Ã·2

Step 2: Q3 = median of upper half (6,7,9,40) = (7+9)Ã·2

Step 3: IQR = Q3 âˆ’ Q1. Upper fence = Q3 + 1.5Ã—IQR

Q1=3.5, Q3=8, IQR=4.5. Upper fence = 8 + 1.5Ã—4.5 = 8 + 6.75 = 14.75. Since 40 > 14.75, 40 is an outlier.

ðŸ“Š Distribution Explorer

Add value (0â€“100): 50

Knowledge Check

A data set is: 5, 8, 8, 10, 14. What is the median?

Correct â€” the middle (3rd) value of 5 sorted values is 8.

9 is the mean, not the median. The median is the positional middle.

10 is the 4th value, not the 3rd (middle) for n=5.

8.5 would be the average of the two middle values â€” but n=5 (odd) has a single middle value.

ðŸ“– Quick Recap

Mean = (5+8+8+10+14)/5 = 45/5 = 9. But the median is the middle value when sorted: 5, 8, 8, 10, 14 â†’ 8.

ðŸ“– Quick Recap

With 5 data points, the median is the 3rd value: 5, 8, 8, 10, 14 â†’ 8.

ðŸ“– Quick Recap

Averaging adjacent middle values applies when n is even. For n=5 (odd), the median is exactly the 3rd value: 8.

Standard deviation is calculated using nâˆ’1 for a sample (instead of n) because:

Simplicity isn't the reason â€” nâˆ’1 corrects a statistical bias.

Correct â€” dividing by nâˆ’1 is Bessel's correction to remove bias.

Bessel's correction is not specifically about outliers.

The data sum is unrelated to the nâˆ’1 correction.

ðŸ“– Quick Recap

nâˆ’1 (Bessel's correction) corrects downward bias: a sample's spread naturally underestimates the population's spread. Dividing by nâˆ’1 gives an unbiased estimator.

ðŸ“– Quick Recap

IQR and robust statistics handle outliers. Dividing by nâˆ’1 is specifically about unbiased variance estimation.

ðŸ“– Quick Recap

The data does not need to sum to zero. nâˆ’1 in the denominator adjusts for the fact that a sample underestimates population variance.

The IQR is preferred over the range for measuring spread when:

IQR works for any distribution, but its main advantage is outlier robustness.

Small sample size alone doesn't determine which spread measure to use.

Correct â€” IQR uses only the middle 50%, making it resistant to extreme values.

Repeated values relate to mode, not the IQR vs. range choice.

ðŸ“– Quick Recap

The normal distribution is symmetric and has few outliers, so the range is often adequate. IQR truly shines when the distribution is skewed or has outliers.

ðŸ“– Quick Recap

Sample size affects precision, not the choice of IQR vs. range. Outliers and skewness drive the preference for IQR.

ðŸ“– Quick Recap

Repeats matter for mode. IQR vs. range is about outlier sensitivity.

If you add a constant c to every value in a data set, which statistic does NOT change?

The mean shifts by c.

The median also shifts by c.

Correct â€” adding a constant shifts all values equally; deviations from the new mean are unchanged.

Mode also shifts by c.

ðŸ“– Quick Recap

Adding c to every value: new mean = old mean + c. Mean does change.

ðŸ“– Quick Recap

Adding c shifts every value, including the middle value. Median = old median + c.

ðŸ“– Quick Recap

The most frequent value increases by c, so mode = old mode + c. Only spread measures remain unchanged.

Which measure of center is most affected by extreme outliers?

Median is resistant to outliers since it only depends on position.

Correct â€” the mean uses every value, so one large outlier can distort it greatly.

Mode uses frequency; a single extreme outlier rarely matches the most common value.

IQR uses only Q1 and Q3, ignoring extreme tails.

ðŸ“– Quick Recap

The median uses only rank, not magnitude. An extreme outlier changes position in the sorted list but rarely the middle value.

ðŸ“– Quick Recap

An outlier typically appears once and doesn't become the mode. Mode is resistant to outliers.

ðŸ“– Quick Recap

IQR is the most robust common spread measure. It explicitly excludes the outer 25% on each side.

Next â†’