Why Statistics Matter
Every day, numbers compete for your attention: sports averages, poll results, test scores, temperature forecasts. Statistics gives you the tools to look past individual data points and see the shape of information. Before you can tackle complex topics like standard deviation or regression, you need to master three foundational measures: mean, median, and mode, plus the basic spread metric range. These concepts appear in middle school math, high school algebra, college entrance exams, and virtually every scientific paper ever published.
This lesson walks you through each measure from scratch, shows when to prefer one over another, and builds toward reading and interpreting simple data displays.
Step-by-Step Lesson
The mean is what most people mean when they say "average." Add up all the values, then divide by how many values there are.
Worked Example — Test Scores
A student received these scores on five quizzes:
Step 1 — Sum: 72 + 85 + 90 + 78 + 95 = 420
Step 2 — Divide by count: 420 ÷ 5 = 84
The mean score is 84.
The median is the middle value of an ordered data set. It is resistant to outliers, making it better than the mean when data is skewed.
Procedure:
- Sort the data in ascending order.
- If n is odd: the median is the value at position (n + 1) / 2.
- If n is even: the median is the mean of the two middle values.
Worked Example — Odd Count (7 values)
Position = (7 + 1) / 2 = 4th value = 12. Median = 12.
Worked Example — Even Count (6 values)
Two middle values: 11 and 14. Median = (11 + 14) / 2 = 12.5.
The mode is the value that appears most often. A data set can have no mode, one mode (unimodal), two modes (bimodal), or more (multimodal).
Worked Example
Count each: 4→1, 7→3, 9→3, 3→1, 2→1.
Both 7 and 9 appear 3 times — this data set is bimodal: mode = 7 and 9.
Mode is especially useful for categorical data. "What size shoe is purchased most often?" has a mode, not a meaningful mean.
Range is the simplest measure of spread (variability) in a data set.
Worked Example
Max = 67, Min = 11. Range = 67 − 11 = 56.
Range is quick but fragile. One extremely large or small outlier inflates the range without telling you much about the typical spread. A class of 25 students where 24 score 80–95 but one scores 12 has a misleadingly large range.
This is where statistics becomes judgment, not just calculation. The right measure depends on the shape of your data and the question being asked.
| Situation | Best Measure | Why |
|---|---|---|
| Symmetric, no outliers | Mean | Uses all data efficiently |
| Skewed data or outliers present | Median | Not distorted by extreme values |
| Categorical data | Mode | Only applicable measure |
| Typical vs. extreme comparison | Mean + Median together | Gap reveals skew direction |
| Spread / variability | Range (as a first estimate) | Quick, easy to communicate |
An outlier is a value that lies far from the rest of the data. Outliers can be legitimate (an unusually tall person) or errors (a data-entry mistake). Before summarizing data, always look for outliers and decide how to handle them.
Outlier Effect Example
Monthly salaries at a small company (in $1,000s):
With outlier: Mean = (45+48+50+52+53+310) / 6 = 558 / 6 = $93K. Median = (50+52)/2 = $51K.
Without outlier: Mean = (45+48+50+52+53) / 5 = 248 / 5 = $49.6K. Median = $50K.
The CEO's $310K salary inflates the mean by 88% but barely moves the median. Here, median is the fairer summary.
Practice Problems
Problem 1
Find the mean, median, and mode of: 8, 3, 5, 3, 7, 9, 3, 6.
Sort: 3, 3, 3, 5, 6, 7, 8, 9 (n = 8)
Mean: (3+3+3+5+6+7+8+9) / 8 = 44 / 8 = 5.5
Median: (5+6)/2 = 5.5 (4th and 5th values)
Mode: 3 appears 3 times — mode = 3
Note: mean equals median here, suggesting the data (excluding mode) is roughly symmetric.
Problem 2
A list of daily temperatures (°F) for a week: 68, 72, 71, 69, 88, 70, 72. Find the range and explain whether the mean or median better represents typical daily temperature.
Range: 88 − 68 = 20°F
Sort: 68, 69, 70, 71, 72, 72, 88
Mean: (68+72+71+69+88+70+72) / 7 = 510 / 7 ≈ 72.9°F
Median: 4th value = 71°F
Better measure: The 88°F day is an outlier (unusually hot). The median (71°F) better reflects the typical temperature; the mean is pulled up by the one hot day.
Problem 3
A shoe store sells these sizes in one hour: 7, 9, 8, 10, 9, 7, 9, 8, 11, 9. What is the mode, and why is it more useful than the mean for the store manager?
Tally: 7→2, 8→2, 9→4, 10→1, 11→1
Mode = 9 (appears 4 times)
Mean: (7+9+8+10+9+7+9+8+11+9) / 10 = 87 / 10 = 8.7
The mean of 8.7 does not correspond to any real shoe size. The mode (size 9) tells the manager which size to stock the most. Mode wins for categorical/discrete inventory decisions.
Problem 4
Data: 15, 22, 18, 25, 19, 23, 17. Calculate all four statistics (mean, median, mode, range).
Sort: 15, 17, 18, 19, 22, 23, 25 (n = 7)
Mean: (15+22+18+25+19+23+17) / 7 = 139 / 7 ≈ 19.86
Median: 4th value = 19
Mode: Each value appears once — no mode
Range: 25 − 15 = 10
Mean ≈ median indicates roughly symmetric data — no strong skew.
Problem 5
The mean of five numbers is 16. Four of the numbers are 12, 18, 14, and 20. What is the fifth number?
Mean = Sum ÷ Count → Sum = Mean × Count = 16 × 5 = 80
Sum of known values: 12 + 18 + 14 + 20 = 64
Fifth number = 80 − 64 = 16
This "reverse mean" technique is useful in competitions and tests.
5 Common Mistakes
-
1Forgetting to sort before finding the median
The median requires an ordered list. Finding the "middle" of an unsorted list gives a random value, not the true median.
-
2Assuming the mean is always the best average
In skewed distributions or data with outliers, the median is more representative. Always inspect the data before choosing.
-
3Miscounting n (the number of values)
Adding one extra value or missing one changes both the mean and the median position calculation. Count carefully, especially when values repeat.
-
4Declaring "no mode" when values tie
If two values both appear the most (and equally), the data is bimodal — both are modes. Only when every value appears exactly once is there truly no mode.
-
5Confusing range with spread in general
Range only measures total span. Two data sets can have the same range but very different distributions. Range is a starting point, not a complete picture of variability.