Lesson 9 of 10

Real-World Data Analysis 🌐

🎯 Grades 6–8 ⏱ ~35 minutes 💚 Intermediate

What You'll Learn

Follow the full data analysis workflow
Clean and filter data with Python
Combine statistics and charts to tell a story
Draw conclusions supported by evidence

The Data Analysis Workflow

Professional data scientists follow a repeatable workflow for every project:

Ask a question — what do you want to find out?
Collect data — CSV, APIs, manual entry
Clean data — fix missing values, convert types
Analyse — statistics, filtering, grouping
Visualise — charts that reveal patterns
Communicate findings — tell a clear story

💡

Step 1 Is Most Important

A clear question guides every other step. "What is the average temperature?" is a clear question. "Tell me something about this data" is not.

Cleaning & Filtering Data

Real data is messy. Common issues: missing values (""), wrong types (numbers stored as strings), and outliers. Filter with list comprehensions:

python

# Remove empty entries
clean = [v for v in data if v != ""]

# Filter: only rows where temp > 70
hot_days = [row for row in rows if float(row["temp"]) > 70]

# Convert column to numbers
temps = [float(row["temp"]) for row in rows]

Putting It All Together

A complete analysis loads data, cleans it, computes statistics, and creates charts — all in one Python script:

python

import csv, statistics, matplotlib.pyplot as plt

temps = []
with open("weather.csv") as f:
    for row in csv.DictReader(f):
        temps.append(float(row["temp_f"]))

print("Mean:", statistics.mean(temps))
print("Median:", statistics.median(temps))

plt.hist(temps, bins=8, color="steelblue")
plt.title("Temperature Distribution")
plt.show()

🆕

Tell a Story

The best data analyses end with a clear sentence: "Cities in the south averaged 15°F warmer than cities in the north." Numbers alone don't communicate — interpretation does.

🎉 Check Your Understanding

1. What is the first step in a data analysis workflow?

Create a chart

Write the Python code

Ask a clear question

Load the CSV file

2. Why do we need to write float(row["temp"]) before comparing to a number?

CSV files only store integers

CSV values are always read as strings

float() makes numbers bigger

Python can't read CSV files

3. Which of these filters rows where the score is above 80?

[row for row in data if row["score"] == 80]

[row for row in data if int(row["score"]) > 80]

filter(data, score > 80)

data.filter(80)

4. What is the final step in communicating a data analysis?

Import matplotlib

Delete the CSV file

Print all the raw numbers

Tell a clear story about what the data reveals

Real-World Data Analysis 🌐

What You'll Learn

💻 Mini Analysis Sandbox

🔒 Common Mistakes

The Data Analysis Workflow

Cleaning & Filtering Data

Putting It All Together

🎉 Check Your Understanding