Lesson 8: Scatter Plots & Correlation
A scatter plot shows the relationship between two numeric variables. Each data point is one observation plotted at its (x, y) coordinates. The pattern of dots reveals whether the variables are correlated.
Key Concepts
Scatter Plots
plt.scatter(x_values, y_values, color="purple", alpha=0.6, s=50)
alpha controls transparency (0-1). s controls dot size. Each point represents one data record.
Correlation
Positive correlation: as x increases, y tends to increase (height vs shoe size). Negative correlation: as x increases, y decreases (hours of TV vs. test score). No correlation: random scatter.
Correlation Coefficient (r)
r ranges from -1 to +1. r = 1 → perfect positive. r = -1 → perfect negative. r = 0 → no correlation. |r| > 0.7 is considered strong. Correlation ≠ causation!
Line of Best Fit (Trend Line)
import numpy as np
m, b = np.polyfit(x, y, 1) # slope and intercept
plt.plot(x, [m*xi+b for xi in x], color="red")
This draws a straight line through the cloud of points.
🔬 Interactive Lab: Scatter Plot & Correlation Explorer
Click to add points on the chart. The correlation coefficient r updates live. Try to create a strong positive or negative correlation.
✅ Check Your Understanding
1. An r value of -0.85 indicates:
2. Correlation does NOT imply:
3. Which chart shows the relationship between two numeric variables?