Database Design & Normalization
Design clean, efficient schemas by eliminating redundancy through normalization — the foundation of every real-world database.
Normalization Explorer
Click each normal form to see the transformation from messy to clean schema design.
What Is Normalization?
Normalization is the process of organizing a database to reduce data redundancy and improve data integrity. A well-normalized database stores each fact exactly once, making updates, inserts, and deletes safe and consistent.
Edgar F. Codd introduced normalization theory in 1970. The process moves a table through a series of normal forms, each eliminating a specific type of anomaly.
Without normalization you risk: Update anomalies (changing one fact requires updating many rows), Insert anomalies (you can't add data without unrelated data), and Delete anomalies (deleting a row removes unrelated facts).
First Normal Form (1NF)
A table is in 1NF when every column holds atomic (indivisible) values — no comma-separated lists, no repeating groups. Each row must be uniquely identifiable by a primary key.
Example violation: storing courses = "Math, Science, English" in one column. Fix: create a separate row for each course enrollment.
Second Normal Form (2NF)
A table is in 2NF when it is in 1NF and every non-key column is fully functionally dependent on the entire primary key — not just part of it. Partial dependencies only appear in tables with composite primary keys.
Example: in an enrollment(student_id, course_id, student_name, course_name) table, student_name depends only on student_id, not the full composite key. Fix: split into students and courses tables.
Third Normal Form (3NF)
A table is in 3NF when it is in 2NF and no non-key column is transitively dependent on the primary key through another non-key column.
Example: orders(order_id, customer_id, customer_city, customer_zip) — customer_city depends on customer_zip, not directly on order_id. Fix: move city/zip to a customers table.
Entity-Relationship (ER) Diagrams
ER diagrams are visual blueprints of a database schema. Key components:
- Entities — rectangles representing tables (e.g., Student, Course)
- Attributes — ovals or columns inside entities (e.g., student_id, name)
- Relationships — diamonds or lines showing how entities relate
- Cardinality — 1:1, 1:N, or M:N notation on relationship lines
Primary & Foreign Keys in Schema Design
Every table should have a primary key — a column (or composite) that uniquely identifies each row. Surrogate keys (id INT AUTO_INCREMENT) are often preferred over natural keys because they never change.
Foreign keys enforce referential integrity: a course_enrollments.student_id column must reference a valid row in students.id. The RDBMS rejects invalid inserts and can cascade updates/deletes.
Check Your Understanding
1. A table has the column phone_numbers = "555-1234, 555-5678". Which normal form does this violate?
2. In a table with composite key (order_id, product_id), the column product_name depends only on product_id. Which normal form is violated?
3. A table has employee_id → department_id → department_budget. The column department_budget depends on department_id, not directly on employee_id. Which normal form is violated?
4. Which statement about foreign keys is TRUE?