Database Management Systems
Unit 4: Normalization
Functional Dependencies, Normal Forms (1NF โ BCNF), Decomposition & Denormalization โ the science of designing databases that don't break.
๐ข Oracle & PostgreSQL | ๐ 15 MCQs (Bloom's) | ๐ฌ 5 Lab Exercises | ๐ผ Interview Prep
Why This Chapter Pays Your Salary
Normalization separates database coders from database architects. A poorly normalized schema causes data corruption, storage waste, and query nightmares. A well-normalized schema is elegant, consistent, and maintainable. GATE CS dedicates 5-8 marks to normalization every year. Every DBA interview at TCS, Infosys, Oracle, and product companies asks normalization questions.
๐ข Industry Snapshot
SBI Core Banking โ When SBI migrated from legacy to CBS (Core Banking Solution), normalization consultants spent 6 months redesigning the schema. The original system stored customer address in 15 different tables โ a classic redundancy nightmare. After normalization to 3NF, the same data lived in ONE address table referenced by FK everywhere. Result: 60% reduction in storage, zero data inconsistency.
Flipkart โ Their product catalog is normalized to 3NF for the source-of-truth (OLTP) database. But their search/display layer is intentionally denormalized โ product name, price, image URL, seller name, and rating all flattened into one table for sub-100ms page loads. They normalize for correctness and denormalize for speed.
Learning Outcomes โ Bloom's Taxonomy
| Bloom's Level | Outcome Statement |
|---|---|
| L1 โ Remember | Define 1NF, 2NF, 3NF, BCNF; list Armstrong's axioms; recall the definition of functional dependency, candidate key, and prime attribute |
| L2 โ Understand | Explain why update anomalies occur in unnormalized tables; describe the difference between partial and transitive dependencies; explain lossy vs lossless decomposition |
| L3 โ Apply | Compute attribute closure, find candidate keys from a set of FDs, determine the highest normal form of a given relation, and decompose a relation to 3NF/BCNF |
| L4 โ Analyze | Analyze whether a decomposition is lossless and dependency-preserving; compare 3NF and BCNF trade-offs |
| L5 โ Evaluate | Justify when to stop at 3NF vs pursuing BCNF; evaluate denormalization decisions for performance-critical Indian enterprise systems |
| L6 โ Create | Take a raw unnormalized dataset and design a complete normalized schema (BCNF) with SQL implementation and sample data |
Concept Explanations
3.1 The Problem โ Update Anomalies
๐ Why Bad Schema Design Destroys Data
Consider a university stores everything in ONE table:
Unnormalized Table: student_courses
student_id | student_name | dept | dept_hod | course_id | course_name | instructor | grade
101 | Rahul | CSE | Dr. Sharma | CS301 | DBMS | Dr. Patel | A
101 | Rahul | CSE | Dr. Sharma | CS302 | OS | Dr. Kumar | B+
102 | Priya | ECE | Dr. Reddy | EC201 | Signals | Dr. Singh | A+
103 | Amit | CSE | Dr. Sharma | CS301 | DBMS | Dr. Patel | B
103 | Amit | CSE | Dr. Sharma | CS303 | Networks | Dr. Gupta | A
๐ฅ THREE DEADLY ANOMALIES
| Anomaly | Problem | Example |
|---|---|---|
| Insertion Anomaly | Can't insert partial data | New department "AI" created but no students enrolled yet โ can't INSERT because student_id (PK) can't be NULL. A department can't exist without a student! |
| Update Anomaly | Must update multiple rows for one fact | CSE HOD changes from Dr. Sharma to Dr. Verma โ must update ALL rows where dept='CSE'. Miss one? Now CSE has TWO different HODs in the database. Data inconsistency. |
| Deletion Anomaly | Deleting one fact accidentally removes another | Priya (102) drops her only course. Delete her row โ we LOSE the fact that ECE's HOD is Dr. Reddy. Department information vanishes with the student. |
Decompose the one big table into multiple smaller tables, each storing exactly one fact. Students in one table, departments in another, courses in another, enrollments linking them. No redundancy, no anomalies.
3.2 Functional Dependencies (FDs)
๐ Functional Dependency โ "X Determines Y"
A functional dependency X โ Y means: if two tuples have the same value for attribute(s) X, they MUST have the same value for attribute(s) Y. X determines Y. X is the determinant; Y is the dependent.
Aadhaar number โ Name, DOB, Address. If you know someone's Aadhaar number, you can determine their name, DOB, and address. Two people can't have the same Aadhaar number with different names. That's a functional dependency.
โ๏ธ EXAMPLES FROM STUDENT_COURSES TABLEFunctional Dependencies
student_id โ student_name, dept (student ID determines name and dept)
dept โ dept_hod (department determines its HOD)
course_id โ course_name, instructor (course ID determines course details)
{student_id, course_id} โ grade (student + course determines grade)
NON-dependencies:
student_name โ student_id (two students can have same name)
dept โ student_id (department has many students)
๐ TYPES OF FDs
| Type | Definition | Example |
|---|---|---|
| Full FD | Y depends on ALL of X (not a subset) | {student_id, course_id} โ grade. Grade depends on BOTH โ can't remove either. |
| Partial FD | Y depends on only PART of a composite key | {student_id, course_id} โ student_name. student_name depends ONLY on student_id, not the full key. Violates 2NF. |
| Transitive FD | X โ Y and Y โ Z, so X โ Z transitively | student_id โ dept and dept โ dept_hod, so student_id โ dept_hod transitively. Violates 3NF. |
| Trivial FD | Y is a subset of X | {student_id, name} โ student_id. Always true โ trivial, not interesting. |
3.3 Attribute Closure & Armstrong's Axioms
๐ Attribute Closure โ Finding What X Can Determine
The closure of X (written Xโบ) is the set of ALL attributes that can be determined from X using the given FDs. It's the key tool for finding candidate keys and testing normal forms.
โ๏ธ STEP-BY-STEP EXAMPLEClosure Computation
Given: R(A, B, C, D, E)
FDs: { A โ B, B โ C, A โ D, D โ E }
Find Aโบ (closure of A):
Step 0: Aโบ = {A} (start with A itself)
Step 1: A โ B โ Aโบ = {A, B}
Step 2: B โ C โ Aโบ = {A, B, C}
Step 3: A โ D โ Aโบ = {A, B, C, D}
Step 4: D โ E โ Aโบ = {A, B, C, D, E}
Aโบ = {A, B, C, D, E} = ALL attributes!
โด A is a CANDIDATE KEY (can determine every attribute)
Find Bโบ:
Bโบ = {B} โ B โ C โ {B, C}
No more FDs apply. Bโบ = {B, C}
B is NOT a candidate key (doesn't determine A, D, E)
๐ ARMSTRONG'S AXIOMS (for GATE CS)
| Axiom | Rule | Formal | Example |
|---|---|---|---|
| Reflexivity | If Y โ X, then X โ Y | {A,B} โ A | A set always determines its subsets (trivial) |
| Augmentation | If X โ Y, then XZ โ YZ | A โ B โ AC โ BC | Adding the same attribute to both sides preserves FD |
| Transitivity | If X โ Y and Y โ Z, then X โ Z | A โ B, B โ C โ A โ C | Chain of dependencies |
Derived rules (from the 3 axioms):
| Rule | Formal | Example |
|---|---|---|
| Union | X โ Y, X โ Z โ X โ YZ | A โ B, A โ C โ A โ BC |
| Decomposition | X โ YZ โ X โ Y, X โ Z | A โ BC โ A โ B, A โ C |
| Pseudo-transitivity | X โ Y, WY โ Z โ WX โ Z | A โ B, CB โ D โ CA โ D |
Finding Candidate Keys โ Algorithm
Algorithm: Finding Candidate Keys
Given: R(A, B, C, D, E, F), FDs: {AB โ C, C โ D, D โ E, CF โ B}
Step 1: Find attributes that appear ONLY on LEFT side of FDs โ must be in every key
A appears only on left (or nowhere on right) โ A must be in every key
F appears only on left โ F must be in every key
Step 2: Find closure of mandatory attributes
{A, F}โบ:
Start: {A, F}
No FD with just A or F on left gives new attrs...
{AF}โบ = {A, F} โ Not all attributes! Need more.
Step 3: Add remaining attributes one by one
Try {A, F, B}โบ:
AB โ C โ {A, B, C, F}
C โ D โ {A, B, C, D, F}
D โ E โ {A, B, C, D, E, F} = ALL โ
{A, B, F} is a superkey.
Try {A, F, C}โบ:
C โ D โ {A, C, D, F}
D โ E โ {A, C, D, E, F}
CF โ B โ {A, B, C, D, E, F} = ALL โ
{A, C, F} is a superkey.
Step 4: Check minimality (no proper subset is a superkey)
{A, B, F}: Can we remove B? {A,F}โบ = {A,F} โ NOT all. So B is needed. โ Minimal.
{A, C, F}: Can we remove C? {A,F}โบ = {A,F} โ NOT all. So C is needed. โ Minimal.
โด Candidate Keys: {A, B, F} and {A, C, F}
Prime attributes: A, B, C, F (part of at least one CK)
Non-prime attributes: D, E
3.4 Normal Forms โ The Normalization Ladder
Normal Form Hierarchy
UNF (Unnormalized)
โ Remove repeating groups / make atomic
1NF (First Normal Form)
โ Remove partial dependencies
2NF (Second Normal Form)
โ Remove transitive dependencies
3NF (Third Normal Form)
โ Every determinant is a candidate key
BCNF (Boyce-Codd Normal Form)
โ Remove multi-valued dependencies (rare)
4NF
โ Remove join dependencies (very rare)
5NF
๐ก Industry standard: 3NF or BCNF is sufficient for 99% of databases.
First Normal Form โ Atomic Values
Rule: Every cell must contain exactly one value (atomic). No repeating groups, no arrays, no comma-separated lists, no nested tables.
โ Violates 1NF:
NOT 1NF
student_id | name | phone_numbers | courses
101 | Rahul | 9876543210, 9123456789 | CS301, CS302, CS303
102 | Priya | 9888777666 | EC201
Problems: phone_numbers has multiple values in one cell. courses is a repeating group.
โ Fixed โ 1NF:
1NF โ Decomposed
Table: students Table: student_phones Table: enrollments
student_id | name student_id | phone student_id | course_id
101 | Rahul 101 | 9876543210 101 | CS301
102 | Priya 101 | 9123456789 101 | CS302
102 | 9888777666 101 | CS303
102 | EC201
"Store phone numbers as VARCHAR(100) with commas: '9876543210,9123456789'." This technically passes 1NF by hiding the violation inside a string, but it's terrible design. You can't search individual phones, can't enforce uniqueness, can't index them. Always use a separate table for multi-valued attributes.
Second Normal Form โ No Partial Dependencies
Rule: Must be in 1NF AND every non-prime attribute must be fully functionally dependent on the ENTIRE primary key. No non-prime attribute should depend on only PART of a composite key.
When does 2NF matter? Only when the PK is composite (2+ columns). If PK is a single column, 1NF โ 2NF automatically.
โ Violates 2NF:
NOT 2NF
Table: enrollments (PK: {student_id, course_id})
student_id | course_id | student_name | dept | course_name | instructor | grade
FDs:
{student_id, course_id} โ grade โ Full dependency โ
student_id โ student_name, dept โ PARTIAL dependency! โ
course_id โ course_name, instructor โ PARTIAL dependency! โ
student_name depends only on student_id, not the full PK.
โ Fixed โ 2NF (remove partial dependencies):
2NF โ Decomposed
Table: students (PK: student_id) Table: courses (PK: course_id)
student_id | student_name | dept course_id | course_name | instructor
101 | Rahul | CSE CS301 | DBMS | Dr. Patel
102 | Priya | ECE CS302 | OS | Dr. Kumar
103 | Amit | CSE CS303 | Networks | Dr. Gupta
Table: enrollments (PK: {student_id, course_id})
student_id | course_id | grade
101 | CS301 | A
101 | CS302 | B+
102 | EC201 | A+
Each non-prime attribute now depends on the FULL PK of its table. โ
Third Normal Form โ No Transitive Dependencies
Rule: Must be in 2NF AND no non-prime attribute should depend on another non-prime attribute. Formally: For every FD X โ Y, at least one must be true: (1) X is a superkey, OR (2) Y is a prime attribute (part of some candidate key).
โ Violates 3NF:
NOT 3NF
Table: students (PK: student_id)
student_id | student_name | dept | dept_hod
FDs:
student_id โ student_name, dept, dept_hod โ OK (student_id is PK)
dept โ dept_hod โ TRANSITIVE! dept is non-prime, dept_hod is non-prime.
Chain: student_id โ dept โ dept_hod
This is a transitive dependency: student_id determines dept_hod THROUGH dept.
Problem: If Dr. Sharma leaves as CSE HOD, must update EVERY CSE student's row.
โ Fixed โ 3NF (remove transitive dependencies):
3NF โ Decomposed
Table: students (PK: student_id) Table: departments (PK: dept)
student_id | student_name | dept dept | dept_hod
101 | Rahul | CSE CSE | Dr. Sharma
102 | Priya | ECE ECE | Dr. Reddy
103 | Amit | CSE
Now dept โ dept_hod is in its OWN table where dept IS the PK (superkey). โ
Updating the HOD requires changing ONE row in departments table. Zero redundancy.
Boyce-Codd Normal Form โ The Strictest Practical Form
Rule: For every non-trivial FD X โ Y, X MUST be a superkey. No exceptions. (3NF allows Y to be a prime attribute even if X isn't a superkey; BCNF doesn't.)
When does 3NF โ BCNF? (The rare case)
3NF but NOT BCNF
Table: student_advisor (describes: students, subjects, advisors)
student_id | subject | advisor
FDs:
{student_id, subject} โ advisor (each student has one advisor per subject)
advisor โ subject (each advisor advises only one subject)
Candidate Keys: {student_id, subject} and {student_id, advisor}
Prime attributes: student_id, subject, advisor โ ALL are prime!
Check 3NF: advisor โ subject. advisor is NOT a superkey, BUT subject IS a prime attribute.
โด Satisfies 3NF (the "or A is prime" exception saves it). โ
Check BCNF: advisor โ subject. advisor is NOT a superkey.
โด Violates BCNF. โ
โ BCNF Decomposition:
BCNF โ Decomposed
Table: advisor_subject (PK: advisor) Table: student_advisor (PK: {student_id, advisor})
advisor | subject student_id | advisor
Dr. Patel | DBMS 101 | Dr. Patel
Dr. Kumar | OS 101 | Dr. Kumar
Dr. Singh | Signals 102 | Dr. Singh
Now every determinant IS a superkey in its table. โ
โ ๏ธ TRADE-OFF: The FD {student_id, subject} โ advisor is LOST!
It can't be checked within a single table anymore.
This is why sometimes we STOP at 3NF โ it preserves all dependencies.
3NF vs BCNF โ When to Choose Which
| Feature | 3NF | BCNF |
|---|---|---|
| Strictness | Allows non-superkey determinant IF dependent is prime | Every determinant MUST be a superkey. No exceptions. |
| Dependency preservation | โ Always possible to preserve all FDs | โ May lose some FDs during decomposition |
| Lossless decomposition | โ Always achievable | โ Always achievable |
| Redundancy | Slight redundancy possible (rare) | Zero redundancy from FDs |
| Industry practice | โ Most databases aim for 3NF minimum | โ Preferred when dependency loss is acceptable |
3.5 Decomposition โ Lossless & Dependency-Preserving
๐ Decomposition Rules โ Don't Lose Data or Dependencies
When decomposing R into R1 and R2, the decomposition is lossless if and only if: R1 โฉ R2 โ R1 or R1 โฉ R2 โ R2 (the common attributes must be a superkey of at least one of the resulting tables).
Lossless Test
R(student_id, student_name, dept, dept_hod)
Decompose into:
R1(student_id, student_name, dept) โ students table
R2(dept, dept_hod) โ departments table
Common attributes: R1 โฉ R2 = {dept}
Is {dept} a superkey of R2? R2 has PK = dept. Yes! โ
โด Decomposition is LOSSLESS.
If we JOIN R1 and R2 on dept, we get back EXACTLY the original data.
No extra spurious tuples are generated.
๐ LOSSY DECOMPOSITION (BAD!)
Lossy Decomposition โ WRONG!
R(A, B, C) with FD: A โ B
Decompose into:
R1(A, C)
R2(B, C)
Common: R1 โฉ R2 = {C}
Is C a key of R1 or R2? NO! C doesn't determine anything.
โด This is LOSSY โ JOINing R1 and R2 on C produces EXTRA rows (spurious tuples).
NEVER do lossy decomposition. Always verify the lossless condition.
๐ DEPENDENCY-PRESERVING DECOMPOSITION
A decomposition preserves dependencies if every FD can be checked within a single resulting table (without needing to JOIN tables). 3NF decomposition is always dependency-preserving. BCNF decomposition may NOT be.
3NF Decomposition Algorithm (Synthesis)
Algorithm: 3NF Decomposition
Input: R(A, B, C, D, E), FDs: {A โ BC, C โ D, D โ E}
Step 1: Find minimal cover (canonical form) of FDs
Already minimal: {A โ B, A โ C, C โ D, D โ E}
(Decompose right sides, remove redundant FDs, remove extraneous attributes)
Step 2: Create a table for each FD
T1(A, B) from A โ B
T2(A, C) from A โ C
T3(C, D) from C โ D
T4(D, E) from D โ E
Step 3: Merge tables with same determinant (left side)
T1 and T2 have same determinant A โ merge:
T12(A, B, C) PK = A
Final tables: T12(A, B, C), T3(C, D), T4(D, E)
Step 4: Ensure a candidate key is present in at least one table
CK = {A} (since Aโบ = {A,B,C,D,E})
A is in T12 โ
Result: {R1(A,B,C), R2(C,D), R3(D,E)} โ all in 3NF, lossless, dependency-preserving.
3.6 Denormalization โ When to Break the Rules
๐ Denormalization โ Trading Correctness for Speed
Denormalization is the intentional introduction of redundancy into a normalized schema to improve read performance. It's NOT about skipping normalization โ it's about normalizing first, then strategically de-normalizing specific tables for speed.
๐ข WHEN TO DENORMALIZE โ Indian Industry Examples| Scenario | Normalized Design | Denormalized Design | Why |
|---|---|---|---|
| Flipkart product listing | 5-table JOIN: products โ categories โ sellers โ images โ ratings | Single product_listing table with name, price, image_url, seller_name, avg_rating | Homepage loads in 100ms instead of 500ms. Updated via background job. |
| SBI account balance | SUM(transactions) per account | Cached current_balance column in accounts table, updated on each transaction | Balance lookup is O(1) instead of scanning millions of transactions. |
| IRCTC seat availability | COUNT available seats from bookings table | Pre-computed available_seats counter per train-date-class | Tatkal booking page loads in milliseconds during peak. |
- Data inconsistency if redundant copies aren't kept in sync
- Increased storage (minor cost in 2025)
- More complex write operations (must update all copies)
- Must handle sync failures gracefully
ALTER TABLE orders ADD total_amount GENERATED ALWAYS AS (quantity * unit_price) STORED (PostgreSQL) โ automatically maintained, no manual sync. (3) JSONB columns โ Store pre-computed summaries as JSON alongside normalized data. (4) CQRS pattern โ Separate read model (denormalized) from write model (normalized). Used by Razorpay, Swiggy, and modern Indian fintechs.Industry Problems
๐ข Industry Problem #1 โ Normalize a University ERP Table
Scenario: A university stores everything in one table:
UNF Table
student_id | name | dept | dept_hod | course_id | course_name | instructor | credits | grade | hostel | room
FDs:
student_id โ name, dept, hostel, room
dept โ dept_hod
course_id โ course_name, instructor, credits
{student_id, course_id} โ grade
hostel โ warden (each hostel has one warden)
Task: Normalize step-by-step from UNF โ 1NF โ 2NF โ 3NF โ BCNF. Show each table at each stage.
๐ก Complete Solution
Step-by-step Normalization
1NF: Already 1NF (atomic values, no repeating groups assumed)
2NF: Remove partial dependencies from PK = {student_id, course_id}
Partial FDs:
student_id โ name, dept, hostel, room (depends on PART of PK)
course_id โ course_name, instructor, credits (depends on PART of PK)
Decompose:
students(student_id, name, dept, hostel, room) PK: student_id
courses(course_id, course_name, instructor, credits) PK: course_id
enrollments(student_id, course_id, grade) PK: {student_id, course_id}
3NF: Remove transitive dependencies
In students: student_id โ dept โ dept_hod (transitive โ but dept_hod
isn't in students anymore if we didn't include it. Let's assume
the original had dept_hod in students table)
students(student_id, name, dept, hostel, room) โ dept_hod removed
departments(dept, dept_hod) PK: dept
In students: student_id โ hostel โ warden (transitive if warden exists)
students(student_id, name, dept, hostel, room)
hostels(hostel, warden) PK: hostel
BCNF: Check โ is every determinant a superkey?
students: student_id โ all. student_id is PK. โ
departments: dept โ dept_hod. dept is PK. โ
courses: course_id โ all. course_id is PK. โ
enrollments: {student_id, course_id} โ grade. Composite PK. โ
hostels: hostel โ warden. hostel is PK. โ
All in BCNF! โ
SQL โ Final Normalized Schema
CREATE TABLE departments (
dept VARCHAR2(10) PRIMARY KEY,
dept_hod VARCHAR2(50)
);
CREATE TABLE hostels (
hostel VARCHAR2(20) PRIMARY KEY,
warden VARCHAR2(50)
);
CREATE TABLE students (
student_id NUMBER(10) PRIMARY KEY,
name VARCHAR2(50) NOT NULL,
dept VARCHAR2(10) REFERENCES departments(dept),
hostel VARCHAR2(20) REFERENCES hostels(hostel),
room VARCHAR2(10)
);
CREATE TABLE courses (
course_id VARCHAR2(10) PRIMARY KEY,
course_name VARCHAR2(50) NOT NULL,
instructor VARCHAR2(50),
credits NUMBER(1) CHECK (credits BETWEEN 1 AND 5)
);
CREATE TABLE enrollments (
student_id NUMBER(10) REFERENCES students(student_id),
course_id VARCHAR2(10) REFERENCES courses(course_id),
grade CHAR(2),
PRIMARY KEY (student_id, course_id)
);
๐ข Industry Problem #2 โ GATE CS: Find Candidate Keys & Highest NF
Scenario: R(A, B, C, D, E) with FDs: {AB โ C, C โ D, D โ E, E โ A}. Find all candidate keys and the highest normal form.
๐ก Complete Solution
Solution
Step 1: Attribute analysis
B appears ONLY on LEFT side โ B must be in every candidate key.
Step 2: Find closures starting with B + each other attribute
{A, B}โบ: ABโC โ {A,B,C}. CโD โ {A,B,C,D}. DโE โ {A,B,C,D,E} โ ALL!
{B, C}โบ: CโD โ {B,C,D}. DโE โ {B,C,D,E}. EโA โ {A,B,C,D,E} โ ALL!
{B, D}โบ: DโE โ {B,D,E}. EโA โ {A,B,D,E}. ABโC โ {A,B,C,D,E} โ ALL!
{B, E}โบ: EโA โ {A,B,E}. ABโC โ {A,B,C,E}. CโD โ {A,B,C,D,E} โ ALL!
All are superkeys. Check minimality:
Bโบ = {B} โ NOT all. So B alone is not a key.
โด All of {AB, BC, BD, BE} are CANDIDATE KEYS (minimal).
Step 3: Prime and non-prime attributes
Prime: A, B, C, D, E โ ALL are prime! (every attribute is in some CK)
Non-prime: NONE
Step 4: Check normal forms
1NF: โ (assumed atomic)
2NF: Check partial dependencies on composite CKs.
ABโC: full (C depends on all of AB) โ
But CโD: C is part of CK {BC}. D depends on just C, not full BC.
However, D is a prime attribute โ 2NF only restricts NON-prime partials.
Since ALL attributes are prime, 2NF is satisfied. โ
3NF: For each FD, check: is LHS a superkey OR is RHS prime?
ABโC: AB is superkey โ
CโD: C is NOT a superkey. But D IS prime. โ (3NF exception)
DโE: D is NOT a superkey. But E IS prime. โ
EโA: E is NOT a superkey. But A IS prime. โ
โด 3NF satisfied. โ
BCNF: For each FD, is LHS a superkey?
CโD: C is NOT a superkey (Cโบ = {C,D,E,A} โ all, missing B). โ
โด NOT in BCNF. โ
Answer: Candidate Keys = {AB, BC, BD, BE}. Highest NF = 3NF (not BCNF).
๐ข Industry Problem #3 โ E-Commerce Denormalization Design
Scenario: Flipkart's normalized product catalog (5 tables: products, categories, sellers, images, reviews) takes 200ms for a JOIN query. The homepage needs <50ms response time. Design a denormalization strategy.
๐ก Solution
SQL โ Denormalized Read Table
-- Normalized source tables (OLTP โ source of truth)
-- products, categories, sellers, product_images, reviews
-- Denormalized materialized view (for read performance)
CREATE MATERIALIZED VIEW mv_product_listing
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND AS
SELECT
p.product_id, p.product_name, p.price,
c.category_name,
s.seller_name,
(SELECT image_url FROM product_images pi
WHERE pi.product_id = p.product_id AND pi.is_primary = 1
AND ROWNUM = 1) AS primary_image,
(SELECT ROUND(AVG(rating),1) FROM reviews r
WHERE r.product_id = p.product_id) AS avg_rating,
(SELECT COUNT(*) FROM reviews r
WHERE r.product_id = p.product_id) AS review_count
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN sellers s ON p.seller_id = s.seller_id;
-- Refresh every 15 minutes via scheduled job
-- Homepage queries hit mv_product_listing (single table, no JOINs)
-- Write operations (new product, review) go to normalized tables
Architecture: Writes โ Normalized OLTP tables (3NF). Reads โ Denormalized materialized view. This is the standard CQRS (Command Query Responsibility Segregation) pattern used by Flipkart, Amazon, and Swiggy.
Lab Exercises
Exercise 1: Identify Anomalies & FDs
Given Table: hospital(doctor_id, doctor_name, dept, dept_phone, patient_id, patient_name, diagnosis, bill_amount)
Tasks:
- Identify all functional dependencies
- Give one example each of insertion, update, and deletion anomaly
- Identify the candidate key(s)
- Classify each FD as full, partial, or transitive
Exercise 2: Compute Attribute Closure & Find Candidate Keys
Given: R(A, B, C, D, E, F) with FDs: {A โ B, BC โ D, D โ E, CF โ A}
Tasks:
- Compute Aโบ, Bโบ, Cโบ, {BC}โบ, {CF}โบ, {ACF}โบ
- Find ALL candidate keys
- List prime and non-prime attributes
- Determine the highest normal form
Hint: Which attributes appear only on the LEFT side of FDs? They must be in every key.
Exercise 3: Normalize a Railway Booking Table (UNF โ BCNF)
Given UNF table:
bookings(pnr, passenger_name, phone, train_no, train_name,
class, seat_no, journey_date, from_station, to_station, fare, status)
FDs: pnr โ all passenger+journey details; train_no โ train_name; {train_no, class} โ fare_per_km; {train_no, journey_date, class, seat_no} โ pnr
Tasks:
- Identify all FDs, candidate keys, prime/non-prime attributes
- Normalize step-by-step: 1NF โ 2NF โ 3NF โ BCNF
- Write CREATE TABLE SQL for each final table
- Verify lossless-join property for each decomposition step
Exercise 4: Test Lossless & Dependency-Preserving Decomposition
Given: R(A, B, C, D) with FDs: {A โ B, B โ C, C โ D}
Decompositions to test:
- D1: R1(A, B), R2(B, C), R3(C, D) โ Is this lossless? Dependency-preserving?
- D2: R1(A, B, C), R2(C, D) โ Is this lossless? Dependency-preserving?
- D3: R1(A, C), R2(B, D) โ Is this lossless? Dependency-preserving?
For each: Apply the lossless-join test and check if all FDs can be verified in a single table.
Exercise 5: Complete Schema Design โ E-Commerce (UNF โ 3NF + Denormalization)
Given UNF: A single spreadsheet with: order_id, order_date, customer_name, customer_phone, customer_city, product_name, category, seller_name, seller_gst, quantity, unit_price, discount_pct, gst_pct, total_amount, payment_mode, delivery_status
Tasks:
- Identify all FDs and candidate keys
- Normalize to 3NF/BCNF โ create at least 6 tables
- Write complete CREATE TABLE SQL with all constraints
- INSERT 10+ sample rows per table with realistic Indian data
- Create one materialized view for a "daily sales dashboard" (denormalization)
MCQ Assessment Bank โ 15 Questions
Hover to reveal answer and explanation.
A functional dependency X โ Y means:
- X and Y always have the same value
- For any two tuples with the same X value, they must have the same Y value โ X uniquely determines Y
- Y uniquely determines X
- X and Y are primary keys
๐ข GATE CS defines FDs formally. Know this definition precisely.
Which of Armstrong's axioms states: if X โ Y and Y โ Z, then X โ Z?
- Reflexivity
- Augmentation
- Transitivity
- Decomposition
๐ข Armstrong's axioms are guaranteed GATE questions. Memorize all three + derived rules.
A relation is in 2NF if:
- It is in 1NF and has no repeating groups
- It is in 1NF and no non-prime attribute is partially dependent on any candidate key
- It is in 1NF and has no transitive dependencies
- Every determinant is a superkey
๐ข Know the exact definition of each normal form โ exams test precise definitions.
Why does a transitive dependency cause problems?
- It doesn't โ transitive dependencies are fine
- Because a non-key attribute determines another non-key attribute, causing the same fact to be stored redundantly across multiple rows. Updating one copy but not others causes data inconsistency (update anomaly)
- It violates 1NF
- It causes the database to crash
๐ข This is the most intuitive way to explain why normalization matters โ use this in interviews.
What is the difference between 3NF and BCNF?
- They are identical
- 3NF allows a non-trivial FD XโA where X is not a superkey, IF A is a prime attribute. BCNF requires X to be a superkey for every non-trivial FD โ no exceptions. BCNF is strictly stronger than 3NF
- BCNF is weaker than 3NF
- 3NF requires all attributes to be prime
๐ข GATE CS 2020, 2022 asked exactly this difference. Know the formal definitions.
A decomposition of R into R1 and R2 is lossless if and only if:
- R1 and R2 have the same number of columns
- R1 โฉ R2 is a superkey of R1 or R2 (the common attributes can determine all attributes of at least one of the decomposed tables)
- R1 and R2 have no common attributes
- All FDs are preserved
๐ข The lossless-join test is a standard GATE question worth 2-5 marks.
Given R(A, B, C, D) with FDs: {A โ B, B โ C, A โ D}. Compute Aโบ.
- {A}
- {A, B}
- {A, B, C, D}
- {A, D}
๐ข Closure computation is the most frequently tested algorithm in GATE DBMS.
Given R(A, B, C, D, E) with FDs: {AB โ C, C โ D, D โ E}. What is the highest normal form?
- 1NF
- 2NF
- 3NF
- BCNF
๐ข Correction: The answer is 2NF (not 1NF). It satisfies 2NF (no partial deps since AB is fully needed for C) but violates 3NF (CโD where C is non-prime). Always trace through each NF systematically.
Decompose R(A, B, C, D) with FDs: {A โ B, B โ C, A โ D} into 3NF using the synthesis algorithm.
- R1(A, B, C, D) โ no decomposition needed
- R1(A, B, D) from merging AโB and AโD; R2(B, C) from BโC. CK {A} is in R1. Lossless (R1โฉR2={B}, B is key of R2) and dependency-preserving (all FDs in a single table)
- R1(A, B), R2(B, C), R3(A, D) โ three tables
- R1(A, C), R2(B, D)
๐ข The 3NF synthesis algorithm is a procedural GATE question. Practice 5+ examples.
Given R(A, B, C, D, E) with FDs: {AB โ CDE, C โ A}. Candidate Keys: {AB, BC}. Is R in BCNF? Analyze.
- Yes, it's in BCNF
- No โ ABโCDE: AB is a superkey โ. But CโA: C is NOT a superkey (Cโบ = {C,A} โ all). CโA violates BCNF. However, A IS prime (part of CK {AB}), so 3NF is satisfied. Result: 3NF but NOT BCNF
- It's not even in 3NF
- Cannot determine without more FDs
๐ข This exact pattern appeared in GATE CS 2021. Memorize: "3NF but not BCNF when a non-superkey determines a prime attribute."
R(A,B,C) with FDs: {AโB, BโC}. Decomposition D1: R1(A,B), R2(A,C). D2: R1(A,B), R2(B,C). Analyze both for lossless-join and dependency preservation.
- Both are lossless and dependency-preserving
- D1: Lossless (A is key of both), dependency-preserving (AโB in R1, but BโC? B not in R1 with C... only AโB in R1 and AโC derivable in R2. But original FD BโC is LOST โ can't be checked in a single table). D2: Lossless (B is key of R2), dependency-preserving (AโB in R1, BโC in R2 โ both preserved). D2 is superior.
- D1 is better
- Neither is lossless
๐ข Analyzing decomposition quality (lossless + preserving) is a 5-mark GATE question.
A BCNF decomposition of R loses the FD {student_id, subject} โ advisor. The DBA team debates: stay at 3NF (preserves the FD) or go to BCNF (loses it). Evaluate.
- Always go to BCNF
- Stay at 3NF if the lost FD represents a critical business rule that must be enforced. In BCNF, you'd need a trigger or application-level check to enforce the lost FD, adding complexity. 3NF preserves all dependencies within tables, making constraint enforcement simpler. The minor redundancy in 3NF is acceptable if it preserves important business rules
- Dependencies don't matter
- Use 1NF instead
๐ข This is a real architectural decision in enterprise database design. Know the trade-offs for senior-level interviews.
A startup's CTO argues: "Normalization is outdated. Just store everything in one big table โ modern SSDs and cloud databases are fast enough." Evaluate this claim.
- The CTO is correct โ normalization is outdated
- Wrong for OLTP (transactional systems). Normalization prevents data anomalies โ no amount of hardware speed fixes inconsistent data. A denormalized OLTP system will eventually have rows where the same customer has two different phone numbers, or a department has two different HODs. However, for OLAP (analytical/reporting systems), flat denormalized tables (star schema, data warehouses) are indeed standard โ because they're read-only and don't face update anomalies
- Normalization causes slow queries
- One table is always better
๐ข This debate comes up in system design interviews. The answer is always: "normalize for correctness, denormalize for performance, never skip normalization."
Given an unnormalized invoice table: invoice_id, date, customer_name, phone, items (comma-separated), quantities, prices, gst_no, total. Design a 3NF schema.
- Keep as one table
- Decompose into: customers(customer_id PK, name, phone, gst_no), invoices(invoice_id PK, customer_id FK, date, total), invoice_items(item_id PK, invoice_id FK, product_name, quantity, unit_price, line_total). Products could be further extracted: products(product_id PK, product_name, unit_price), with invoice_items referencing product_id
- Two tables: invoices and items
- Store as JSON
๐ข Invoice/billing schema design is a common campus placement and TCS/Infosys project assignment.
Design a complete normalized schema for a hospital management system (patients, doctors, departments, appointments, prescriptions, medicines). Start from a single unnormalized table and show the final 3NF/BCNF tables with SQL.
- One table is sufficient
- 6+ tables: departments(dept_id PK, name, hod), doctors(doctor_id PK, name, specialization, dept_id FK, salary), patients(patient_id PK, name, phone, dob, blood_group), appointments(appt_id PK, patient_id FK, doctor_id FK, date, status, fee), medicines(med_id PK, name, manufacturer, price), prescriptions(rx_id PK, appt_id FK, med_id FK, dosage, duration). All in BCNF โ every determinant is a PK/superkey
- Three tables maximum
- Use a spreadsheet
๐ข Hospital/university schema design from UNF to 3NF is a guaranteed lab exam question.
Chapter Summary
๐ฏ 3 Skills This Chapter Unlocks
- Schema Design Quality โ You can identify and eliminate redundancy in any database. This prevents data corruption that costs companies lakhs in debugging and fixes.
- GATE CS Normalization โ Closure computation, candidate key finding, NF determination, decomposition โ these topics carry 5-8 marks in GATE CS every year.
- Architecture Decisions โ Knowing when to normalize (OLTP) vs denormalize (OLAP) is a senior-level skill that differentiates you in interviews at product companies.
๐ Normalization Quick Reference
CLOSURE: Xโบ = start{X}, apply all FDs iteratively, stop when no change
CANDIDATE KEY: minimal set where Xโบ = all attributes
Find: attrs only on LHS must be in every key. Add others until Xโบ = all.
NORMAL FORMS:
1NF: atomic values (no arrays, no CSV, no nested)
2NF: 1NF + no partial deps (non-prime depends on full CK, not part)
3NF: 2NF + for XโA: X is superkey OR A is prime
BCNF: for XโA: X is ALWAYS superkey (strictest)
DECOMPOSITION TEST:
Lossless: R1 โฉ R2 โ R1 or R1 โฉ R2 โ R2
Dependency-preserving: all FDs within single decomposed table
3NF SYNTHESIS: minimal cover โ table per FD โ merge same LHS โ add CK table
DENORMALIZE WHEN: read-heavy dashboards, sub-100ms response needed,
data freshness of 15min+ acceptable. NEVER for OLTP source-of-truth.
Interview & Career Preparation
Q1: What is normalization? Why is it needed?
Model Answer: Normalization is the process of organizing a database to reduce redundancy and eliminate anomalies (insertion, update, deletion). It decomposes large tables into smaller, well-structured tables connected by foreign keys. Needed because: without it, the same fact is stored multiple times, leading to inconsistency when one copy is updated but others aren't. Example: storing department HOD in every student row โ changing the HOD requires updating hundreds of rows.
Q2: Explain 1NF, 2NF, 3NF with examples.
Model Answer: 1NF: every cell is atomic (one value). Fix: no comma-separated phone numbers โ use a separate table. 2NF: no partial dependencies โ non-prime attributes depend on the FULL composite key. Fix: student_name depends only on student_id, not {student_id, course_id} โ move to students table. 3NF: no transitive dependencies โ non-prime doesn't determine non-prime. Fix: dept โ dept_hod is transitive through student_id โ dept โ dept_hod โ create departments table.
Q3: What is BCNF? How does it differ from 3NF?
Model Answer: BCNF requires that for every non-trivial FD XโY, X must be a superkey. No exceptions. 3NF has an exception: if Y is a prime attribute (part of some candidate key), the FD is allowed even if X isn't a superkey. BCNF is stricter โ eliminates all FD-based redundancy. Trade-off: BCNF decomposition may lose some FDs (can't be checked in a single table), while 3NF always preserves all FDs.
Q4: How do you find candidate keys from FDs?
Model Answer: Step 1: Identify attributes that appear ONLY on the left side of FDs โ they must be in every candidate key. Step 2: Compute their closure. If closure = all attributes, they form a candidate key. Step 3: If not, add remaining attributes one by one, compute closure each time. Step 4: Check minimality โ remove each attribute and verify the remaining is still a superkey. If removing any attribute breaks it, the set is a minimal candidate key.
Q5: What is a lossless-join decomposition?
Model Answer: A decomposition of R into R1 and R2 is lossless if JOINing R1 and R2 gives back exactly R โ no extra (spurious) tuples. Test: the common attributes (R1 โฉ R2) must be a superkey of at least one of the decomposed tables. If they're not, the JOIN produces extra rows that weren't in the original table. Always verify lossless property when decomposing.
Q6: What is the 3NF synthesis algorithm?
Model Answer: Step 1: Find minimal cover of FDs (decompose RHS, remove redundant FDs, remove extraneous attributes). Step 2: Create a table for each FD โ left side becomes PK. Step 3: Merge tables with the same determinant. Step 4: If no table contains a candidate key, add a table with just the CK. This always produces a lossless, dependency-preserving 3NF decomposition.
Q7: When should you denormalize?
Model Answer: Denormalize when: (1) Read performance is critical (dashboards, search pages needing sub-100ms). (2) The query involves many JOINs across large tables. (3) Slight data staleness is acceptable (15-minute-old data for reports). Techniques: materialized views, computed columns, read replicas with denormalized schema. Rule: always normalize first (source of truth), then denormalize strategically for read performance. Never skip normalization for OLTP systems.
Q8: What are Armstrong's axioms?
Model Answer: Three core axioms: (1) Reflexivity: if Y โ X then XโY (trivial). (2) Augmentation: if XโY then XZโYZ (adding attributes to both sides). (3) Transitivity: if XโY and YโZ then XโZ. These are sound (only derive valid FDs) and complete (can derive ALL valid FDs). Derived rules: Union (XโY, XโZ โ XโYZ), Decomposition (XโYZ โ XโY, XโZ), Pseudo-transitivity (XโY, WYโZ โ WXโZ).
Q9: Can you give a real-world example of each normal form violation?
Model Answer: 1NF violation: Excel sheet with "Phone: 9876,9123" in one cell โ not atomic. 2NF violation: University table with {student_id, course_id} as PK, but student_name depends only on student_id (partial). 3NF violation: Employee table where emp_id โ dept โ dept_location (transitive โ dept_location is a fact about dept, not about employee). BCNF violation: Advisor table where advisor โ subject but advisor isn't a superkey (rare, involves overlapping candidate keys).
Q10: Normalize vs Denormalize โ how do companies like Flipkart handle this?
Model Answer: Flipkart uses a dual-model architecture: (1) Normalized PostgreSQL database for OLTP โ order placement, payment processing, inventory updates. Full 3NF/BCNF. ACID transactions. (2) Denormalized data warehouse (BigQuery/Redshift) for OLAP โ sales analytics, recommendation engine, reporting. Star schema with fact and dimension tables. The normalized system feeds the denormalized system via ETL/streaming pipelines. This is the industry-standard CQRS pattern.
๐ GATE CS Normalization Strategy
- Always start by computing closures and finding candidate keys
- Classify attributes as prime/non-prime before checking NFs
- Check NFs bottom-up: 1NF โ 2NF โ 3NF โ BCNF. Stop at the first failure.
- Practice: 15-20 problems from previous GATE papers (2015-2024 have normalization every year)