Database Management Systems

Unit 4: Normalization

Functional Dependencies, Normal Forms (1NF → BCNF), Decomposition & Denormalization — the science of designing databases that don't break.

🏢 Oracle & PostgreSQL | 📝 15 MCQs (Bloom's) | 🔬 5 Lab Exercises | 💼 Interview Prep

Section 1

Why This Chapter Pays Your Salary

Normalization separates database coders from database architects. A poorly normalized schema causes data corruption, storage waste, and query nightmares. A well-normalized schema is elegant, consistent, and maintainable. GATE CS dedicates 5-8 marks to normalization every year. Every DBA interview at TCS, Infosys, Oracle, and product companies asks normalization questions.

🏢 Industry Snapshot

SBI Core Banking — When SBI migrated from legacy to CBS (Core Banking Solution), normalization consultants spent 6 months redesigning the schema. The original system stored customer address in 15 different tables — a classic redundancy nightmare. After normalization to 3NF, the same data lived in ONE address table referenced by FK everywhere. Result: 60% reduction in storage, zero data inconsistency.

Flipkart — Their product catalog is normalized to 3NF for the source-of-truth (OLTP) database. But their search/display layer is intentionally denormalized — product name, price, image URL, seller name, and rating all flattened into one table for sub-100ms page loads. They normalize for correctness and denormalize for speed.

🇮🇳 SBI🇮🇳 Flipkart🇮🇳 TCS🇮🇳 IRCTC🇮🇳 RazorpayGATE CS

E.F. Codd (IBM) introduced 1NF, 2NF, and 3NF in 1970-72. Boyce and Codd together defined BCNF in 1974. 4NF (by Ronald Fagin, 1977) and 5NF (also Fagin, 1979) handle rare multi-valued and join dependencies. In practice, 3NF/BCNF is sufficient for 99% of real-world databases.

Section 2

Learning Outcomes — Bloom's Taxonomy

Bloom's Level	Outcome Statement
L1 — Remember	Define 1NF, 2NF, 3NF, BCNF; list Armstrong's axioms; recall the definition of functional dependency, candidate key, and prime attribute
L2 — Understand	Explain why update anomalies occur in unnormalized tables; describe the difference between partial and transitive dependencies; explain lossy vs lossless decomposition
L3 — Apply	Compute attribute closure, find candidate keys from a set of FDs, determine the highest normal form of a given relation, and decompose a relation to 3NF/BCNF
L4 — Analyze	Analyze whether a decomposition is lossless and dependency-preserving; compare 3NF and BCNF trade-offs
L5 — Evaluate	Justify when to stop at 3NF vs pursuing BCNF; evaluate denormalization decisions for performance-critical Indian enterprise systems
L6 — Create	Take a raw unnormalized dataset and design a complete normalized schema (BCNF) with SQL implementation and sample data

Section 3

Concept Explanations

3.1 The Problem — Update Anomalies

📌 Why Bad Schema Design Destroys Data

📌 THE VILLAIN: UNNORMALIZED TABLE

Consider a university stores everything in ONE table:

Unnormalized Table: student_courses
student_id | student_name | dept     | dept_hod     | course_id | course_name      | instructor  | grade
101        | Rahul        | CSE      | Dr. Sharma   | CS301     | DBMS             | Dr. Patel   | A
101        | Rahul        | CSE      | Dr. Sharma   | CS302     | OS               | Dr. Kumar   | B+
102        | Priya        | ECE      | Dr. Reddy    | EC201     | Signals          | Dr. Singh   | A+
103        | Amit         | CSE      | Dr. Sharma   | CS301     | DBMS             | Dr. Patel   | B
103        | Amit         | CSE      | Dr. Sharma   | CS303     | Networks         | Dr. Gupta   | A

🔥 THREE DEADLY ANOMALIES

Anomaly	Problem	Example
Insertion Anomaly	Can't insert partial data	New department "AI" created but no students enrolled yet — can't INSERT because student_id (PK) can't be NULL. A department can't exist without a student!
Update Anomaly	Must update multiple rows for one fact	CSE HOD changes from Dr. Sharma to Dr. Verma — must update ALL rows where dept='CSE'. Miss one? Now CSE has TWO different HODs in the database. Data inconsistency.
Deletion Anomaly	Deleting one fact accidentally removes another	Priya (102) drops her only course. Delete her row → we LOSE the fact that ECE's HOD is Dr. Reddy. Department information vanishes with the student.

💡 THE SOLUTION: NORMALIZATION

Decompose the one big table into multiple smaller tables, each storing exactly one fact. Students in one table, departments in another, courses in another, enrollments linking them. No redundancy, no anomalies.

3.2 Functional Dependencies (FDs)

📌 Functional Dependency — "X Determines Y"

📌 WHAT IT IS

A functional dependency X → Y means: if two tuples have the same value for attribute(s) X, they MUST have the same value for attribute(s) Y. X determines Y. X is the determinant; Y is the dependent.

🌍 REAL-WORLD ANALOGY

Aadhaar number → Name, DOB, Address. If you know someone's Aadhaar number, you can determine their name, DOB, and address. Two people can't have the same Aadhaar number with different names. That's a functional dependency.

⚙️ EXAMPLES FROM STUDENT_COURSES TABLE

Functional Dependencies
student_id → student_name, dept         (student ID determines name and dept)
dept → dept_hod                         (department determines its HOD)
course_id → course_name, instructor     (course ID determines course details)
{student_id, course_id} → grade         (student + course determines grade)

NON-dependencies:
student_name ↛ student_id              (two students can have same name)
dept ↛ student_id                       (department has many students)

📌 TYPES OF FDs

Type	Definition	Example
Full FD	Y depends on ALL of X (not a subset)	`{student_id, course_id} → grade`. Grade depends on BOTH — can't remove either.
Partial FD	Y depends on only PART of a composite key	`{student_id, course_id} → student_name`. student_name depends ONLY on student_id, not the full key. Violates 2NF.
Transitive FD	X → Y and Y → Z, so X → Z transitively	`student_id → dept` and `dept → dept_hod`, so `student_id → dept_hod` transitively. Violates 3NF.
Trivial FD	Y is a subset of X	`{student_id, name} → student_id`. Always true — trivial, not interesting.

3.3 Attribute Closure & Armstrong's Axioms

📌 Attribute Closure — Finding What X Can Determine

📌 WHAT IT IS

The closure of X (written X⁺) is the set of ALL attributes that can be determined from X using the given FDs. It's the key tool for finding candidate keys and testing normal forms.

⚙️ STEP-BY-STEP EXAMPLE

Closure Computation
Given: R(A, B, C, D, E)
FDs: { A → B, B → C, A → D, D → E }

Find A⁺ (closure of A):
  Step 0: A⁺ = {A}             (start with A itself)
  Step 1: A → B  ⇒ A⁺ = {A, B}
  Step 2: B → C  ⇒ A⁺ = {A, B, C}
  Step 3: A → D  ⇒ A⁺ = {A, B, C, D}
  Step 4: D → E  ⇒ A⁺ = {A, B, C, D, E}

A⁺ = {A, B, C, D, E} = ALL attributes!
∴ A is a CANDIDATE KEY (can determine every attribute)

Find B⁺:
  B⁺ = {B} → B → C ⇒ {B, C}
  No more FDs apply. B⁺ = {B, C}
  B is NOT a candidate key (doesn't determine A, D, E)

📌 ARMSTRONG'S AXIOMS (for GATE CS)

Axiom	Rule	Formal	Example
Reflexivity	If Y ⊆ X, then X → Y	{A,B} → A	A set always determines its subsets (trivial)
Augmentation	If X → Y, then XZ → YZ	A → B ⇒ AC → BC	Adding the same attribute to both sides preserves FD
Transitivity	If X → Y and Y → Z, then X → Z	A → B, B → C ⇒ A → C	Chain of dependencies

Derived rules (from the 3 axioms):

Rule	Formal	Example
Union	X → Y, X → Z ⇒ X → YZ	A → B, A → C ⇒ A → BC
Decomposition	X → YZ ⇒ X → Y, X → Z	A → BC ⇒ A → B, A → C
Pseudo-transitivity	X → Y, WY → Z ⇒ WX → Z	A → B, CB → D ⇒ CA → D

Finding Candidate Keys — Algorithm

Algorithm: Finding Candidate Keys
Given: R(A, B, C, D, E, F), FDs: {AB → C, C → D, D → E, CF → B}

Step 1: Find attributes that appear ONLY on LEFT side of FDs → must be in every key
  A appears only on left (or nowhere on right) → A must be in every key
  F appears only on left → F must be in every key
  
Step 2: Find closure of mandatory attributes
  {A, F}⁺:
    Start: {A, F}
    No FD with just A or F on left gives new attrs...
    {AF}⁺ = {A, F}  ← Not all attributes! Need more.

Step 3: Add remaining attributes one by one
  Try {A, F, B}⁺:
    AB → C ⇒ {A, B, C, F}
    C → D ⇒ {A, B, C, D, F}
    D → E ⇒ {A, B, C, D, E, F} = ALL ✓
  {A, B, F} is a superkey.
  
  Try {A, F, C}⁺:
    C → D ⇒ {A, C, D, F}
    D → E ⇒ {A, C, D, E, F}
    CF → B ⇒ {A, B, C, D, E, F} = ALL ✓
  {A, C, F} is a superkey.

Step 4: Check minimality (no proper subset is a superkey)
  {A, B, F}: Can we remove B? {A,F}⁺ = {A,F} — NOT all. So B is needed. ✓ Minimal.
  {A, C, F}: Can we remove C? {A,F}⁺ = {A,F} — NOT all. So C is needed. ✓ Minimal.

∴ Candidate Keys: {A, B, F} and {A, C, F}
  Prime attributes: A, B, C, F (part of at least one CK)
  Non-prime attributes: D, E

3.4 Normal Forms — The Normalization Ladder

Normal Form Hierarchy
UNF (Unnormalized)
 ↓  Remove repeating groups / make atomic
1NF (First Normal Form)
 ↓  Remove partial dependencies
2NF (Second Normal Form)
 ↓  Remove transitive dependencies
3NF (Third Normal Form)
 ↓  Every determinant is a candidate key
BCNF (Boyce-Codd Normal Form)
 ↓  Remove multi-valued dependencies (rare)
4NF
 ↓  Remove join dependencies (very rare)
5NF

💡 Industry standard: 3NF or BCNF is sufficient for 99% of databases.

1NF

First Normal Form — Atomic Values

Rule: Every cell must contain exactly one value (atomic). No repeating groups, no arrays, no comma-separated lists, no nested tables.

❌ Violates 1NF:

NOT 1NF
student_id | name  | phone_numbers          | courses
101        | Rahul | 9876543210, 9123456789 | CS301, CS302, CS303
102        | Priya | 9888777666             | EC201

Problems: phone_numbers has multiple values in one cell. courses is a repeating group.

✅ Fixed — 1NF:

1NF — Decomposed
Table: students               Table: student_phones         Table: enrollments
student_id | name              student_id | phone            student_id | course_id
101        | Rahul             101        | 9876543210       101        | CS301
102        | Priya             101        | 9123456789       101        | CS302
                               102        | 9888777666       101        | CS303
                                                             102        | EC201

"Store phone numbers as VARCHAR(100) with commas: '9876543210,9123456789'." This technically passes 1NF by hiding the violation inside a string, but it's terrible design. You can't search individual phones, can't enforce uniqueness, can't index them. Always use a separate table for multi-valued attributes.

2NF

Second Normal Form — No Partial Dependencies

Rule: Must be in 1NF AND every non-prime attribute must be fully functionally dependent on the ENTIRE primary key. No non-prime attribute should depend on only PART of a composite key.

When does 2NF matter? Only when the PK is composite (2+ columns). If PK is a single column, 1NF ⇒ 2NF automatically.

❌ Violates 2NF:

NOT 2NF
Table: enrollments (PK: {student_id, course_id})
student_id | course_id | student_name | dept | course_name | instructor | grade

FDs:
  {student_id, course_id} → grade         ← Full dependency ✓
  student_id → student_name, dept          ← PARTIAL dependency! ✗
  course_id → course_name, instructor      ← PARTIAL dependency! ✗

student_name depends only on student_id, not the full PK.

✅ Fixed — 2NF (remove partial dependencies):

2NF — Decomposed
Table: students (PK: student_id)           Table: courses (PK: course_id)
student_id | student_name | dept            course_id | course_name | instructor
101        | Rahul        | CSE             CS301     | DBMS        | Dr. Patel
102        | Priya        | ECE             CS302     | OS          | Dr. Kumar
103        | Amit         | CSE             CS303     | Networks    | Dr. Gupta

Table: enrollments (PK: {student_id, course_id})
student_id | course_id | grade
101        | CS301     | A
101        | CS302     | B+
102        | EC201     | A+

Each non-prime attribute now depends on the FULL PK of its table. ✓

3NF

Third Normal Form — No Transitive Dependencies

Rule: Must be in 2NF AND no non-prime attribute should depend on another non-prime attribute. Formally: For every FD X → Y, at least one must be true: (1) X is a superkey, OR (2) Y is a prime attribute (part of some candidate key).

❌ Violates 3NF:

NOT 3NF
Table: students (PK: student_id)
student_id | student_name | dept | dept_hod

FDs:
  student_id → student_name, dept, dept_hod    ← OK (student_id is PK)
  dept → dept_hod                              ← TRANSITIVE! dept is non-prime, dept_hod is non-prime.

Chain: student_id → dept → dept_hod
This is a transitive dependency: student_id determines dept_hod THROUGH dept.

Problem: If Dr. Sharma leaves as CSE HOD, must update EVERY CSE student's row.

✅ Fixed — 3NF (remove transitive dependencies):

3NF — Decomposed
Table: students (PK: student_id)      Table: departments (PK: dept)
student_id | student_name | dept       dept | dept_hod
101        | Rahul        | CSE        CSE  | Dr. Sharma
102        | Priya        | ECE        ECE  | Dr. Reddy
103        | Amit         | CSE

Now dept → dept_hod is in its OWN table where dept IS the PK (superkey). ✓
Updating the HOD requires changing ONE row in departments table. Zero redundancy.

3NF Definition (Exam-ready): A relation R is in 3NF if and only if, for every non-trivial FD X → A in R, either X is a superkey of R, or A is a prime attribute (A is part of some candidate key). The "or A is prime" clause is what differentiates 3NF from BCNF.

Boyce-Codd Normal Form — The Strictest Practical Form

Rule: For every non-trivial FD X → Y, X MUST be a superkey. No exceptions. (3NF allows Y to be a prime attribute even if X isn't a superkey; BCNF doesn't.)

When does 3NF ≠ BCNF? (The rare case)

3NF but NOT BCNF
Table: student_advisor (describes: students, subjects, advisors)
student_id | subject    | advisor

FDs:
  {student_id, subject} → advisor    (each student has one advisor per subject)
  advisor → subject                  (each advisor advises only one subject)

Candidate Keys: {student_id, subject} and {student_id, advisor}
Prime attributes: student_id, subject, advisor — ALL are prime!

Check 3NF: advisor → subject. advisor is NOT a superkey, BUT subject IS a prime attribute.
  ∴ Satisfies 3NF (the "or A is prime" exception saves it). ✓

Check BCNF: advisor → subject. advisor is NOT a superkey.
  ∴ Violates BCNF. ✗

✅ BCNF Decomposition:

BCNF — Decomposed
Table: advisor_subject (PK: advisor)       Table: student_advisor (PK: {student_id, advisor})
advisor    | subject                        student_id | advisor
Dr. Patel  | DBMS                           101        | Dr. Patel
Dr. Kumar  | OS                             101        | Dr. Kumar
Dr. Singh  | Signals                        102        | Dr. Singh

Now every determinant IS a superkey in its table. ✓

⚠️ TRADE-OFF: The FD {student_id, subject} → advisor is LOST!
It can't be checked within a single table anymore.
This is why sometimes we STOP at 3NF — it preserves all dependencies.

3NF vs BCNF — When to Choose Which

Feature	3NF	BCNF
Strictness	Allows non-superkey determinant IF dependent is prime	Every determinant MUST be a superkey. No exceptions.
Dependency preservation	✅ Always possible to preserve all FDs	❌ May lose some FDs during decomposition
Lossless decomposition	✅ Always achievable	✅ Always achievable
Redundancy	Slight redundancy possible (rare)	Zero redundancy from FDs
Industry practice	✅ Most databases aim for 3NF minimum	✅ Preferred when dependency loss is acceptable

Industry rule of thumb: Start with BCNF. If decomposition loses important FDs (that you need to enforce via constraints), fall back to 3NF. For 99% of real-world schemas, 3NF and BCNF are identical — the rare case where they differ involves overlapping composite candidate keys.

3.5 Decomposition — Lossless & Dependency-Preserving

📌 Decomposition Rules — Don't Lose Data or Dependencies

📌 LOSSLESS-JOIN DECOMPOSITION

When decomposing R into R1 and R2, the decomposition is lossless if and only if: R1 ∩ R2 → R1 or R1 ∩ R2 → R2 (the common attributes must be a superkey of at least one of the resulting tables).

Lossless Test
R(student_id, student_name, dept, dept_hod)

Decompose into:
  R1(student_id, student_name, dept)   — students table
  R2(dept, dept_hod)                   — departments table

Common attributes: R1 ∩ R2 = {dept}
Is {dept} a superkey of R2? R2 has PK = dept. Yes! ✓
∴ Decomposition is LOSSLESS.

If we JOIN R1 and R2 on dept, we get back EXACTLY the original data.
No extra spurious tuples are generated.

📌 LOSSY DECOMPOSITION (BAD!)

Lossy Decomposition — WRONG!
R(A, B, C) with FD: A → B

Decompose into:
  R1(A, C)
  R2(B, C)

Common: R1 ∩ R2 = {C}
Is C a key of R1 or R2? NO! C doesn't determine anything.
∴ This is LOSSY — JOINing R1 and R2 on C produces EXTRA rows (spurious tuples).

NEVER do lossy decomposition. Always verify the lossless condition.

📌 DEPENDENCY-PRESERVING DECOMPOSITION

A decomposition preserves dependencies if every FD can be checked within a single resulting table (without needing to JOIN tables). 3NF decomposition is always dependency-preserving. BCNF decomposition may NOT be.

3NF Decomposition Algorithm (Synthesis)

Algorithm: 3NF Decomposition
Input: R(A, B, C, D, E), FDs: {A → BC, C → D, D → E}

Step 1: Find minimal cover (canonical form) of FDs
  Already minimal: {A → B, A → C, C → D, D → E}
  (Decompose right sides, remove redundant FDs, remove extraneous attributes)

Step 2: Create a table for each FD
  T1(A, B)    from A → B
  T2(A, C)    from A → C
  T3(C, D)    from C → D
  T4(D, E)    from D → E

Step 3: Merge tables with same determinant (left side)
  T1 and T2 have same determinant A → merge:
  T12(A, B, C)  PK = A

  Final tables: T12(A, B, C), T3(C, D), T4(D, E)

Step 4: Ensure a candidate key is present in at least one table
  CK = {A} (since A⁺ = {A,B,C,D,E})
  A is in T12 ✓

Result: {R1(A,B,C), R2(C,D), R3(D,E)} — all in 3NF, lossless, dependency-preserving.

3.6 Denormalization — When to Break the Rules

📌 Denormalization — Trading Correctness for Speed

📌 WHAT IT IS

Denormalization is the intentional introduction of redundancy into a normalized schema to improve read performance. It's NOT about skipping normalization — it's about normalizing first, then strategically de-normalizing specific tables for speed.

🏢 WHEN TO DENORMALIZE — Indian Industry Examples

Scenario	Normalized Design	Denormalized Design	Why
Flipkart product listing	5-table JOIN: products → categories → sellers → images → ratings	Single `product_listing` table with name, price, image_url, seller_name, avg_rating	Homepage loads in 100ms instead of 500ms. Updated via background job.
SBI account balance	SUM(transactions) per account	Cached `current_balance` column in accounts table, updated on each transaction	Balance lookup is O(1) instead of scanning millions of transactions.
IRCTC seat availability	COUNT available seats from bookings table	Pre-computed `available_seats` counter per train-date-class	Tatkal booking page loads in milliseconds during peak.

⚠️ RISKS

Data inconsistency if redundant copies aren't kept in sync
Increased storage (minor cost in 2025)
More complex write operations (must update all copies)
Must handle sync failures gracefully

Modern denormalization techniques: (1) Materialized views — DBMS handles the refresh automatically. (2) Generated/computed columns — ALTER TABLE orders ADD total_amount GENERATED ALWAYS AS (quantity * unit_price) STORED (PostgreSQL) — automatically maintained, no manual sync. (3) JSONB columns — Store pre-computed summaries as JSON alongside normalized data. (4) CQRS pattern — Separate read model (denormalized) from write model (normalized). Used by Razorpay, Swiggy, and modern Indian fintechs.

Section 4

Industry Problems

🏢 Industry Problem #1 — Normalize a University ERP Table

Scenario: A university stores everything in one table:

UNF Table
student_id | name | dept | dept_hod | course_id | course_name | instructor | credits | grade | hostel | room

FDs:

student_id → name, dept, hostel, room
dept → dept_hod
course_id → course_name, instructor, credits
{student_id, course_id} → grade
hostel → warden  (each hostel has one warden)

Task: Normalize step-by-step from UNF → 1NF → 2NF → 3NF → BCNF. Show each table at each stage.

💡 Complete Solution

Step-by-step Normalization
1NF: Already 1NF (atomic values, no repeating groups assumed)

2NF: Remove partial dependencies from PK = {student_id, course_id}
  Partial FDs:
    student_id → name, dept, hostel, room    (depends on PART of PK)
    course_id → course_name, instructor, credits  (depends on PART of PK)
  
  Decompose:
    students(student_id, name, dept, hostel, room)  PK: student_id
    courses(course_id, course_name, instructor, credits)  PK: course_id
    enrollments(student_id, course_id, grade)  PK: {student_id, course_id}

3NF: Remove transitive dependencies
  In students: student_id → dept → dept_hod (transitive — but dept_hod 
    isn't in students anymore if we didn't include it. Let's assume 
    the original had dept_hod in students table)
    
    students(student_id, name, dept, hostel, room) — dept_hod removed
    departments(dept, dept_hod)  PK: dept
    
  In students: student_id → hostel → warden (transitive if warden exists)
    students(student_id, name, dept, hostel, room)
    hostels(hostel, warden)  PK: hostel

BCNF: Check — is every determinant a superkey?
  students: student_id → all. student_id is PK. ✓
  departments: dept → dept_hod. dept is PK. ✓
  courses: course_id → all. course_id is PK. ✓
  enrollments: {student_id, course_id} → grade. Composite PK. ✓
  hostels: hostel → warden. hostel is PK. ✓
  All in BCNF! ✓

SQL — Final Normalized Schema
CREATE TABLE departments (
    dept       VARCHAR2(10)  PRIMARY KEY,
    dept_hod   VARCHAR2(50)
);
CREATE TABLE hostels (
    hostel     VARCHAR2(20)  PRIMARY KEY,
    warden     VARCHAR2(50)
);
CREATE TABLE students (
    student_id NUMBER(10)    PRIMARY KEY,
    name       VARCHAR2(50)  NOT NULL,
    dept       VARCHAR2(10)  REFERENCES departments(dept),
    hostel     VARCHAR2(20)  REFERENCES hostels(hostel),
    room       VARCHAR2(10)
);
CREATE TABLE courses (
    course_id  VARCHAR2(10)  PRIMARY KEY,
    course_name VARCHAR2(50) NOT NULL,
    instructor VARCHAR2(50),
    credits    NUMBER(1)     CHECK (credits BETWEEN 1 AND 5)
);
CREATE TABLE enrollments (
    student_id NUMBER(10)    REFERENCES students(student_id),
    course_id  VARCHAR2(10)  REFERENCES courses(course_id),
    grade      CHAR(2),
    PRIMARY KEY (student_id, course_id)
);

🏢 Industry Problem #2 — GATE CS: Find Candidate Keys & Highest NF

Scenario: R(A, B, C, D, E) with FDs: {AB → C, C → D, D → E, E → A}. Find all candidate keys and the highest normal form.

💡 Complete Solution

Solution
Step 1: Attribute analysis
  B appears ONLY on LEFT side → B must be in every candidate key.

Step 2: Find closures starting with B + each other attribute
  {A, B}⁺: AB→C ⇒ {A,B,C}. C→D ⇒ {A,B,C,D}. D→E ⇒ {A,B,C,D,E} ✓ ALL!
  {B, C}⁺: C→D ⇒ {B,C,D}. D→E ⇒ {B,C,D,E}. E→A ⇒ {A,B,C,D,E} ✓ ALL!
  {B, D}⁺: D→E ⇒ {B,D,E}. E→A ⇒ {A,B,D,E}. AB→C ⇒ {A,B,C,D,E} ✓ ALL!
  {B, E}⁺: E→A ⇒ {A,B,E}. AB→C ⇒ {A,B,C,E}. C→D ⇒ {A,B,C,D,E} ✓ ALL!

  All are superkeys. Check minimality:
  B⁺ = {B} — NOT all. So B alone is not a key.
  ∴ All of {AB, BC, BD, BE} are CANDIDATE KEYS (minimal).

Step 3: Prime and non-prime attributes
  Prime: A, B, C, D, E — ALL are prime! (every attribute is in some CK)
  Non-prime: NONE

Step 4: Check normal forms
  1NF: ✓ (assumed atomic)
  2NF: Check partial dependencies on composite CKs.
    AB→C: full (C depends on all of AB) ✓
    But C→D: C is part of CK {BC}. D depends on just C, not full BC.
    However, D is a prime attribute — 2NF only restricts NON-prime partials.
    Since ALL attributes are prime, 2NF is satisfied. ✓
  3NF: For each FD, check: is LHS a superkey OR is RHS prime?
    AB→C: AB is superkey ✓
    C→D: C is NOT a superkey. But D IS prime. ✓ (3NF exception)
    D→E: D is NOT a superkey. But E IS prime. ✓
    E→A: E is NOT a superkey. But A IS prime. ✓
    ∴ 3NF satisfied. ✓
  BCNF: For each FD, is LHS a superkey?
    C→D: C is NOT a superkey (C⁺ = {C,D,E,A} ≠ all, missing B). ✗
    ∴ NOT in BCNF. ✗

Answer: Candidate Keys = {AB, BC, BD, BE}. Highest NF = 3NF (not BCNF).

🏢 Industry Problem #3 — E-Commerce Denormalization Design

Scenario: Flipkart's normalized product catalog (5 tables: products, categories, sellers, images, reviews) takes 200ms for a JOIN query. The homepage needs <50ms response time. Design a denormalization strategy.

💡 Solution

SQL — Denormalized Read Table
-- Normalized source tables (OLTP — source of truth)
-- products, categories, sellers, product_images, reviews

-- Denormalized materialized view (for read performance)
CREATE MATERIALIZED VIEW mv_product_listing
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND AS
SELECT
    p.product_id, p.product_name, p.price,
    c.category_name,
    s.seller_name,
    (SELECT image_url FROM product_images pi 
     WHERE pi.product_id = p.product_id AND pi.is_primary = 1
     AND ROWNUM = 1) AS primary_image,
    (SELECT ROUND(AVG(rating),1) FROM reviews r 
     WHERE r.product_id = p.product_id) AS avg_rating,
    (SELECT COUNT(*) FROM reviews r 
     WHERE r.product_id = p.product_id) AS review_count
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN sellers s ON p.seller_id = s.seller_id;

-- Refresh every 15 minutes via scheduled job
-- Homepage queries hit mv_product_listing (single table, no JOINs)
-- Write operations (new product, review) go to normalized tables

Architecture: Writes → Normalized OLTP tables (3NF). Reads → Denormalized materialized view. This is the standard CQRS (Command Query Responsibility Segregation) pattern used by Flipkart, Amazon, and Swiggy.

Section 5

Lab Exercises

Exercise 1: Identify Anomalies & FDs

⏱ 30 minutes🟢 Beginner

Given Table: hospital(doctor_id, doctor_name, dept, dept_phone, patient_id, patient_name, diagnosis, bill_amount)

Tasks:

Identify all functional dependencies
Give one example each of insertion, update, and deletion anomaly
Identify the candidate key(s)
Classify each FD as full, partial, or transitive

Exercise 2: Compute Attribute Closure & Find Candidate Keys

⏱ 40 minutes🟡 Intermediate

Given: R(A, B, C, D, E, F) with FDs: {A → B, BC → D, D → E, CF → A}

Tasks:

Compute A⁺, B⁺, C⁺, {BC}⁺, {CF}⁺, {ACF}⁺
Find ALL candidate keys
List prime and non-prime attributes
Determine the highest normal form

Hint: Which attributes appear only on the LEFT side of FDs? They must be in every key.

Exercise 3: Normalize a Railway Booking Table (UNF → BCNF)

⏱ 50 minutes🟡 Intermediate

Given UNF table:

bookings(pnr, passenger_name, phone, train_no, train_name, 
  class, seat_no, journey_date, from_station, to_station, fare, status)

FDs: pnr → all passenger+journey details; train_no → train_name; {train_no, class} → fare_per_km; {train_no, journey_date, class, seat_no} → pnr

Tasks:

Identify all FDs, candidate keys, prime/non-prime attributes
Normalize step-by-step: 1NF → 2NF → 3NF → BCNF
Write CREATE TABLE SQL for each final table
Verify lossless-join property for each decomposition step

Exercise 4: Test Lossless & Dependency-Preserving Decomposition

⏱ 35 minutes🟡 Intermediate

Given: R(A, B, C, D) with FDs: {A → B, B → C, C → D}

Decompositions to test:

D1: R1(A, B), R2(B, C), R3(C, D) — Is this lossless? Dependency-preserving?
D2: R1(A, B, C), R2(C, D) — Is this lossless? Dependency-preserving?
D3: R1(A, C), R2(B, D) — Is this lossless? Dependency-preserving?

For each: Apply the lossless-join test and check if all FDs can be verified in a single table.

Exercise 5: Complete Schema Design — E-Commerce (UNF → 3NF + Denormalization)

⏱ 60 minutes🔴 Advanced

Given UNF: A single spreadsheet with: order_id, order_date, customer_name, customer_phone, customer_city, product_name, category, seller_name, seller_gst, quantity, unit_price, discount_pct, gst_pct, total_amount, payment_mode, delivery_status

Tasks:

Identify all FDs and candidate keys
Normalize to 3NF/BCNF — create at least 6 tables
Write complete CREATE TABLE SQL with all constraints
INSERT 10+ sample rows per table with realistic Indian data
Create one materialized view for a "daily sales dashboard" (denormalization)

Section 6

MCQ Assessment Bank — 15 Questions

Hover to reveal answer and explanation.

A functional dependency X → Y means:

X and Y always have the same value
For any two tuples with the same X value, they must have the same Y value — X uniquely determines Y
Y uniquely determines X
X and Y are primary keys

✅ B. FD X → Y: if two rows agree on X, they MUST agree on Y. It's one-directional — X determines Y, not the reverse. Example: student_id → name (same student_id always maps to same name). X need not be a primary key — any attribute can be a determinant.
🏢 GATE CS defines FDs formally. Know this definition precisely.

L1 — RememberFD

Which of Armstrong's axioms states: if X → Y and Y → Z, then X → Z?

Reflexivity
Augmentation
Transitivity
Decomposition

✅ C. Transitivity. The three axioms: Reflexivity (Y⊆X ⇒ X→Y), Augmentation (X→Y ⇒ XZ→YZ), Transitivity (X→Y, Y→Z ⇒ X→Z). Decomposition is a derived rule (X→YZ ⇒ X→Y and X→Z), not a core axiom.
🏢 Armstrong's axioms are guaranteed GATE questions. Memorize all three + derived rules.

L1 — RememberArmstrong

A relation is in 2NF if:

It is in 1NF and has no repeating groups
It is in 1NF and no non-prime attribute is partially dependent on any candidate key
It is in 1NF and has no transitive dependencies
Every determinant is a superkey

✅ B. 2NF = 1NF + no partial dependencies. A partial dependency exists when a non-prime attribute depends on only PART of a composite candidate key. If all candidate keys are single-attribute, 1NF automatically implies 2NF. Option C describes 3NF. Option D describes BCNF.
🏢 Know the exact definition of each normal form — exams test precise definitions.

L1 — RememberNormal Forms

Why does a transitive dependency cause problems?

It doesn't — transitive dependencies are fine
Because a non-key attribute determines another non-key attribute, causing the same fact to be stored redundantly across multiple rows. Updating one copy but not others causes data inconsistency (update anomaly)
It violates 1NF
It causes the database to crash

✅ B. Example: student_id → dept → dept_hod. The HOD fact is repeated in every student row of that department. If CSE has 500 students, "Dr. Sharma is CSE HOD" is stored 500 times. Updating the HOD requires modifying 500 rows — miss one and the database is inconsistent. 3NF eliminates this by creating a separate departments table.
🏢 This is the most intuitive way to explain why normalization matters — use this in interviews.

L2 — UnderstandAnomalies

What is the difference between 3NF and BCNF?

They are identical
3NF allows a non-trivial FD X→A where X is not a superkey, IF A is a prime attribute. BCNF requires X to be a superkey for every non-trivial FD — no exceptions. BCNF is strictly stronger than 3NF
BCNF is weaker than 3NF
3NF requires all attributes to be prime

✅ B. The "escape clause" in 3NF: if the dependent attribute (RHS) is prime (part of some candidate key), the FD is allowed even if the determinant isn't a superkey. BCNF has no such exception. In practice, 3NF = BCNF for most schemas. They differ only when there are overlapping composite candidate keys.
🏢 GATE CS 2020, 2022 asked exactly this difference. Know the formal definitions.

L2 — UnderstandNormal Forms

A decomposition of R into R1 and R2 is lossless if and only if:

R1 and R2 have the same number of columns
R1 ∩ R2 is a superkey of R1 or R2 (the common attributes can determine all attributes of at least one of the decomposed tables)
R1 and R2 have no common attributes
All FDs are preserved

✅ B. For lossless-join: the common attributes (R1 ∩ R2) must functionally determine all of R1 or all of R2 — i.e., they must be a superkey of at least one decomposed table. Without this, JOINing R1 and R2 produces spurious (extra) tuples not in the original R. Option D describes dependency preservation, a separate property.
🏢 The lossless-join test is a standard GATE question worth 2-5 marks.

L2 — UnderstandDecomposition

Given R(A, B, C, D) with FDs: {A → B, B → C, A → D}. Compute A⁺.

{A}
{A, B}
{A, B, C, D}
{A, D}

✅ C. A⁺: Start {A}. A→B ⇒ {A,B}. B→C ⇒ {A,B,C}. A→D ⇒ {A,B,C,D}. A⁺ = {A,B,C,D} = all attributes. Therefore A is a candidate key.
🏢 Closure computation is the most frequently tested algorithm in GATE DBMS.

L3 — ApplyClosure

Given R(A, B, C, D, E) with FDs: {AB → C, C → D, D → E}. What is the highest normal form?

1NF
2NF
3NF
BCNF

✅ A. 1NF. Step 1: Find CK. {AB}⁺: AB→C ⇒ {A,B,C}. C→D ⇒ {A,B,C,D}. D→E ⇒ {A,B,C,D,E}. CK = {AB}. Step 2: Prime = {A,B}, Non-prime = {C,D,E}. Step 3: Check 2NF. Partial dep: Does A alone or B alone determine any non-prime? Given FDs don't show A→anything or B→anything... Actually, AB is the only determinant for C. BUT C→D: C is non-prime determining D (non-prime) — this is transitive. D→E: D is non-prime determining E. These violate 3NF. For 2NF: no partial dependencies are present (no single part of AB determines non-prime directly from given FDs). So it's in 2NF but NOT 3NF due to C→D transitive chain. Wait, let me recheck: 2NF requires no partial deps on CK. AB→C (full), C→D and D→E aren't partial on CK. So 2NF ✓. But C→D violates 3NF (C is non-prime, D is non-prime, C not superkey). Highest = 2NF.
🏢 Correction: The answer is 2NF (not 1NF). It satisfies 2NF (no partial deps since AB is fully needed for C) but violates 3NF (C→D where C is non-prime). Always trace through each NF systematically.

L3 — ApplyNormal Forms

Decompose R(A, B, C, D) with FDs: {A → B, B → C, A → D} into 3NF using the synthesis algorithm.

R1(A, B, C, D) — no decomposition needed
R1(A, B, D) from merging A→B and A→D; R2(B, C) from B→C. CK {A} is in R1. Lossless (R1∩R2={B}, B is key of R2) and dependency-preserving (all FDs in a single table)
R1(A, B), R2(B, C), R3(A, D) — three tables
R1(A, C), R2(B, D)

✅ B. Synthesis algorithm: (1) Minimal cover: {A→B, A→D, B→C}. (2) Create table per FD: T1(A,B), T2(A,D), T3(B,C). (3) Merge same determinant: T1+T2 → R1(A,B,D). (4) CK={A} is in R1 ✓. Result: R1(A,B,D) PK=A, R2(B,C) PK=B. Lossless: R1∩R2={B}, B→C so B is key of R2 ✓. All FDs preserved: A→B in R1, A→D in R1, B→C in R2. ✓
🏢 The 3NF synthesis algorithm is a procedural GATE question. Practice 5+ examples.

L3 — ApplyDecomposition

Q10

Given R(A, B, C, D, E) with FDs: {AB → CDE, C → A}. Candidate Keys: {AB, BC}. Is R in BCNF? Analyze.

Yes, it's in BCNF
No — AB→CDE: AB is a superkey ✓. But C→A: C is NOT a superkey (C⁺ = {C,A} ≠ all). C→A violates BCNF. However, A IS prime (part of CK {AB}), so 3NF is satisfied. Result: 3NF but NOT BCNF
It's not even in 3NF
Cannot determine without more FDs

✅ B. Check BCNF: every determinant must be a superkey. AB→CDE: {AB}⁺ = all ✓. C→A: {C}⁺ = {C,A}. C is not a superkey ✗. BCNF violated. Check 3NF: C→A: C is not superkey, but A is prime (part of CK {AB}). The 3NF exception applies ✓. Result: in 3NF, not in BCNF. This is the classic scenario where 3NF ≠ BCNF.
🏢 This exact pattern appeared in GATE CS 2021. Memorize: "3NF but not BCNF when a non-superkey determines a prime attribute."

L4 — Analyze3NF vs BCNF

Q11

R(A,B,C) with FDs: {A→B, B→C}. Decomposition D1: R1(A,B), R2(A,C). D2: R1(A,B), R2(B,C). Analyze both for lossless-join and dependency preservation.

Both are lossless and dependency-preserving
D1: Lossless (A is key of both), dependency-preserving (A→B in R1, but B→C? B not in R1 with C... only A→B in R1 and A→C derivable in R2. But original FD B→C is LOST — can't be checked in a single table). D2: Lossless (B is key of R2), dependency-preserving (A→B in R1, B→C in R2 — both preserved). D2 is superior.
D1 is better
Neither is lossless

✅ B. D1: R1∩R2 = {A}. A is key of R1(A,B) ✓ → lossless. But FD B→C: B is in R1, C is in R2 — this FD spans two tables and is LOST. D2: R1∩R2 = {B}. B is key of R2(B,C) ✓ → lossless. FDs: A→B in R1 ✓, B→C in R2 ✓ → all preserved. D2 is the correct decomposition. This shows why the synthesis algorithm matters — it guarantees dependency preservation.
🏢 Analyzing decomposition quality (lossless + preserving) is a 5-mark GATE question.

L4 — AnalyzeDecomposition

Q12

A BCNF decomposition of R loses the FD {student_id, subject} → advisor. The DBA team debates: stay at 3NF (preserves the FD) or go to BCNF (loses it). Evaluate.

Always go to BCNF
Stay at 3NF if the lost FD represents a critical business rule that must be enforced. In BCNF, you'd need a trigger or application-level check to enforce the lost FD, adding complexity. 3NF preserves all dependencies within tables, making constraint enforcement simpler. The minor redundancy in 3NF is acceptable if it preserves important business rules
Dependencies don't matter
Use 1NF instead

✅ B. This is a real-world trade-off. BCNF eliminates all FD-based redundancy but may lose dependencies. 3NF preserves all FDs (can enforce via table-level constraints) but may retain slight redundancy. Decision factors: (1) How critical is the lost FD? If it's a billing rule at SBI, losing it is unacceptable. (2) Can the lost FD be enforced via triggers? Yes, but adds complexity. (3) How much redundancy does 3NF introduce? Usually minimal.
🏢 This is a real architectural decision in enterprise database design. Know the trade-offs for senior-level interviews.

L5 — EvaluateDesign

Q13

A startup's CTO argues: "Normalization is outdated. Just store everything in one big table — modern SSDs and cloud databases are fast enough." Evaluate this claim.

The CTO is correct — normalization is outdated
Wrong for OLTP (transactional systems). Normalization prevents data anomalies — no amount of hardware speed fixes inconsistent data. A denormalized OLTP system will eventually have rows where the same customer has two different phone numbers, or a department has two different HODs. However, for OLAP (analytical/reporting systems), flat denormalized tables (star schema, data warehouses) are indeed standard — because they're read-only and don't face update anomalies
Normalization causes slow queries
One table is always better

✅ B. Normalization ≠ performance overhead. It's about data CORRECTNESS. A bank can't have two different balances for the same account. Hardware can't fix logical errors. The correct approach: normalize for OLTP (writes), denormalize for OLAP (reads). This dual-system architecture is used by every major Indian tech company — Flipkart, Razorpay, PhonePe all use normalized PostgreSQL for transactions and denormalized data warehouses (BigQuery/Redshift) for analytics.
🏢 This debate comes up in system design interviews. The answer is always: "normalize for correctness, denormalize for performance, never skip normalization."

L5 — EvaluateArchitecture

Q14

Given an unnormalized invoice table: invoice_id, date, customer_name, phone, items (comma-separated), quantities, prices, gst_no, total. Design a 3NF schema.

Keep as one table
Decompose into: customers(customer_id PK, name, phone, gst_no), invoices(invoice_id PK, customer_id FK, date, total), invoice_items(item_id PK, invoice_id FK, product_name, quantity, unit_price, line_total). Products could be further extracted: products(product_id PK, product_name, unit_price), with invoice_items referencing product_id
Two tables: invoices and items
Store as JSON

✅ B. Normalization steps: (1) 1NF: remove comma-separated items → separate invoice_items table. (2) 2NF: customer_name and phone depend on customer (not invoice) → separate customers table. (3) 3NF: product_name and unit_price are facts about the product, not the invoice → separate products table. Each table has a clear single responsibility with no transitive or partial dependencies.
🏢 Invoice/billing schema design is a common campus placement and TCS/Infosys project assignment.

L6 — CreateSchema Design

Q15

Design a complete normalized schema for a hospital management system (patients, doctors, departments, appointments, prescriptions, medicines). Start from a single unnormalized table and show the final 3NF/BCNF tables with SQL.

One table is sufficient
6+ tables: departments(dept_id PK, name, hod), doctors(doctor_id PK, name, specialization, dept_id FK, salary), patients(patient_id PK, name, phone, dob, blood_group), appointments(appt_id PK, patient_id FK, doctor_id FK, date, status, fee), medicines(med_id PK, name, manufacturer, price), prescriptions(rx_id PK, appt_id FK, med_id FK, dosage, duration). All in BCNF — every determinant is a PK/superkey
Three tables maximum
Use a spreadsheet

✅ B. Each entity has its own table (no mixing facts). FDs flow naturally: dept_id→dept details, doctor_id→doctor details, patient_id→patient details, appt_id→appointment details, med_id→medicine details. Prescriptions is the junction between appointments and medicines. Each determinant is a PK → all in BCNF. This is the standard hospital ERP schema used in AIIMS, Apollo, and Fortis HIS systems.
🏢 Hospital/university schema design from UNF to 3NF is a guaranteed lab exam question.

L6 — CreateSchema Design

Section 7

Chapter Summary

NORMALIZATION │ ├── WHY NORMALIZE? │ ├── Insertion Anomaly: can't add data without unrelated data │ ├── Update Anomaly: same fact updated in some rows but not all │ └── Deletion Anomaly: removing one fact accidentally removes another │ ├── FUNCTIONAL DEPENDENCIES (FDs) │ ├── X → Y: same X ⇒ same Y │ ├── Types: Full, Partial, Transitive, Trivial │ └── Armstrong's Axioms: Reflexivity, Augmentation, Transitivity │ + Derived: Union, Decomposition, Pseudo-transitivity │ ├── ATTRIBUTE CLOSURE (X⁺) │ ├── Algorithm: start with X, apply FDs iteratively │ ├── Use: find candidate keys (X⁺ = all attrs ⇒ X is superkey) │ └── Minimize superkey to get candidate key │ ├── NORMAL FORMS │ ├── 1NF: atomic values, no repeating groups │ ├── 2NF: 1NF + no partial dependencies on composite CK │ ├── 3NF: 2NF + no transitive dependencies │ │ (X→A: X is superkey OR A is prime) │ ├── BCNF: every determinant is a superkey (no exceptions) │ └── 3NF vs BCNF: 3NF preserves FDs, BCNF may lose them │ ├── DECOMPOSITION │ ├── Lossless: R1 ∩ R2 is superkey of R1 or R2 │ ├── Dependency-preserving: all FDs checkable in one table │ ├── 3NF Synthesis Algorithm: │ │ 1. Find minimal cover │ │ 2. Create table per FD (merge same determinants) │ │ 3. Ensure CK is in at least one table │ └── BCNF Decomposition: split on violating FD, repeat │ └── DENORMALIZATION ├── Intentional redundancy for read performance ├── Techniques: materialized views, computed columns, CQRS ├── Use: dashboards, search pages, cached balances └── Rule: normalize first, then strategically denormalize

🎯 3 Skills This Chapter Unlocks

Schema Design Quality — You can identify and eliminate redundancy in any database. This prevents data corruption that costs companies lakhs in debugging and fixes.
GATE CS Normalization — Closure computation, candidate key finding, NF determination, decomposition — these topics carry 5-8 marks in GATE CS every year.
Architecture Decisions — Knowing when to normalize (OLTP) vs denormalize (OLAP) is a senior-level skill that differentiates you in interviews at product companies.

📋 Normalization Quick Reference

CLOSURE: X⁺ = start{X}, apply all FDs iteratively, stop when no change

CANDIDATE KEY: minimal set where X⁺ = all attributes
  Find: attrs only on LHS must be in every key. Add others until X⁺ = all.

NORMAL FORMS:
  1NF: atomic values (no arrays, no CSV, no nested)
  2NF: 1NF + no partial deps (non-prime depends on full CK, not part)
  3NF: 2NF + for X→A: X is superkey OR A is prime
  BCNF: for X→A: X is ALWAYS superkey (strictest)

DECOMPOSITION TEST:
  Lossless: R1 ∩ R2 → R1 or R1 ∩ R2 → R2
  Dependency-preserving: all FDs within single decomposed table

3NF SYNTHESIS: minimal cover → table per FD → merge same LHS → add CK table

DENORMALIZE WHEN: read-heavy dashboards, sub-100ms response needed,
  data freshness of 15min+ acceptable. NEVER for OLTP source-of-truth.

Section 8

Interview & Career Preparation

Q1: What is normalization? Why is it needed?

Model Answer: Normalization is the process of organizing a database to reduce redundancy and eliminate anomalies (insertion, update, deletion). It decomposes large tables into smaller, well-structured tables connected by foreign keys. Needed because: without it, the same fact is stored multiple times, leading to inconsistency when one copy is updated but others aren't. Example: storing department HOD in every student row — changing the HOD requires updating hundreds of rows.

Q2: Explain 1NF, 2NF, 3NF with examples.

Model Answer: 1NF: every cell is atomic (one value). Fix: no comma-separated phone numbers — use a separate table. 2NF: no partial dependencies — non-prime attributes depend on the FULL composite key. Fix: student_name depends only on student_id, not {student_id, course_id} — move to students table. 3NF: no transitive dependencies — non-prime doesn't determine non-prime. Fix: dept → dept_hod is transitive through student_id → dept → dept_hod — create departments table.

Q3: What is BCNF? How does it differ from 3NF?

Model Answer: BCNF requires that for every non-trivial FD X→Y, X must be a superkey. No exceptions. 3NF has an exception: if Y is a prime attribute (part of some candidate key), the FD is allowed even if X isn't a superkey. BCNF is stricter — eliminates all FD-based redundancy. Trade-off: BCNF decomposition may lose some FDs (can't be checked in a single table), while 3NF always preserves all FDs.

Q4: How do you find candidate keys from FDs?

Model Answer: Step 1: Identify attributes that appear ONLY on the left side of FDs — they must be in every candidate key. Step 2: Compute their closure. If closure = all attributes, they form a candidate key. Step 3: If not, add remaining attributes one by one, compute closure each time. Step 4: Check minimality — remove each attribute and verify the remaining is still a superkey. If removing any attribute breaks it, the set is a minimal candidate key.

Q5: What is a lossless-join decomposition?

Model Answer: A decomposition of R into R1 and R2 is lossless if JOINing R1 and R2 gives back exactly R — no extra (spurious) tuples. Test: the common attributes (R1 ∩ R2) must be a superkey of at least one of the decomposed tables. If they're not, the JOIN produces extra rows that weren't in the original table. Always verify lossless property when decomposing.

Q6: What is the 3NF synthesis algorithm?

Model Answer: Step 1: Find minimal cover of FDs (decompose RHS, remove redundant FDs, remove extraneous attributes). Step 2: Create a table for each FD — left side becomes PK. Step 3: Merge tables with the same determinant. Step 4: If no table contains a candidate key, add a table with just the CK. This always produces a lossless, dependency-preserving 3NF decomposition.

Q7: When should you denormalize?

Model Answer: Denormalize when: (1) Read performance is critical (dashboards, search pages needing sub-100ms). (2) The query involves many JOINs across large tables. (3) Slight data staleness is acceptable (15-minute-old data for reports). Techniques: materialized views, computed columns, read replicas with denormalized schema. Rule: always normalize first (source of truth), then denormalize strategically for read performance. Never skip normalization for OLTP systems.

Q8: What are Armstrong's axioms?

Model Answer: Three core axioms: (1) Reflexivity: if Y ⊆ X then X→Y (trivial). (2) Augmentation: if X→Y then XZ→YZ (adding attributes to both sides). (3) Transitivity: if X→Y and Y→Z then X→Z. These are sound (only derive valid FDs) and complete (can derive ALL valid FDs). Derived rules: Union (X→Y, X→Z ⇒ X→YZ), Decomposition (X→YZ ⇒ X→Y, X→Z), Pseudo-transitivity (X→Y, WY→Z ⇒ WX→Z).

Q9: Can you give a real-world example of each normal form violation?

Model Answer: 1NF violation: Excel sheet with "Phone: 9876,9123" in one cell — not atomic. 2NF violation: University table with {student_id, course_id} as PK, but student_name depends only on student_id (partial). 3NF violation: Employee table where emp_id → dept → dept_location (transitive — dept_location is a fact about dept, not about employee). BCNF violation: Advisor table where advisor → subject but advisor isn't a superkey (rare, involves overlapping candidate keys).

Q10: Normalize vs Denormalize — how do companies like Flipkart handle this?

Model Answer: Flipkart uses a dual-model architecture: (1) Normalized PostgreSQL database for OLTP — order placement, payment processing, inventory updates. Full 3NF/BCNF. ACID transactions. (2) Denormalized data warehouse (BigQuery/Redshift) for OLAP — sales analytics, recommendation engine, reporting. Star schema with fact and dimension tables. The normalized system feeds the denormalized system via ETL/streaming pipelines. This is the industry-standard CQRS pattern.

🎓 GATE CS Normalization Strategy

Always start by computing closures and finding candidate keys
Classify attributes as prime/non-prime before checking NFs
Check NFs bottom-up: 1NF → 2NF → 3NF → BCNF. Stop at the first failure.
Practice: 15-20 problems from previous GATE papers (2015-2024 have normalization every year)