Database Management Systems

Unit 4: Normalization

Functional Dependencies, Normal Forms (1NF โ†’ BCNF), Decomposition & Denormalization โ€” the science of designing databases that don't break.

๐Ÿข Oracle & PostgreSQL  |  ๐Ÿ“ 15 MCQs (Bloom's)  |  ๐Ÿ”ฌ 5 Lab Exercises  |  ๐Ÿ’ผ Interview Prep

Section 1

Why This Chapter Pays Your Salary

Normalization separates database coders from database architects. A poorly normalized schema causes data corruption, storage waste, and query nightmares. A well-normalized schema is elegant, consistent, and maintainable. GATE CS dedicates 5-8 marks to normalization every year. Every DBA interview at TCS, Infosys, Oracle, and product companies asks normalization questions.

๐Ÿข Industry Snapshot

SBI Core Banking โ€” When SBI migrated from legacy to CBS (Core Banking Solution), normalization consultants spent 6 months redesigning the schema. The original system stored customer address in 15 different tables โ€” a classic redundancy nightmare. After normalization to 3NF, the same data lived in ONE address table referenced by FK everywhere. Result: 60% reduction in storage, zero data inconsistency.

Flipkart โ€” Their product catalog is normalized to 3NF for the source-of-truth (OLTP) database. But their search/display layer is intentionally denormalized โ€” product name, price, image URL, seller name, and rating all flattened into one table for sub-100ms page loads. They normalize for correctness and denormalize for speed.

๐Ÿ‡ฎ๐Ÿ‡ณ SBI๐Ÿ‡ฎ๐Ÿ‡ณ Flipkart๐Ÿ‡ฎ๐Ÿ‡ณ TCS๐Ÿ‡ฎ๐Ÿ‡ณ IRCTC๐Ÿ‡ฎ๐Ÿ‡ณ RazorpayGATE CS
E.F. Codd (IBM) introduced 1NF, 2NF, and 3NF in 1970-72. Boyce and Codd together defined BCNF in 1974. 4NF (by Ronald Fagin, 1977) and 5NF (also Fagin, 1979) handle rare multi-valued and join dependencies. In practice, 3NF/BCNF is sufficient for 99% of real-world databases.
Section 2

Learning Outcomes โ€” Bloom's Taxonomy

Bloom's LevelOutcome Statement
L1 โ€” RememberDefine 1NF, 2NF, 3NF, BCNF; list Armstrong's axioms; recall the definition of functional dependency, candidate key, and prime attribute
L2 โ€” UnderstandExplain why update anomalies occur in unnormalized tables; describe the difference between partial and transitive dependencies; explain lossy vs lossless decomposition
L3 โ€” ApplyCompute attribute closure, find candidate keys from a set of FDs, determine the highest normal form of a given relation, and decompose a relation to 3NF/BCNF
L4 โ€” AnalyzeAnalyze whether a decomposition is lossless and dependency-preserving; compare 3NF and BCNF trade-offs
L5 โ€” EvaluateJustify when to stop at 3NF vs pursuing BCNF; evaluate denormalization decisions for performance-critical Indian enterprise systems
L6 โ€” CreateTake a raw unnormalized dataset and design a complete normalized schema (BCNF) with SQL implementation and sample data
Section 3

Concept Explanations

3.1 The Problem โ€” Update Anomalies

๐Ÿ“Œ Why Bad Schema Design Destroys Data

๐Ÿ“Œ THE VILLAIN: UNNORMALIZED TABLE

Consider a university stores everything in ONE table:

Unnormalized Table: student_courses
student_id | student_name | dept     | dept_hod     | course_id | course_name      | instructor  | grade
101        | Rahul        | CSE      | Dr. Sharma   | CS301     | DBMS             | Dr. Patel   | A
101        | Rahul        | CSE      | Dr. Sharma   | CS302     | OS               | Dr. Kumar   | B+
102        | Priya        | ECE      | Dr. Reddy    | EC201     | Signals          | Dr. Singh   | A+
103        | Amit         | CSE      | Dr. Sharma   | CS301     | DBMS             | Dr. Patel   | B
103        | Amit         | CSE      | Dr. Sharma   | CS303     | Networks         | Dr. Gupta   | A
๐Ÿ”ฅ THREE DEADLY ANOMALIES
AnomalyProblemExample
Insertion AnomalyCan't insert partial dataNew department "AI" created but no students enrolled yet โ€” can't INSERT because student_id (PK) can't be NULL. A department can't exist without a student!
Update AnomalyMust update multiple rows for one factCSE HOD changes from Dr. Sharma to Dr. Verma โ€” must update ALL rows where dept='CSE'. Miss one? Now CSE has TWO different HODs in the database. Data inconsistency.
Deletion AnomalyDeleting one fact accidentally removes anotherPriya (102) drops her only course. Delete her row โ†’ we LOSE the fact that ECE's HOD is Dr. Reddy. Department information vanishes with the student.
๐Ÿ’ก THE SOLUTION: NORMALIZATION

Decompose the one big table into multiple smaller tables, each storing exactly one fact. Students in one table, departments in another, courses in another, enrollments linking them. No redundancy, no anomalies.

3.2 Functional Dependencies (FDs)

๐Ÿ“Œ Functional Dependency โ€” "X Determines Y"

๐Ÿ“Œ WHAT IT IS

A functional dependency X โ†’ Y means: if two tuples have the same value for attribute(s) X, they MUST have the same value for attribute(s) Y. X determines Y. X is the determinant; Y is the dependent.

๐ŸŒ REAL-WORLD ANALOGY

Aadhaar number โ†’ Name, DOB, Address. If you know someone's Aadhaar number, you can determine their name, DOB, and address. Two people can't have the same Aadhaar number with different names. That's a functional dependency.

โš™๏ธ EXAMPLES FROM STUDENT_COURSES TABLE
Functional Dependencies
student_id โ†’ student_name, dept         (student ID determines name and dept)
dept โ†’ dept_hod                         (department determines its HOD)
course_id โ†’ course_name, instructor     (course ID determines course details)
{student_id, course_id} โ†’ grade         (student + course determines grade)

NON-dependencies:
student_name โ†› student_id              (two students can have same name)
dept โ†› student_id                       (department has many students)
๐Ÿ“Œ TYPES OF FDs
TypeDefinitionExample
Full FDY depends on ALL of X (not a subset){student_id, course_id} โ†’ grade. Grade depends on BOTH โ€” can't remove either.
Partial FDY depends on only PART of a composite key{student_id, course_id} โ†’ student_name. student_name depends ONLY on student_id, not the full key. Violates 2NF.
Transitive FDX โ†’ Y and Y โ†’ Z, so X โ†’ Z transitivelystudent_id โ†’ dept and dept โ†’ dept_hod, so student_id โ†’ dept_hod transitively. Violates 3NF.
Trivial FDY is a subset of X{student_id, name} โ†’ student_id. Always true โ€” trivial, not interesting.

3.3 Attribute Closure & Armstrong's Axioms

๐Ÿ“Œ Attribute Closure โ€” Finding What X Can Determine

๐Ÿ“Œ WHAT IT IS

The closure of X (written Xโบ) is the set of ALL attributes that can be determined from X using the given FDs. It's the key tool for finding candidate keys and testing normal forms.

โš™๏ธ STEP-BY-STEP EXAMPLE
Closure Computation
Given: R(A, B, C, D, E)
FDs: { A โ†’ B, B โ†’ C, A โ†’ D, D โ†’ E }

Find Aโบ (closure of A):
  Step 0: Aโบ = {A}             (start with A itself)
  Step 1: A โ†’ B  โ‡’ Aโบ = {A, B}
  Step 2: B โ†’ C  โ‡’ Aโบ = {A, B, C}
  Step 3: A โ†’ D  โ‡’ Aโบ = {A, B, C, D}
  Step 4: D โ†’ E  โ‡’ Aโบ = {A, B, C, D, E}

Aโบ = {A, B, C, D, E} = ALL attributes!
โˆด A is a CANDIDATE KEY (can determine every attribute)

Find Bโบ:
  Bโบ = {B} โ†’ B โ†’ C โ‡’ {B, C}
  No more FDs apply. Bโบ = {B, C}
  B is NOT a candidate key (doesn't determine A, D, E)
๐Ÿ“Œ ARMSTRONG'S AXIOMS (for GATE CS)
AxiomRuleFormalExample
ReflexivityIf Y โІ X, then X โ†’ Y{A,B} โ†’ AA set always determines its subsets (trivial)
AugmentationIf X โ†’ Y, then XZ โ†’ YZA โ†’ B โ‡’ AC โ†’ BCAdding the same attribute to both sides preserves FD
TransitivityIf X โ†’ Y and Y โ†’ Z, then X โ†’ ZA โ†’ B, B โ†’ C โ‡’ A โ†’ CChain of dependencies

Derived rules (from the 3 axioms):

RuleFormalExample
UnionX โ†’ Y, X โ†’ Z โ‡’ X โ†’ YZA โ†’ B, A โ†’ C โ‡’ A โ†’ BC
DecompositionX โ†’ YZ โ‡’ X โ†’ Y, X โ†’ ZA โ†’ BC โ‡’ A โ†’ B, A โ†’ C
Pseudo-transitivityX โ†’ Y, WY โ†’ Z โ‡’ WX โ†’ ZA โ†’ B, CB โ†’ D โ‡’ CA โ†’ D

Finding Candidate Keys โ€” Algorithm

Algorithm: Finding Candidate Keys
Given: R(A, B, C, D, E, F), FDs: {AB โ†’ C, C โ†’ D, D โ†’ E, CF โ†’ B}

Step 1: Find attributes that appear ONLY on LEFT side of FDs โ†’ must be in every key
  A appears only on left (or nowhere on right) โ†’ A must be in every key
  F appears only on left โ†’ F must be in every key
  
Step 2: Find closure of mandatory attributes
  {A, F}โบ:
    Start: {A, F}
    No FD with just A or F on left gives new attrs...
    {AF}โบ = {A, F}  โ† Not all attributes! Need more.

Step 3: Add remaining attributes one by one
  Try {A, F, B}โบ:
    AB โ†’ C โ‡’ {A, B, C, F}
    C โ†’ D โ‡’ {A, B, C, D, F}
    D โ†’ E โ‡’ {A, B, C, D, E, F} = ALL โœ“
  {A, B, F} is a superkey.
  
  Try {A, F, C}โบ:
    C โ†’ D โ‡’ {A, C, D, F}
    D โ†’ E โ‡’ {A, C, D, E, F}
    CF โ†’ B โ‡’ {A, B, C, D, E, F} = ALL โœ“
  {A, C, F} is a superkey.

Step 4: Check minimality (no proper subset is a superkey)
  {A, B, F}: Can we remove B? {A,F}โบ = {A,F} โ€” NOT all. So B is needed. โœ“ Minimal.
  {A, C, F}: Can we remove C? {A,F}โบ = {A,F} โ€” NOT all. So C is needed. โœ“ Minimal.

โˆด Candidate Keys: {A, B, F} and {A, C, F}
  Prime attributes: A, B, C, F (part of at least one CK)
  Non-prime attributes: D, E

3.4 Normal Forms โ€” The Normalization Ladder

Normal Form Hierarchy
UNF (Unnormalized)
 โ†“  Remove repeating groups / make atomic
1NF (First Normal Form)
 โ†“  Remove partial dependencies
2NF (Second Normal Form)
 โ†“  Remove transitive dependencies
3NF (Third Normal Form)
 โ†“  Every determinant is a candidate key
BCNF (Boyce-Codd Normal Form)
 โ†“  Remove multi-valued dependencies (rare)
4NF
 โ†“  Remove join dependencies (very rare)
5NF

๐Ÿ’ก Industry standard: 3NF or BCNF is sufficient for 99% of databases.
1NF

First Normal Form โ€” Atomic Values

Rule: Every cell must contain exactly one value (atomic). No repeating groups, no arrays, no comma-separated lists, no nested tables.

โŒ Violates 1NF:
NOT 1NF
student_id | name  | phone_numbers          | courses
101        | Rahul | 9876543210, 9123456789 | CS301, CS302, CS303
102        | Priya | 9888777666             | EC201

Problems: phone_numbers has multiple values in one cell. courses is a repeating group.

โœ… Fixed โ€” 1NF:
1NF โ€” Decomposed
Table: students               Table: student_phones         Table: enrollments
student_id | name              student_id | phone            student_id | course_id
101        | Rahul             101        | 9876543210       101        | CS301
102        | Priya             101        | 9123456789       101        | CS302
                               102        | 9888777666       101        | CS303
                                                             102        | EC201

"Store phone numbers as VARCHAR(100) with commas: '9876543210,9123456789'." This technically passes 1NF by hiding the violation inside a string, but it's terrible design. You can't search individual phones, can't enforce uniqueness, can't index them. Always use a separate table for multi-valued attributes.

2NF

Second Normal Form โ€” No Partial Dependencies

Rule: Must be in 1NF AND every non-prime attribute must be fully functionally dependent on the ENTIRE primary key. No non-prime attribute should depend on only PART of a composite key.

When does 2NF matter? Only when the PK is composite (2+ columns). If PK is a single column, 1NF โ‡’ 2NF automatically.

โŒ Violates 2NF:
NOT 2NF
Table: enrollments (PK: {student_id, course_id})
student_id | course_id | student_name | dept | course_name | instructor | grade

FDs:
  {student_id, course_id} โ†’ grade         โ† Full dependency โœ“
  student_id โ†’ student_name, dept          โ† PARTIAL dependency! โœ—
  course_id โ†’ course_name, instructor      โ† PARTIAL dependency! โœ—

student_name depends only on student_id, not the full PK.
โœ… Fixed โ€” 2NF (remove partial dependencies):
2NF โ€” Decomposed
Table: students (PK: student_id)           Table: courses (PK: course_id)
student_id | student_name | dept            course_id | course_name | instructor
101        | Rahul        | CSE             CS301     | DBMS        | Dr. Patel
102        | Priya        | ECE             CS302     | OS          | Dr. Kumar
103        | Amit         | CSE             CS303     | Networks    | Dr. Gupta

Table: enrollments (PK: {student_id, course_id})
student_id | course_id | grade
101        | CS301     | A
101        | CS302     | B+
102        | EC201     | A+

Each non-prime attribute now depends on the FULL PK of its table. โœ“
3NF

Third Normal Form โ€” No Transitive Dependencies

Rule: Must be in 2NF AND no non-prime attribute should depend on another non-prime attribute. Formally: For every FD X โ†’ Y, at least one must be true: (1) X is a superkey, OR (2) Y is a prime attribute (part of some candidate key).

โŒ Violates 3NF:
NOT 3NF
Table: students (PK: student_id)
student_id | student_name | dept | dept_hod

FDs:
  student_id โ†’ student_name, dept, dept_hod    โ† OK (student_id is PK)
  dept โ†’ dept_hod                              โ† TRANSITIVE! dept is non-prime, dept_hod is non-prime.

Chain: student_id โ†’ dept โ†’ dept_hod
This is a transitive dependency: student_id determines dept_hod THROUGH dept.

Problem: If Dr. Sharma leaves as CSE HOD, must update EVERY CSE student's row.
โœ… Fixed โ€” 3NF (remove transitive dependencies):
3NF โ€” Decomposed
Table: students (PK: student_id)      Table: departments (PK: dept)
student_id | student_name | dept       dept | dept_hod
101        | Rahul        | CSE        CSE  | Dr. Sharma
102        | Priya        | ECE        ECE  | Dr. Reddy
103        | Amit         | CSE

Now dept โ†’ dept_hod is in its OWN table where dept IS the PK (superkey). โœ“
Updating the HOD requires changing ONE row in departments table. Zero redundancy.
3NF Definition (Exam-ready): A relation R is in 3NF if and only if, for every non-trivial FD X โ†’ A in R, either X is a superkey of R, or A is a prime attribute (A is part of some candidate key). The "or A is prime" clause is what differentiates 3NF from BCNF.
BC

Boyce-Codd Normal Form โ€” The Strictest Practical Form

Rule: For every non-trivial FD X โ†’ Y, X MUST be a superkey. No exceptions. (3NF allows Y to be a prime attribute even if X isn't a superkey; BCNF doesn't.)

When does 3NF โ‰  BCNF? (The rare case)
3NF but NOT BCNF
Table: student_advisor (describes: students, subjects, advisors)
student_id | subject    | advisor

FDs:
  {student_id, subject} โ†’ advisor    (each student has one advisor per subject)
  advisor โ†’ subject                  (each advisor advises only one subject)

Candidate Keys: {student_id, subject} and {student_id, advisor}
Prime attributes: student_id, subject, advisor โ€” ALL are prime!

Check 3NF: advisor โ†’ subject. advisor is NOT a superkey, BUT subject IS a prime attribute.
  โˆด Satisfies 3NF (the "or A is prime" exception saves it). โœ“

Check BCNF: advisor โ†’ subject. advisor is NOT a superkey.
  โˆด Violates BCNF. โœ—
โœ… BCNF Decomposition:
BCNF โ€” Decomposed
Table: advisor_subject (PK: advisor)       Table: student_advisor (PK: {student_id, advisor})
advisor    | subject                        student_id | advisor
Dr. Patel  | DBMS                           101        | Dr. Patel
Dr. Kumar  | OS                             101        | Dr. Kumar
Dr. Singh  | Signals                        102        | Dr. Singh

Now every determinant IS a superkey in its table. โœ“

โš ๏ธ TRADE-OFF: The FD {student_id, subject} โ†’ advisor is LOST!
It can't be checked within a single table anymore.
This is why sometimes we STOP at 3NF โ€” it preserves all dependencies.

3NF vs BCNF โ€” When to Choose Which

Feature3NFBCNF
StrictnessAllows non-superkey determinant IF dependent is primeEvery determinant MUST be a superkey. No exceptions.
Dependency preservationโœ… Always possible to preserve all FDsโŒ May lose some FDs during decomposition
Lossless decompositionโœ… Always achievableโœ… Always achievable
RedundancySlight redundancy possible (rare)Zero redundancy from FDs
Industry practiceโœ… Most databases aim for 3NF minimumโœ… Preferred when dependency loss is acceptable
Industry rule of thumb: Start with BCNF. If decomposition loses important FDs (that you need to enforce via constraints), fall back to 3NF. For 99% of real-world schemas, 3NF and BCNF are identical โ€” the rare case where they differ involves overlapping composite candidate keys.

3.5 Decomposition โ€” Lossless & Dependency-Preserving

๐Ÿ“Œ Decomposition Rules โ€” Don't Lose Data or Dependencies

๐Ÿ“Œ LOSSLESS-JOIN DECOMPOSITION

When decomposing R into R1 and R2, the decomposition is lossless if and only if: R1 โˆฉ R2 โ†’ R1 or R1 โˆฉ R2 โ†’ R2 (the common attributes must be a superkey of at least one of the resulting tables).

Lossless Test
R(student_id, student_name, dept, dept_hod)

Decompose into:
  R1(student_id, student_name, dept)   โ€” students table
  R2(dept, dept_hod)                   โ€” departments table

Common attributes: R1 โˆฉ R2 = {dept}
Is {dept} a superkey of R2? R2 has PK = dept. Yes! โœ“
โˆด Decomposition is LOSSLESS.

If we JOIN R1 and R2 on dept, we get back EXACTLY the original data.
No extra spurious tuples are generated.
๐Ÿ“Œ LOSSY DECOMPOSITION (BAD!)
Lossy Decomposition โ€” WRONG!
R(A, B, C) with FD: A โ†’ B

Decompose into:
  R1(A, C)
  R2(B, C)

Common: R1 โˆฉ R2 = {C}
Is C a key of R1 or R2? NO! C doesn't determine anything.
โˆด This is LOSSY โ€” JOINing R1 and R2 on C produces EXTRA rows (spurious tuples).

NEVER do lossy decomposition. Always verify the lossless condition.
๐Ÿ“Œ DEPENDENCY-PRESERVING DECOMPOSITION

A decomposition preserves dependencies if every FD can be checked within a single resulting table (without needing to JOIN tables). 3NF decomposition is always dependency-preserving. BCNF decomposition may NOT be.

3NF Decomposition Algorithm (Synthesis)

Algorithm: 3NF Decomposition
Input: R(A, B, C, D, E), FDs: {A โ†’ BC, C โ†’ D, D โ†’ E}

Step 1: Find minimal cover (canonical form) of FDs
  Already minimal: {A โ†’ B, A โ†’ C, C โ†’ D, D โ†’ E}
  (Decompose right sides, remove redundant FDs, remove extraneous attributes)

Step 2: Create a table for each FD
  T1(A, B)    from A โ†’ B
  T2(A, C)    from A โ†’ C
  T3(C, D)    from C โ†’ D
  T4(D, E)    from D โ†’ E

Step 3: Merge tables with same determinant (left side)
  T1 and T2 have same determinant A โ†’ merge:
  T12(A, B, C)  PK = A

  Final tables: T12(A, B, C), T3(C, D), T4(D, E)

Step 4: Ensure a candidate key is present in at least one table
  CK = {A} (since Aโบ = {A,B,C,D,E})
  A is in T12 โœ“

Result: {R1(A,B,C), R2(C,D), R3(D,E)} โ€” all in 3NF, lossless, dependency-preserving.

3.6 Denormalization โ€” When to Break the Rules

๐Ÿ“Œ Denormalization โ€” Trading Correctness for Speed

๐Ÿ“Œ WHAT IT IS

Denormalization is the intentional introduction of redundancy into a normalized schema to improve read performance. It's NOT about skipping normalization โ€” it's about normalizing first, then strategically de-normalizing specific tables for speed.

๐Ÿข WHEN TO DENORMALIZE โ€” Indian Industry Examples
ScenarioNormalized DesignDenormalized DesignWhy
Flipkart product listing5-table JOIN: products โ†’ categories โ†’ sellers โ†’ images โ†’ ratingsSingle product_listing table with name, price, image_url, seller_name, avg_ratingHomepage loads in 100ms instead of 500ms. Updated via background job.
SBI account balanceSUM(transactions) per accountCached current_balance column in accounts table, updated on each transactionBalance lookup is O(1) instead of scanning millions of transactions.
IRCTC seat availabilityCOUNT available seats from bookings tablePre-computed available_seats counter per train-date-classTatkal booking page loads in milliseconds during peak.
โš ๏ธ RISKS
  • Data inconsistency if redundant copies aren't kept in sync
  • Increased storage (minor cost in 2025)
  • More complex write operations (must update all copies)
  • Must handle sync failures gracefully
Modern denormalization techniques: (1) Materialized views โ€” DBMS handles the refresh automatically. (2) Generated/computed columns โ€” ALTER TABLE orders ADD total_amount GENERATED ALWAYS AS (quantity * unit_price) STORED (PostgreSQL) โ€” automatically maintained, no manual sync. (3) JSONB columns โ€” Store pre-computed summaries as JSON alongside normalized data. (4) CQRS pattern โ€” Separate read model (denormalized) from write model (normalized). Used by Razorpay, Swiggy, and modern Indian fintechs.

Section 4

Industry Problems

๐Ÿข Industry Problem #1 โ€” Normalize a University ERP Table

Scenario: A university stores everything in one table:

UNF Table
student_id | name | dept | dept_hod | course_id | course_name | instructor | credits | grade | hostel | room

FDs:

student_id โ†’ name, dept, hostel, room
dept โ†’ dept_hod
course_id โ†’ course_name, instructor, credits
{student_id, course_id} โ†’ grade
hostel โ†’ warden  (each hostel has one warden)

Task: Normalize step-by-step from UNF โ†’ 1NF โ†’ 2NF โ†’ 3NF โ†’ BCNF. Show each table at each stage.

๐Ÿ’ก Complete Solution
Step-by-step Normalization
1NF: Already 1NF (atomic values, no repeating groups assumed)

2NF: Remove partial dependencies from PK = {student_id, course_id}
  Partial FDs:
    student_id โ†’ name, dept, hostel, room    (depends on PART of PK)
    course_id โ†’ course_name, instructor, credits  (depends on PART of PK)
  
  Decompose:
    students(student_id, name, dept, hostel, room)  PK: student_id
    courses(course_id, course_name, instructor, credits)  PK: course_id
    enrollments(student_id, course_id, grade)  PK: {student_id, course_id}

3NF: Remove transitive dependencies
  In students: student_id โ†’ dept โ†’ dept_hod (transitive โ€” but dept_hod 
    isn't in students anymore if we didn't include it. Let's assume 
    the original had dept_hod in students table)
    
    students(student_id, name, dept, hostel, room) โ€” dept_hod removed
    departments(dept, dept_hod)  PK: dept
    
  In students: student_id โ†’ hostel โ†’ warden (transitive if warden exists)
    students(student_id, name, dept, hostel, room)
    hostels(hostel, warden)  PK: hostel

BCNF: Check โ€” is every determinant a superkey?
  students: student_id โ†’ all. student_id is PK. โœ“
  departments: dept โ†’ dept_hod. dept is PK. โœ“
  courses: course_id โ†’ all. course_id is PK. โœ“
  enrollments: {student_id, course_id} โ†’ grade. Composite PK. โœ“
  hostels: hostel โ†’ warden. hostel is PK. โœ“
  All in BCNF! โœ“
SQL โ€” Final Normalized Schema
CREATE TABLE departments (
    dept       VARCHAR2(10)  PRIMARY KEY,
    dept_hod   VARCHAR2(50)
);
CREATE TABLE hostels (
    hostel     VARCHAR2(20)  PRIMARY KEY,
    warden     VARCHAR2(50)
);
CREATE TABLE students (
    student_id NUMBER(10)    PRIMARY KEY,
    name       VARCHAR2(50)  NOT NULL,
    dept       VARCHAR2(10)  REFERENCES departments(dept),
    hostel     VARCHAR2(20)  REFERENCES hostels(hostel),
    room       VARCHAR2(10)
);
CREATE TABLE courses (
    course_id  VARCHAR2(10)  PRIMARY KEY,
    course_name VARCHAR2(50) NOT NULL,
    instructor VARCHAR2(50),
    credits    NUMBER(1)     CHECK (credits BETWEEN 1 AND 5)
);
CREATE TABLE enrollments (
    student_id NUMBER(10)    REFERENCES students(student_id),
    course_id  VARCHAR2(10)  REFERENCES courses(course_id),
    grade      CHAR(2),
    PRIMARY KEY (student_id, course_id)
);

๐Ÿข Industry Problem #2 โ€” GATE CS: Find Candidate Keys & Highest NF

Scenario: R(A, B, C, D, E) with FDs: {AB โ†’ C, C โ†’ D, D โ†’ E, E โ†’ A}. Find all candidate keys and the highest normal form.

๐Ÿ’ก Complete Solution
Solution
Step 1: Attribute analysis
  B appears ONLY on LEFT side โ†’ B must be in every candidate key.

Step 2: Find closures starting with B + each other attribute
  {A, B}โบ: ABโ†’C โ‡’ {A,B,C}. Cโ†’D โ‡’ {A,B,C,D}. Dโ†’E โ‡’ {A,B,C,D,E} โœ“ ALL!
  {B, C}โบ: Cโ†’D โ‡’ {B,C,D}. Dโ†’E โ‡’ {B,C,D,E}. Eโ†’A โ‡’ {A,B,C,D,E} โœ“ ALL!
  {B, D}โบ: Dโ†’E โ‡’ {B,D,E}. Eโ†’A โ‡’ {A,B,D,E}. ABโ†’C โ‡’ {A,B,C,D,E} โœ“ ALL!
  {B, E}โบ: Eโ†’A โ‡’ {A,B,E}. ABโ†’C โ‡’ {A,B,C,E}. Cโ†’D โ‡’ {A,B,C,D,E} โœ“ ALL!

  All are superkeys. Check minimality:
  Bโบ = {B} โ€” NOT all. So B alone is not a key.
  โˆด All of {AB, BC, BD, BE} are CANDIDATE KEYS (minimal).

Step 3: Prime and non-prime attributes
  Prime: A, B, C, D, E โ€” ALL are prime! (every attribute is in some CK)
  Non-prime: NONE

Step 4: Check normal forms
  1NF: โœ“ (assumed atomic)
  2NF: Check partial dependencies on composite CKs.
    ABโ†’C: full (C depends on all of AB) โœ“
    But Cโ†’D: C is part of CK {BC}. D depends on just C, not full BC.
    However, D is a prime attribute โ€” 2NF only restricts NON-prime partials.
    Since ALL attributes are prime, 2NF is satisfied. โœ“
  3NF: For each FD, check: is LHS a superkey OR is RHS prime?
    ABโ†’C: AB is superkey โœ“
    Cโ†’D: C is NOT a superkey. But D IS prime. โœ“ (3NF exception)
    Dโ†’E: D is NOT a superkey. But E IS prime. โœ“
    Eโ†’A: E is NOT a superkey. But A IS prime. โœ“
    โˆด 3NF satisfied. โœ“
  BCNF: For each FD, is LHS a superkey?
    Cโ†’D: C is NOT a superkey (Cโบ = {C,D,E,A} โ‰  all, missing B). โœ—
    โˆด NOT in BCNF. โœ—

Answer: Candidate Keys = {AB, BC, BD, BE}. Highest NF = 3NF (not BCNF).

๐Ÿข Industry Problem #3 โ€” E-Commerce Denormalization Design

Scenario: Flipkart's normalized product catalog (5 tables: products, categories, sellers, images, reviews) takes 200ms for a JOIN query. The homepage needs <50ms response time. Design a denormalization strategy.

๐Ÿ’ก Solution
SQL โ€” Denormalized Read Table
-- Normalized source tables (OLTP โ€” source of truth)
-- products, categories, sellers, product_images, reviews

-- Denormalized materialized view (for read performance)
CREATE MATERIALIZED VIEW mv_product_listing
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND AS
SELECT
    p.product_id, p.product_name, p.price,
    c.category_name,
    s.seller_name,
    (SELECT image_url FROM product_images pi 
     WHERE pi.product_id = p.product_id AND pi.is_primary = 1
     AND ROWNUM = 1) AS primary_image,
    (SELECT ROUND(AVG(rating),1) FROM reviews r 
     WHERE r.product_id = p.product_id) AS avg_rating,
    (SELECT COUNT(*) FROM reviews r 
     WHERE r.product_id = p.product_id) AS review_count
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN sellers s ON p.seller_id = s.seller_id;

-- Refresh every 15 minutes via scheduled job
-- Homepage queries hit mv_product_listing (single table, no JOINs)
-- Write operations (new product, review) go to normalized tables

Architecture: Writes โ†’ Normalized OLTP tables (3NF). Reads โ†’ Denormalized materialized view. This is the standard CQRS (Command Query Responsibility Segregation) pattern used by Flipkart, Amazon, and Swiggy.

Section 5

Lab Exercises

Exercise 1: Identify Anomalies & FDs

โฑ 30 minutes๐ŸŸข Beginner

Given Table: hospital(doctor_id, doctor_name, dept, dept_phone, patient_id, patient_name, diagnosis, bill_amount)

Tasks:

  1. Identify all functional dependencies
  2. Give one example each of insertion, update, and deletion anomaly
  3. Identify the candidate key(s)
  4. Classify each FD as full, partial, or transitive

Exercise 2: Compute Attribute Closure & Find Candidate Keys

โฑ 40 minutes๐ŸŸก Intermediate

Given: R(A, B, C, D, E, F) with FDs: {A โ†’ B, BC โ†’ D, D โ†’ E, CF โ†’ A}

Tasks:

  1. Compute Aโบ, Bโบ, Cโบ, {BC}โบ, {CF}โบ, {ACF}โบ
  2. Find ALL candidate keys
  3. List prime and non-prime attributes
  4. Determine the highest normal form

Hint: Which attributes appear only on the LEFT side of FDs? They must be in every key.

Exercise 3: Normalize a Railway Booking Table (UNF โ†’ BCNF)

โฑ 50 minutes๐ŸŸก Intermediate

Given UNF table:

bookings(pnr, passenger_name, phone, train_no, train_name, 
  class, seat_no, journey_date, from_station, to_station, fare, status)

FDs: pnr โ†’ all passenger+journey details; train_no โ†’ train_name; {train_no, class} โ†’ fare_per_km; {train_no, journey_date, class, seat_no} โ†’ pnr

Tasks:

  1. Identify all FDs, candidate keys, prime/non-prime attributes
  2. Normalize step-by-step: 1NF โ†’ 2NF โ†’ 3NF โ†’ BCNF
  3. Write CREATE TABLE SQL for each final table
  4. Verify lossless-join property for each decomposition step

Exercise 4: Test Lossless & Dependency-Preserving Decomposition

โฑ 35 minutes๐ŸŸก Intermediate

Given: R(A, B, C, D) with FDs: {A โ†’ B, B โ†’ C, C โ†’ D}

Decompositions to test:

  1. D1: R1(A, B), R2(B, C), R3(C, D) โ€” Is this lossless? Dependency-preserving?
  2. D2: R1(A, B, C), R2(C, D) โ€” Is this lossless? Dependency-preserving?
  3. D3: R1(A, C), R2(B, D) โ€” Is this lossless? Dependency-preserving?

For each: Apply the lossless-join test and check if all FDs can be verified in a single table.

Exercise 5: Complete Schema Design โ€” E-Commerce (UNF โ†’ 3NF + Denormalization)

โฑ 60 minutes๐Ÿ”ด Advanced

Given UNF: A single spreadsheet with: order_id, order_date, customer_name, customer_phone, customer_city, product_name, category, seller_name, seller_gst, quantity, unit_price, discount_pct, gst_pct, total_amount, payment_mode, delivery_status

Tasks:

  1. Identify all FDs and candidate keys
  2. Normalize to 3NF/BCNF โ€” create at least 6 tables
  3. Write complete CREATE TABLE SQL with all constraints
  4. INSERT 10+ sample rows per table with realistic Indian data
  5. Create one materialized view for a "daily sales dashboard" (denormalization)
Section 6

MCQ Assessment Bank โ€” 15 Questions

Hover to reveal answer and explanation.

Q1

A functional dependency X โ†’ Y means:

  1. X and Y always have the same value
  2. For any two tuples with the same X value, they must have the same Y value โ€” X uniquely determines Y
  3. Y uniquely determines X
  4. X and Y are primary keys
โœ… B. FD X โ†’ Y: if two rows agree on X, they MUST agree on Y. It's one-directional โ€” X determines Y, not the reverse. Example: student_id โ†’ name (same student_id always maps to same name). X need not be a primary key โ€” any attribute can be a determinant.
๐Ÿข GATE CS defines FDs formally. Know this definition precisely.
L1 โ€” RememberFD
Q2

Which of Armstrong's axioms states: if X โ†’ Y and Y โ†’ Z, then X โ†’ Z?

  1. Reflexivity
  2. Augmentation
  3. Transitivity
  4. Decomposition
โœ… C. Transitivity. The three axioms: Reflexivity (YโІX โ‡’ Xโ†’Y), Augmentation (Xโ†’Y โ‡’ XZโ†’YZ), Transitivity (Xโ†’Y, Yโ†’Z โ‡’ Xโ†’Z). Decomposition is a derived rule (Xโ†’YZ โ‡’ Xโ†’Y and Xโ†’Z), not a core axiom.
๐Ÿข Armstrong's axioms are guaranteed GATE questions. Memorize all three + derived rules.
L1 โ€” RememberArmstrong
Q3

A relation is in 2NF if:

  1. It is in 1NF and has no repeating groups
  2. It is in 1NF and no non-prime attribute is partially dependent on any candidate key
  3. It is in 1NF and has no transitive dependencies
  4. Every determinant is a superkey
โœ… B. 2NF = 1NF + no partial dependencies. A partial dependency exists when a non-prime attribute depends on only PART of a composite candidate key. If all candidate keys are single-attribute, 1NF automatically implies 2NF. Option C describes 3NF. Option D describes BCNF.
๐Ÿข Know the exact definition of each normal form โ€” exams test precise definitions.
L1 โ€” RememberNormal Forms
Q4

Why does a transitive dependency cause problems?

  1. It doesn't โ€” transitive dependencies are fine
  2. Because a non-key attribute determines another non-key attribute, causing the same fact to be stored redundantly across multiple rows. Updating one copy but not others causes data inconsistency (update anomaly)
  3. It violates 1NF
  4. It causes the database to crash
โœ… B. Example: student_id โ†’ dept โ†’ dept_hod. The HOD fact is repeated in every student row of that department. If CSE has 500 students, "Dr. Sharma is CSE HOD" is stored 500 times. Updating the HOD requires modifying 500 rows โ€” miss one and the database is inconsistent. 3NF eliminates this by creating a separate departments table.
๐Ÿข This is the most intuitive way to explain why normalization matters โ€” use this in interviews.
L2 โ€” UnderstandAnomalies
Q5

What is the difference between 3NF and BCNF?

  1. They are identical
  2. 3NF allows a non-trivial FD Xโ†’A where X is not a superkey, IF A is a prime attribute. BCNF requires X to be a superkey for every non-trivial FD โ€” no exceptions. BCNF is strictly stronger than 3NF
  3. BCNF is weaker than 3NF
  4. 3NF requires all attributes to be prime
โœ… B. The "escape clause" in 3NF: if the dependent attribute (RHS) is prime (part of some candidate key), the FD is allowed even if the determinant isn't a superkey. BCNF has no such exception. In practice, 3NF = BCNF for most schemas. They differ only when there are overlapping composite candidate keys.
๐Ÿข GATE CS 2020, 2022 asked exactly this difference. Know the formal definitions.
L2 โ€” UnderstandNormal Forms
Q6

A decomposition of R into R1 and R2 is lossless if and only if:

  1. R1 and R2 have the same number of columns
  2. R1 โˆฉ R2 is a superkey of R1 or R2 (the common attributes can determine all attributes of at least one of the decomposed tables)
  3. R1 and R2 have no common attributes
  4. All FDs are preserved
โœ… B. For lossless-join: the common attributes (R1 โˆฉ R2) must functionally determine all of R1 or all of R2 โ€” i.e., they must be a superkey of at least one decomposed table. Without this, JOINing R1 and R2 produces spurious (extra) tuples not in the original R. Option D describes dependency preservation, a separate property.
๐Ÿข The lossless-join test is a standard GATE question worth 2-5 marks.
L2 โ€” UnderstandDecomposition
Q7

Given R(A, B, C, D) with FDs: {A โ†’ B, B โ†’ C, A โ†’ D}. Compute Aโบ.

  1. {A}
  2. {A, B}
  3. {A, B, C, D}
  4. {A, D}
โœ… C. Aโบ: Start {A}. Aโ†’B โ‡’ {A,B}. Bโ†’C โ‡’ {A,B,C}. Aโ†’D โ‡’ {A,B,C,D}. Aโบ = {A,B,C,D} = all attributes. Therefore A is a candidate key.
๐Ÿข Closure computation is the most frequently tested algorithm in GATE DBMS.
L3 โ€” ApplyClosure
Q8

Given R(A, B, C, D, E) with FDs: {AB โ†’ C, C โ†’ D, D โ†’ E}. What is the highest normal form?

  1. 1NF
  2. 2NF
  3. 3NF
  4. BCNF
โœ… A. 1NF. Step 1: Find CK. {AB}โบ: ABโ†’C โ‡’ {A,B,C}. Cโ†’D โ‡’ {A,B,C,D}. Dโ†’E โ‡’ {A,B,C,D,E}. CK = {AB}. Step 2: Prime = {A,B}, Non-prime = {C,D,E}. Step 3: Check 2NF. Partial dep: Does A alone or B alone determine any non-prime? Given FDs don't show Aโ†’anything or Bโ†’anything... Actually, AB is the only determinant for C. BUT Cโ†’D: C is non-prime determining D (non-prime) โ€” this is transitive. Dโ†’E: D is non-prime determining E. These violate 3NF. For 2NF: no partial dependencies are present (no single part of AB determines non-prime directly from given FDs). So it's in 2NF but NOT 3NF due to Cโ†’D transitive chain. Wait, let me recheck: 2NF requires no partial deps on CK. ABโ†’C (full), Cโ†’D and Dโ†’E aren't partial on CK. So 2NF โœ“. But Cโ†’D violates 3NF (C is non-prime, D is non-prime, C not superkey). Highest = 2NF.
๐Ÿข Correction: The answer is 2NF (not 1NF). It satisfies 2NF (no partial deps since AB is fully needed for C) but violates 3NF (Cโ†’D where C is non-prime). Always trace through each NF systematically.
L3 โ€” ApplyNormal Forms
Q9

Decompose R(A, B, C, D) with FDs: {A โ†’ B, B โ†’ C, A โ†’ D} into 3NF using the synthesis algorithm.

  1. R1(A, B, C, D) โ€” no decomposition needed
  2. R1(A, B, D) from merging Aโ†’B and Aโ†’D; R2(B, C) from Bโ†’C. CK {A} is in R1. Lossless (R1โˆฉR2={B}, B is key of R2) and dependency-preserving (all FDs in a single table)
  3. R1(A, B), R2(B, C), R3(A, D) โ€” three tables
  4. R1(A, C), R2(B, D)
โœ… B. Synthesis algorithm: (1) Minimal cover: {Aโ†’B, Aโ†’D, Bโ†’C}. (2) Create table per FD: T1(A,B), T2(A,D), T3(B,C). (3) Merge same determinant: T1+T2 โ†’ R1(A,B,D). (4) CK={A} is in R1 โœ“. Result: R1(A,B,D) PK=A, R2(B,C) PK=B. Lossless: R1โˆฉR2={B}, Bโ†’C so B is key of R2 โœ“. All FDs preserved: Aโ†’B in R1, Aโ†’D in R1, Bโ†’C in R2. โœ“
๐Ÿข The 3NF synthesis algorithm is a procedural GATE question. Practice 5+ examples.
L3 โ€” ApplyDecomposition
Q10

Given R(A, B, C, D, E) with FDs: {AB โ†’ CDE, C โ†’ A}. Candidate Keys: {AB, BC}. Is R in BCNF? Analyze.

  1. Yes, it's in BCNF
  2. No โ€” ABโ†’CDE: AB is a superkey โœ“. But Cโ†’A: C is NOT a superkey (Cโบ = {C,A} โ‰  all). Cโ†’A violates BCNF. However, A IS prime (part of CK {AB}), so 3NF is satisfied. Result: 3NF but NOT BCNF
  3. It's not even in 3NF
  4. Cannot determine without more FDs
โœ… B. Check BCNF: every determinant must be a superkey. ABโ†’CDE: {AB}โบ = all โœ“. Cโ†’A: {C}โบ = {C,A}. C is not a superkey โœ—. BCNF violated. Check 3NF: Cโ†’A: C is not superkey, but A is prime (part of CK {AB}). The 3NF exception applies โœ“. Result: in 3NF, not in BCNF. This is the classic scenario where 3NF โ‰  BCNF.
๐Ÿข This exact pattern appeared in GATE CS 2021. Memorize: "3NF but not BCNF when a non-superkey determines a prime attribute."
L4 โ€” Analyze3NF vs BCNF
Q11

R(A,B,C) with FDs: {Aโ†’B, Bโ†’C}. Decomposition D1: R1(A,B), R2(A,C). D2: R1(A,B), R2(B,C). Analyze both for lossless-join and dependency preservation.

  1. Both are lossless and dependency-preserving
  2. D1: Lossless (A is key of both), dependency-preserving (Aโ†’B in R1, but Bโ†’C? B not in R1 with C... only Aโ†’B in R1 and Aโ†’C derivable in R2. But original FD Bโ†’C is LOST โ€” can't be checked in a single table). D2: Lossless (B is key of R2), dependency-preserving (Aโ†’B in R1, Bโ†’C in R2 โ€” both preserved). D2 is superior.
  3. D1 is better
  4. Neither is lossless
โœ… B. D1: R1โˆฉR2 = {A}. A is key of R1(A,B) โœ“ โ†’ lossless. But FD Bโ†’C: B is in R1, C is in R2 โ€” this FD spans two tables and is LOST. D2: R1โˆฉR2 = {B}. B is key of R2(B,C) โœ“ โ†’ lossless. FDs: Aโ†’B in R1 โœ“, Bโ†’C in R2 โœ“ โ†’ all preserved. D2 is the correct decomposition. This shows why the synthesis algorithm matters โ€” it guarantees dependency preservation.
๐Ÿข Analyzing decomposition quality (lossless + preserving) is a 5-mark GATE question.
L4 โ€” AnalyzeDecomposition
Q12

A BCNF decomposition of R loses the FD {student_id, subject} โ†’ advisor. The DBA team debates: stay at 3NF (preserves the FD) or go to BCNF (loses it). Evaluate.

  1. Always go to BCNF
  2. Stay at 3NF if the lost FD represents a critical business rule that must be enforced. In BCNF, you'd need a trigger or application-level check to enforce the lost FD, adding complexity. 3NF preserves all dependencies within tables, making constraint enforcement simpler. The minor redundancy in 3NF is acceptable if it preserves important business rules
  3. Dependencies don't matter
  4. Use 1NF instead
โœ… B. This is a real-world trade-off. BCNF eliminates all FD-based redundancy but may lose dependencies. 3NF preserves all FDs (can enforce via table-level constraints) but may retain slight redundancy. Decision factors: (1) How critical is the lost FD? If it's a billing rule at SBI, losing it is unacceptable. (2) Can the lost FD be enforced via triggers? Yes, but adds complexity. (3) How much redundancy does 3NF introduce? Usually minimal.
๐Ÿข This is a real architectural decision in enterprise database design. Know the trade-offs for senior-level interviews.
L5 โ€” EvaluateDesign
Q13

A startup's CTO argues: "Normalization is outdated. Just store everything in one big table โ€” modern SSDs and cloud databases are fast enough." Evaluate this claim.

  1. The CTO is correct โ€” normalization is outdated
  2. Wrong for OLTP (transactional systems). Normalization prevents data anomalies โ€” no amount of hardware speed fixes inconsistent data. A denormalized OLTP system will eventually have rows where the same customer has two different phone numbers, or a department has two different HODs. However, for OLAP (analytical/reporting systems), flat denormalized tables (star schema, data warehouses) are indeed standard โ€” because they're read-only and don't face update anomalies
  3. Normalization causes slow queries
  4. One table is always better
โœ… B. Normalization โ‰  performance overhead. It's about data CORRECTNESS. A bank can't have two different balances for the same account. Hardware can't fix logical errors. The correct approach: normalize for OLTP (writes), denormalize for OLAP (reads). This dual-system architecture is used by every major Indian tech company โ€” Flipkart, Razorpay, PhonePe all use normalized PostgreSQL for transactions and denormalized data warehouses (BigQuery/Redshift) for analytics.
๐Ÿข This debate comes up in system design interviews. The answer is always: "normalize for correctness, denormalize for performance, never skip normalization."
L5 โ€” EvaluateArchitecture
Q14

Given an unnormalized invoice table: invoice_id, date, customer_name, phone, items (comma-separated), quantities, prices, gst_no, total. Design a 3NF schema.

  1. Keep as one table
  2. Decompose into: customers(customer_id PK, name, phone, gst_no), invoices(invoice_id PK, customer_id FK, date, total), invoice_items(item_id PK, invoice_id FK, product_name, quantity, unit_price, line_total). Products could be further extracted: products(product_id PK, product_name, unit_price), with invoice_items referencing product_id
  3. Two tables: invoices and items
  4. Store as JSON
โœ… B. Normalization steps: (1) 1NF: remove comma-separated items โ†’ separate invoice_items table. (2) 2NF: customer_name and phone depend on customer (not invoice) โ†’ separate customers table. (3) 3NF: product_name and unit_price are facts about the product, not the invoice โ†’ separate products table. Each table has a clear single responsibility with no transitive or partial dependencies.
๐Ÿข Invoice/billing schema design is a common campus placement and TCS/Infosys project assignment.
L6 โ€” CreateSchema Design
Q15

Design a complete normalized schema for a hospital management system (patients, doctors, departments, appointments, prescriptions, medicines). Start from a single unnormalized table and show the final 3NF/BCNF tables with SQL.

  1. One table is sufficient
  2. 6+ tables: departments(dept_id PK, name, hod), doctors(doctor_id PK, name, specialization, dept_id FK, salary), patients(patient_id PK, name, phone, dob, blood_group), appointments(appt_id PK, patient_id FK, doctor_id FK, date, status, fee), medicines(med_id PK, name, manufacturer, price), prescriptions(rx_id PK, appt_id FK, med_id FK, dosage, duration). All in BCNF โ€” every determinant is a PK/superkey
  3. Three tables maximum
  4. Use a spreadsheet
โœ… B. Each entity has its own table (no mixing facts). FDs flow naturally: dept_idโ†’dept details, doctor_idโ†’doctor details, patient_idโ†’patient details, appt_idโ†’appointment details, med_idโ†’medicine details. Prescriptions is the junction between appointments and medicines. Each determinant is a PK โ†’ all in BCNF. This is the standard hospital ERP schema used in AIIMS, Apollo, and Fortis HIS systems.
๐Ÿข Hospital/university schema design from UNF to 3NF is a guaranteed lab exam question.
L6 โ€” CreateSchema Design
Section 7

Chapter Summary

NORMALIZATION โ”‚ โ”œโ”€โ”€ WHY NORMALIZE? โ”‚ โ”œโ”€โ”€ Insertion Anomaly: can't add data without unrelated data โ”‚ โ”œโ”€โ”€ Update Anomaly: same fact updated in some rows but not all โ”‚ โ””โ”€โ”€ Deletion Anomaly: removing one fact accidentally removes another โ”‚ โ”œโ”€โ”€ FUNCTIONAL DEPENDENCIES (FDs) โ”‚ โ”œโ”€โ”€ X โ†’ Y: same X โ‡’ same Y โ”‚ โ”œโ”€โ”€ Types: Full, Partial, Transitive, Trivial โ”‚ โ””โ”€โ”€ Armstrong's Axioms: Reflexivity, Augmentation, Transitivity โ”‚ + Derived: Union, Decomposition, Pseudo-transitivity โ”‚ โ”œโ”€โ”€ ATTRIBUTE CLOSURE (Xโบ) โ”‚ โ”œโ”€โ”€ Algorithm: start with X, apply FDs iteratively โ”‚ โ”œโ”€โ”€ Use: find candidate keys (Xโบ = all attrs โ‡’ X is superkey) โ”‚ โ””โ”€โ”€ Minimize superkey to get candidate key โ”‚ โ”œโ”€โ”€ NORMAL FORMS โ”‚ โ”œโ”€โ”€ 1NF: atomic values, no repeating groups โ”‚ โ”œโ”€โ”€ 2NF: 1NF + no partial dependencies on composite CK โ”‚ โ”œโ”€โ”€ 3NF: 2NF + no transitive dependencies โ”‚ โ”‚ (Xโ†’A: X is superkey OR A is prime) โ”‚ โ”œโ”€โ”€ BCNF: every determinant is a superkey (no exceptions) โ”‚ โ””โ”€โ”€ 3NF vs BCNF: 3NF preserves FDs, BCNF may lose them โ”‚ โ”œโ”€โ”€ DECOMPOSITION โ”‚ โ”œโ”€โ”€ Lossless: R1 โˆฉ R2 is superkey of R1 or R2 โ”‚ โ”œโ”€โ”€ Dependency-preserving: all FDs checkable in one table โ”‚ โ”œโ”€โ”€ 3NF Synthesis Algorithm: โ”‚ โ”‚ 1. Find minimal cover โ”‚ โ”‚ 2. Create table per FD (merge same determinants) โ”‚ โ”‚ 3. Ensure CK is in at least one table โ”‚ โ””โ”€โ”€ BCNF Decomposition: split on violating FD, repeat โ”‚ โ””โ”€โ”€ DENORMALIZATION โ”œโ”€โ”€ Intentional redundancy for read performance โ”œโ”€โ”€ Techniques: materialized views, computed columns, CQRS โ”œโ”€โ”€ Use: dashboards, search pages, cached balances โ””โ”€โ”€ Rule: normalize first, then strategically denormalize

๐ŸŽฏ 3 Skills This Chapter Unlocks

  1. Schema Design Quality โ€” You can identify and eliminate redundancy in any database. This prevents data corruption that costs companies lakhs in debugging and fixes.
  2. GATE CS Normalization โ€” Closure computation, candidate key finding, NF determination, decomposition โ€” these topics carry 5-8 marks in GATE CS every year.
  3. Architecture Decisions โ€” Knowing when to normalize (OLTP) vs denormalize (OLAP) is a senior-level skill that differentiates you in interviews at product companies.

๐Ÿ“‹ Normalization Quick Reference

CLOSURE: Xโบ = start{X}, apply all FDs iteratively, stop when no change

CANDIDATE KEY: minimal set where Xโบ = all attributes
  Find: attrs only on LHS must be in every key. Add others until Xโบ = all.

NORMAL FORMS:
  1NF: atomic values (no arrays, no CSV, no nested)
  2NF: 1NF + no partial deps (non-prime depends on full CK, not part)
  3NF: 2NF + for Xโ†’A: X is superkey OR A is prime
  BCNF: for Xโ†’A: X is ALWAYS superkey (strictest)

DECOMPOSITION TEST:
  Lossless: R1 โˆฉ R2 โ†’ R1 or R1 โˆฉ R2 โ†’ R2
  Dependency-preserving: all FDs within single decomposed table

3NF SYNTHESIS: minimal cover โ†’ table per FD โ†’ merge same LHS โ†’ add CK table

DENORMALIZE WHEN: read-heavy dashboards, sub-100ms response needed,
  data freshness of 15min+ acceptable. NEVER for OLTP source-of-truth.
Section 8

Interview & Career Preparation

Q1: What is normalization? Why is it needed?

Model Answer: Normalization is the process of organizing a database to reduce redundancy and eliminate anomalies (insertion, update, deletion). It decomposes large tables into smaller, well-structured tables connected by foreign keys. Needed because: without it, the same fact is stored multiple times, leading to inconsistency when one copy is updated but others aren't. Example: storing department HOD in every student row โ€” changing the HOD requires updating hundreds of rows.

Q2: Explain 1NF, 2NF, 3NF with examples.

Model Answer: 1NF: every cell is atomic (one value). Fix: no comma-separated phone numbers โ€” use a separate table. 2NF: no partial dependencies โ€” non-prime attributes depend on the FULL composite key. Fix: student_name depends only on student_id, not {student_id, course_id} โ€” move to students table. 3NF: no transitive dependencies โ€” non-prime doesn't determine non-prime. Fix: dept โ†’ dept_hod is transitive through student_id โ†’ dept โ†’ dept_hod โ€” create departments table.

Q3: What is BCNF? How does it differ from 3NF?

Model Answer: BCNF requires that for every non-trivial FD Xโ†’Y, X must be a superkey. No exceptions. 3NF has an exception: if Y is a prime attribute (part of some candidate key), the FD is allowed even if X isn't a superkey. BCNF is stricter โ€” eliminates all FD-based redundancy. Trade-off: BCNF decomposition may lose some FDs (can't be checked in a single table), while 3NF always preserves all FDs.

Q4: How do you find candidate keys from FDs?

Model Answer: Step 1: Identify attributes that appear ONLY on the left side of FDs โ€” they must be in every candidate key. Step 2: Compute their closure. If closure = all attributes, they form a candidate key. Step 3: If not, add remaining attributes one by one, compute closure each time. Step 4: Check minimality โ€” remove each attribute and verify the remaining is still a superkey. If removing any attribute breaks it, the set is a minimal candidate key.

Q5: What is a lossless-join decomposition?

Model Answer: A decomposition of R into R1 and R2 is lossless if JOINing R1 and R2 gives back exactly R โ€” no extra (spurious) tuples. Test: the common attributes (R1 โˆฉ R2) must be a superkey of at least one of the decomposed tables. If they're not, the JOIN produces extra rows that weren't in the original table. Always verify lossless property when decomposing.

Q6: What is the 3NF synthesis algorithm?

Model Answer: Step 1: Find minimal cover of FDs (decompose RHS, remove redundant FDs, remove extraneous attributes). Step 2: Create a table for each FD โ€” left side becomes PK. Step 3: Merge tables with the same determinant. Step 4: If no table contains a candidate key, add a table with just the CK. This always produces a lossless, dependency-preserving 3NF decomposition.

Q7: When should you denormalize?

Model Answer: Denormalize when: (1) Read performance is critical (dashboards, search pages needing sub-100ms). (2) The query involves many JOINs across large tables. (3) Slight data staleness is acceptable (15-minute-old data for reports). Techniques: materialized views, computed columns, read replicas with denormalized schema. Rule: always normalize first (source of truth), then denormalize strategically for read performance. Never skip normalization for OLTP systems.

Q8: What are Armstrong's axioms?

Model Answer: Three core axioms: (1) Reflexivity: if Y โІ X then Xโ†’Y (trivial). (2) Augmentation: if Xโ†’Y then XZโ†’YZ (adding attributes to both sides). (3) Transitivity: if Xโ†’Y and Yโ†’Z then Xโ†’Z. These are sound (only derive valid FDs) and complete (can derive ALL valid FDs). Derived rules: Union (Xโ†’Y, Xโ†’Z โ‡’ Xโ†’YZ), Decomposition (Xโ†’YZ โ‡’ Xโ†’Y, Xโ†’Z), Pseudo-transitivity (Xโ†’Y, WYโ†’Z โ‡’ WXโ†’Z).

Q9: Can you give a real-world example of each normal form violation?

Model Answer: 1NF violation: Excel sheet with "Phone: 9876,9123" in one cell โ€” not atomic. 2NF violation: University table with {student_id, course_id} as PK, but student_name depends only on student_id (partial). 3NF violation: Employee table where emp_id โ†’ dept โ†’ dept_location (transitive โ€” dept_location is a fact about dept, not about employee). BCNF violation: Advisor table where advisor โ†’ subject but advisor isn't a superkey (rare, involves overlapping candidate keys).

Q10: Normalize vs Denormalize โ€” how do companies like Flipkart handle this?

Model Answer: Flipkart uses a dual-model architecture: (1) Normalized PostgreSQL database for OLTP โ€” order placement, payment processing, inventory updates. Full 3NF/BCNF. ACID transactions. (2) Denormalized data warehouse (BigQuery/Redshift) for OLAP โ€” sales analytics, recommendation engine, reporting. Star schema with fact and dimension tables. The normalized system feeds the denormalized system via ETL/streaming pipelines. This is the industry-standard CQRS pattern.

๐ŸŽ“ GATE CS Normalization Strategy

  • Always start by computing closures and finding candidate keys
  • Classify attributes as prime/non-prime before checking NFs
  • Check NFs bottom-up: 1NF โ†’ 2NF โ†’ 3NF โ†’ BCNF. Stop at the first failure.
  • Practice: 15-20 problems from previous GATE papers (2015-2024 have normalization every year)