Data Structures & Algorithms: Industry Edition

Unit 5: Heaps, Hashing & Graphs

HeapSort, Hash Tables, BFS, DFS, Dijkstra's & Floyd-Warshall — with real examples from Ola, PhonePe, Google Maps & IRCTC.

🏢 Real Projects | 💻 4 Lab Programs (Python + C) | 📝 25 MCQs | 🎯 3 Interview Questions

Section 1

Industry Hook — The Real-World Problem First

🗺️ Three Problems That Power India's Digital Infrastructure

Problem 1 — Ola (Heaps): When you book a ride, Ola must find the nearest driver from 500,000 active drivers — in under 5ms. Scanning all 500K drivers linearly is too slow. A min-heap (priority queue) gives the nearest driver in O(1) and updates in O(log n). That's ~19 operations instead of 500,000.

Problem 2 — PhonePe (Hashing): PhonePe processes 7 billion UPI transactions monthly. Every payment must check: "Has this transaction ID been seen before?" Searching 7 billion records linearly = impossible. A hash table answers in O(1) — constant time regardless of table size.

Problem 3 — Google Maps (Graphs): When you search "Delhi to Mumbai" on Google Maps, it explores millions of road segments to find the shortest route. Roads are edges, intersections are vertices — this is a graph. Dijkstra's algorithm finds the shortest path in O((V+E) log V). IRCTC uses the same algorithm to find the best train route with connections.

This is exactly the problem heaps, hashing, and graphs solve. Let's master them all.

🇮🇳 Ola🇮🇳 PhonePeGoogle Maps🇮🇳 IRCTC

Section 2

Concept Explanation — Theory, Earned

2.1 Heaps & HeapSort

Layer 1 — Intuition

A heap is like a VIP hospital queue. The patient with the highest severity (priority) is always treated first. You can add new patients and the most critical one automatically "bubbles up" to the front. A max-heap keeps the maximum at the root; a min-heap keeps the minimum.

Layer 2 — Visual

Max-Heap (parent ≥ children)
           90
         /    \
       80      70
      /  \    /
    50   60  30

Array: [90, 80, 70, 50, 60, 30]  (level-order)

For index i:
  Parent:      (i-1)/2
  Left child:  2i + 1
  Right child: 2i + 2

Heap Property: arr[parent] ≥ arr[child] (max-heap)
               arr[parent] ≤ arr[child] (min-heap)

Heap Operations

Operation	Time	How
Get Max/Min	O(1)	Root is always max/min
Insert	O(log n)	Add at end, bubble UP (sift up)
Extract Max/Min	O(log n)	Remove root, replace with last, bubble DOWN (heapify)
Build Heap	O(n)	Bottom-up heapify (not O(n log n)!)
HeapSort	O(n log n)	Build heap + n extractions

Building a heap from an array is O(n), not O(n log n). This is counter-intuitive — you'd expect n insertions × O(log n) each = O(n log n). But bottom-up heapify is smarter: most nodes are near the bottom and need very few swaps. The math proves it converges to O(n). This was proven by Floyd in 1964.

2.2 Hashing & Hash Tables

Layer 1 — Intuition

A hash table is like a library with numbered shelves. Instead of searching every shelf, you compute a shelf number from the book title: "Harry Potter" → shelf 7. Go directly to shelf 7. Done in O(1).

Layer 2 — Hash Function & Collisions

Hashing Concept
key → hash_function(key) → index → store in array[index]

Example: hash(key) = key % table_size

Insert keys: 25, 37, 42, 55, 73  (table_size = 10)
  25 % 10 = 5  → slot 5
  37 % 10 = 7  → slot 7
  42 % 10 = 2  → slot 2
  55 % 10 = 5  → COLLISION! Slot 5 already has 25!
  73 % 10 = 3  → slot 3

Collision Resolution Techniques

Technique	How It Works	Pros	Cons
Separate Chaining	Each slot holds a linked list	Simple, never "full"	Extra pointer space, cache unfriendly
Linear Probing	Try next slot: (h+1), (h+2), ...	Cache friendly	Primary clustering
Quadratic Probing	Try (h+1²), (h+2²), (h+3²), ...	Less clustering	Secondary clustering, may not find empty slot
Double Hashing	h₂(key) as step: h₁ + i·h₂	Best distribution	Complex, needs good h₂

Real consequence: PhonePe with 7B monthly transactions: hash table lookup = O(1) ≈ 50 nanoseconds. Linear search = O(n) ≈ 7 billion × 7ns = 49 seconds per lookup. At 2,700 transactions/second, the system would need 49 × 2,700 = 132,300 seconds of compute per second. Physically impossible without hashing.

2.3 Graphs — Vertices, Edges, Connections

Layer 1 — Intuition

A graph is a social network. Each person is a vertex, each friendship is an edge. Unlike trees, graphs can have cycles (A→B→C→A), disconnected components, and edges with weights (distance between cities).

Layer 2 — Representations

Example Graph
     A ——(4)—— B
     |         |
    (2)       (3)
     |         |
     C ——(1)—— D

Adjacency Matrix
     A  B  C  D
A  [ 0  4  2  0 ]
B  [ 4  0  0  3 ]
C  [ 2  0  0  1 ]
D  [ 0  3  1  0 ]
Space: O(V²). Good for dense graphs.

Adjacency List
A → [(B,4), (C,2)]
B → [(A,4), (D,3)]
C → [(A,2), (D,1)]
D → [(B,3), (C,1)]
Space: O(V+E). Good for sparse graphs.

BFS vs DFS

BFS (Breadth-First)	DFS (Depth-First)
Uses a Queue	Uses a Stack (or recursion)
Visits level by level	Goes deep before backtracking
Finds shortest path (unweighted)	Finds all connected components
Google Maps: "How many stops?"	Maze solving: "Is there a path?"
Space: O(V)	Space: O(V)
Time: O(V+E)	Time: O(V+E)

2.4 Shortest Path Algorithms

Dijkstra's Algorithm — Single Source Shortest Path

Find shortest path from A to all vertices:

Graph: A-(4)-B, A-(2)-C, B-(3)-D, C-(1)-D

Step 0: dist = {A:0, B:∞, C:∞, D:∞}     visited = {}
Step 1: Visit A (dist=0)
        Update B: 0+4=4, C: 0+2=2
        dist = {A:0, B:4, C:2, D:∞}       visited = {A}
Step 2: Visit C (dist=2, smallest unvisited)
        Update D: 2+1=3
        dist = {A:0, B:4, C:2, D:3}       visited = {A,C}
Step 3: Visit D (dist=3)
        Update B: 3+3=6 > 4, no update
        dist = {A:0, B:4, C:2, D:3}       visited = {A,C,D}
Step 4: Visit B (dist=4)
        dist = {A:0, B:4, C:2, D:3}       visited = {A,C,D,B}

Shortest: A→B=4, A→C=2, A→D=3 (via C!)

Time: O((V+E) log V) with min-heap. Without heap: O(V²).

Floyd-Warshall — All-Pairs Shortest Path

Finds shortest path between every pair of vertices. Uses dynamic programming: "Can going through vertex k improve the path from i to j?"

Core Idea
for k in all vertices:
    for i in all vertices:
        for j in all vertices:
            dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

Time: O(V³)   Space: O(V²)
Used when: You need ALL shortest paths (e.g., IRCTC route planning between ALL station pairs).

Dijkstra's finds shortest paths from ONE source to ALL destinations — O((V+E) log V). Floyd-Warshall finds shortest paths between ALL pairs — O(V³). Google Maps uses Dijkstra (one source: your location). IRCTC precomputes all station-pair distances using Floyd-Warshall. When should you use which?

Section 3

Lab Program Implementations

LAB PROGRAM 1: Heap Operations & HeapSort

📘 Syllabus: Unit 5 — Heaps🏢 Industry: Ola driver selection, OS task scheduler, Top-K queries

Python
class MaxHeap:
    """Max-Heap using array. Parent ≥ both children.
    Industry: Ola's driver pool — extract nearest (max priority) in O(1)."""
    def __init__(self):
        self.heap = []

    def _parent(self, i): return (i - 1) // 2
    def _left(self, i):   return 2 * i + 1
    def _right(self, i):  return 2 * i + 2

    def insert(self, val):
        """Add value, then bubble UP. O(log n)."""
        self.heap.append(val)
        i = len(self.heap) - 1
        while i > 0 and self.heap[self._parent(i)] < self.heap[i]:
            p = self._parent(i)
            self.heap[i], self.heap[p] = self.heap[p], self.heap[i]
            i = p

    def extract_max(self):
        """Remove and return max (root). O(log n)."""
        if not self.heap: return None
        max_val = self.heap[0]
        self.heap[0] = self.heap[-1]    # Move last to root
        self.heap.pop()
        self._heapify_down(0)           # Bubble down
        return max_val

    def _heapify_down(self, i):
        """Restore heap property by bubbling down."""
        n = len(self.heap)
        largest = i
        l, r = self._left(i), self._right(i)
        if l < n and self.heap[l] > self.heap[largest]: largest = l
        if r < n and self.heap[r] > self.heap[largest]: largest = r
        if largest != i:
            self.heap[i], self.heap[largest] = self.heap[largest], self.heap[i]
            self._heapify_down(largest)


def heap_sort(arr):
    """HeapSort: Build max-heap, repeatedly extract max. O(n log n)."""
    n = len(arr)
    # Build max-heap (bottom-up) — O(n)
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)
    # Extract elements one by one
    for i in range(n - 1, 0, -1):
        arr[0], arr[i] = arr[i], arr[0]   # Move max to end
        heapify(arr, i, 0)                # Heapify reduced array

def heapify(arr, n, i):
    largest = i
    l, r = 2*i+1, 2*i+2
    if l < n and arr[l] > arr[largest]: largest = l
    if r < n and arr[r] > arr[largest]: largest = r
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]
        heapify(arr, n, largest)

data = [12, 11, 13, 5, 6, 7]
heap_sort(data)
print("HeapSort:", data)

HeapSort: [5, 6, 7, 11, 12, 13]

C — HeapSort

C
#include <stdio.h>
void swap(int* a, int* b) { int t=*a; *a=*b; *b=t; }

void heapify(int a[], int n, int i) {
    int big=i, l=2*i+1, r=2*i+2;
    if(l<n && a[l]>a[big]) big=l;
    if(r<n && a[r]>a[big]) big=r;
    if(big!=i){ swap(&a[i],&a[big]); heapify(a,n,big); }
}
void heapSort(int a[], int n) {
    for(int i=n/2-1;i>=0;i--) heapify(a,n,i);
    for(int i=n-1;i>0;i--){ swap(&a[0],&a[i]); heapify(a,i,0); }
}
int main() {
    int a[]={12,11,13,5,6,7};
    heapSort(a,6);
    for(int i=0;i<6;i++) printf("%d ",a[i]);
    return 0;
}

LAB PROGRAM 2: Hash Table with Collision Resolution

📘 Syllabus: Unit 5 — Hashing🏢 Industry: PhonePe transaction dedup, DNS lookup, database indexing

Python
class HashTable:
    """Hash table with separate chaining (linked list per bucket).
    Industry: PhonePe checks transaction uniqueness in O(1)."""
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # Array of lists
        self.count = 0

    def _hash(self, key):
        return hash(key) % self.size

    def put(self, key, value):
        """Insert or update key-value pair. O(1) average."""
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)  # Update
                return
        self.table[idx].append((key, value))       # New entry
        self.count += 1

    def get(self, key):
        """Retrieve value by key. O(1) average."""
        idx = self._hash(key)
        for k, v in self.table[idx]:
            if k == key: return v
        return None

    def delete(self, key):
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx].pop(i)
                self.count -= 1; return True
        return False

    def display(self):
        for i, bucket in enumerate(self.table):
            if bucket:
                print(f"  [{i}]: {' → '.join(f'{k}:{v}' for k,v in bucket)}")


ht = HashTable(7)
ht.put("TXN001", 500)
ht.put("TXN002", 1200)
ht.put("TXN003", 300)
ht.put("TXN004", 750)
print("Hash Table:"); ht.display()
print(f"Get TXN002: ₹{ht.get('TXN002')}")
ht.delete("TXN003")
print(f"After deleting TXN003: {ht.get('TXN003')}")

Hash Table: [0]: TXN002:1200 [2]: TXN001:500 [4]: TXN004:750 [6]: TXN003:300 Get TXN002: ₹1200 After deleting TXN003: None

LAB PROGRAM 3: BFS and DFS on a Graph

📘 Syllabus: Unit 5 — Graph Traversals🏢 Industry: Google Maps routing, LinkedIn connections, web crawlers

Python
from collections import deque

class Graph:
    """Graph using adjacency list representation."""
    def __init__(self):
        self.adj = {}   # {vertex: [neighbors]}

    def add_edge(self, u, v):
        self.adj.setdefault(u, []).append(v)
        self.adj.setdefault(v, []).append(u)  # Undirected

    def bfs(self, start):
        """Breadth-First Search using queue. O(V+E).
        Visits vertices level by level — finds shortest path (unweighted).
        Industry: Google Maps 'how many stops to destination?'"""
        visited = set([start])
        queue = deque([start])
        order = []

        while queue:
            vertex = queue.popleft()       # FIFO — visit oldest first
            order.append(vertex)
            for neighbor in self.adj.get(vertex, []):
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)

        return order

    def dfs(self, start, visited=None):
        """Depth-First Search using recursion. O(V+E).
        Goes deep before backtracking — detects cycles, topological sort.
        Industry: Web crawler follows links as deep as possible."""
        if visited is None: visited = set()
        visited.add(start)
        result = [start]
        for neighbor in self.adj.get(start, []):
            if neighbor not in visited:
                result.extend(self.dfs(neighbor, visited))
        return result


g = Graph()
for u, v in [("Delhi","Jaipur"), ("Delhi","Agra"), ("Jaipur","Udaipur"),
             ("Agra","Lucknow"), ("Lucknow","Varanasi")]:
    g.add_edge(u, v)

print("BFS from Delhi:", g.bfs("Delhi"))
print("DFS from Delhi:", g.dfs("Delhi"))

BFS from Delhi: ['Delhi', 'Jaipur', 'Agra', 'Udaipur', 'Lucknow', 'Varanasi'] DFS from Delhi: ['Delhi', 'Jaipur', 'Udaipur', 'Agra', 'Lucknow', 'Varanasi']

C — BFS

C
#include <stdio.h>
#define V 6

int adj[V][V], visited[V], queue[V], front=0, rear=-1;

void bfs(int start) {
    visited[start] = 1; queue[++rear] = start;
    while(front <= rear) {
        int v = queue[front++];
        printf("%d ", v);
        for(int i=0;i<V;i++)
            if(adj[v][i] && !visited[i]) { visited[i]=1; queue[++rear]=i; }
    }
}

int main() {
    adj[0][1]=adj[1][0]=1; adj[0][2]=adj[2][0]=1;
    adj[1][3]=adj[3][1]=1; adj[2][4]=adj[4][2]=1; adj[4][5]=adj[5][4]=1;
    printf("BFS: "); bfs(0);
    return 0;
}

LAB PROGRAM 4: Dijkstra's Shortest Path Algorithm

📘 Syllabus: Unit 5 — Shortest Path🏢 Industry: Google Maps, IRCTC train routes, network packet routing

Python
import heapq

def dijkstra(graph, start):
    """Dijkstra's algorithm using min-heap (priority queue).
    Finds shortest distance from start to ALL other vertices.
    Time: O((V+E) log V) with heap. Without: O(V²).
    
    Industry: Google Maps finds shortest route from your location
    to destination. IRCTC finds optimal train routes."""
    
    dist = {v: float('inf') for v in graph}
    dist[start] = 0
    prev = {v: None for v in graph}    # To reconstruct path
    pq = [(0, start)]                    # (distance, vertex)
    visited = set()
    
    while pq:
        d, u = heapq.heappop(pq)          # Get nearest unvisited
        if u in visited: continue
        visited.add(u)
        
        for v, weight in graph[u]:
            new_dist = d + weight
            if new_dist < dist[v]:
                dist[v] = new_dist        # Found shorter path!
                prev[v] = u
                heapq.heappush(pq, (new_dist, v))
    
    return dist, prev


def get_path(prev, target):
    path = []
    while target:
        path.append(target)
        target = prev[target]
    return path[::-1]


# Indian railway network (simplified)
railway = {
    "Delhi":     [("Jaipur", 281), ("Agra", 200), ("Lucknow", 500)],
    "Jaipur":    [("Delhi", 281), ("Ahmedabad", 625)],
    "Agra":      [("Delhi", 200), ("Lucknow", 335)],
    "Lucknow":   [("Delhi", 500), ("Agra", 335), ("Varanasi", 300)],
    "Ahmedabad": [("Jaipur", 625), ("Mumbai", 524)],
    "Varanasi":  [("Lucknow", 300)],
    "Mumbai":    [("Ahmedabad", 524)],
}

dist, prev = dijkstra(railway, "Delhi")
print("=== IRCTC Route Planner: Delhi to All Cities ===")
for city, d in sorted(dist.items(), key=lambda x: x[1]):
    path = get_path(prev, city)
    print(f"  {city:12s}: {d:4d} km  via {' → '.join(path)}")

=== IRCTC Route Planner: Delhi to All Cities === Delhi : 0 km via Delhi Agra : 200 km via Delhi → Agra Jaipur : 281 km via Delhi → Jaipur Lucknow : 500 km via Delhi → Lucknow Varanasi : 800 km via Delhi → Lucknow → Varanasi Ahmedabad : 906 km via Delhi → Jaipur → Ahmedabad Mumbai : 1430 km via Delhi → Jaipur → Ahmedabad → Mumbai

Notice Delhi→Lucknow is 500 km directly, but Delhi→Agra→Lucknow is 200+335 = 535 km. Dijkstra correctly picks the direct route. But Delhi→Mumbai goes through Jaipur→Ahmedabad (1430 km) because that's shorter than Delhi→Agra→Lucknow→ ∞ (no direct Lucknow-Mumbai link). This is exactly how Google Maps works — it doesn't always pick the "obvious" route.

Section 4

Industry Case Studies

Case Study 1: Ola — Driver Matching with Min-Heap

🏢 CompanyOla (🇮🇳 2M+ rides/day, 500K active drivers)

⚙️ SystemReal-time Driver Matching Engine

🔴 ProblemWhen a rider requests a ride, find the nearest available driver from 500K active drivers in under 5ms. Drivers' locations update every 3 seconds.

🟢 DSA UsedGeospatial min-heap: drivers indexed by distance to rider. Extract-min gives nearest driver in O(1). When a driver moves, update their position in O(log n). Combined with geohashing to limit search to nearby grid cells.

📊 Numbers500K drivers, 2M rides/day. Heap extract-min: O(1) = ~100ns. Linear scan: O(500K) = ~3.5ms. At 23 rides/second, heap saves 80ms/second of compute.

✅ OutcomeDriver matched within 3 seconds of request. Rider sees "Driver assigned" almost instantly.

💡 Student LessonHeaps are for "give me the best element NOW." Whenever you need the min, max, nearest, cheapest, or most urgent item, think heap.

Case Study 2: PhonePe — Transaction Deduplication with Hash Tables

🏢 CompanyPhonePe (🇮🇳 7B+ monthly UPI transactions)

⚙️ SystemTransaction Deduplication & Idempotency Layer

🔴 ProblemNetwork timeouts cause payment retries. If ₹500 is charged twice, the user loses money. Every transaction must be checked: "Has this exact transaction ID been processed before?"

🟢 DSA UsedDistributed hash table (Bloom filter + hash map). Transaction ID → hash → O(1) lookup. If found → reject duplicate. If not → process and store.

📊 Numbers7B transactions/month = ~2,700/second. Each lookup: O(1) ≈ 50ns. Without hashing: O(n) scan of billions = impossible.

✅ OutcomeZero duplicate charges. Every retry is safely rejected without double-processing.

💡 Student LessonHash tables are for "have I seen this before?" — deduplication, caching, counting, and lookup. Python's dict, Java's HashMap, and Redis are all hash tables.

Case Study 3: Google Maps — Shortest Path with Dijkstra

🏢 CompanyGoogle Maps (Global — 1B+ monthly users)

⚙️ SystemRoute Planning Engine

🔴 ProblemFind the shortest/fastest route between two points on a road network with billions of edges. Must handle real-time traffic, road closures, and turn restrictions.

🟢 DSA UsedA* search (Dijkstra + heuristic) on a contraction hierarchy. The road graph is preprocessed into a hierarchy: local roads → city roads → highways. Dijkstra runs on this compressed graph, reducing billions of edges to thousands.

📊 Numbers1 billion+ route queries/day. Road network: ~1 billion intersections, ~2 billion road segments globally. Route computed in under 500ms.

✅ OutcomeInstant route suggestions with real-time traffic. The same algorithm powers Uber, Ola, and every navigation app.

💡 Student LessonDijkstra's is the foundation of every navigation system. Lab Program 4's IRCTC route planner is a simplified version of exactly what Google Maps does.

Section 5

Chapter Project — "Build It Yourself"

🗺️ Campus Navigation System with Dijkstra

Pitch: Build a shortest-path navigation system for your college campus — find the quickest route between any two buildings, with real walking distances.

Steps

Step 1 — Model campus as a graph: Buildings = vertices (Library, Canteen, Lab, Hostel, etc.). Paths = weighted edges (walking time in minutes).
Step 2 — Adjacency list: Use Lab Program 3's graph structure to store the campus map.
Step 3 — Dijkstra: Use Lab Program 4 to find the shortest path between any two buildings.
Step 4 — Path reconstruction: Print the complete walking route: "Library → Main Gate → Canteen (7 min)".
Step 5 — Interactive menu: User inputs source and destination, system outputs shortest route.

Extensions

⭐ Easy: Add all-pairs shortest paths using Floyd-Warshall — precompute a routing table
⭐⭐ Medium: Account for "blocked paths" (construction, restricted areas) by removing edges dynamically
⭐⭐⭐ Hard: Visualize the campus graph and highlight the shortest path using matplotlib

Section 6

Interview Corner — "Crack It at Google/Amazon"

Q1: Top K Frequent Elements ⭐⭐ [Amazon + Google]

Python
import heapq
from collections import Counter

def top_k_frequent(nums, k):
    """Find k most frequent elements. O(n log k).
    Uses hash map (count) + min-heap (top K).
    Industry: Twitter trending topics, YouTube trending videos."""
    count = Counter(nums)                     # O(n) — hash map
    return heapq.nlargest(k, count.keys(), key=count.get)

print(top_k_frequent([1,1,1,2,2,3], 2))  # [1, 2]

Time: O(n log k). Hash map counts in O(n). Heap maintains top K in O(n log k). Much better than sorting O(n log n) for large n, small k.

Q2: Two Sum ⭐ [Most Asked Interview Question Worldwide]

Python
def two_sum(nums, target):
    """Find two indices whose values add to target. O(n).
    Hash map stores: value → index. For each num, check if
    (target - num) exists in the map."""
    seen = {}                              # value → index
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]    # Found pair!
        seen[num] = i                      # Store for future lookup
    return []

print(two_sum([2, 7, 11, 15], 9))  # [0, 1]

Time: O(n). One pass. Hash map lookup is O(1). Brute force is O(n²) with nested loops. This is the #1 most-asked interview question (LeetCode #1).

Q3: Number of Islands (BFS/DFS on Grid) ⭐⭐ [GATE + Interview]

Python
def num_islands(grid):
    """Count connected components of '1's in a 2D grid.
    Each '1' is land, '0' is water. Connected '1's form an island.
    Uses DFS to 'flood fill' each island."""
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    count = 0

    def dfs(r, c):
        if r<0 or r>=rows or c<0 or c>=cols or grid[r][c]!='1':
            return
        grid[r][c] = '0'               # Mark visited
        dfs(r+1,c); dfs(r-1,c); dfs(r,c+1); dfs(r,c-1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1              # New island found
                dfs(r, c)              # Sink the entire island
    return count

Time: O(R×C). Each cell visited once. This is the graph traversal applied to a 2D grid — a matrix IS a graph where each cell connects to its 4 neighbors.

Section 7

MCQ Assessment Bank — 25 Questions

Level 1 — Recall ⭐ (5 Questions)

In a max-heap, the root contains:

The minimum element
The maximum element
The median element
A random element

✅ (b) The maximum element
📖 Max-heap property: parent ≥ children. Root has no parent, so it's the maximum.
❌ (a) That's a min-heap.

University⭐ Easy

Average time complexity of hash table lookup:

O(n)
O(log n)
O(1)
O(n²)

✅ (c) O(1)
📖 Hash function computes the index directly. No searching needed (on average, with good hash function and load factor).
❌ (a) Worst case with all collisions. (b) That's BST.

GATE⭐ Easy

BFS uses which data structure?

Stack
Queue
Heap
Array

✅ (b) Queue
📖 BFS processes vertices in FIFO order — the order they were discovered. This ensures level-by-level traversal.
❌ (a) Stack is used by DFS.

University⭐ Easy

Dijkstra's algorithm finds:

Minimum spanning tree
Shortest path from one source to all vertices
All-pairs shortest paths
Longest path

✅ (b)
📖 Single-source shortest path. For all-pairs, use Floyd-Warshall. For MST, use Kruskal/Prim.
❌ (c) That's Floyd-Warshall. (a) That's Prim's/Kruskal's.

GATE⭐ Easy

HeapSort's time complexity is:

O(n) best, O(n²) worst
O(n log n) in all cases
O(n²) in all cases
O(n log n) avg, O(n²) worst

✅ (b) O(n log n) in all cases
📖 Build heap: O(n). Extract n elements: n × O(log n). Total: O(n log n) regardless of input.
❌ (d) That's Quick Sort, not HeapSort.

GATE⭐ Easy

Level 2 — Conceptual ⭐⭐ (6 Questions)

In separate chaining, a hash table with 10 slots and 30 elements has an average chain length of:

3
10
30
0.33

✅ (a) 3
📖 Load factor α = n/m = 30/10 = 3. Average chain length = α = 3. Expected search time: O(1 + α) = O(4).
❌ (b)(c) Not ratios. (d) Inverted.

GATE⭐⭐ Medium

Linear probing suffers from "primary clustering." This means:

The hash function is bad
Consecutive occupied slots form long chains, degrading search to O(n)
The table is too small
Keys are sorted

✅ (b)
📖 When slots near a collision get filled, they create a "cluster." Any new key hashing into this cluster must probe through the entire cluster — the bigger it gets, the slower it gets.
❌ (a) Can happen even with good hash functions. (c) Related but not the definition.

Interview⭐⭐ Medium

DFS on a graph uses which data structure internally (when using recursion)?

Queue
Call stack (implicit stack)
Heap
Hash table

✅ (b) Call stack
📖 Each recursive DFS call pushes a frame onto the call stack. When it returns (backtracks), the frame is popped. DFS can also be implemented with an explicit stack.
❌ (a) Queue is BFS.

GATE⭐⭐ Medium

Q9 ⚠️ TRAP

Dijkstra's algorithm works correctly with negative edge weights.

True — it finds shortest paths regardless
False — negative weights can cause Dijkstra to miss shorter paths; use Bellman-Ford instead
True — just use absolute values
False — graphs can't have negative weights

✅ (b)
📖 Trap! Dijkstra's greedy approach assumes: once a vertex is "finalized," no shorter path exists. Negative weights violate this — a later path through a negative edge could be shorter. Bellman-Ford handles negative weights in O(VE).
❌ (a)(c) Dangerous misconceptions. (d) Graphs CAN have negative weights.

⚠️ TRAPGATE⭐⭐ Medium

Q10

Floyd-Warshall's time complexity is:

O(V + E)
O(V² log V)
O(V³)
O(E log V)

✅ (c) O(V³)
📖 Three nested loops over V vertices: for k, for i, for j. Simple but cubic. For V=1000: 1 billion operations. For V=10000: 1 trillion — impractical.
❌ (a) BFS/DFS. (d) Dijkstra with heap.

GATE⭐⭐ Medium

Q11

A heap is best represented as an array because:

Arrays are always faster
A complete binary tree maps perfectly to array indices with no wasted space
Heaps can't use pointers
Arrays support O(1) insert

✅ (b)
📖 Heaps are complete binary trees — all levels filled left-to-right. Array indices 2i+1 (left) and 2i+2 (right) perfectly represent parent-child relationships without pointer overhead.
❌ (a) Not universally true. (c) Could use pointers but wasteful. (d) Heap insert is O(log n).

University⭐⭐ Medium

Level 3 — Application ⭐⭐ (8 Questions)

Q12

Ola needs the 5 nearest drivers from 500K. Best approach?

Sort all 500K by distance — O(n log n)
Min-heap of all drivers, extract 5 times — O(n + 5 log n)
Max-heap of size 5, scan all drivers — O(n log 5)
Both (b) and (c) work, but (c) is more space-efficient

✅ (d)
📖 (b) builds a full heap: O(n) build + 5 extractions. (c) maintains a tiny heap of 5 elements, comparing each driver: O(n log 5) ≈ O(n). Both are O(n), but (c) uses O(5) space vs O(n).
❌ (a) O(n log n) is unnecessary — we only need top 5, not full sort.

Interview⭐⭐ Medium

Q13

After inserting keys 10, 20, 30 into a hash table of size 7 with h(k)=k%7, the slots occupied are:

3, 6, 2
10, 20, 30
0, 1, 2
3, 6, 2

✅ (a) and (d) — 3, 6, 2
📖 10%7=3, 20%7=6, 30%7=2. No collisions — all different slots.
❌ (b) Slots are indices, not values. (c) Wrong computation.

University⭐⭐ Medium

Q14

BFS from vertex A in an unweighted graph finds:

The longest path to each vertex
The shortest path (minimum edges) to each vertex
All cycles in the graph
The minimum spanning tree

✅ (b) Shortest path (unweighted)
📖 BFS visits vertices level-by-level. The first time it reaches a vertex is via the shortest path (fewest edges). For weighted graphs, you need Dijkstra.
❌ Others are different problems.

GATE⭐⭐ Medium

Q15 ⚠️ TRAP

A hash table with load factor > 1 using separate chaining will fail (overflow).

True — can't store more elements than slots
False — separate chaining handles any load factor via linked lists
True — hash functions can't handle load factor > 1
False — but performance degrades linearly

✅ (b) with (d) also partially correct
📖 Trap! Open addressing (linear probing) DOES fail at load factor > 1. But separate chaining uses linked lists — each slot can hold unlimited elements. Load factor > 1 means average chain length > 1, so lookups slow down but DON'T fail.
❌ (a)(c) Only true for open addressing.

⚠️ TRAPGATE⭐⭐ Medium

Q16

Adjacency matrix vs adjacency list: for a sparse graph (V=10000, E=20000), which uses less memory?

Matrix: O(V²) = 100M entries
List: O(V+E) = 30,000 entries
Same memory for both
Matrix is always better

✅ (b) Adjacency list
📖 Matrix: 10000² = 100,000,000 entries (400 MB for int). List: 10000 + 20000 = 30,000 entries (~240 KB). For sparse graphs, list uses 1000x less memory.
❌ (a) States matrix size but doesn't call it better. (d) Matrix is better for dense graphs only.

GATE⭐⭐ Medium

Q17

DFS can be used to detect cycles in a graph. BFS cannot.

True — only DFS detects cycles
False — both can detect cycles
True — BFS is only for shortest paths
False — neither can detect cycles

✅ (b) False — both can detect cycles
📖 DFS detects back edges (revisiting an ancestor). BFS detects cycles when it encounters an already-visited non-parent vertex. Both work for undirected graphs. For directed graphs, DFS with coloring (white/gray/black) is standard.
❌ (a)(c) BFS can also detect cycles.

Interview⭐⭐ Medium

Q18

Python's dict is implemented as:

A balanced BST
A hash table with open addressing
A sorted array
A linked list

✅ (b) Hash table with open addressing
📖 Python's dict uses hash table with open addressing (specifically, a variant of linear probing with perturbation). Java's HashMap uses separate chaining. Both achieve O(1) average lookup.
❌ (a) C++ std::map uses Red-Black tree, but Python dict doesn't.

Interview⭐⭐ Medium

Q19

Google Maps uses Dijkstra's algorithm directly on the full road graph.

True — Dijkstra is fast enough
False — it uses a preprocessed contraction hierarchy with A* search for speed
True — modern CPUs can handle it
False — Google uses BFS instead

✅ (b)
📖 The full road graph has billions of nodes. Even Dijkstra with a heap would take seconds. Google preprocesses the graph into a hierarchy (contraction hierarchies) and uses A* (Dijkstra + heuristic) on this compressed graph. Result: millisecond routing.
❌ (a)(c) Too slow for real-time. (d) BFS doesn't handle weights.

Interview⭐⭐ Medium

Level 4 — Analysis ⭐⭐⭐ (6 Questions)

Q20

For finding the shortest path between ALL pairs of 500 railway stations, Dijkstra vs Floyd-Warshall:

Run Dijkstra 500 times: O(500 × (V+E) log V)
Floyd-Warshall once: O(V³) = O(500³) = 125M operations
Both are equivalent
(b) is better for dense graphs, (a) for sparse graphs

✅ (d)
📖 Dijkstra × V: O(V(V+E)log V). Floyd: O(V³). For dense graphs (E≈V²): Dijkstra = O(V³ log V) > Floyd = O(V³). For sparse graphs (E≈V): Dijkstra = O(V² log V) < Floyd = O(V³). The right choice depends on graph density.
❌ (c) They're different for different graph types.

GATE⭐⭐⭐ Hard

Q21

HeapSort is not stable. This matters when:

Never — stability is irrelevant
When sorting records by a secondary key while preserving primary key order
Only for linked lists
Only for strings

✅ (b)
📖 Stability preserves relative order of equal elements. Example: sorting students by name (primary) then by grade (secondary). An unstable sort may reorder students with the same grade, breaking the name ordering.
❌ (a) Stability matters in many practical scenarios.

Interview⭐⭐⭐ Hard

Q22 ⚠️ TRAP

A hash table with O(1) lookup is always faster than a BST with O(log n) lookup.

True — O(1) < O(log n) mathematically
False — hash tables don't support range queries, ordered iteration, or finding min/max efficiently
True — hash tables are universally superior
False — but only because hash tables use more memory

✅ (b)
📖 Trap! Hash tables win for exact-key lookup. But BSTs support: range queries ("all prices between ₹100-500"), ordered iteration (sorted output), and min/max in O(log n). Hash tables need O(n) for all of these. Choose based on your access pattern.
❌ (a)(c) Ignores BST advantages.

⚠️ TRAPInterview⭐⭐⭐ Hard

Q23

Time complexity of BFS and DFS on a graph represented as an adjacency list:

O(V)
O(E)
O(V + E)
O(V × E)

✅ (c) O(V + E)
📖 Each vertex is enqueued/pushed once: O(V). Each edge is examined once (twice for undirected): O(E). Total: O(V + E). With adjacency matrix: O(V²) because you check all V neighbors for each vertex.
❌ (a)(b) Missing one component. (d) Way too high.

GATE⭐⭐⭐ Hard

Q24

Building a heap from n elements is O(n), not O(n log n), because:

Most nodes are near the bottom and need very few swaps — the sum converges to O(n)
The heap is already sorted
Building only takes one pass
It's a mathematical coincidence

✅ (a)
📖 Bottom-up heapify: leaves (n/2 nodes) need 0 swaps. Level above (n/4 nodes) need ≤1 swap. Next (n/8) need ≤2. Sum: n/4·1 + n/8·2 + n/16·3 + ... = n·Σ(i/2^i) converges to O(n). This is Floyd's linear-time build-heap proof.
❌ (b)(c)(d) Wrong explanations.

GATE⭐⭐⭐ Hard

Q25

Which correctly summarizes the "choose your DSA" decision?

Always use hash tables — O(1) is unbeatable
Always use arrays — simplest is best
Match the data structure to your access pattern: arrays for index access, hash tables for key lookup, heaps for priority, trees for ordered data, graphs for relationships
It doesn't matter — modern hardware makes everything fast

✅ (c)
📖 This is the master lesson of the entire course. There is no universally "best" data structure. Each has trade-offs. The engineer's job is to match the structure to the problem's access pattern.
❌ "Always" answers are always wrong in engineering.

Interview⭐⭐⭐ Hard

Section 8

Chapter Summary — Course Finale

5 Things Every Engineer Must Remember

Heaps give O(1) access to the extreme element — need the minimum, maximum, nearest, cheapest, or most urgent? Use a heap. Ola, Uber, and every OS scheduler uses this.
Hash tables give O(1) key-value lookup — need "has this been seen before?" or "what's the value for this key?" Use hashing. Python's dict, Redis, and every database cache is a hash table.
Graphs model relationships — any problem with "connections" (roads, friendships, dependencies) is a graph problem. BFS for shortest unweighted path, DFS for exploration and cycle detection.
Dijkstra's is the foundation of navigation — every map app, network router, and logistics optimizer uses it. The min-heap optimization reduces O(V²) to O((V+E) log V).
No data structure is universally best — arrays for index access, linked lists for insert/delete, trees for ordered data, hash tables for key lookup, heaps for priority, graphs for relationships. The engineer's job is to match the structure to the problem.

Complete DSA Course — Master Table

Structure	Best For	Search	Insert	Delete	Used At
Array	Index access	O(n) / O(log n)*	O(n)	O(n)	Flipkart, every program
Linked List	Insert/Delete	O(n)	O(1)**	O(1)**	Spotify, Linux kernel
Stack	LIFO / Undo	—	O(1)	O(1)	Paytm, compilers
Queue	FIFO / Buffering	—	O(1)	O(1)	Swiggy, Amazon SQS
BST	Ordered dynamic data	O(log n)	O(log n)	O(log n)	MongoDB, databases
Heap	Priority / Extremes	O(n)	O(log n)	O(log n)	Ola, OS scheduler
Hash Table	Key-value lookup	O(1)	O(1)	O(1)	PhonePe, Redis, DNS
Graph	Relationships	O(V+E)	O(1)	O(V+E)	Google Maps, IRCTC

*Binary search on sorted array. **At known position.

🎓 Congratulations — Course Complete!

You've completed all 5 units of "Data Structures & Algorithms: Industry Edition." You now understand:

How Flipkart stores 150 million products (arrays)
How Spotify manages playlists (linked lists)
How Paytm rolls back failed payments (stacks)
How Swiggy processes 50K orders/hour (queues)
How Nykaa organizes 5 million products (trees)
How Ola finds nearest drivers (heaps)
How PhonePe prevents duplicate charges (hashing)
How Google Maps finds shortest routes (graphs)

Every data structure you've learned powers real systems used by hundreds of millions of people. You're not just learning theory — you're learning how India's and the world's tech infrastructure works.