Data Structures & Algorithms: Industry Edition

Unit 5: Heaps, Hashing & Graphs

HeapSort, Hash Tables, BFS, DFS, Dijkstra's & Floyd-Warshall โ€” with real examples from Ola, PhonePe, Google Maps & IRCTC.

๐Ÿข Real Projects  |  ๐Ÿ’ป 4 Lab Programs (Python + C)  |  ๐Ÿ“ 25 MCQs  |  ๐ŸŽฏ 3 Interview Questions

Section 1

Industry Hook โ€” The Real-World Problem First

๐Ÿ—บ๏ธ Three Problems That Power India's Digital Infrastructure

Problem 1 โ€” Ola (Heaps): When you book a ride, Ola must find the nearest driver from 500,000 active drivers โ€” in under 5ms. Scanning all 500K drivers linearly is too slow. A min-heap (priority queue) gives the nearest driver in O(1) and updates in O(log n). That's ~19 operations instead of 500,000.

Problem 2 โ€” PhonePe (Hashing): PhonePe processes 7 billion UPI transactions monthly. Every payment must check: "Has this transaction ID been seen before?" Searching 7 billion records linearly = impossible. A hash table answers in O(1) โ€” constant time regardless of table size.

Problem 3 โ€” Google Maps (Graphs): When you search "Delhi to Mumbai" on Google Maps, it explores millions of road segments to find the shortest route. Roads are edges, intersections are vertices โ€” this is a graph. Dijkstra's algorithm finds the shortest path in O((V+E) log V). IRCTC uses the same algorithm to find the best train route with connections.

This is exactly the problem heaps, hashing, and graphs solve. Let's master them all.

๐Ÿ‡ฎ๐Ÿ‡ณ Ola๐Ÿ‡ฎ๐Ÿ‡ณ PhonePeGoogle Maps๐Ÿ‡ฎ๐Ÿ‡ณ IRCTC
Section 2

Concept Explanation โ€” Theory, Earned

2.1 Heaps & HeapSort

Layer 1 โ€” Intuition

A heap is like a VIP hospital queue. The patient with the highest severity (priority) is always treated first. You can add new patients and the most critical one automatically "bubbles up" to the front. A max-heap keeps the maximum at the root; a min-heap keeps the minimum.

Layer 2 โ€” Visual

Max-Heap (parent โ‰ฅ children)
           90
         /    \
       80      70
      /  \    /
    50   60  30

Array: [90, 80, 70, 50, 60, 30]  (level-order)

For index i:
  Parent:      (i-1)/2
  Left child:  2i + 1
  Right child: 2i + 2

Heap Property: arr[parent] โ‰ฅ arr[child] (max-heap)
               arr[parent] โ‰ค arr[child] (min-heap)

Heap Operations

OperationTimeHow
Get Max/MinO(1)Root is always max/min
InsertO(log n)Add at end, bubble UP (sift up)
Extract Max/MinO(log n)Remove root, replace with last, bubble DOWN (heapify)
Build HeapO(n)Bottom-up heapify (not O(n log n)!)
HeapSortO(n log n)Build heap + n extractions

Building a heap from an array is O(n), not O(n log n). This is counter-intuitive โ€” you'd expect n insertions ร— O(log n) each = O(n log n). But bottom-up heapify is smarter: most nodes are near the bottom and need very few swaps. The math proves it converges to O(n). This was proven by Floyd in 1964.

2.2 Hashing & Hash Tables

Layer 1 โ€” Intuition

A hash table is like a library with numbered shelves. Instead of searching every shelf, you compute a shelf number from the book title: "Harry Potter" โ†’ shelf 7. Go directly to shelf 7. Done in O(1).

Layer 2 โ€” Hash Function & Collisions

Hashing Concept
key โ†’ hash_function(key) โ†’ index โ†’ store in array[index]

Example: hash(key) = key % table_size

Insert keys: 25, 37, 42, 55, 73  (table_size = 10)
  25 % 10 = 5  โ†’ slot 5
  37 % 10 = 7  โ†’ slot 7
  42 % 10 = 2  โ†’ slot 2
  55 % 10 = 5  โ†’ COLLISION! Slot 5 already has 25!
  73 % 10 = 3  โ†’ slot 3

Collision Resolution Techniques

TechniqueHow It WorksProsCons
Separate ChainingEach slot holds a linked listSimple, never "full"Extra pointer space, cache unfriendly
Linear ProbingTry next slot: (h+1), (h+2), ...Cache friendlyPrimary clustering
Quadratic ProbingTry (h+1ยฒ), (h+2ยฒ), (h+3ยฒ), ...Less clusteringSecondary clustering, may not find empty slot
Double Hashinghโ‚‚(key) as step: hโ‚ + iยทhโ‚‚Best distributionComplex, needs good hโ‚‚
Real consequence: PhonePe with 7B monthly transactions: hash table lookup = O(1) โ‰ˆ 50 nanoseconds. Linear search = O(n) โ‰ˆ 7 billion ร— 7ns = 49 seconds per lookup. At 2,700 transactions/second, the system would need 49 ร— 2,700 = 132,300 seconds of compute per second. Physically impossible without hashing.

2.3 Graphs โ€” Vertices, Edges, Connections

Layer 1 โ€” Intuition

A graph is a social network. Each person is a vertex, each friendship is an edge. Unlike trees, graphs can have cycles (Aโ†’Bโ†’Cโ†’A), disconnected components, and edges with weights (distance between cities).

Layer 2 โ€” Representations

Example Graph
     A โ€”โ€”(4)โ€”โ€” B
     |         |
    (2)       (3)
     |         |
     C โ€”โ€”(1)โ€”โ€” D

Adjacency Matrix
     A  B  C  D
A  [ 0  4  2  0 ]
B  [ 4  0  0  3 ]
C  [ 2  0  0  1 ]
D  [ 0  3  1  0 ]
Space: O(Vยฒ). Good for dense graphs.

Adjacency List
A โ†’ [(B,4), (C,2)]
B โ†’ [(A,4), (D,3)]
C โ†’ [(A,2), (D,1)]
D โ†’ [(B,3), (C,1)]
Space: O(V+E). Good for sparse graphs.

BFS vs DFS

BFS (Breadth-First)DFS (Depth-First)
Uses a QueueUses a Stack (or recursion)
Visits level by levelGoes deep before backtracking
Finds shortest path (unweighted)Finds all connected components
Google Maps: "How many stops?"Maze solving: "Is there a path?"
Space: O(V)Space: O(V)
Time: O(V+E)Time: O(V+E)

2.4 Shortest Path Algorithms

Dijkstra's Algorithm โ€” Single Source Shortest Path

Find shortest path from A to all vertices:

Graph: A-(4)-B, A-(2)-C, B-(3)-D, C-(1)-D

Step 0: dist = {A:0, B:โˆž, C:โˆž, D:โˆž}     visited = {}
Step 1: Visit A (dist=0)
        Update B: 0+4=4, C: 0+2=2
        dist = {A:0, B:4, C:2, D:โˆž}       visited = {A}
Step 2: Visit C (dist=2, smallest unvisited)
        Update D: 2+1=3
        dist = {A:0, B:4, C:2, D:3}       visited = {A,C}
Step 3: Visit D (dist=3)
        Update B: 3+3=6 > 4, no update
        dist = {A:0, B:4, C:2, D:3}       visited = {A,C,D}
Step 4: Visit B (dist=4)
        dist = {A:0, B:4, C:2, D:3}       visited = {A,C,D,B}

Shortest: Aโ†’B=4, Aโ†’C=2, Aโ†’D=3 (via C!)

Time: O((V+E) log V) with min-heap. Without heap: O(Vยฒ).

Floyd-Warshall โ€” All-Pairs Shortest Path

Finds shortest path between every pair of vertices. Uses dynamic programming: "Can going through vertex k improve the path from i to j?"

Core Idea
for k in all vertices:
    for i in all vertices:
        for j in all vertices:
            dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

Time: O(Vยณ)   Space: O(Vยฒ)
Used when: You need ALL shortest paths (e.g., IRCTC route planning between ALL station pairs).

Dijkstra's finds shortest paths from ONE source to ALL destinations โ€” O((V+E) log V). Floyd-Warshall finds shortest paths between ALL pairs โ€” O(Vยณ). Google Maps uses Dijkstra (one source: your location). IRCTC precomputes all station-pair distances using Floyd-Warshall. When should you use which?

Section 3

Lab Program Implementations

LAB PROGRAM 1: Heap Operations & HeapSort

๐Ÿ“˜ Syllabus: Unit 5 โ€” Heaps๐Ÿข Industry: Ola driver selection, OS task scheduler, Top-K queries
Python
class MaxHeap:
    """Max-Heap using array. Parent โ‰ฅ both children.
    Industry: Ola's driver pool โ€” extract nearest (max priority) in O(1)."""
    def __init__(self):
        self.heap = []

    def _parent(self, i): return (i - 1) // 2
    def _left(self, i):   return 2 * i + 1
    def _right(self, i):  return 2 * i + 2

    def insert(self, val):
        """Add value, then bubble UP. O(log n)."""
        self.heap.append(val)
        i = len(self.heap) - 1
        while i > 0 and self.heap[self._parent(i)] < self.heap[i]:
            p = self._parent(i)
            self.heap[i], self.heap[p] = self.heap[p], self.heap[i]
            i = p

    def extract_max(self):
        """Remove and return max (root). O(log n)."""
        if not self.heap: return None
        max_val = self.heap[0]
        self.heap[0] = self.heap[-1]    # Move last to root
        self.heap.pop()
        self._heapify_down(0)           # Bubble down
        return max_val

    def _heapify_down(self, i):
        """Restore heap property by bubbling down."""
        n = len(self.heap)
        largest = i
        l, r = self._left(i), self._right(i)
        if l < n and self.heap[l] > self.heap[largest]: largest = l
        if r < n and self.heap[r] > self.heap[largest]: largest = r
        if largest != i:
            self.heap[i], self.heap[largest] = self.heap[largest], self.heap[i]
            self._heapify_down(largest)


def heap_sort(arr):
    """HeapSort: Build max-heap, repeatedly extract max. O(n log n)."""
    n = len(arr)
    # Build max-heap (bottom-up) โ€” O(n)
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)
    # Extract elements one by one
    for i in range(n - 1, 0, -1):
        arr[0], arr[i] = arr[i], arr[0]   # Move max to end
        heapify(arr, i, 0)                # Heapify reduced array

def heapify(arr, n, i):
    largest = i
    l, r = 2*i+1, 2*i+2
    if l < n and arr[l] > arr[largest]: largest = l
    if r < n and arr[r] > arr[largest]: largest = r
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]
        heapify(arr, n, largest)

data = [12, 11, 13, 5, 6, 7]
heap_sort(data)
print("HeapSort:", data)
HeapSort: [5, 6, 7, 11, 12, 13]

C โ€” HeapSort

C
#include <stdio.h>
void swap(int* a, int* b) { int t=*a; *a=*b; *b=t; }

void heapify(int a[], int n, int i) {
    int big=i, l=2*i+1, r=2*i+2;
    if(l<n && a[l]>a[big]) big=l;
    if(r<n && a[r]>a[big]) big=r;
    if(big!=i){ swap(&a[i],&a[big]); heapify(a,n,big); }
}
void heapSort(int a[], int n) {
    for(int i=n/2-1;i>=0;i--) heapify(a,n,i);
    for(int i=n-1;i>0;i--){ swap(&a[0],&a[i]); heapify(a,i,0); }
}
int main() {
    int a[]={12,11,13,5,6,7};
    heapSort(a,6);
    for(int i=0;i<6;i++) printf("%d ",a[i]);
    return 0;
}

LAB PROGRAM 2: Hash Table with Collision Resolution

๐Ÿ“˜ Syllabus: Unit 5 โ€” Hashing๐Ÿข Industry: PhonePe transaction dedup, DNS lookup, database indexing
Python
class HashTable:
    """Hash table with separate chaining (linked list per bucket).
    Industry: PhonePe checks transaction uniqueness in O(1)."""
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # Array of lists
        self.count = 0

    def _hash(self, key):
        return hash(key) % self.size

    def put(self, key, value):
        """Insert or update key-value pair. O(1) average."""
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)  # Update
                return
        self.table[idx].append((key, value))       # New entry
        self.count += 1

    def get(self, key):
        """Retrieve value by key. O(1) average."""
        idx = self._hash(key)
        for k, v in self.table[idx]:
            if k == key: return v
        return None

    def delete(self, key):
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx].pop(i)
                self.count -= 1; return True
        return False

    def display(self):
        for i, bucket in enumerate(self.table):
            if bucket:
                print(f"  [{i}]: {' โ†’ '.join(f'{k}:{v}' for k,v in bucket)}")


ht = HashTable(7)
ht.put("TXN001", 500)
ht.put("TXN002", 1200)
ht.put("TXN003", 300)
ht.put("TXN004", 750)
print("Hash Table:"); ht.display()
print(f"Get TXN002: โ‚น{ht.get('TXN002')}")
ht.delete("TXN003")
print(f"After deleting TXN003: {ht.get('TXN003')}")
Hash Table: [0]: TXN002:1200 [2]: TXN001:500 [4]: TXN004:750 [6]: TXN003:300 Get TXN002: โ‚น1200 After deleting TXN003: None

LAB PROGRAM 3: BFS and DFS on a Graph

๐Ÿ“˜ Syllabus: Unit 5 โ€” Graph Traversals๐Ÿข Industry: Google Maps routing, LinkedIn connections, web crawlers
Python
from collections import deque

class Graph:
    """Graph using adjacency list representation."""
    def __init__(self):
        self.adj = {}   # {vertex: [neighbors]}

    def add_edge(self, u, v):
        self.adj.setdefault(u, []).append(v)
        self.adj.setdefault(v, []).append(u)  # Undirected

    def bfs(self, start):
        """Breadth-First Search using queue. O(V+E).
        Visits vertices level by level โ€” finds shortest path (unweighted).
        Industry: Google Maps 'how many stops to destination?'"""
        visited = set([start])
        queue = deque([start])
        order = []

        while queue:
            vertex = queue.popleft()       # FIFO โ€” visit oldest first
            order.append(vertex)
            for neighbor in self.adj.get(vertex, []):
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)

        return order

    def dfs(self, start, visited=None):
        """Depth-First Search using recursion. O(V+E).
        Goes deep before backtracking โ€” detects cycles, topological sort.
        Industry: Web crawler follows links as deep as possible."""
        if visited is None: visited = set()
        visited.add(start)
        result = [start]
        for neighbor in self.adj.get(start, []):
            if neighbor not in visited:
                result.extend(self.dfs(neighbor, visited))
        return result


g = Graph()
for u, v in [("Delhi","Jaipur"), ("Delhi","Agra"), ("Jaipur","Udaipur"),
             ("Agra","Lucknow"), ("Lucknow","Varanasi")]:
    g.add_edge(u, v)

print("BFS from Delhi:", g.bfs("Delhi"))
print("DFS from Delhi:", g.dfs("Delhi"))
BFS from Delhi: ['Delhi', 'Jaipur', 'Agra', 'Udaipur', 'Lucknow', 'Varanasi'] DFS from Delhi: ['Delhi', 'Jaipur', 'Udaipur', 'Agra', 'Lucknow', 'Varanasi']

C โ€” BFS

C
#include <stdio.h>
#define V 6

int adj[V][V], visited[V], queue[V], front=0, rear=-1;

void bfs(int start) {
    visited[start] = 1; queue[++rear] = start;
    while(front <= rear) {
        int v = queue[front++];
        printf("%d ", v);
        for(int i=0;i<V;i++)
            if(adj[v][i] && !visited[i]) { visited[i]=1; queue[++rear]=i; }
    }
}

int main() {
    adj[0][1]=adj[1][0]=1; adj[0][2]=adj[2][0]=1;
    adj[1][3]=adj[3][1]=1; adj[2][4]=adj[4][2]=1; adj[4][5]=adj[5][4]=1;
    printf("BFS: "); bfs(0);
    return 0;
}

LAB PROGRAM 4: Dijkstra's Shortest Path Algorithm

๐Ÿ“˜ Syllabus: Unit 5 โ€” Shortest Path๐Ÿข Industry: Google Maps, IRCTC train routes, network packet routing
Python
import heapq

def dijkstra(graph, start):
    """Dijkstra's algorithm using min-heap (priority queue).
    Finds shortest distance from start to ALL other vertices.
    Time: O((V+E) log V) with heap. Without: O(Vยฒ).
    
    Industry: Google Maps finds shortest route from your location
    to destination. IRCTC finds optimal train routes."""
    
    dist = {v: float('inf') for v in graph}
    dist[start] = 0
    prev = {v: None for v in graph}    # To reconstruct path
    pq = [(0, start)]                    # (distance, vertex)
    visited = set()
    
    while pq:
        d, u = heapq.heappop(pq)          # Get nearest unvisited
        if u in visited: continue
        visited.add(u)
        
        for v, weight in graph[u]:
            new_dist = d + weight
            if new_dist < dist[v]:
                dist[v] = new_dist        # Found shorter path!
                prev[v] = u
                heapq.heappush(pq, (new_dist, v))
    
    return dist, prev


def get_path(prev, target):
    path = []
    while target:
        path.append(target)
        target = prev[target]
    return path[::-1]


# Indian railway network (simplified)
railway = {
    "Delhi":     [("Jaipur", 281), ("Agra", 200), ("Lucknow", 500)],
    "Jaipur":    [("Delhi", 281), ("Ahmedabad", 625)],
    "Agra":      [("Delhi", 200), ("Lucknow", 335)],
    "Lucknow":   [("Delhi", 500), ("Agra", 335), ("Varanasi", 300)],
    "Ahmedabad": [("Jaipur", 625), ("Mumbai", 524)],
    "Varanasi":  [("Lucknow", 300)],
    "Mumbai":    [("Ahmedabad", 524)],
}

dist, prev = dijkstra(railway, "Delhi")
print("=== IRCTC Route Planner: Delhi to All Cities ===")
for city, d in sorted(dist.items(), key=lambda x: x[1]):
    path = get_path(prev, city)
    print(f"  {city:12s}: {d:4d} km  via {' โ†’ '.join(path)}")
=== IRCTC Route Planner: Delhi to All Cities === Delhi : 0 km via Delhi Agra : 200 km via Delhi โ†’ Agra Jaipur : 281 km via Delhi โ†’ Jaipur Lucknow : 500 km via Delhi โ†’ Lucknow Varanasi : 800 km via Delhi โ†’ Lucknow โ†’ Varanasi Ahmedabad : 906 km via Delhi โ†’ Jaipur โ†’ Ahmedabad Mumbai : 1430 km via Delhi โ†’ Jaipur โ†’ Ahmedabad โ†’ Mumbai

Notice Delhiโ†’Lucknow is 500 km directly, but Delhiโ†’Agraโ†’Lucknow is 200+335 = 535 km. Dijkstra correctly picks the direct route. But Delhiโ†’Mumbai goes through Jaipurโ†’Ahmedabad (1430 km) because that's shorter than Delhiโ†’Agraโ†’Lucknowโ†’ โˆž (no direct Lucknow-Mumbai link). This is exactly how Google Maps works โ€” it doesn't always pick the "obvious" route.

Section 4

Industry Case Studies

Case Study 1: Ola โ€” Driver Matching with Min-Heap

๐Ÿข CompanyOla (๐Ÿ‡ฎ๐Ÿ‡ณ 2M+ rides/day, 500K active drivers)
โš™๏ธ SystemReal-time Driver Matching Engine
๐Ÿ”ด ProblemWhen a rider requests a ride, find the nearest available driver from 500K active drivers in under 5ms. Drivers' locations update every 3 seconds.
๐ŸŸข DSA UsedGeospatial min-heap: drivers indexed by distance to rider. Extract-min gives nearest driver in O(1). When a driver moves, update their position in O(log n). Combined with geohashing to limit search to nearby grid cells.
๐Ÿ“Š Numbers500K drivers, 2M rides/day. Heap extract-min: O(1) = ~100ns. Linear scan: O(500K) = ~3.5ms. At 23 rides/second, heap saves 80ms/second of compute.
โœ… OutcomeDriver matched within 3 seconds of request. Rider sees "Driver assigned" almost instantly.
๐Ÿ’ก Student LessonHeaps are for "give me the best element NOW." Whenever you need the min, max, nearest, cheapest, or most urgent item, think heap.

Case Study 2: PhonePe โ€” Transaction Deduplication with Hash Tables

๐Ÿข CompanyPhonePe (๐Ÿ‡ฎ๐Ÿ‡ณ 7B+ monthly UPI transactions)
โš™๏ธ SystemTransaction Deduplication & Idempotency Layer
๐Ÿ”ด ProblemNetwork timeouts cause payment retries. If โ‚น500 is charged twice, the user loses money. Every transaction must be checked: "Has this exact transaction ID been processed before?"
๐ŸŸข DSA UsedDistributed hash table (Bloom filter + hash map). Transaction ID โ†’ hash โ†’ O(1) lookup. If found โ†’ reject duplicate. If not โ†’ process and store.
๐Ÿ“Š Numbers7B transactions/month = ~2,700/second. Each lookup: O(1) โ‰ˆ 50ns. Without hashing: O(n) scan of billions = impossible.
โœ… OutcomeZero duplicate charges. Every retry is safely rejected without double-processing.
๐Ÿ’ก Student LessonHash tables are for "have I seen this before?" โ€” deduplication, caching, counting, and lookup. Python's dict, Java's HashMap, and Redis are all hash tables.

Case Study 3: Google Maps โ€” Shortest Path with Dijkstra

๐Ÿข CompanyGoogle Maps (Global โ€” 1B+ monthly users)
โš™๏ธ SystemRoute Planning Engine
๐Ÿ”ด ProblemFind the shortest/fastest route between two points on a road network with billions of edges. Must handle real-time traffic, road closures, and turn restrictions.
๐ŸŸข DSA UsedA* search (Dijkstra + heuristic) on a contraction hierarchy. The road graph is preprocessed into a hierarchy: local roads โ†’ city roads โ†’ highways. Dijkstra runs on this compressed graph, reducing billions of edges to thousands.
๐Ÿ“Š Numbers1 billion+ route queries/day. Road network: ~1 billion intersections, ~2 billion road segments globally. Route computed in under 500ms.
โœ… OutcomeInstant route suggestions with real-time traffic. The same algorithm powers Uber, Ola, and every navigation app.
๐Ÿ’ก Student LessonDijkstra's is the foundation of every navigation system. Lab Program 4's IRCTC route planner is a simplified version of exactly what Google Maps does.
Section 5

Chapter Project โ€” "Build It Yourself"

๐Ÿ—บ๏ธ Campus Navigation System with Dijkstra

Pitch: Build a shortest-path navigation system for your college campus โ€” find the quickest route between any two buildings, with real walking distances.

Steps

  1. Step 1 โ€” Model campus as a graph: Buildings = vertices (Library, Canteen, Lab, Hostel, etc.). Paths = weighted edges (walking time in minutes).
  2. Step 2 โ€” Adjacency list: Use Lab Program 3's graph structure to store the campus map.
  3. Step 3 โ€” Dijkstra: Use Lab Program 4 to find the shortest path between any two buildings.
  4. Step 4 โ€” Path reconstruction: Print the complete walking route: "Library โ†’ Main Gate โ†’ Canteen (7 min)".
  5. Step 5 โ€” Interactive menu: User inputs source and destination, system outputs shortest route.

Extensions

  • โญ Easy: Add all-pairs shortest paths using Floyd-Warshall โ€” precompute a routing table
  • โญโญ Medium: Account for "blocked paths" (construction, restricted areas) by removing edges dynamically
  • โญโญโญ Hard: Visualize the campus graph and highlight the shortest path using matplotlib
Section 6

Interview Corner โ€” "Crack It at Google/Amazon"

Q1: Top K Frequent Elements โญโญ [Amazon + Google]

Python
import heapq
from collections import Counter

def top_k_frequent(nums, k):
    """Find k most frequent elements. O(n log k).
    Uses hash map (count) + min-heap (top K).
    Industry: Twitter trending topics, YouTube trending videos."""
    count = Counter(nums)                     # O(n) โ€” hash map
    return heapq.nlargest(k, count.keys(), key=count.get)

print(top_k_frequent([1,1,1,2,2,3], 2))  # [1, 2]

Time: O(n log k). Hash map counts in O(n). Heap maintains top K in O(n log k). Much better than sorting O(n log n) for large n, small k.

Q2: Two Sum โญ [Most Asked Interview Question Worldwide]

Python
def two_sum(nums, target):
    """Find two indices whose values add to target. O(n).
    Hash map stores: value โ†’ index. For each num, check if
    (target - num) exists in the map."""
    seen = {}                              # value โ†’ index
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]    # Found pair!
        seen[num] = i                      # Store for future lookup
    return []

print(two_sum([2, 7, 11, 15], 9))  # [0, 1]

Time: O(n). One pass. Hash map lookup is O(1). Brute force is O(nยฒ) with nested loops. This is the #1 most-asked interview question (LeetCode #1).

Q3: Number of Islands (BFS/DFS on Grid) โญโญ [GATE + Interview]

Python
def num_islands(grid):
    """Count connected components of '1's in a 2D grid.
    Each '1' is land, '0' is water. Connected '1's form an island.
    Uses DFS to 'flood fill' each island."""
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    count = 0

    def dfs(r, c):
        if r<0 or r>=rows or c<0 or c>=cols or grid[r][c]!='1':
            return
        grid[r][c] = '0'               # Mark visited
        dfs(r+1,c); dfs(r-1,c); dfs(r,c+1); dfs(r,c-1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1              # New island found
                dfs(r, c)              # Sink the entire island
    return count

Time: O(Rร—C). Each cell visited once. This is the graph traversal applied to a 2D grid โ€” a matrix IS a graph where each cell connects to its 4 neighbors.

Section 7

MCQ Assessment Bank โ€” 25 Questions

Level 1 โ€” Recall โญ (5 Questions)

Q1

In a max-heap, the root contains:

  1. The minimum element
  2. The maximum element
  3. The median element
  4. A random element
โœ… (b) The maximum element
๐Ÿ“– Max-heap property: parent โ‰ฅ children. Root has no parent, so it's the maximum.
โŒ (a) That's a min-heap.
Universityโญ Easy
Q2

Average time complexity of hash table lookup:

  1. O(n)
  2. O(log n)
  3. O(1)
  4. O(nยฒ)
โœ… (c) O(1)
๐Ÿ“– Hash function computes the index directly. No searching needed (on average, with good hash function and load factor).
โŒ (a) Worst case with all collisions. (b) That's BST.
GATEโญ Easy
Q3

BFS uses which data structure?

  1. Stack
  2. Queue
  3. Heap
  4. Array
โœ… (b) Queue
๐Ÿ“– BFS processes vertices in FIFO order โ€” the order they were discovered. This ensures level-by-level traversal.
โŒ (a) Stack is used by DFS.
Universityโญ Easy
Q4

Dijkstra's algorithm finds:

  1. Minimum spanning tree
  2. Shortest path from one source to all vertices
  3. All-pairs shortest paths
  4. Longest path
โœ… (b)
๐Ÿ“– Single-source shortest path. For all-pairs, use Floyd-Warshall. For MST, use Kruskal/Prim.
โŒ (c) That's Floyd-Warshall. (a) That's Prim's/Kruskal's.
GATEโญ Easy
Q5

HeapSort's time complexity is:

  1. O(n) best, O(nยฒ) worst
  2. O(n log n) in all cases
  3. O(nยฒ) in all cases
  4. O(n log n) avg, O(nยฒ) worst
โœ… (b) O(n log n) in all cases
๐Ÿ“– Build heap: O(n). Extract n elements: n ร— O(log n). Total: O(n log n) regardless of input.
โŒ (d) That's Quick Sort, not HeapSort.
GATEโญ Easy

Level 2 โ€” Conceptual โญโญ (6 Questions)

Q6

In separate chaining, a hash table with 10 slots and 30 elements has an average chain length of:

  1. 3
  2. 10
  3. 30
  4. 0.33
โœ… (a) 3
๐Ÿ“– Load factor ฮฑ = n/m = 30/10 = 3. Average chain length = ฮฑ = 3. Expected search time: O(1 + ฮฑ) = O(4).
โŒ (b)(c) Not ratios. (d) Inverted.
GATEโญโญ Medium
Q7

Linear probing suffers from "primary clustering." This means:

  1. The hash function is bad
  2. Consecutive occupied slots form long chains, degrading search to O(n)
  3. The table is too small
  4. Keys are sorted
โœ… (b)
๐Ÿ“– When slots near a collision get filled, they create a "cluster." Any new key hashing into this cluster must probe through the entire cluster โ€” the bigger it gets, the slower it gets.
โŒ (a) Can happen even with good hash functions. (c) Related but not the definition.
Interviewโญโญ Medium
Q8

DFS on a graph uses which data structure internally (when using recursion)?

  1. Queue
  2. Call stack (implicit stack)
  3. Heap
  4. Hash table
โœ… (b) Call stack
๐Ÿ“– Each recursive DFS call pushes a frame onto the call stack. When it returns (backtracks), the frame is popped. DFS can also be implemented with an explicit stack.
โŒ (a) Queue is BFS.
GATEโญโญ Medium
Q9 โš ๏ธ TRAP

Dijkstra's algorithm works correctly with negative edge weights.

  1. True โ€” it finds shortest paths regardless
  2. False โ€” negative weights can cause Dijkstra to miss shorter paths; use Bellman-Ford instead
  3. True โ€” just use absolute values
  4. False โ€” graphs can't have negative weights
โœ… (b)
๐Ÿ“– Trap! Dijkstra's greedy approach assumes: once a vertex is "finalized," no shorter path exists. Negative weights violate this โ€” a later path through a negative edge could be shorter. Bellman-Ford handles negative weights in O(VE).
โŒ (a)(c) Dangerous misconceptions. (d) Graphs CAN have negative weights.
โš ๏ธ TRAPGATEโญโญ Medium
Q10

Floyd-Warshall's time complexity is:

  1. O(V + E)
  2. O(Vยฒ log V)
  3. O(Vยณ)
  4. O(E log V)
โœ… (c) O(Vยณ)
๐Ÿ“– Three nested loops over V vertices: for k, for i, for j. Simple but cubic. For V=1000: 1 billion operations. For V=10000: 1 trillion โ€” impractical.
โŒ (a) BFS/DFS. (d) Dijkstra with heap.
GATEโญโญ Medium
Q11

A heap is best represented as an array because:

  1. Arrays are always faster
  2. A complete binary tree maps perfectly to array indices with no wasted space
  3. Heaps can't use pointers
  4. Arrays support O(1) insert
โœ… (b)
๐Ÿ“– Heaps are complete binary trees โ€” all levels filled left-to-right. Array indices 2i+1 (left) and 2i+2 (right) perfectly represent parent-child relationships without pointer overhead.
โŒ (a) Not universally true. (c) Could use pointers but wasteful. (d) Heap insert is O(log n).
Universityโญโญ Medium

Level 3 โ€” Application โญโญ (8 Questions)

Q12

Ola needs the 5 nearest drivers from 500K. Best approach?

  1. Sort all 500K by distance โ€” O(n log n)
  2. Min-heap of all drivers, extract 5 times โ€” O(n + 5 log n)
  3. Max-heap of size 5, scan all drivers โ€” O(n log 5)
  4. Both (b) and (c) work, but (c) is more space-efficient
โœ… (d)
๐Ÿ“– (b) builds a full heap: O(n) build + 5 extractions. (c) maintains a tiny heap of 5 elements, comparing each driver: O(n log 5) โ‰ˆ O(n). Both are O(n), but (c) uses O(5) space vs O(n).
โŒ (a) O(n log n) is unnecessary โ€” we only need top 5, not full sort.
Interviewโญโญ Medium
Q13

After inserting keys 10, 20, 30 into a hash table of size 7 with h(k)=k%7, the slots occupied are:

  1. 3, 6, 2
  2. 10, 20, 30
  3. 0, 1, 2
  4. 3, 6, 2
โœ… (a) and (d) โ€” 3, 6, 2
๐Ÿ“– 10%7=3, 20%7=6, 30%7=2. No collisions โ€” all different slots.
โŒ (b) Slots are indices, not values. (c) Wrong computation.
Universityโญโญ Medium
Q14

BFS from vertex A in an unweighted graph finds:

  1. The longest path to each vertex
  2. The shortest path (minimum edges) to each vertex
  3. All cycles in the graph
  4. The minimum spanning tree
โœ… (b) Shortest path (unweighted)
๐Ÿ“– BFS visits vertices level-by-level. The first time it reaches a vertex is via the shortest path (fewest edges). For weighted graphs, you need Dijkstra.
โŒ Others are different problems.
GATEโญโญ Medium
Q15 โš ๏ธ TRAP

A hash table with load factor > 1 using separate chaining will fail (overflow).

  1. True โ€” can't store more elements than slots
  2. False โ€” separate chaining handles any load factor via linked lists
  3. True โ€” hash functions can't handle load factor > 1
  4. False โ€” but performance degrades linearly
โœ… (b) with (d) also partially correct
๐Ÿ“– Trap! Open addressing (linear probing) DOES fail at load factor > 1. But separate chaining uses linked lists โ€” each slot can hold unlimited elements. Load factor > 1 means average chain length > 1, so lookups slow down but DON'T fail.
โŒ (a)(c) Only true for open addressing.
โš ๏ธ TRAPGATEโญโญ Medium
Q16

Adjacency matrix vs adjacency list: for a sparse graph (V=10000, E=20000), which uses less memory?

  1. Matrix: O(Vยฒ) = 100M entries
  2. List: O(V+E) = 30,000 entries
  3. Same memory for both
  4. Matrix is always better
โœ… (b) Adjacency list
๐Ÿ“– Matrix: 10000ยฒ = 100,000,000 entries (400 MB for int). List: 10000 + 20000 = 30,000 entries (~240 KB). For sparse graphs, list uses 1000x less memory.
โŒ (a) States matrix size but doesn't call it better. (d) Matrix is better for dense graphs only.
GATEโญโญ Medium
Q17

DFS can be used to detect cycles in a graph. BFS cannot.

  1. True โ€” only DFS detects cycles
  2. False โ€” both can detect cycles
  3. True โ€” BFS is only for shortest paths
  4. False โ€” neither can detect cycles
โœ… (b) False โ€” both can detect cycles
๐Ÿ“– DFS detects back edges (revisiting an ancestor). BFS detects cycles when it encounters an already-visited non-parent vertex. Both work for undirected graphs. For directed graphs, DFS with coloring (white/gray/black) is standard.
โŒ (a)(c) BFS can also detect cycles.
Interviewโญโญ Medium
Q18

Python's dict is implemented as:

  1. A balanced BST
  2. A hash table with open addressing
  3. A sorted array
  4. A linked list
โœ… (b) Hash table with open addressing
๐Ÿ“– Python's dict uses hash table with open addressing (specifically, a variant of linear probing with perturbation). Java's HashMap uses separate chaining. Both achieve O(1) average lookup.
โŒ (a) C++ std::map uses Red-Black tree, but Python dict doesn't.
Interviewโญโญ Medium
Q19

Google Maps uses Dijkstra's algorithm directly on the full road graph.

  1. True โ€” Dijkstra is fast enough
  2. False โ€” it uses a preprocessed contraction hierarchy with A* search for speed
  3. True โ€” modern CPUs can handle it
  4. False โ€” Google uses BFS instead
โœ… (b)
๐Ÿ“– The full road graph has billions of nodes. Even Dijkstra with a heap would take seconds. Google preprocesses the graph into a hierarchy (contraction hierarchies) and uses A* (Dijkstra + heuristic) on this compressed graph. Result: millisecond routing.
โŒ (a)(c) Too slow for real-time. (d) BFS doesn't handle weights.
Interviewโญโญ Medium

Level 4 โ€” Analysis โญโญโญ (6 Questions)

Q20

For finding the shortest path between ALL pairs of 500 railway stations, Dijkstra vs Floyd-Warshall:

  1. Run Dijkstra 500 times: O(500 ร— (V+E) log V)
  2. Floyd-Warshall once: O(Vยณ) = O(500ยณ) = 125M operations
  3. Both are equivalent
  4. (b) is better for dense graphs, (a) for sparse graphs
โœ… (d)
๐Ÿ“– Dijkstra ร— V: O(V(V+E)log V). Floyd: O(Vยณ). For dense graphs (Eโ‰ˆVยฒ): Dijkstra = O(Vยณ log V) > Floyd = O(Vยณ). For sparse graphs (Eโ‰ˆV): Dijkstra = O(Vยฒ log V) < Floyd = O(Vยณ). The right choice depends on graph density.
โŒ (c) They're different for different graph types.
GATEโญโญโญ Hard
Q21

HeapSort is not stable. This matters when:

  1. Never โ€” stability is irrelevant
  2. When sorting records by a secondary key while preserving primary key order
  3. Only for linked lists
  4. Only for strings
โœ… (b)
๐Ÿ“– Stability preserves relative order of equal elements. Example: sorting students by name (primary) then by grade (secondary). An unstable sort may reorder students with the same grade, breaking the name ordering.
โŒ (a) Stability matters in many practical scenarios.
Interviewโญโญโญ Hard
Q22 โš ๏ธ TRAP

A hash table with O(1) lookup is always faster than a BST with O(log n) lookup.

  1. True โ€” O(1) < O(log n) mathematically
  2. False โ€” hash tables don't support range queries, ordered iteration, or finding min/max efficiently
  3. True โ€” hash tables are universally superior
  4. False โ€” but only because hash tables use more memory
โœ… (b)
๐Ÿ“– Trap! Hash tables win for exact-key lookup. But BSTs support: range queries ("all prices between โ‚น100-500"), ordered iteration (sorted output), and min/max in O(log n). Hash tables need O(n) for all of these. Choose based on your access pattern.
โŒ (a)(c) Ignores BST advantages.
โš ๏ธ TRAPInterviewโญโญโญ Hard
Q23

Time complexity of BFS and DFS on a graph represented as an adjacency list:

  1. O(V)
  2. O(E)
  3. O(V + E)
  4. O(V ร— E)
โœ… (c) O(V + E)
๐Ÿ“– Each vertex is enqueued/pushed once: O(V). Each edge is examined once (twice for undirected): O(E). Total: O(V + E). With adjacency matrix: O(Vยฒ) because you check all V neighbors for each vertex.
โŒ (a)(b) Missing one component. (d) Way too high.
GATEโญโญโญ Hard
Q24

Building a heap from n elements is O(n), not O(n log n), because:

  1. Most nodes are near the bottom and need very few swaps โ€” the sum converges to O(n)
  2. The heap is already sorted
  3. Building only takes one pass
  4. It's a mathematical coincidence
โœ… (a)
๐Ÿ“– Bottom-up heapify: leaves (n/2 nodes) need 0 swaps. Level above (n/4 nodes) need โ‰ค1 swap. Next (n/8) need โ‰ค2. Sum: n/4ยท1 + n/8ยท2 + n/16ยท3 + ... = nยทฮฃ(i/2^i) converges to O(n). This is Floyd's linear-time build-heap proof.
โŒ (b)(c)(d) Wrong explanations.
GATEโญโญโญ Hard
Q25

Which correctly summarizes the "choose your DSA" decision?

  1. Always use hash tables โ€” O(1) is unbeatable
  2. Always use arrays โ€” simplest is best
  3. Match the data structure to your access pattern: arrays for index access, hash tables for key lookup, heaps for priority, trees for ordered data, graphs for relationships
  4. It doesn't matter โ€” modern hardware makes everything fast
โœ… (c)
๐Ÿ“– This is the master lesson of the entire course. There is no universally "best" data structure. Each has trade-offs. The engineer's job is to match the structure to the problem's access pattern.
โŒ "Always" answers are always wrong in engineering.
Interviewโญโญโญ Hard
Section 8

Chapter Summary โ€” Course Finale

5 Things Every Engineer Must Remember

  • Heaps give O(1) access to the extreme element โ€” need the minimum, maximum, nearest, cheapest, or most urgent? Use a heap. Ola, Uber, and every OS scheduler uses this.
  • Hash tables give O(1) key-value lookup โ€” need "has this been seen before?" or "what's the value for this key?" Use hashing. Python's dict, Redis, and every database cache is a hash table.
  • Graphs model relationships โ€” any problem with "connections" (roads, friendships, dependencies) is a graph problem. BFS for shortest unweighted path, DFS for exploration and cycle detection.
  • Dijkstra's is the foundation of navigation โ€” every map app, network router, and logistics optimizer uses it. The min-heap optimization reduces O(Vยฒ) to O((V+E) log V).
  • No data structure is universally best โ€” arrays for index access, linked lists for insert/delete, trees for ordered data, hash tables for key lookup, heaps for priority, graphs for relationships. The engineer's job is to match the structure to the problem.

Complete DSA Course โ€” Master Table

StructureBest ForSearchInsertDeleteUsed At
ArrayIndex accessO(n) / O(log n)*O(n)O(n)Flipkart, every program
Linked ListInsert/DeleteO(n)O(1)**O(1)**Spotify, Linux kernel
StackLIFO / Undoโ€”O(1)O(1)Paytm, compilers
QueueFIFO / Bufferingโ€”O(1)O(1)Swiggy, Amazon SQS
BSTOrdered dynamic dataO(log n)O(log n)O(log n)MongoDB, databases
HeapPriority / ExtremesO(n)O(log n)O(log n)Ola, OS scheduler
Hash TableKey-value lookupO(1)O(1)O(1)PhonePe, Redis, DNS
GraphRelationshipsO(V+E)O(1)O(V+E)Google Maps, IRCTC

*Binary search on sorted array. **At known position.

๐ŸŽ“ Congratulations โ€” Course Complete!

You've completed all 5 units of "Data Structures & Algorithms: Industry Edition." You now understand:

  • How Flipkart stores 150 million products (arrays)
  • How Spotify manages playlists (linked lists)
  • How Paytm rolls back failed payments (stacks)
  • How Swiggy processes 50K orders/hour (queues)
  • How Nykaa organizes 5 million products (trees)
  • How Ola finds nearest drivers (heaps)
  • How PhonePe prevents duplicate charges (hashing)
  • How Google Maps finds shortest routes (graphs)

Every data structure you've learned powers real systems used by hundreds of millions of people. You're not just learning theory โ€” you're learning how India's and the world's tech infrastructure works.