Data Structures & Algorithms: Industry Edition
Unit 5: Heaps, Hashing & Graphs
HeapSort, Hash Tables, BFS, DFS, Dijkstra's & Floyd-Warshall โ with real examples from Ola, PhonePe, Google Maps & IRCTC.
๐ข Real Projects | ๐ป 4 Lab Programs (Python + C) | ๐ 25 MCQs | ๐ฏ 3 Interview Questions
Industry Hook โ The Real-World Problem First
๐บ๏ธ Three Problems That Power India's Digital Infrastructure
Problem 1 โ Ola (Heaps): When you book a ride, Ola must find the nearest driver from 500,000 active drivers โ in under 5ms. Scanning all 500K drivers linearly is too slow. A min-heap (priority queue) gives the nearest driver in O(1) and updates in O(log n). That's ~19 operations instead of 500,000.
Problem 2 โ PhonePe (Hashing): PhonePe processes 7 billion UPI transactions monthly. Every payment must check: "Has this transaction ID been seen before?" Searching 7 billion records linearly = impossible. A hash table answers in O(1) โ constant time regardless of table size.
Problem 3 โ Google Maps (Graphs): When you search "Delhi to Mumbai" on Google Maps, it explores millions of road segments to find the shortest route. Roads are edges, intersections are vertices โ this is a graph. Dijkstra's algorithm finds the shortest path in O((V+E) log V). IRCTC uses the same algorithm to find the best train route with connections.
This is exactly the problem heaps, hashing, and graphs solve. Let's master them all.
Concept Explanation โ Theory, Earned
2.1 Heaps & HeapSort
Layer 1 โ Intuition
A heap is like a VIP hospital queue. The patient with the highest severity (priority) is always treated first. You can add new patients and the most critical one automatically "bubbles up" to the front. A max-heap keeps the maximum at the root; a min-heap keeps the minimum.
Layer 2 โ Visual
Max-Heap (parent โฅ children)
90
/ \
80 70
/ \ /
50 60 30
Array: [90, 80, 70, 50, 60, 30] (level-order)
For index i:
Parent: (i-1)/2
Left child: 2i + 1
Right child: 2i + 2
Heap Property: arr[parent] โฅ arr[child] (max-heap)
arr[parent] โค arr[child] (min-heap)
Heap Operations
| Operation | Time | How |
|---|---|---|
| Get Max/Min | O(1) | Root is always max/min |
| Insert | O(log n) | Add at end, bubble UP (sift up) |
| Extract Max/Min | O(log n) | Remove root, replace with last, bubble DOWN (heapify) |
| Build Heap | O(n) | Bottom-up heapify (not O(n log n)!) |
| HeapSort | O(n log n) | Build heap + n extractions |
Building a heap from an array is O(n), not O(n log n). This is counter-intuitive โ you'd expect n insertions ร O(log n) each = O(n log n). But bottom-up heapify is smarter: most nodes are near the bottom and need very few swaps. The math proves it converges to O(n). This was proven by Floyd in 1964.
2.2 Hashing & Hash Tables
Layer 1 โ Intuition
A hash table is like a library with numbered shelves. Instead of searching every shelf, you compute a shelf number from the book title: "Harry Potter" โ shelf 7. Go directly to shelf 7. Done in O(1).
Layer 2 โ Hash Function & Collisions
Hashing Concept
key โ hash_function(key) โ index โ store in array[index]
Example: hash(key) = key % table_size
Insert keys: 25, 37, 42, 55, 73 (table_size = 10)
25 % 10 = 5 โ slot 5
37 % 10 = 7 โ slot 7
42 % 10 = 2 โ slot 2
55 % 10 = 5 โ COLLISION! Slot 5 already has 25!
73 % 10 = 3 โ slot 3
Collision Resolution Techniques
| Technique | How It Works | Pros | Cons |
|---|---|---|---|
| Separate Chaining | Each slot holds a linked list | Simple, never "full" | Extra pointer space, cache unfriendly |
| Linear Probing | Try next slot: (h+1), (h+2), ... | Cache friendly | Primary clustering |
| Quadratic Probing | Try (h+1ยฒ), (h+2ยฒ), (h+3ยฒ), ... | Less clustering | Secondary clustering, may not find empty slot |
| Double Hashing | hโ(key) as step: hโ + iยทhโ | Best distribution | Complex, needs good hโ |
2.3 Graphs โ Vertices, Edges, Connections
Layer 1 โ Intuition
A graph is a social network. Each person is a vertex, each friendship is an edge. Unlike trees, graphs can have cycles (AโBโCโA), disconnected components, and edges with weights (distance between cities).
Layer 2 โ Representations
Example Graph
A โโ(4)โโ B
| |
(2) (3)
| |
C โโ(1)โโ D
Adjacency Matrix
A B C D
A [ 0 4 2 0 ]
B [ 4 0 0 3 ]
C [ 2 0 0 1 ]
D [ 0 3 1 0 ]
Space: O(Vยฒ). Good for dense graphs.
Adjacency List
A โ [(B,4), (C,2)]
B โ [(A,4), (D,3)]
C โ [(A,2), (D,1)]
D โ [(B,3), (C,1)]
Space: O(V+E). Good for sparse graphs.
BFS vs DFS
| BFS (Breadth-First) | DFS (Depth-First) |
|---|---|
| Uses a Queue | Uses a Stack (or recursion) |
| Visits level by level | Goes deep before backtracking |
| Finds shortest path (unweighted) | Finds all connected components |
| Google Maps: "How many stops?" | Maze solving: "Is there a path?" |
| Space: O(V) | Space: O(V) |
| Time: O(V+E) | Time: O(V+E) |
2.4 Shortest Path Algorithms
Dijkstra's Algorithm โ Single Source Shortest Path
Find shortest path from A to all vertices:
Graph: A-(4)-B, A-(2)-C, B-(3)-D, C-(1)-D
Step 0: dist = {A:0, B:โ, C:โ, D:โ} visited = {}
Step 1: Visit A (dist=0)
Update B: 0+4=4, C: 0+2=2
dist = {A:0, B:4, C:2, D:โ} visited = {A}
Step 2: Visit C (dist=2, smallest unvisited)
Update D: 2+1=3
dist = {A:0, B:4, C:2, D:3} visited = {A,C}
Step 3: Visit D (dist=3)
Update B: 3+3=6 > 4, no update
dist = {A:0, B:4, C:2, D:3} visited = {A,C,D}
Step 4: Visit B (dist=4)
dist = {A:0, B:4, C:2, D:3} visited = {A,C,D,B}
Shortest: AโB=4, AโC=2, AโD=3 (via C!)
Time: O((V+E) log V) with min-heap. Without heap: O(Vยฒ).
Floyd-Warshall โ All-Pairs Shortest Path
Finds shortest path between every pair of vertices. Uses dynamic programming: "Can going through vertex k improve the path from i to j?"
Core Idea
for k in all vertices:
for i in all vertices:
for j in all vertices:
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])
Time: O(Vยณ) Space: O(Vยฒ)
Used when: You need ALL shortest paths (e.g., IRCTC route planning between ALL station pairs).
Dijkstra's finds shortest paths from ONE source to ALL destinations โ O((V+E) log V). Floyd-Warshall finds shortest paths between ALL pairs โ O(Vยณ). Google Maps uses Dijkstra (one source: your location). IRCTC precomputes all station-pair distances using Floyd-Warshall. When should you use which?
Lab Program Implementations
LAB PROGRAM 1: Heap Operations & HeapSort
Python
class MaxHeap:
"""Max-Heap using array. Parent โฅ both children.
Industry: Ola's driver pool โ extract nearest (max priority) in O(1)."""
def __init__(self):
self.heap = []
def _parent(self, i): return (i - 1) // 2
def _left(self, i): return 2 * i + 1
def _right(self, i): return 2 * i + 2
def insert(self, val):
"""Add value, then bubble UP. O(log n)."""
self.heap.append(val)
i = len(self.heap) - 1
while i > 0 and self.heap[self._parent(i)] < self.heap[i]:
p = self._parent(i)
self.heap[i], self.heap[p] = self.heap[p], self.heap[i]
i = p
def extract_max(self):
"""Remove and return max (root). O(log n)."""
if not self.heap: return None
max_val = self.heap[0]
self.heap[0] = self.heap[-1] # Move last to root
self.heap.pop()
self._heapify_down(0) # Bubble down
return max_val
def _heapify_down(self, i):
"""Restore heap property by bubbling down."""
n = len(self.heap)
largest = i
l, r = self._left(i), self._right(i)
if l < n and self.heap[l] > self.heap[largest]: largest = l
if r < n and self.heap[r] > self.heap[largest]: largest = r
if largest != i:
self.heap[i], self.heap[largest] = self.heap[largest], self.heap[i]
self._heapify_down(largest)
def heap_sort(arr):
"""HeapSort: Build max-heap, repeatedly extract max. O(n log n)."""
n = len(arr)
# Build max-heap (bottom-up) โ O(n)
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# Extract elements one by one
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0] # Move max to end
heapify(arr, i, 0) # Heapify reduced array
def heapify(arr, n, i):
largest = i
l, r = 2*i+1, 2*i+2
if l < n and arr[l] > arr[largest]: largest = l
if r < n and arr[r] > arr[largest]: largest = r
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)
data = [12, 11, 13, 5, 6, 7]
heap_sort(data)
print("HeapSort:", data)
C โ HeapSort
C
#include <stdio.h>
void swap(int* a, int* b) { int t=*a; *a=*b; *b=t; }
void heapify(int a[], int n, int i) {
int big=i, l=2*i+1, r=2*i+2;
if(l<n && a[l]>a[big]) big=l;
if(r<n && a[r]>a[big]) big=r;
if(big!=i){ swap(&a[i],&a[big]); heapify(a,n,big); }
}
void heapSort(int a[], int n) {
for(int i=n/2-1;i>=0;i--) heapify(a,n,i);
for(int i=n-1;i>0;i--){ swap(&a[0],&a[i]); heapify(a,i,0); }
}
int main() {
int a[]={12,11,13,5,6,7};
heapSort(a,6);
for(int i=0;i<6;i++) printf("%d ",a[i]);
return 0;
}
LAB PROGRAM 2: Hash Table with Collision Resolution
Python
class HashTable:
"""Hash table with separate chaining (linked list per bucket).
Industry: PhonePe checks transaction uniqueness in O(1)."""
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)] # Array of lists
self.count = 0
def _hash(self, key):
return hash(key) % self.size
def put(self, key, value):
"""Insert or update key-value pair. O(1) average."""
idx = self._hash(key)
for i, (k, v) in enumerate(self.table[idx]):
if k == key:
self.table[idx][i] = (key, value) # Update
return
self.table[idx].append((key, value)) # New entry
self.count += 1
def get(self, key):
"""Retrieve value by key. O(1) average."""
idx = self._hash(key)
for k, v in self.table[idx]:
if k == key: return v
return None
def delete(self, key):
idx = self._hash(key)
for i, (k, v) in enumerate(self.table[idx]):
if k == key:
self.table[idx].pop(i)
self.count -= 1; return True
return False
def display(self):
for i, bucket in enumerate(self.table):
if bucket:
print(f" [{i}]: {' โ '.join(f'{k}:{v}' for k,v in bucket)}")
ht = HashTable(7)
ht.put("TXN001", 500)
ht.put("TXN002", 1200)
ht.put("TXN003", 300)
ht.put("TXN004", 750)
print("Hash Table:"); ht.display()
print(f"Get TXN002: โน{ht.get('TXN002')}")
ht.delete("TXN003")
print(f"After deleting TXN003: {ht.get('TXN003')}")
LAB PROGRAM 3: BFS and DFS on a Graph
Python
from collections import deque
class Graph:
"""Graph using adjacency list representation."""
def __init__(self):
self.adj = {} # {vertex: [neighbors]}
def add_edge(self, u, v):
self.adj.setdefault(u, []).append(v)
self.adj.setdefault(v, []).append(u) # Undirected
def bfs(self, start):
"""Breadth-First Search using queue. O(V+E).
Visits vertices level by level โ finds shortest path (unweighted).
Industry: Google Maps 'how many stops to destination?'"""
visited = set([start])
queue = deque([start])
order = []
while queue:
vertex = queue.popleft() # FIFO โ visit oldest first
order.append(vertex)
for neighbor in self.adj.get(vertex, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return order
def dfs(self, start, visited=None):
"""Depth-First Search using recursion. O(V+E).
Goes deep before backtracking โ detects cycles, topological sort.
Industry: Web crawler follows links as deep as possible."""
if visited is None: visited = set()
visited.add(start)
result = [start]
for neighbor in self.adj.get(start, []):
if neighbor not in visited:
result.extend(self.dfs(neighbor, visited))
return result
g = Graph()
for u, v in [("Delhi","Jaipur"), ("Delhi","Agra"), ("Jaipur","Udaipur"),
("Agra","Lucknow"), ("Lucknow","Varanasi")]:
g.add_edge(u, v)
print("BFS from Delhi:", g.bfs("Delhi"))
print("DFS from Delhi:", g.dfs("Delhi"))
C โ BFS
C
#include <stdio.h>
#define V 6
int adj[V][V], visited[V], queue[V], front=0, rear=-1;
void bfs(int start) {
visited[start] = 1; queue[++rear] = start;
while(front <= rear) {
int v = queue[front++];
printf("%d ", v);
for(int i=0;i<V;i++)
if(adj[v][i] && !visited[i]) { visited[i]=1; queue[++rear]=i; }
}
}
int main() {
adj[0][1]=adj[1][0]=1; adj[0][2]=adj[2][0]=1;
adj[1][3]=adj[3][1]=1; adj[2][4]=adj[4][2]=1; adj[4][5]=adj[5][4]=1;
printf("BFS: "); bfs(0);
return 0;
}
LAB PROGRAM 4: Dijkstra's Shortest Path Algorithm
Python
import heapq
def dijkstra(graph, start):
"""Dijkstra's algorithm using min-heap (priority queue).
Finds shortest distance from start to ALL other vertices.
Time: O((V+E) log V) with heap. Without: O(Vยฒ).
Industry: Google Maps finds shortest route from your location
to destination. IRCTC finds optimal train routes."""
dist = {v: float('inf') for v in graph}
dist[start] = 0
prev = {v: None for v in graph} # To reconstruct path
pq = [(0, start)] # (distance, vertex)
visited = set()
while pq:
d, u = heapq.heappop(pq) # Get nearest unvisited
if u in visited: continue
visited.add(u)
for v, weight in graph[u]:
new_dist = d + weight
if new_dist < dist[v]:
dist[v] = new_dist # Found shorter path!
prev[v] = u
heapq.heappush(pq, (new_dist, v))
return dist, prev
def get_path(prev, target):
path = []
while target:
path.append(target)
target = prev[target]
return path[::-1]
# Indian railway network (simplified)
railway = {
"Delhi": [("Jaipur", 281), ("Agra", 200), ("Lucknow", 500)],
"Jaipur": [("Delhi", 281), ("Ahmedabad", 625)],
"Agra": [("Delhi", 200), ("Lucknow", 335)],
"Lucknow": [("Delhi", 500), ("Agra", 335), ("Varanasi", 300)],
"Ahmedabad": [("Jaipur", 625), ("Mumbai", 524)],
"Varanasi": [("Lucknow", 300)],
"Mumbai": [("Ahmedabad", 524)],
}
dist, prev = dijkstra(railway, "Delhi")
print("=== IRCTC Route Planner: Delhi to All Cities ===")
for city, d in sorted(dist.items(), key=lambda x: x[1]):
path = get_path(prev, city)
print(f" {city:12s}: {d:4d} km via {' โ '.join(path)}")
Notice DelhiโLucknow is 500 km directly, but DelhiโAgraโLucknow is 200+335 = 535 km. Dijkstra correctly picks the direct route. But DelhiโMumbai goes through JaipurโAhmedabad (1430 km) because that's shorter than DelhiโAgraโLucknowโ โ (no direct Lucknow-Mumbai link). This is exactly how Google Maps works โ it doesn't always pick the "obvious" route.
Industry Case Studies
Case Study 1: Ola โ Driver Matching with Min-Heap
Case Study 2: PhonePe โ Transaction Deduplication with Hash Tables
Case Study 3: Google Maps โ Shortest Path with Dijkstra
Chapter Project โ "Build It Yourself"
๐บ๏ธ Campus Navigation System with Dijkstra
Pitch: Build a shortest-path navigation system for your college campus โ find the quickest route between any two buildings, with real walking distances.
Steps
- Step 1 โ Model campus as a graph: Buildings = vertices (Library, Canteen, Lab, Hostel, etc.). Paths = weighted edges (walking time in minutes).
- Step 2 โ Adjacency list: Use Lab Program 3's graph structure to store the campus map.
- Step 3 โ Dijkstra: Use Lab Program 4 to find the shortest path between any two buildings.
- Step 4 โ Path reconstruction: Print the complete walking route: "Library โ Main Gate โ Canteen (7 min)".
- Step 5 โ Interactive menu: User inputs source and destination, system outputs shortest route.
Extensions
- โญ Easy: Add all-pairs shortest paths using Floyd-Warshall โ precompute a routing table
- โญโญ Medium: Account for "blocked paths" (construction, restricted areas) by removing edges dynamically
- โญโญโญ Hard: Visualize the campus graph and highlight the shortest path using matplotlib
Interview Corner โ "Crack It at Google/Amazon"
Q1: Top K Frequent Elements โญโญ [Amazon + Google]
Python
import heapq
from collections import Counter
def top_k_frequent(nums, k):
"""Find k most frequent elements. O(n log k).
Uses hash map (count) + min-heap (top K).
Industry: Twitter trending topics, YouTube trending videos."""
count = Counter(nums) # O(n) โ hash map
return heapq.nlargest(k, count.keys(), key=count.get)
print(top_k_frequent([1,1,1,2,2,3], 2)) # [1, 2]
Time: O(n log k). Hash map counts in O(n). Heap maintains top K in O(n log k). Much better than sorting O(n log n) for large n, small k.
Q2: Two Sum โญ [Most Asked Interview Question Worldwide]
Python
def two_sum(nums, target):
"""Find two indices whose values add to target. O(n).
Hash map stores: value โ index. For each num, check if
(target - num) exists in the map."""
seen = {} # value โ index
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i] # Found pair!
seen[num] = i # Store for future lookup
return []
print(two_sum([2, 7, 11, 15], 9)) # [0, 1]
Time: O(n). One pass. Hash map lookup is O(1). Brute force is O(nยฒ) with nested loops. This is the #1 most-asked interview question (LeetCode #1).
Q3: Number of Islands (BFS/DFS on Grid) โญโญ [GATE + Interview]
Python
def num_islands(grid):
"""Count connected components of '1's in a 2D grid.
Each '1' is land, '0' is water. Connected '1's form an island.
Uses DFS to 'flood fill' each island."""
if not grid: return 0
rows, cols = len(grid), len(grid[0])
count = 0
def dfs(r, c):
if r<0 or r>=rows or c<0 or c>=cols or grid[r][c]!='1':
return
grid[r][c] = '0' # Mark visited
dfs(r+1,c); dfs(r-1,c); dfs(r,c+1); dfs(r,c-1)
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1':
count += 1 # New island found
dfs(r, c) # Sink the entire island
return count
Time: O(RรC). Each cell visited once. This is the graph traversal applied to a 2D grid โ a matrix IS a graph where each cell connects to its 4 neighbors.
MCQ Assessment Bank โ 25 Questions
Level 1 โ Recall โญ (5 Questions)
In a max-heap, the root contains:
- The minimum element
- The maximum element
- The median element
- A random element
๐ Max-heap property: parent โฅ children. Root has no parent, so it's the maximum.
โ (a) That's a min-heap.
Average time complexity of hash table lookup:
- O(n)
- O(log n)
- O(1)
- O(nยฒ)
๐ Hash function computes the index directly. No searching needed (on average, with good hash function and load factor).
โ (a) Worst case with all collisions. (b) That's BST.
BFS uses which data structure?
- Stack
- Queue
- Heap
- Array
๐ BFS processes vertices in FIFO order โ the order they were discovered. This ensures level-by-level traversal.
โ (a) Stack is used by DFS.
Dijkstra's algorithm finds:
- Minimum spanning tree
- Shortest path from one source to all vertices
- All-pairs shortest paths
- Longest path
๐ Single-source shortest path. For all-pairs, use Floyd-Warshall. For MST, use Kruskal/Prim.
โ (c) That's Floyd-Warshall. (a) That's Prim's/Kruskal's.
HeapSort's time complexity is:
- O(n) best, O(nยฒ) worst
- O(n log n) in all cases
- O(nยฒ) in all cases
- O(n log n) avg, O(nยฒ) worst
๐ Build heap: O(n). Extract n elements: n ร O(log n). Total: O(n log n) regardless of input.
โ (d) That's Quick Sort, not HeapSort.
Level 2 โ Conceptual โญโญ (6 Questions)
In separate chaining, a hash table with 10 slots and 30 elements has an average chain length of:
- 3
- 10
- 30
- 0.33
๐ Load factor ฮฑ = n/m = 30/10 = 3. Average chain length = ฮฑ = 3. Expected search time: O(1 + ฮฑ) = O(4).
โ (b)(c) Not ratios. (d) Inverted.
Linear probing suffers from "primary clustering." This means:
- The hash function is bad
- Consecutive occupied slots form long chains, degrading search to O(n)
- The table is too small
- Keys are sorted
๐ When slots near a collision get filled, they create a "cluster." Any new key hashing into this cluster must probe through the entire cluster โ the bigger it gets, the slower it gets.
โ (a) Can happen even with good hash functions. (c) Related but not the definition.
DFS on a graph uses which data structure internally (when using recursion)?
- Queue
- Call stack (implicit stack)
- Heap
- Hash table
๐ Each recursive DFS call pushes a frame onto the call stack. When it returns (backtracks), the frame is popped. DFS can also be implemented with an explicit stack.
โ (a) Queue is BFS.
Dijkstra's algorithm works correctly with negative edge weights.
- True โ it finds shortest paths regardless
- False โ negative weights can cause Dijkstra to miss shorter paths; use Bellman-Ford instead
- True โ just use absolute values
- False โ graphs can't have negative weights
๐ Trap! Dijkstra's greedy approach assumes: once a vertex is "finalized," no shorter path exists. Negative weights violate this โ a later path through a negative edge could be shorter. Bellman-Ford handles negative weights in O(VE).
โ (a)(c) Dangerous misconceptions. (d) Graphs CAN have negative weights.
Floyd-Warshall's time complexity is:
- O(V + E)
- O(Vยฒ log V)
- O(Vยณ)
- O(E log V)
๐ Three nested loops over V vertices: for k, for i, for j. Simple but cubic. For V=1000: 1 billion operations. For V=10000: 1 trillion โ impractical.
โ (a) BFS/DFS. (d) Dijkstra with heap.
A heap is best represented as an array because:
- Arrays are always faster
- A complete binary tree maps perfectly to array indices with no wasted space
- Heaps can't use pointers
- Arrays support O(1) insert
๐ Heaps are complete binary trees โ all levels filled left-to-right. Array indices 2i+1 (left) and 2i+2 (right) perfectly represent parent-child relationships without pointer overhead.
โ (a) Not universally true. (c) Could use pointers but wasteful. (d) Heap insert is O(log n).
Level 3 โ Application โญโญ (8 Questions)
Ola needs the 5 nearest drivers from 500K. Best approach?
- Sort all 500K by distance โ O(n log n)
- Min-heap of all drivers, extract 5 times โ O(n + 5 log n)
- Max-heap of size 5, scan all drivers โ O(n log 5)
- Both (b) and (c) work, but (c) is more space-efficient
๐ (b) builds a full heap: O(n) build + 5 extractions. (c) maintains a tiny heap of 5 elements, comparing each driver: O(n log 5) โ O(n). Both are O(n), but (c) uses O(5) space vs O(n).
โ (a) O(n log n) is unnecessary โ we only need top 5, not full sort.
After inserting keys 10, 20, 30 into a hash table of size 7 with h(k)=k%7, the slots occupied are:
- 3, 6, 2
- 10, 20, 30
- 0, 1, 2
- 3, 6, 2
๐ 10%7=3, 20%7=6, 30%7=2. No collisions โ all different slots.
โ (b) Slots are indices, not values. (c) Wrong computation.
BFS from vertex A in an unweighted graph finds:
- The longest path to each vertex
- The shortest path (minimum edges) to each vertex
- All cycles in the graph
- The minimum spanning tree
๐ BFS visits vertices level-by-level. The first time it reaches a vertex is via the shortest path (fewest edges). For weighted graphs, you need Dijkstra.
โ Others are different problems.
A hash table with load factor > 1 using separate chaining will fail (overflow).
- True โ can't store more elements than slots
- False โ separate chaining handles any load factor via linked lists
- True โ hash functions can't handle load factor > 1
- False โ but performance degrades linearly
๐ Trap! Open addressing (linear probing) DOES fail at load factor > 1. But separate chaining uses linked lists โ each slot can hold unlimited elements. Load factor > 1 means average chain length > 1, so lookups slow down but DON'T fail.
โ (a)(c) Only true for open addressing.
Adjacency matrix vs adjacency list: for a sparse graph (V=10000, E=20000), which uses less memory?
- Matrix: O(Vยฒ) = 100M entries
- List: O(V+E) = 30,000 entries
- Same memory for both
- Matrix is always better
๐ Matrix: 10000ยฒ = 100,000,000 entries (400 MB for int). List: 10000 + 20000 = 30,000 entries (~240 KB). For sparse graphs, list uses 1000x less memory.
โ (a) States matrix size but doesn't call it better. (d) Matrix is better for dense graphs only.
DFS can be used to detect cycles in a graph. BFS cannot.
- True โ only DFS detects cycles
- False โ both can detect cycles
- True โ BFS is only for shortest paths
- False โ neither can detect cycles
๐ DFS detects back edges (revisiting an ancestor). BFS detects cycles when it encounters an already-visited non-parent vertex. Both work for undirected graphs. For directed graphs, DFS with coloring (white/gray/black) is standard.
โ (a)(c) BFS can also detect cycles.
Python's dict is implemented as:
- A balanced BST
- A hash table with open addressing
- A sorted array
- A linked list
๐ Python's dict uses hash table with open addressing (specifically, a variant of linear probing with perturbation). Java's HashMap uses separate chaining. Both achieve O(1) average lookup.
โ (a) C++ std::map uses Red-Black tree, but Python dict doesn't.
Google Maps uses Dijkstra's algorithm directly on the full road graph.
- True โ Dijkstra is fast enough
- False โ it uses a preprocessed contraction hierarchy with A* search for speed
- True โ modern CPUs can handle it
- False โ Google uses BFS instead
๐ The full road graph has billions of nodes. Even Dijkstra with a heap would take seconds. Google preprocesses the graph into a hierarchy (contraction hierarchies) and uses A* (Dijkstra + heuristic) on this compressed graph. Result: millisecond routing.
โ (a)(c) Too slow for real-time. (d) BFS doesn't handle weights.
Level 4 โ Analysis โญโญโญ (6 Questions)
For finding the shortest path between ALL pairs of 500 railway stations, Dijkstra vs Floyd-Warshall:
- Run Dijkstra 500 times: O(500 ร (V+E) log V)
- Floyd-Warshall once: O(Vยณ) = O(500ยณ) = 125M operations
- Both are equivalent
- (b) is better for dense graphs, (a) for sparse graphs
๐ Dijkstra ร V: O(V(V+E)log V). Floyd: O(Vยณ). For dense graphs (EโVยฒ): Dijkstra = O(Vยณ log V) > Floyd = O(Vยณ). For sparse graphs (EโV): Dijkstra = O(Vยฒ log V) < Floyd = O(Vยณ). The right choice depends on graph density.
โ (c) They're different for different graph types.
HeapSort is not stable. This matters when:
- Never โ stability is irrelevant
- When sorting records by a secondary key while preserving primary key order
- Only for linked lists
- Only for strings
๐ Stability preserves relative order of equal elements. Example: sorting students by name (primary) then by grade (secondary). An unstable sort may reorder students with the same grade, breaking the name ordering.
โ (a) Stability matters in many practical scenarios.
A hash table with O(1) lookup is always faster than a BST with O(log n) lookup.
- True โ O(1) < O(log n) mathematically
- False โ hash tables don't support range queries, ordered iteration, or finding min/max efficiently
- True โ hash tables are universally superior
- False โ but only because hash tables use more memory
๐ Trap! Hash tables win for exact-key lookup. But BSTs support: range queries ("all prices between โน100-500"), ordered iteration (sorted output), and min/max in O(log n). Hash tables need O(n) for all of these. Choose based on your access pattern.
โ (a)(c) Ignores BST advantages.
Time complexity of BFS and DFS on a graph represented as an adjacency list:
- O(V)
- O(E)
- O(V + E)
- O(V ร E)
๐ Each vertex is enqueued/pushed once: O(V). Each edge is examined once (twice for undirected): O(E). Total: O(V + E). With adjacency matrix: O(Vยฒ) because you check all V neighbors for each vertex.
โ (a)(b) Missing one component. (d) Way too high.
Building a heap from n elements is O(n), not O(n log n), because:
- Most nodes are near the bottom and need very few swaps โ the sum converges to O(n)
- The heap is already sorted
- Building only takes one pass
- It's a mathematical coincidence
๐ Bottom-up heapify: leaves (n/2 nodes) need 0 swaps. Level above (n/4 nodes) need โค1 swap. Next (n/8) need โค2. Sum: n/4ยท1 + n/8ยท2 + n/16ยท3 + ... = nยทฮฃ(i/2^i) converges to O(n). This is Floyd's linear-time build-heap proof.
โ (b)(c)(d) Wrong explanations.
Which correctly summarizes the "choose your DSA" decision?
- Always use hash tables โ O(1) is unbeatable
- Always use arrays โ simplest is best
- Match the data structure to your access pattern: arrays for index access, hash tables for key lookup, heaps for priority, trees for ordered data, graphs for relationships
- It doesn't matter โ modern hardware makes everything fast
๐ This is the master lesson of the entire course. There is no universally "best" data structure. Each has trade-offs. The engineer's job is to match the structure to the problem's access pattern.
โ "Always" answers are always wrong in engineering.
Chapter Summary โ Course Finale
5 Things Every Engineer Must Remember
- Heaps give O(1) access to the extreme element โ need the minimum, maximum, nearest, cheapest, or most urgent? Use a heap. Ola, Uber, and every OS scheduler uses this.
- Hash tables give O(1) key-value lookup โ need "has this been seen before?" or "what's the value for this key?" Use hashing. Python's dict, Redis, and every database cache is a hash table.
- Graphs model relationships โ any problem with "connections" (roads, friendships, dependencies) is a graph problem. BFS for shortest unweighted path, DFS for exploration and cycle detection.
- Dijkstra's is the foundation of navigation โ every map app, network router, and logistics optimizer uses it. The min-heap optimization reduces O(Vยฒ) to O((V+E) log V).
- No data structure is universally best โ arrays for index access, linked lists for insert/delete, trees for ordered data, hash tables for key lookup, heaps for priority, graphs for relationships. The engineer's job is to match the structure to the problem.
Complete DSA Course โ Master Table
| Structure | Best For | Search | Insert | Delete | Used At |
|---|---|---|---|---|---|
| Array | Index access | O(n) / O(log n)* | O(n) | O(n) | Flipkart, every program |
| Linked List | Insert/Delete | O(n) | O(1)** | O(1)** | Spotify, Linux kernel |
| Stack | LIFO / Undo | โ | O(1) | O(1) | Paytm, compilers |
| Queue | FIFO / Buffering | โ | O(1) | O(1) | Swiggy, Amazon SQS |
| BST | Ordered dynamic data | O(log n) | O(log n) | O(log n) | MongoDB, databases |
| Heap | Priority / Extremes | O(n) | O(log n) | O(log n) | Ola, OS scheduler |
| Hash Table | Key-value lookup | O(1) | O(1) | O(1) | PhonePe, Redis, DNS |
| Graph | Relationships | O(V+E) | O(1) | O(V+E) | Google Maps, IRCTC |
*Binary search on sorted array. **At known position.
๐ Congratulations โ Course Complete!
You've completed all 5 units of "Data Structures & Algorithms: Industry Edition." You now understand:
- How Flipkart stores 150 million products (arrays)
- How Spotify manages playlists (linked lists)
- How Paytm rolls back failed payments (stacks)
- How Swiggy processes 50K orders/hour (queues)
- How Nykaa organizes 5 million products (trees)
- How Ola finds nearest drivers (heaps)
- How PhonePe prevents duplicate charges (hashing)
- How Google Maps finds shortest routes (graphs)
Every data structure you've learned powers real systems used by hundreds of millions of people. You're not just learning theory โ you're learning how India's and the world's tech infrastructure works.