Computer Organization & Architecture
Unit 6: Memory Unit
From registers to hard drives โ master the memory hierarchy, cache mapping techniques, virtual memory, and solve GATE-level numericals with confidence.
โฑ๏ธ 8 hrs theory + 5 hrs lab | ๐ฏ GATE ~4 marks | ๐ฅ๏ธ Snapdragon Cache
๐ผ Jobs this unlocks: VLSI Design Engineer (โน6โ12 LPA) | Embedded Systems Developer (โน5โ10 LPA) | SoC Verification Engineer (โน8โ15 LPA)
Opening Hook โ Why Does Your Laptop Slow Down with 100 Tabs?
๐ฅ๏ธ The Mystery of the 100-Tab Slowdown
You've done it. We all have. You open Chrome, start with 5 tabs, then 20, then 50โฆ and by the time you hit 100 tabs, your laptop turns into a space heater that can barely scroll. Your fancy 16 GB RAM machine is now slower than a โน5,000 phone. Why?
The answer lies in the memory hierarchy. Your CPU doesn't just grab data from RAM. It first checks its tiny ultra-fast L1 cache (32 KB, ~1 ns). Miss? It checks the L2 cache (256 KB, ~5 ns). Still miss? L3 cache (8 MB, ~20 ns). All misses? It finally goes to RAM (16 GB, ~100 ns). But with 100 tabs, even RAM fills up, and the OS starts using your SSD as virtual memory โ that's 1000ร slower than RAM. That's the slowdown.
Qualcomm's Snapdragon 8 Gen 3 chip (inside your Samsung Galaxy S24) has a 12 MB L3 cache designed by Indian engineers in Hyderabad. Apple's M3 has a 36 MB L2. Every nanosecond saved in cache design translates to billions of dollars in market advantage. This chapter teaches you exactly how that works.
Learning Outcomes โ Bloom's Taxonomy Mapped
| Bloom's Level | Learning Outcome |
|---|---|
| ๐ต Remember | List the levels of the memory hierarchy with access times, sizes, and cost per bit |
| ๐ต Remember | Define cache memory, hit ratio, miss penalty, TLB, and page fault |
| ๐ข Understand | Explain how direct mapping, fully associative, and set-associative mapping work with tag/line/word fields |
| ๐ข Understand | Describe virtual memory organisation including page tables, TLB, and demand paging |
| ๐ก Apply | Compute tag, line, and word bits for a given cache configuration and calculate AMAT |
| ๐ก Apply | Trace a reference string through cache using FIFO replacement and calculate hit rate |
| ๐ Analyze | Compare write-through vs write-back policies and analyse their performance trade-offs |
| ๐ Analyze | Analyse why set-associative mapping is preferred over direct and fully associative in modern CPUs |
| ๐ด Evaluate | Evaluate the cache design trade-offs in Snapdragon vs Apple Silicon processors |
| ๐ด Evaluate | Assess the impact of page size on TLB miss rate and internal fragmentation |
| ๐ฃ Create | Design a 2-level cache hierarchy for a given workload with AMAT constraints |
| ๐ฃ Create | Simulate a cache replacement algorithm for a given reference string and propose optimisations |
Concept Explanation โ Memory Unit from Scratch
1. Memory Hierarchy โ The Speed-Size-Cost Pyramid
Imagine a library. You keep your most-used notes on your desk (registers โ fastest, tiny). Books you need today are on the shelf beside you (cache). The library room has thousands of books (RAM). The basement archive has millions (SSD/HDD). Each level is bigger but slower. A computer's memory works exactly the same way.
๐บ The Complete Memory Hierarchy Pyramid
โโโโโโโโโโโโโ
โ REGISTERS โ โ 0.3 ns | 256 Bโ2 KB | โนโนโนโนโน (on-chip)
โ (CPU) โ Flip-flops, zero latency for ALU
โโโโโโโโโโโโโค
โ L1 CACHE โ โ 1 ns | 32โ64 KB | โนโนโนโน (on-chip SRAM)
โ (per core)โ Split: I-cache + D-cache
โโโโโโโโโโโโโค
โ L2 CACHE โ โ 5 ns | 256 KBโ1 MB| โนโนโน (on-chip SRAM)
โ (per core)โ Unified instruction + data
โโโโโโโโโโโโโค
โ L3 CACHE โ โ 20 ns | 4โ36 MB | โนโน (shared SRAM)
โ (shared) โ Shared across all cores
โโโโโโโโโโโโโค
โ MAIN MEM โ โ 100 ns | 4โ64 GB | โน (DRAM)
โ (RAM) โ Volatile, row/column addressing
โโโโโโโโโโโโโค
โ SSD โ โ 50 ฮผs | 256 GBโ4 TB| โน/10 (NAND Flash)
โ(secondary)โ Non-volatile, no moving parts
โโโโโโโโโโโโโค
โ HDD โ โ 10 ms | 1โ20 TB | โน/100 (magnetic)
โ(secondary)โ Spinning platters, mechanical arm
โโโโโโโโโโโโโ
Speed: โโโโโโโ FASTEST โโโโโโโโโโโโโโโโโโโโ SLOWEST โโโโโโโบ
Size: โโโโโโโ SMALLEST โโโโโโโโโโโโโโโโโโโ LARGEST โโโโโโโบ
Cost/b: โโโโโโโ MOST EXPENSIVE โโโโโโโโโโโโโ CHEAPEST โโโโโโบ
| Level | Technology | Access Time | Typical Size | Cost/GB (approx.) | Volatile? |
|---|---|---|---|---|---|
| Registers | Flip-flops | 0.3 ns | ~1 KB | โ | Yes |
| L1 Cache | SRAM | 1 ns | 32โ64 KB | ~โน5,00,000 | Yes |
| L2 Cache | SRAM | 5 ns | 256 KBโ1 MB | ~โน2,00,000 | Yes |
| L3 Cache | SRAM | 20 ns | 4โ36 MB | ~โน50,000 | Yes |
| RAM | DRAM | 100 ns | 4โ64 GB | ~โน250 | Yes |
| SSD | NAND Flash | 50 ฮผs | 256 GBโ4 TB | ~โน5 | No |
| HDD | Magnetic | 10 ms | 1โ20 TB | ~โน2 | No |
2. Cache Memory โ Structure & Organisation
Cache memory is a small, fast SRAM buffer between the CPU and main memory. Its job: keep the most frequently accessed data close to the CPU so the processor doesn't waste 100 ns waiting for RAM every time.
๐๏ธ Cache Memory Block Diagram
CPU CACHE MAIN MEMORY
โโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ โ โโ Address โโโโโโโบโ โ โ โ
โ CPU โ โ Tag Array โโโ Miss โโโโโโโบโ RAM โ
โ โโโโ Data โโโโโโโโโโโ Data Array โโโโ Block โโโโโโ (DRAM) โ
โ โ โ Valid Bits โ โ โ
โโโโโโโ โ Dirty Bits โ โโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโ
Cache Line Structure:
โโโโโโโโโฌโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Valid โ Tag โ Data Block (B bytes) โ
โ (1b) โ(t bits)โ Wordโ โ Wordโ โ Wordโ โ ... โ Wโ โ
โโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key terms:
โข Cache Line (Block): The smallest unit of data transferred between cache and RAM. Typical: 32 or 64 bytes.
โข Tag: Identifies which main memory block is currently stored in this cache line.
โข Valid bit: 1 = line has valid data, 0 = empty/invalid.
โข Dirty bit: (Write-back only) 1 = line modified, needs to be written back to RAM.
โข Hit: Requested data found in cache. Miss: Not found โ fetch from RAM.
โข Hit Ratio (h): h = (Number of hits) / (Total accesses). Typical: 0.90โ0.99.
3. Direct Mapping โ [Tag | Line | Word]
Analogy: Think of a hostel with 8 rooms. Each student is assigned a fixed room based on their roll number: Room = Roll % 8. Student 0, 8, 16, 24 all map to Room 0. If Student 0 is in Room 0 and Student 8 arrives, Student 0 gets kicked out. No choice โ it's direct mapping.
๐ Direct Mapped Cache โ Address Breakdown
Given: Main Memory = 2โฟ bytes, Cache Lines = 2หก, Block Size = 2สท bytes
CPU Address (n bits):
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ TAG โ LINE/INDEX โ WORD OFFSET โ
โ (n-l-w) bitsโ (l bits) โ (w bits) โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Mapping Formula:
Cache Line Number = (Main Memory Block Number) mod (Number of Cache Lines)
Line Number = Block Address mod 2หก
Example: 32-bit address, 512 lines, 4 words/block (16 bytes)
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโ
โ TAG (19) โ LINE (9) โ W(4) โ
โ 19 bits โ 9 bits โ 4 bitsโ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโ
Total = 19 + 9 + 4 = 32 bits โ
Direct Mapped Cache Layout:
โโโโโโโโโฌโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Valid โ Tag โ Wordโ โ Wordโ โ Wordโ โ Wordโ โ โ Line 0
โโโโโโโโโผโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโค
โ Valid โ Tag โ Wordโ โ Wordโ โ Wordโ โ Wordโ โ โ Line 1
โโโโโโโโโผโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโค
โ ... โ ... โ ... โ ... โ ... โ ... โ
โโโโโโโโโผโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโค
โ Valid โ Tag โ Wordโ โ Wordโ โ Wordโ โ Wordโ โ โ Line 511
โโโโโโโโโดโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโ
How a lookup works:
- Extract the LINE bits from the CPU address โ go to that cache line
- Compare the TAG field of that line with the TAG bits from the address
- If TAG matches AND Valid=1 โ HIT! Use the WORD OFFSET to pick the right word
- If TAG doesn't match or Valid=0 โ MISS! Fetch block from RAM, replace this line
4. Fully Associative Mapping โ [Tag | Word]
Analogy: Unlike the hostel (direct mapping), think of a parking lot with 8 spots. Any car can park in any spot. When a new car arrives and the lot is full, you use a replacement policy (kick out the oldest = FIFO, kick out least recently used = LRU). Maximum flexibility, but you need to check all spots simultaneously.
๐ Fully Associative Cache โ Address Breakdown
CPU Address (n bits):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ TAG โ WORD OFFSET โ
โ (n - w) bits โ (w bits) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
NO LINE/INDEX field! Any block can go in ANY cache line.
Example: 32-bit address, Block = 16 bytes (w = 4)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโ
โ TAG (28) โ W(4) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโ
Lookup: CPU sends tag โ ALL lines compare simultaneously (parallel comparators)
โโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Valid โ Tag (28) โ Data Block (16 bytes) โ โ Line 0 โโ
โโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ Valid โ Tag (28) โ Data Block (16 bytes) โ โ Line 1 โโ All compared
โโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค โ in PARALLEL
โ ... โ ... โ ... โ โ
โโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ Valid โ Tag (28) โ Data Block (16 bytes) โ โ Line N โโ
โโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ Compare with incoming tag
Advantage: No conflict misses โ any block can go anywhere.
Disadvantage: Expensive! Needs a comparator for every cache line. Hardware cost scales with cache size.
Used for: TLBs (small, needs high hit rate), small L1 caches in some designs.
5. Set-Associative Mapping โ The Best of Both Worlds
Analogy: Compromise! Instead of one fixed room (direct) or any room (associative), we have hostels (sets), each with a few rooms (ways). A student must go to their assigned hostel but can pick any room inside it. This gives flexibility within a set while keeping hardware cost reasonable.
๐ K-Way Set-Associative Cache (2-Way Example)
CPU Address (n bits):
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ TAG โ SET INDEX โ WORD OFFSET โ
โ(n - s - w) b โ (s bits) โ (w bits) โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Number of Sets = Total Lines / K (where K = associativity)
s = logโ(Number of Sets)
2-Way Set-Associative Cache Layout (4 sets, 8 total lines):
Way 0 Way 1
โโโโโโโโโฌโโโโโโฌโโโโโโโ โโโโโโโโโฌโโโโโโฌโโโโโโโ
โ VโTag โData โ โ โ VโTag โData โ โ โ Set 0
โโโโโโโโโผโโโโโโผโโโโโโโค โโโโโโโโโผโโโโโโผโโโโโโโค
โ VโTag โData โ โ โ VโTag โData โ โ โ Set 1
โโโโโโโโโผโโโโโโผโโโโโโโค โโโโโโโโโผโโโโโโผโโโโโโโค
โ VโTag โData โ โ โ VโTag โData โ โ โ Set 2
โโโโโโโโโผโโโโโโผโโโโโโโค โโโโโโโโโผโโโโโโผโโโโโโโค
โ VโTag โData โ โ โ VโTag โData โ โ โ Set 3
โโโโโโโโโดโโโโโโดโโโโโโโ โโโโโโโโโดโโโโโโดโโโโโโโ
Lookup Process:
1. Use SET INDEX โ go to that set
2. Compare TAG with BOTH Way 0 and Way 1 simultaneously
3. If either matches (and Valid=1) โ HIT
4. Both miss โ MISS โ replace one way (FIFO/LRU)
Special Cases:
โข K = 1 (1-way) โ Direct Mapped
โข K = N (N-way) โ Fully Associative
โข K = 2 or 4 โ Most common in modern CPUs
6. Cache Hit/Miss Trace โ Reference String with FIFO
Let's trace how a cache handles a sequence of memory references. This is a classic GATE question type.
๐ Worked Example: 4-Line Direct-Mapped Cache with FIFO
Setup: 4 cache lines (lines 0โ3), direct-mapped, block size = 1 word.
Reference String (block numbers): 0, 8, 0, 6, 8, 2, 0, 6
Mapping: Cache Line = Block Number mod 4
Block โ Cache Line: Block 0 โ Line 0 (0 mod 4 = 0) Block 8 โ Line 0 (8 mod 4 = 0) โ CONFLICT with Block 0! Block 6 โ Line 2 (6 mod 4 = 2) Block 2 โ Line 2 (2 mod 4 = 2) โ CONFLICT with Block 6! Trace Table: โโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโ โ Step โ Request โ Line 0 โ Line 1 โ Line 2 โ Line 3 โ Hit/Miss โ โโโโโโโโผโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโค โ 1 โ Blk 0 โ [0] โ โ โ โ โ โ โ MISS โ โ 2 โ Blk 8 โ [8] โ โ โ โ โ โ โ MISS โ โ 3 โ Blk 0 โ [0] โ โ โ โ โ โ โ MISS โ โ 4 โ Blk 6 โ [0] โ โ โ [6] โ โ โ MISS โ โ 5 โ Blk 8 โ [8] โ โ โ [6] โ โ โ MISS โ โ 6 โ Blk 2 โ [8] โ โ โ [2] โ โ โ MISS โ โ 7 โ Blk 0 โ [0] โ โ โ [2] โ โ โ MISS โ โ 8 โ Blk 6 โ [0] โ โ โ [6] โ โ โ MISS โ โโโโโโโโดโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโ Hits = 0, Misses = 8 Hit Rate = 0/8 = 0% (Terrible! All conflict misses)
This is the worst case for direct mapping โ all references map to just 2 lines, causing constant thrashing. A 2-way set-associative cache would dramatically improve this.
Now with 2-Way Set-Associative (2 sets, 2 ways each):
Set = Block mod 2 Block 0 โ Set 0 | Block 8 โ Set 0 | Block 6 โ Set 0 | Block 2 โ Set 0 โโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโ โ Step โ Request โ Set 0 (W0, W1) โ Set 1 (W0, W1) โ Hit/Miss โ โโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโค โ 1 โ Blk 0 โ [0, โ] โ [โ, โ] โ MISS โ โ 2 โ Blk 8 โ [0, 8] โ [โ, โ] โ MISS โ โ 3 โ Blk 0 โ [0, 8] โ [โ, โ] โ HIT โ โ โ 4 โ Blk 6 โ [6, 8] FIFO โ [โ, โ] โ MISS โ โ 5 โ Blk 8 โ [6, 8] โ [โ, โ] โ HIT โ โ โ 6 โ Blk 2 โ [6, 2] FIFO โ [โ, โ] โ MISS โ โ 7 โ Blk 0 โ [0, 2] FIFO โ [โ, โ] โ MISS โ โ 8 โ Blk 6 โ [0, 6] FIFO โ [โ, โ] โ MISS โ โโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโ Hits = 2, Misses = 6 Hit Rate = 2/8 = 25% (Better than 0% with direct mapping!)
7. Write-Through vs Write-Back โ Comparison
| Feature | Write-Through | Write-Back |
|---|---|---|
| Mechanism | Every write updates both cache AND main memory simultaneously | Write only to cache; update main memory when line is evicted |
| Speed | Slower (every write goes to RAM) | Faster (writes are buffered in cache) |
| Consistency | Cache and RAM always consistent | Can be inconsistent; needs dirty bit tracking |
| Dirty Bit | Not needed | Required (1 = modified, needs writeback) |
| Write Buffer | Often uses a write buffer to avoid CPU stalls | Not needed for writes |
| Complexity | Simpler hardware | More complex (needs dirty bit logic + writeback) |
| Best For | Multiprocessor systems (coherency), I/O devices | Single-processor, performance-critical systems |
| Used In | L1 D-cache (some ARM designs) | L2/L3 caches, most modern CPUs |
| Miss Policy | Write-allocate or Write-no-allocate | Usually write-allocate (fetch block then write) |
8. Virtual Memory โ Page Table, TLB, Demand Paging
Analogy: Imagine you're a teacher with 60 students but only 30 chairs. You give each student a "virtual seat number" (1โ60). When a student comes to class, you assign them a real chair. If all chairs are full, the least active student goes to the "waiting room" (disk). That's virtual memory โ every process gets its own full address space, but physical RAM is shared.
๐ Virtual Memory Address Translation
Virtual Address (from CPU):
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Virtual Page No. โ Page Offset โ
โ (VPN) โ (d bits) โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PAGE TABLE โ
โ โโโโโโโฌโโโโโโโโฌโโโโโโโโโโโ โ
โ โValidโ Dirty โFrame No. โ โ
โ โ 1 โ 0 โ 0x3A โ โ โ VPN 0
โ โ 1 โ 1 โ 0x2F โ โ โ VPN 1
โ โ 0 โ 0 โ โ โ โ โ VPN 2 (PAGE FAULT!)
โ โ 1 โ 0 โ 0x71 โ โ โ VPN 3
โ โ ... โ ... โ ... โ โ
โ โโโโโโโดโโโโโโโโดโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Physical Address:
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Physical Frame โ Page Offset โ
โ Number (PFN) โ (d bits) โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Page Fault: Valid=0 โ page not in RAM โ OS fetches from disk (VERY slow: ~10 ms)
Translation Lookaside Buffer (TLB):
CPU โโVPNโโโบ โโโโโโโ Hit โโPFNโโโบ Physical Address
โ TLB โ (fast: ~1 ns, fully associative)
โโโโโโโ
โ Miss
โผ
โโโโโโโโโโโโโโ
โ Page Table โ (in RAM: ~100 ns)
โ Walk โ
โโโโโโโโโโโโโโ
โ Page Fault
โผ
โโโโโโโโโโโโโโ
โ Disk โ (10 ms โ catastrophic!)
โโโโโโโโโโโโโโ
TLB is a small, fast cache (typically 32โ128 entries, fully associative) that stores recent VPNโPFN translations. TLB hit rate is typically 99%+ in well-designed systems.
Demand Paging: Pages are loaded into RAM only when accessed (not pre-loaded). This saves RAM โ most of a process's pages are never touched.
9. Content Addressable Memory (CAM)
Normal memory (RAM): you give an address, it returns data. CAM is the reverse: you give data (a search key), it returns the address/location where that data is stored โ in a single clock cycle.
๐ CAM vs RAM โ Fundamental Difference
RAM (Address โ Data): CAM (Data โ Address):
โโโโโโโโโโโฌโโโโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Address โ Data โ โ Search โ Match? โ
โ 0 โ 0xAB โ โ Key:0xCD โ โ
โ 1 โ 0xCD โโโโโโ Read โ โ Line 0: No โ
โ 2 โ 0xEF โ โ โ Line 1: YES โโคโโ Found!
โ 3 โ 0x12 โ โ โ Line 2: No โ
โโโโโโโโโโโดโโโโโโโโโโโ โ โ Line 3: No โ
Input: Address โโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Output: Data Input: Data (search key)
Output: Location (address)
Where is CAM used?
- TLB โ Search by VPN, get PFN in one cycle
- Network routers โ Search by IP address for routing table lookup
- Fully associative caches โ All tags compared simultaneously = CAM behaviour
TCAM (Ternary CAM): Each bit can be 0, 1, or X (don't care). Used in firewalls and routers for wildcard matching.
10. DRAM, SSD & HDD โ Main & Secondary Storage
DRAM (Dynamic RAM)
DRAM stores each bit as a charge on a tiny capacitor. The charge leaks, so DRAM needs periodic refresh (every ~64 ms). It's cheaper and denser than SRAM (1 transistor + 1 capacitor per bit vs 6 transistors for SRAM), which is why we use DRAM for main memory.
| Feature | SRAM (Cache) | DRAM (RAM) |
|---|---|---|
| Storage Element | 6 transistors (flip-flop) | 1 transistor + 1 capacitor |
| Speed | ~1โ20 ns | ~100 ns |
| Refresh Needed? | No | Yes (every ~64 ms) |
| Density | Low (6T per bit) | High (1T1C per bit) |
| Cost/bit | High | Low |
| Used For | L1/L2/L3 cache, registers | Main memory (DDR4/DDR5) |
SSD (Solid State Drive)
Uses NAND flash memory. No moving parts, so it's shock-resistant and faster than HDD. Data stored in floating-gate transistors that trap electrons. Typical read latency: ~50 ฮผs. Limited write endurance (cells wear out after ~3,000โ100,000 write cycles).
HDD (Hard Disk Drive)
Magnetic storage on spinning platters. A mechanical arm moves to the right track (seek time ~5 ms) and waits for the right sector to rotate under it (rotational latency ~4 ms at 7200 RPM). Total access time: ~10 ms. Cheapest โน/GB but slowest.
HDD Access Time Breakdown: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Seek Time Rotational Latency Transfer Time โ โ (~5 ms) (~4.2 ms @ 7200 RPM) (~0.01 ms) โ โ โโโ Arm moves โโโบโโโ Platter spins โโโบโโโ Data read โโโบ โ โ โ โ Total โ 9โ10 ms per random access โ โ Rotational Latency = (1/2) ร (60/RPM) seconds โ โ For 7200 RPM: (1/2) ร (60/7200) = 4.17 ms โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Worked Numerical โ Complete GATE-Style Problem
๐งฎ Problem: 512-Line Direct-Mapped Cache, 32-bit Address
Given:
- Cache: 512 lines, direct-mapped
- Block size: 4 words (1 word = 4 bytes โ block = 16 bytes)
- Address: 32-bit, byte-addressable
- Cache hit time = 1 ns, miss penalty = 100 ns, hit rate = 0.95
Find: (a) Tag, Line, Word bits (b) Cache data size (c) Total cache size (d) AMAT
Solution:
(a) Address Field Breakdown:
Block size = 16 bytes โ Word Offset = logโ(16) = 4 bits Lines = 512 = 2โน โ Line Index = 9 bits Tag = 32 - 9 - 4 = 19 bits โโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโ โ Tag (19) โ Line (9) โ Word (4) โ โโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโ
(b) Cache Data Size:
Data = Number of Lines ร Block Size
= 512 ร 16 bytes
= 8,192 bytes = 8 KB
(c) Total Cache Size (including overhead):
Each line stores: 1 valid bit + 19 tag bits + 128 data bits (16 bytes)
= 1 + 19 + 128 = 148 bits per line
Total = 512 ร 148 = 75,776 bits = 9,472 bytes โ 9.25 KB
Overhead = Total - Data = 9.25 KB - 8 KB = 1.25 KB (for tags + valid bits)
(d) Average Memory Access Time (AMAT):
AMAT = Hit Time + Miss Rate ร Miss Penalty
= 1 + (1 - 0.95) ร 100
= 1 + 0.05 ร 100
= 1 + 5
= 6 ns
Without cache: 100 ns. With cache: 6 ns โ 16.7ร speedup!
โข Registers = The chai cup already in your hand (instant access)
โข L1 Cache = The chai shop right at the college gate (10 seconds walk)
โข L2 Cache = The canteen inside campus (2 minutes walk)
โข L3 Cache = The CCD/Starbucks on the main road (10 minutes)
โข RAM = Going home to make chai (30 minutes travel)
โข HDD = Ordering chai leaves from Amazon and waiting 2 days
You always try the nearest shop first. If it has your favorite Cutting Chai โ HIT! If not โ MISS, go to the next level. That's exactly how CPU cache works!
Learn by Doing โ 3-Tier Lab Structure
๐ข Tier 1 โ GUIDED: Cache Address Decoder (Python)
Objective:
Write a Python program that takes a memory address, cache configuration, and outputs the Tag, Line/Set, and Word offset fields.
Step 1: Get User Inputs
Python # Cache Address Decoder address_bits = int(input("Enter address width (bits): ")) # e.g., 32 num_lines = int(input("Enter number of cache lines: ")) # e.g., 512 block_size = int(input("Enter block size (bytes): ")) # e.g., 16 address_hex = input("Enter memory address (hex, e.g. 0x1A3F): ")
Step 2: Calculate Bit Fields
Python import math word_bits = int(math.log2(block_size)) line_bits = int(math.log2(num_lines)) tag_bits = address_bits - line_bits - word_bits print(f"Tag: {tag_bits} bits | Line: {line_bits} bits | Word: {word_bits} bits")
Step 3: Decode the Address
Python address = int(address_hex, 16) word_offset = address & ((1 << word_bits) - 1) line_index = (address >> word_bits) & ((1 << line_bits) - 1) tag_value = address >> (word_bits + line_bits) print(f"Address: {address_hex} โ Tag={tag_value} | Line={line_index} | Word={word_offset}")
๐ก Tier 2 โ SEMI-GUIDED: Cache Simulator with Hit/Miss Tracking
Mission:
Build a Python cache simulator that takes a reference string and reports hits, misses, and hit rate for direct-mapped and set-associative caches.
Hints:
- Create a list of
Nonevalues to represent cache lines:cache = [None] * num_lines - For each reference: compute
line = ref % num_lines - Check if
cache[line] == refโ HIT, else โ MISS and replace - For set-associative: use a list of lists. Each set is a list with K slots
- Track hits and misses in counters. Print hit rate at the end
Python # Skeleton โ fill in the blanks def simulate_direct(refs, num_lines): cache = [None] * num_lines hits = 0 for ref in refs: line = ref % num_lines if cache[line] == ref: hits += 1 print(f"Ref {ref} โ Line {line} โ HIT") else: cache[line] = ref print(f"Ref {ref} โ Line {line} โ MISS") print(f"Hit Rate: {hits}/{len(refs)} = {hits/len(refs)*100:.1f}%") refs = [0, 8, 0, 6, 8, 2, 0, 6] simulate_direct(refs, 4)
๐ด Tier 3 โ OPEN CHALLENGE: Full Cache Hierarchy Analyzer
The Brief:
Build a complete cache hierarchy simulator that models L1 โ L2 โ RAM access with:
- L1 Cache: Direct-mapped, 64 lines, 4-word blocks
- L2 Cache: 4-way set-associative, 256 lines, 8-word blocks, LRU replacement
- Input: Read a reference string from a file (at least 100 addresses)
- Output: L1 hit rate, L2 hit rate, overall AMAT, total access time
- Bonus: Generate a visual trace table showing L1/L2 hits/misses per access
AMAT Formula for 2-level cache:
AMAT = Hit_Time_L1 + Miss_Rate_L1 ร (Hit_Time_L2 + Miss_Rate_L2 ร Miss_Penalty_RAM)
Practice Problems โ Diagrams, Numericals, Industry & GATE
๐ Diagram-Based Questions (3)
Draw the complete memory hierarchy pyramid for a modern smartphone (Snapdragon 8 Gen 3). Label each level with: technology, size, access time, and one real-world example of data stored at that level.
Draw a detailed block diagram of a 2-way set-associative cache with 8 sets. Show the address field breakdown for a 32-bit address with 64-byte blocks. Label all comparators, MUX, valid bits, tag arrays, and data arrays.
Draw the virtual memory address translation flow diagram showing: CPU โ TLB โ Page Table โ Physical Memory, with the page fault handler path to disk. Include all timing labels.
๐งฎ Numerical Problems (6)
A direct-mapped cache has 1024 lines, block size = 8 words (1 word = 4 bytes), address = 32 bits. Find: (a) Tag, Line, and Byte Offset bits (b) Total cache data storage in KB (c) Total cache size including tag and valid bits.
A 4-way set-associative cache has 256 total lines, block size = 64 bytes, address = 32 bits. Find: (a) Number of sets (b) Tag, Set, Offset bits (c) Number of tag comparators needed.
A system has: L1 hit time = 1 ns, L1 miss rate = 5%, L2 hit time = 10 ns, L2 miss rate = 20%, RAM access time = 100 ns. Calculate the Average Memory Access Time (AMAT).
A virtual memory system has: virtual address = 32 bits, physical address = 28 bits, page size = 4 KB. Find: (a) Number of virtual pages (b) Number of physical frames (c) Page table entries (d) Size of page table if each entry is 4 bytes.
An HDD spins at 10,000 RPM. Average seek time = 4 ms. Sector size = 512 bytes, transfer rate = 200 MB/s. Calculate average access time for one sector.
A CPU generates 64-bit addresses. The cache is fully associative with 128 lines, block size = 32 bytes. (a) How many tag bits per line? (b) If hit rate = 0.92, hit time = 2 ns, miss penalty = 80 ns, find AMAT. (c) How many comparators are needed?
๐ญ Industry Application Questions (3)
Qualcomm's Snapdragon 8 Gen 3 has a 12 MB L3 cache shared across 8 cores. If each core generates 2 billion memory accesses per second and the L3 hit rate is 70% (for accesses that miss L1+L2), calculate how many RAM accesses per second the L3 cache prevents.
Samsung's DDR5-7200 has a peak bandwidth of 57.6 GB/s per channel. A server motherboard has 8 channels. If a database workload requires 400 GB/s bandwidth, is this configuration sufficient? What would you recommend?
ISRO's NavIC satellite navigation system needs to store ephemeris data for 7 satellites with 1 ms update rate. Each update is 256 bytes. Design the cache requirements if data must be accessed within 10 ns with 99.9% hit rate.
๐ฏ GATE Previous Year Style Questions (5)
A direct-mapped cache has 2ยนโด bytes of data and 2โถ byte blocks. The address is 32 bits. What is the tag field size in bits?
- 12
- 14
- 18
- 20
Consider a 2-way set-associative cache with 256 cache lines and block size of 4 words (word = 4 bytes). The address length is 32 bits. The size of the tag field is:
- 18 bits
- 19 bits
- 20 bits
- 21 bits
The effective access time of a memory system with cache hit rate h, cache access time tโ, and main memory access time tโ (using simultaneous access) is:
- h ร tโ + (1-h) ร tโ
- tโ + (1-h) ร tโ
- h ร tโ + (1-h) ร (tโ + tโ)
- h ร (tโ + tโ) + (1-h) ร tโ
In a virtual memory system with page size of 4 KB, a process has a virtual address space of 2ยณยฒ bytes. The physical memory is 2ยฒโธ bytes. How many entries does the page table have?
- 2ยนโถ
- 2ยฒโฐ
- 2ยฒโด
- 2ยฒโธ
A CPU generates 20-bit addresses. The main memory access time is 100 ns. The cache access time is 10 ns with a hit ratio of 0.9. Using hierarchical access, the effective memory access time is:
- 20 ns
- 19 ns
- 110 ns
- 91 ns
MCQ Assessment Bank โ 30 Questions (Bloom's Mapped)
Remember / Identify (Q1โQ5)
Which memory is fastest in the memory hierarchy?
- DRAM
- Cache (SRAM)
- CPU Registers
- SSD
SRAM is used in cache memory because:
- It is cheaper than DRAM
- It is faster and doesn't need refresh
- It has higher density
- It uses capacitors for storage
In a direct-mapped cache, the address is divided into:
- Tag and Offset
- Tag, Line, and Word Offset
- Tag and Set
- Page Number and Offset
Which memory needs periodic refresh?
- SRAM
- DRAM
- ROM
- Flash
TLB stands for:
- Translation Lookaside Buffer
- Table Lookup Block
- Transfer Line Buffer
- Tag Line Base
Understand / Explain (Q6โQ10)
Why does increasing cache associativity reduce conflict misses?
- It increases cache size
- It allows a block to be placed in multiple locations
- It makes the cache faster
- It reduces the block size
What is the principle of temporal locality?
- If a memory location is accessed, nearby locations will also be accessed
- If a memory location is accessed, it will likely be accessed again soon
- Memory should be accessed in sequential order
- Frequently accessed data should be stored on disk
In write-back policy, when is data written to main memory?
- On every write operation
- Only when the cache line is evicted (replaced)
- At fixed time intervals
- When the CPU is idle
What happens during a page fault?
- Cache line is replaced
- TLB is flushed
- Required page is loaded from disk to RAM by the OS
- CPU clock speed is reduced
Why is fully associative mapping expensive to implement?
- It needs more cache lines
- It requires a comparator for every cache line
- It needs larger block sizes
- It requires more address bits
Apply / Calculate (Q11โQ20)
A direct-mapped cache has 256 lines, block size = 32 bytes, address = 32 bits. The number of tag bits is:
- 17
- 19
- 21
- 23
A cache has hit rate = 0.96, hit time = 2 ns, miss penalty = 50 ns. The AMAT is:
- 4 ns
- 3 ns
- 5 ns
- 4 ns
In a 4-way set-associative cache with 512 total lines, the number of sets is:
- 64
- 128
- 256
- 512
A virtual memory system has 20-bit virtual addresses, page size = 1 KB. The number of page table entries is:
- 512
- 1024
- 2048
- 4096
A fully associative cache has 64 lines, block size = 16 bytes, address = 32 bits. The tag size is:
- 24 bits
- 26 bits
- 28 bits
- 30 bits
A 2-level cache system has: L1 access = 1 ns (miss rate 10%), L2 access = 10 ns (miss rate 5%), RAM access = 200 ns. What is the AMAT?
- 2 ns
- 3 ns
- 2 ns
- 12 ns
An HDD rotates at 7200 RPM. The average rotational latency is approximately:
- 2.08 ms
- 4.17 ms
- 8.33 ms
- 16.67 ms
Cache size = 64 KB, block size = 64 bytes. The number of cache lines is:
- 512
- 1024
- 2048
- 4096
The effective memory access time with hit ratio h=0.9, cache time=10ns, memory time=100ns (hierarchical access) is:
- 19 ns
- 20 ns
- 28 ns
- 100 ns
A page table has 2ยฒโฐ entries, each entry is 4 bytes. The total page table size is:
- 1 MB
- 2 MB
- 4 MB
- 8 MB
Analyze / Compare (Q21โQ25)
Which cache mapping has the highest conflict miss rate for a given cache size?
- Direct mapped
- 2-way set-associative
- 4-way set-associative
- Fully associative
Increasing block size in a cache initially reduces miss rate but then increases it. This increase is due to:
- Increased hit time
- Increased conflict misses and reduced number of lines
- Decreased tag bits
- Increased write-back overhead
In a multiprocessor system, which write policy simplifies cache coherence?
- Write-back
- Write-through
- Write-allocate
- Write-no-allocate
Which replacement policy can suffer from Bรฉlรกdy's anomaly?
- LRU
- FIFO
- Optimal
- Random
Why is the TLB typically fully associative despite the high hardware cost?
- It needs to store large pages
- It has very few entries and must maximize hit rate
- It operates at disk speed
- It replaces the page table entirely
Evaluate / Create (Q26โQ30)
A system architect must choose between a 16 KB direct-mapped cache and an 8 KB 2-way set-associative cache. Assuming the workload has significant conflict misses, which is likely better?
- 16 KB direct-mapped (bigger is always better)
- 8 KB 2-way set-associative (less conflicts)
- Both perform identically
- Cannot determine without the workload
If a system's page fault rate increases from 0.001% to 0.01%, and each page fault costs 10 ms, the impact on effective access time is:
- Negligible (< 1% change)
- Significant (~10ร increase in fault overhead)
- System crashes
- Only affects disk performance
To design a cache that eliminates all conflict misses, you would choose:
- Direct-mapped with large blocks
- Fully associative mapping
- Set-associative with 2 ways
- Write-back policy
Which approach would most effectively reduce TLB misses for a workload with a 2 GB working set?
- Increase TLB entries from 64 to 128
- Use larger page sizes (2 MB instead of 4 KB)
- Add another TLB level
- Both (B) and (C)
A chip designer has a transistor budget of 500K transistors for cache. SRAM uses 6 transistors/bit. What is the maximum data capacity of the cache?
- ~10 KB
- ~64 KB
- ~128 KB
- ~83 KB
Short Answer Questions (8)
Define the hit ratio and explain why a hit ratio of 0.95 vs 0.90 can make a significant difference in AMAT. Provide a numerical example.
Explain the difference between compulsory, capacity, and conflict misses (the 3 C's of cache misses).
Distinguish between SRAM and DRAM with respect to storage cell structure, speed, cost, refresh requirement, and usage.
What is a TLB and why is it typically fully associative? What happens on a TLB miss?
Write the AMAT formula for a 2-level cache system and calculate AMAT for: L1 time=1ns (miss rate 8%), L2 time=10ns (miss rate 20%), RAM=200ns.
Explain demand paging and how it differs from pre-paging. What are the advantages of demand paging?
Compare write-through and write-back cache policies with respect to: speed, consistency, dirty bit usage, and suitability for multiprocessor systems.
What is Content Addressable Memory (CAM)? How does it differ from conventional RAM? Where is it used?
Long Answer Questions (3)
๐ LA1: Compare all three cache mapping techniques with diagrams, formulas, advantages, disadvantages, and real-world usage (15 marks)
Model Answer Structure:
1. Direct Mapping
Address: [Tag | Line Index | Word Offset]
Formula: Cache Line = Block Number mod (Number of Lines)
Each block has exactly ONE possible cache line.
Advantages: Simple hardware (1 comparator), fast lookup, cheap.
Disadvantages: High conflict miss rate โ two blocks mapping to the same line cause thrashing.
Used in: Simple embedded systems, L1 cache in some older designs.
2. Fully Associative Mapping
Address: [Tag | Word Offset] โ NO line index field.
Any block can go in ANY cache line.
Advantages: Zero conflict misses, highest flexibility.
Disadvantages: Needs N comparators (one per line), expensive hardware, slower for large caches.
Used in: TLBs (small, need high hit rate), small special-purpose caches.
3. Set-Associative Mapping
Address: [Tag | Set Index | Word Offset]
Formula: Set = Block Number mod (Number of Sets). Within a set, block can go in any way.
K-way: Each set has K lines. Needs K comparators (manageable).
Advantages: Balance of conflict reduction and hardware cost. Optimal for most workloads.
Disadvantages: More complex than direct, slightly slower than direct (K-way comparison).
Used in: L1/L2/L3 in ALL modern CPUs (2-way to 16-way).
Comparison Summary: โโโโโโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ โ Feature โ Direct โ Fully Assoc. โ K-Way Set-Assoc. โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโค โ Placement โ Fixed โ Anywhere โ Within a set โ โ Comparators โ 1 โ N โ K (per set) โ โ Conflict Miss โ High โ None โ Low โ โ Hardware Cost โ Low โ Very High โ Medium โ โ Flexibility โ Low โ Very High โ High โ โ Hit Rate โ Lower โ Highest โ Near-highest โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ
๐ LA2: Explain virtual memory organisation with page table, TLB, and demand paging. Include a complete address translation diagram (15 marks)
Model Answer should cover:
- Virtual Memory Concept: Each process gets its own virtual address space (e.g., 4 GB for 32-bit). Physical RAM is shared. The OS + hardware translate virtual โ physical addresses transparently.
- Page Table: A data structure (one per process) stored in RAM. Maps virtual page numbers (VPN) to physical frame numbers (PFN). Each entry has: valid bit, dirty bit, frame number, permission bits.
- TLB: A fast hardware cache (fully associative, ~32โ128 entries) that stores recent VPNโPFN translations. Hit rate typically 99%+. Prevents expensive page table walks for most accesses.
- Demand Paging: Pages loaded only when accessed. Page fault โ OS interrupt โ load from disk โ update page table โ resume. Process starts with zero pages in RAM.
- Address Translation Flow: CPU sends virtual address โ TLB check (1 ns). Hit โ PFN directly. Miss โ Page table walk (100 ns). Valid โ get PFN, update TLB. Invalid โ Page fault โ Disk (10 ms) โ load page โ update page table โ update TLB โ retry.
- Page Replacement: When RAM is full: LRU, FIFO, or Clock algorithm selects a victim page. If dirty โ write back to disk first.
Include the full address translation diagram from Section C.8.
๐ LA3: Solve a comprehensive cache design problem with AMAT calculation for a 2-level cache system (15 marks)
Problem: A processor has a 2-level cache system:
- L1: Direct-mapped, 128 lines, 32-byte blocks, hit time = 1 ns, miss rate = 10%
- L2: 4-way set-associative, 1024 lines, 64-byte blocks, hit time = 8 ns, miss rate = 5% (local)
- RAM access time: 100 ns. Address: 32-bit, byte-addressable.
Find: (a) L1 address breakdown (tag/line/offset) (b) L2 address breakdown (c) AMAT (d) Speedup over no cache (e) If L1 miss rate improves to 5%, new AMAT and % improvement.
Solution:
(a) L1: Block = 32 bytes โ offset = 5 bits
Lines = 128 = 2โท โ index = 7 bits
Tag = 32 - 7 - 5 = 20 bits โ [20|7|5]
(b) L2: Block = 64 bytes โ offset = 6 bits
Sets = 1024/4 = 256 = 2โธ โ set index = 8 bits
Tag = 32 - 8 - 6 = 18 bits โ [18|8|6]
(c) AMAT = T_L1 + MR_L1 ร (T_L2 + MR_L2 ร T_RAM)
= 1 + 0.10 ร (8 + 0.05 ร 100)
= 1 + 0.10 ร (8 + 5)
= 1 + 0.10 ร 13
= 1 + 1.3 = 2.3 ns
(d) Speedup = RAM_time / AMAT = 100 / 2.3 = 43.5ร
(e) New AMAT = 1 + 0.05 ร 13 = 1 + 0.65 = 1.65 ns
Improvement = (2.3 - 1.65) / 2.3 ร 100 = 28.3%
Industry Spotlight โ A Day in the Life
๐จโ๐ป Vikram Sahu, 32 โ Cache Design Engineer at Samsung Semiconductor, Bangalore
Background: B.Tech (ECE) from NIT Bhopal. M.Tech from IIT Madras (VLSI). Joined Samsung Semiconductor India (SSIR) as a campus hire. Now leads a team of 6 engineers designing L2 cache controllers for Exynos mobile processors.
A Typical Day:
8:30 AM โ Morning sync with the Seoul (Korea) team. Review overnight simulation results for the new Exynos 2500 L2 cache design. A corner-case coherence bug was found โ discuss fix approaches.
9:30 AM โ Write RTL (Register Transfer Level) code in SystemVerilog for a new cache replacement algorithm. Samsung is exploring RRIP (Re-Reference Interval Prediction) to replace LRU.
11:00 AM โ Run synthesis and timing analysis using Synopsys Design Compiler. Target: L2 hit time โค 4 ns at 3.5 GHz. Current design meets timing with 200 ps slack.
1:00 PM โ Lunch at Samsung's Bangalore campus. Discuss power-performance trade-offs with the power management team. Every picojoule per cache access matters for phone battery life.
2:00 PM โ Run cache trace simulations using SPEC CPU2017 benchmarks. Compare 4-way vs 8-way L2 on workload mix: Chrome, WhatsApp, games, camera app. 8-way gives 2% higher hit rate but 15% more power.
4:30 PM โ Code review for a junior engineer's TLB prefetcher design. Suggest optimisations for reducing TLB miss penalty from 20 ns to 14 ns.
6:00 PM โ Write a technical report comparing the Exynos cache hierarchy with Snapdragon 8 Gen 3. Present findings to the architecture team in Seoul next week.
| Detail | Info |
|---|---|
| Tools Used Daily | SystemVerilog, Synopsys VCS, Design Compiler, GEM5 simulator, Python (scripting), Perforce (version control) |
| Entry Salary (India) | โน10โ15 LPA (M.Tech) / โน6โ8 LPA (B.Tech) |
| Mid-Level (5โ8 yrs) | โน20โ35 LPA |
| Senior (10+ yrs) | โน40โ80 LPA + RSUs |
| Companies Hiring (India) | Samsung SSIR, Qualcomm Hyderabad, Intel Bangalore, AMD Hyderabad, ARM Bangalore, Texas Instruments, MediaTek Noida, NVIDIA |
Earn With It โ Memory Optimization Skills
๐ฐ Your Earning Path After This Chapter
Portfolio Piece: A working cache simulator (Python) with trace output + a technical blog post explaining cache mapping with diagrams โ hosted on GitHub.
Skill Paths Unlocked:
โข Embedded Systems (Immediate): Optimise memory usage in Arduino/ESP32 projects. Freelance IoT gigs: โน3,000โโน10,000/project
โข VLSI/SoC Design (After GATE/M.Tech): Cache controller design at Samsung, Qualcomm, Intel. Entry: โน10โ15 LPA
โข Systems Programming: Write cache-friendly C/C++ code. Performance optimisation gigs: โน5,000โโน20,000/project
โข Technical Content Writing: Write COA tutorials for GeeksforGeeks, Naukri, or Unstop. โน500โโน2,000/article
| Opportunity | Skills Needed | Platform | Earning Potential |
|---|---|---|---|
| COA Tutorial Writer | Cache concepts + writing | GeeksforGeeks, Medium | โน500โโน2,000/article |
| Embedded IoT Projects | C/C++, memory optimization | Freelancer, Internshala | โน3,000โโน10,000/project |
| GATE Coaching Assistant | COA + numerical solving | Unacademy, Physics Wallah | โน5,000โโน15,000/month |
| Performance Tuning | Cache-aware coding | Upwork, Toptal | $25โ$75/hour |
Chapter Summary โ Memory Unit at a Glance
๐ง Key Takeaways
- Memory Hierarchy: Registers โ L1 โ L2 โ L3 โ RAM โ SSD โ HDD. Faster = smaller = costlier.
- Locality of Reference: Temporal (reuse recently accessed) + Spatial (access nearby addresses). The foundation of caching.
- Cache Mapping: Direct (simple, conflict-prone), Fully Associative (flexible, expensive), Set-Associative (practical balance).
- Address Fields: Direct: [Tag|Line|Offset]. Associative: [Tag|Offset]. Set-Assoc: [Tag|Set|Offset].
- AMAT = Hit Time + Miss Rate ร Miss Penalty. For multi-level: recurse into each level.
- Write Policies: Write-through (consistent, slow) vs Write-back (fast, needs dirty bit).
- Virtual Memory: Gives each process its own address space. Page table maps VPNโPFN. TLB caches translations.
- Page Fault: Referenced page not in RAM โ OS loads from disk (~10 ms). Must be extremely rare (<0.001%).
- SRAM vs DRAM: SRAM = fast/expensive (cache). DRAM = slow/cheap/needs refresh (main memory).
- CAM: Searches by content, not address. Used in TLBs and fully associative caches.
๐ Essential Formulas
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ AMAT = Hit_Time + Miss_Rate ร Miss_Penalty โ โ โ โ 2-Level AMAT = Tโ + MRโ ร (Tโ + MRโ ร T_RAM) โ โ โ โ Tag bits (Direct) = n - logโ(Lines) - logโ(Block) โ โ Tag bits (Assoc) = n - logโ(Block) โ โ Tag bits (Set-Assoc) = n - logโ(Sets) - logโ(Block) โ โ โ โ Sets = Total_Lines / Associativity โ โ Cache_Data = Lines ร Block_Size โ โ โ โ Virtual Pages = 2^(VA_bits - Offset_bits) โ โ Physical Frames = 2^(PA_bits - Offset_bits) โ โ Page Table Size = Num_Virtual_Pages ร Entry_Size โ โ โ โ Rotational Latency = (1/2) ร (60/RPM) seconds โ โ Disk Access = Seek + Rotational_Latency + Transfer_Time โ โ โ โ Effective Access Time (Hierarchical) = t_c + (1-h) ร t_m โ โ Effective Access Time (Simultaneous) = hรt_c + (1-h) ร t_m โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Earning Checkpoint โ Self-Assessment
| Skill / Concept | Tool / Method | Deliverable | Earning Ready? |
|---|---|---|---|
| Memory Hierarchy | Conceptual | โ | โ Yes โ interview ready |
| Cache Address Decoding | Python script | Address Decoder tool on GitHub | โ Yes โ useful for GATE coaching content |
| Cache Mapping (3 types) | Diagrams + calculations | Blog post with ASCII diagrams | โ Yes โ technical writing gigs |
| Hit/Miss Trace Simulation | Python simulator | Cache Simulator on GitHub | โ Yes โ portfolio piece |
| AMAT Calculation | Formula application | Solved numericals set | โ Yes โ GATE coaching assistance |
| Virtual Memory Concepts | Conceptual + calculations | โ | โ Yes โ interview ready |
| Cache Hierarchy Design | Full simulator (Tier 3) | L1+L2 Hierarchy Simulator | โ Yes โ resume-worthy project |
| VLSI/SoC Cache Design | SystemVerilog (beyond chapter) | โ | โฌ Not yet โ needs M.Tech/advanced courses |
โ Unit 6 complete. You've mastered the Memory Unit โ from registers to virtual memory!
[QR: Link to EduArtha video tutorial โ COA Unit 6: Memory Unit]