Computer Organization & Architecture

Unit 9: Capstone — CPU Simulator Portfolio & Career Launchpad

Synthesize all 8 units into portfolio-ready projects, master GATE preparation, and launch your hardware engineering career.

⏱ Time to Complete: 10–12 hours | 💰 Salary Range: ₹6–80 LPA (hardware pays more!) | 📝 30 MCQs (Bloom's Mapped)

💼 Jobs this unlocks: VLSI Engineer (₹6–25 LPA) | Embedded Systems (₹5–18 LPA) | FPGA Developer (₹7–22 LPA) | Firmware Engineer (₹6–20 LPA)

Section A

Opening Hook — The BCA Student Who Built a CPU

🖥️ How Rohan Went from ₹0 to ₹8 LPA by Building a CPU Simulator on GitHub

Rohan Mehta was a 3rd-year BCA student at a tier-3 college in Indore. No IIT, no NIT, no connections in the tech industry. His COA professor gave an assignment: "Build something that demonstrates CPU concepts." Most students submitted a 2-page Word document. Rohan built a complete CPU simulator in Python.

His simulator had 8 registers, a 10-instruction ISA, a fetch-decode-execute pipeline, and cache simulation. He spent 3 weeks on it — debugging late into the night, learning Git, writing proper documentation. He pushed the entire project to GitHub with a polished README, screenshots, and usage instructions.

Six months later, a recruiter at Tata Elxsi found Rohan's GitHub profile while searching for "CPU simulator Python." The recruiter was impressed — not by fancy credentials, but by the fact that this student actually built something. Rohan cleared a technical interview on computer architecture, got an offer for ₹8 LPA as an Embedded Systems Engineer, and started working on automotive ECU software for a German car manufacturer.

What if YOU built this? This chapter gives you everything you need: 8 portfolio projects (one per unit), GATE preparation, career roadmaps, and interview prep for top hardware companies.

🇮🇳 Tata Elxsi🇮🇳 Intel India🇮🇳 Samsung R&D🇮🇳 Qualcomm Hyderabad🇮🇳 ISRO🇮🇳 Texas Instruments

India's semiconductor industry is projected to reach $63 billion by 2026 (India Semiconductor Mission, MeitY). The Indian government has approved ₹76,000 crore ($10B) in incentives for semiconductor manufacturing. Companies like Micron (₹22,500 crore fab in Gujarat), Tata Electronics (OSAT facility in Assam), and CG Power are setting up chip facilities. This means thousands of new hardware engineering jobs that require exactly the COA knowledge you've built in Units 1–8.

Section B

Learning Outcomes — Bloom's Taxonomy Mapped (12 Outcomes)

Bloom's Level	Learning Outcome
🔵 Remember	LO1: Recall all 7 basic logic gates, 4 flip-flop types (SR, JK, D, T), and their truth tables from Unit 1
🔵 Remember	LO2: List the stages of the fetch-decode-execute cycle and name the 12 Mano machine register-reference instructions from Units 3–4
🔵 Understand	LO3: Explain how the fetch-decode-execute cycle works, including the role of PC, MAR, MBR, IR, and the control unit from Unit 3
🔵 Understand	LO4: Describe the memory hierarchy (registers → cache → RAM → disk) and explain why each level trades speed for capacity from Unit 6
🟢 Apply	LO5: Build a working CPU simulator in Python with 8 registers, a basic ISA, and execute a 10-instruction program from Unit 4
🟢 Apply	LO6: Implement Booth's multiplication algorithm for signed binary numbers and verify results step-by-step from Unit 7
🟢 Analyze	LO7: Compare direct-mapped, fully-associative, and set-associative cache mapping techniques using hit-ratio analysis from Unit 6
🟢 Analyze	LO8: Analyze data, control, and structural hazards in a 5-stage pipeline and determine appropriate forwarding/stalling solutions from Unit 8
🟠 Evaluate	LO9: Evaluate ISA design choices (RISC vs CISC, fixed vs variable length) and justify which is optimal for given application scenarios from Unit 4
🟠 Evaluate	LO10: Assess interrupt handling strategies (polling, vectored, daisy-chain, priority) and recommend the best approach for real-time systems from Unit 5
🟠 Create	LO11: Design a complete 8-bit CPU architecture with ALU, registers, control unit, and memory interface — documented with block diagrams
🟠 Create	LO12: Create a GitHub portfolio of 8 COA projects, deploy to GitHub Pages, and write LinkedIn-ready project descriptions for job applications

Section C

Concepts — Portfolio Projects, GATE Prep & Career Guidance

This capstone synthesizes all 8 units of Computer Organization into portfolio-ready projects. For each project, you get: a description, complete Python code, and a README template. Build all 8 and you'll have a GitHub portfolio that gets you hired.

🔧 Portfolio Project 1: Truth Table Generator (Unit 1 — Digital Logic)

📋 Project Description

What it does: Takes any Boolean expression (e.g., A AND B OR NOT C) and generates the complete truth table with all possible input combinations. Supports AND, OR, NOT, NAND, NOR, XOR, XNOR gates.

What it demonstrates: Understanding of Boolean algebra, logic gates, and combinational logic from Unit 1. Shows you can convert theory into working software.

Difficulty: Beginner | Time: 2–3 hours

Python
# truth_table_generator.py — Portfolio Project 1 (Unit 1: Digital Logic)
import itertools

def gate_and(a, b):  return a & b
def gate_or(a, b):   return a | b
def gate_not(a):     return 1 - a
def gate_nand(a, b): return 1 - (a & b)
def gate_nor(a, b):  return 1 - (a | b)
def gate_xor(a, b):  return a ^ b
def gate_xnor(a, b): return 1 - (a ^ b)

GATES = {'AND': gate_and, 'OR': gate_or, 'NOT': gate_not,
         'NAND': gate_nand, 'NOR': gate_nor,
         'XOR': gate_xor, 'XNOR': gate_xnor}

def evaluate_expression(expr, variables, values):
    """Evaluate a Boolean expression with given variable assignments."""
    env = dict(zip(variables, values))
    tokens = expr.upper().split()
    # Simple recursive-descent parser for: NOT A, A AND B, A OR B, etc.
    result = env.get(tokens[0], 0)
    if tokens[0] == 'NOT':
        result = gate_not(env.get(tokens[1], 0))
        tokens = tokens[2:]
    else:
        tokens = tokens[1:]
    i = 0
    while i < len(tokens) - 1:
        op, operand = tokens[i], tokens[i+1]
        val = env.get(operand, 0)
        if op in GATES:
            result = GATES[op](result, val)
        i += 2
    return result

def generate_truth_table(expression, variables):
    """Generate and print full truth table."""
    header = ' | '.join(variables) + ' | OUTPUT'
    print(header)
    print('-' * len(header))
    for combo in itertools.product([0, 1], repeat=len(variables)):
        result = evaluate_expression(expression, variables, combo)
        row = ' | '.join(str(v) for v in combo)
        print(f"{row} |   {result}")

# Example usage
print("=== Truth Table: A AND B OR NOT C ===")
generate_truth_table("A AND B", ['A', 'B'])

README.md
# 🔢 Truth Table Generator
**Portfolio Project 1 — COA Unit 1: Digital Logic**

## Overview
A Python tool that generates complete truth tables for any Boolean expression
using standard logic gates (AND, OR, NOT, NAND, NOR, XOR, XNOR).

## Features
- Supports all 7 basic logic gates
- Generates truth tables for any number of variables
- Clean formatted output suitable for documentation

## Usage
```bash
python truth_table_generator.py
```

## Concepts Demonstrated
- Boolean Algebra fundamentals
- Combinational logic gate operations
- Truth table construction

## Author
[Your Name] | BCA Student | COA Portfolio Project

Make it interactive! Add a command-line interface using Python's argparse module so users can type expressions directly. This small touch makes your GitHub project look professional and usable — exactly what recruiters want to see.

🔧 Portfolio Project 2: 8-bit Shift Register Simulator (Unit 2 — Sequential Circuits)

📋 Project Description

What it does: Simulates an 8-bit shift register supporting SISO (Serial-In Serial-Out), SIPO (Serial-In Parallel-Out), PISO (Parallel-In Serial-Out), and PIPO (Parallel-In Parallel-Out) operations. Shows bit movement clock-by-clock.

What it demonstrates: Understanding of sequential circuits, flip-flops, clock-driven operations, and register behavior from Unit 2.

Difficulty: Beginner | Time: 2–3 hours

Python
# shift_register_simulator.py — Portfolio Project 2 (Unit 2: Sequential Circuits)

class ShiftRegister:
    def __init__(self, size=8):
        self.size = size
        self.register = [0] * size
        self.clock = 0

    def display(self):
        bits = ''.join(str(b) for b in self.register)
        print(f"  Clock {self.clock:2d}: [{bits}]  (Decimal: {int(bits, 2)})")

    def shift_left(self, input_bit=0):
        """Shift all bits left; new bit enters from right."""
        self.clock += 1
        output = self.register[0]
        self.register = self.register[1:] + [input_bit]
        return output

    def shift_right(self, input_bit=0):
        """Shift all bits right; new bit enters from left."""
        self.clock += 1
        output = self.register[-1]
        self.register = [input_bit] + self.register[:-1]
        return output

    def parallel_load(self, data):
        """Load 8 bits simultaneously (PIPO)."""
        self.clock += 1
        self.register = [int(b) for b in f"{data:08b}"]

    def serial_in(self, bit_stream):
        """SISO: Feed bits one at a time, shift right."""
        print("  SISO — Serial In, Serial Out:")
        self.display()
        for bit in bit_stream:
            self.shift_right(bit)
            self.display()

# Demo
sr = ShiftRegister(8)
print("=== 8-Bit Shift Register Simulator ===")
print("\n--- Parallel Load: 10110010 ---")
sr.parallel_load(0b10110010)
sr.display()
print("\n--- Shift Left x3 ---")
for _ in range(3):
    sr.shift_left(1)
    sr.display()
print("\n--- Serial In: [1,0,1,1] ---")
sr2 = ShiftRegister(8)
sr2.serial_in([1, 0, 1, 1])

README.md
# 📟 8-Bit Shift Register Simulator
**Portfolio Project 2 — COA Unit 2: Sequential Circuits**

## Overview
Simulates all 4 types of shift registers: SISO, SIPO, PISO, PIPO.
Visualizes bit movement clock-by-clock.

## Features
- Left shift, Right shift operations
- Parallel load (PIPO)
- Serial input with visual output
- Clock cycle tracking

## Concepts Demonstrated
- Sequential circuit behavior
- Flip-flop based storage
- Clock-driven data movement

Shift registers are everywhere! Your keyboard uses a shift register (74HC165) to read key presses. UART serial communication uses shift registers to convert parallel data to serial. Even the SPI protocol that connects sensors to Arduino is a shift register protocol.

🔧 Portfolio Project 3: Fetch-Decode-Execute Simulator (Unit 3 — Basic Computer Organization)

📋 Project Description

What it does: Simulates the complete instruction cycle of a basic computer. Shows how the Program Counter (PC), Memory Address Register (MAR), Memory Buffer Register (MBR), and Instruction Register (IR) work together to fetch, decode, and execute instructions.

What it demonstrates: The fundamental operation of any computer — the fetch-decode-execute cycle — from Unit 3.

Difficulty: Intermediate | Time: 3–4 hours

Python
# fetch_decode_execute.py — Portfolio Project 3 (Unit 3: Basic Computer Org)

class BasicComputer:
    def __init__(self, memory_size=256):
        self.memory = [0] * memory_size
        self.PC = 0      # Program Counter
        self.MAR = 0     # Memory Address Register
        self.MBR = 0     # Memory Buffer Register
        self.IR = 0      # Instruction Register
        self.AC = 0      # Accumulator
        self.halted = False
        # Opcodes: 0=HLT, 1=LDA, 2=STA, 3=ADD, 4=SUB, 5=JMP
        self.opcodes = {0:'HLT', 1:'LDA', 2:'STA', 3:'ADD', 4:'SUB', 5:'JMP'}

    def load_program(self, program, start=0):
        for i, instr in enumerate(program):
            self.memory[start + i] = instr

    def fetch(self):
        self.MAR = self.PC
        self.MBR = self.memory[self.MAR]
        self.IR = self.MBR
        self.PC += 1
        print(f"  FETCH:  PC={self.PC-1} → MAR={self.MAR} → MBR={self.MBR:#06x} → IR={self.IR:#06x}")

    def decode_execute(self):
        opcode = (self.IR >> 8) & 0xFF
        operand = self.IR & 0xFF
        name = self.opcodes.get(opcode, '???')
        print(f"  DECODE: opcode={name}({opcode}), operand={operand}")
        if opcode == 0:  # HLT
            self.halted = True
        elif opcode == 1:  # LDA addr
            self.AC = self.memory[operand]
        elif opcode == 2:  # STA addr
            self.memory[operand] = self.AC
        elif opcode == 3:  # ADD addr
            self.AC += self.memory[operand]
        elif opcode == 4:  # SUB addr
            self.AC -= self.memory[operand]
        elif opcode == 5:  # JMP addr
            self.PC = operand
        print(f"  EXEC:   AC={self.AC}, PC={self.PC}")

    def run(self):
        cycle = 0
        while not self.halted and cycle < 100:
            print(f"\n--- Cycle {cycle} ---")
            self.fetch()
            self.decode_execute()
            cycle += 1

# Program: Load 5, Add 3, Store result, Halt
# Memory[50]=5, Memory[51]=3
comp = BasicComputer()
comp.memory[50] = 5
comp.memory[51] = 3
program = [0x0132, 0x0333, 0x0234, 0x0000]  # LDA 50, ADD 51, STA 52, HLT
comp.load_program(program)
comp.run()
print(f"\nResult at Memory[52] = {comp.memory[52]}")

README.md
# 🔄 Fetch-Decode-Execute Simulator
**Portfolio Project 3 — COA Unit 3: Basic Computer Organization**

## Overview
Simulates the fundamental instruction cycle of a von Neumann computer.
Visualizes PC, MAR, MBR, IR, and AC register changes at each step.

## Supported Instructions
| Opcode | Mnemonic | Description          |
|--------|----------|----------------------|
| 0      | HLT      | Halt execution       |
| 1      | LDA addr | Load AC from memory  |
| 2      | STA addr | Store AC to memory   |
| 3      | ADD addr | Add to AC            |
| 4      | SUB addr | Subtract from AC     |
| 5      | JMP addr | Jump to address      |

## Concepts Demonstrated
- Von Neumann architecture
- Instruction cycle (fetch → decode → execute)
- Register transfer operations

🔧 Portfolio Project 4: Simple CPU Simulator (Unit 4 — Central Processing Unit)

📋 Project Description — The Star of Your Portfolio

What it does: A complete CPU simulator with 8 general-purpose registers (R0–R7), a 10-instruction ISA (LOAD, STORE, ADD, SUB, MUL, AND, OR, NOT, JMP, HLT), program memory, data memory, and step-by-step execution trace. This is the project that gets you noticed by recruiters.

What it demonstrates: Complete understanding of CPU architecture, instruction set design, register file management, and ALU operations from Unit 4.

Difficulty: Advanced | Time: 5–8 hours

Python
# cpu_simulator.py — Portfolio Project 4 (Unit 4: CPU Architecture)
# 8 registers (R0-R7), 10-instruction ISA, step-by-step execution

class CPUSimulator:
    def __init__(self):
        self.registers = [0] * 8   # R0-R7
        self.memory = [0] * 256    # 256 bytes data memory
        self.program = []            # Instruction list
        self.PC = 0
        self.halted = False
        self.cycle = 0
        self.ISA = {
            'LOAD': self._load, 'STORE': self._store,
            'ADD':  self._add,  'SUB':   self._sub,
            'MUL':  self._mul,  'AND':   self._and,
            'OR':   self._or,   'NOT':   self._not,
            'JMP':  self._jmp,  'HLT':   self._hlt,
        }

    def _load(self, rd, val, _):
        self.registers[rd] = val

    def _store(self, rs, addr, _):
        self.memory[addr] = self.registers[rs]

    def _add(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] + self.registers[rs2]

    def _sub(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] - self.registers[rs2]

    def _mul(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] * self.registers[rs2]

    def _and(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] & self.registers[rs2]

    def _or(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] | self.registers[rs2]

    def _not(self, rd, rs, _):
        self.registers[rd] = ~self.registers[rs] & 0xFF

    def _jmp(self, addr, _, __):
        self.PC = addr - 1  # -1 because PC increments after

    def _hlt(self, *args):
        self.halted = True

    def load_program(self, instructions):
        self.program = instructions

    def dump_registers(self):
        regs = ' '.join(f"R{i}={v}" for i, v in enumerate(self.registers))
        print(f"  Registers: {regs}")

    def execute(self):
        print("=== CPU Simulator — Execution Trace ===")
        while not self.halted and self.PC < len(self.program):
            instr = self.program[self.PC]
            op = instr[0]
            args = instr[1:] + [0] * (3 - len(instr[1:]))
            print(f"\n  Cycle {self.cycle} | PC={self.PC} | {op} {args}")
            self.ISA[op](*args)
            self.dump_registers()
            self.PC += 1
            self.cycle += 1
        print(f"\n=== Halted after {self.cycle} cycles ===")

# Sample 10-instruction program: Compute (5+3)*2 and store
cpu = CPUSimulator()
cpu.load_program([
    ['LOAD', 0, 5],       # R0 = 5
    ['LOAD', 1, 3],       # R1 = 3
    ['ADD',  2, 0, 1],    # R2 = R0 + R1 = 8
    ['LOAD', 3, 2],       # R3 = 2
    ['MUL',  4, 2, 3],    # R4 = R2 * R3 = 16
    ['STORE',4, 100],     # Memory[100] = R4
    ['LOAD', 5, 255],     # R5 = 255
    ['AND',  6, 4, 5],    # R6 = R4 AND R5
    ['NOT',  7, 6],       # R7 = NOT R6
    ['HLT'],               # Halt
])
cpu.execute()

README.md
# 🖥️ Simple CPU Simulator
**Portfolio Project 4 — COA Unit 4: Central Processing Unit**

## Overview
A complete CPU simulator with 8 general-purpose registers, a 10-instruction
ISA, and step-by-step execution trace. The flagship project of this portfolio.

## Instruction Set Architecture (ISA)
| Instruction | Format              | Description              |
|-------------|---------------------|--------------------------|
| LOAD        | LOAD Rd, imm        | Load immediate to Rd     |
| STORE       | STORE Rs, addr      | Store Rs to memory       |
| ADD         | ADD Rd, Rs1, Rs2    | Rd = Rs1 + Rs2           |
| SUB         | SUB Rd, Rs1, Rs2    | Rd = Rs1 - Rs2           |
| MUL         | MUL Rd, Rs1, Rs2    | Rd = Rs1 * Rs2           |
| AND         | AND Rd, Rs1, Rs2    | Rd = Rs1 AND Rs2         |
| OR          | OR  Rd, Rs1, Rs2    | Rd = Rs1 OR Rs2          |
| NOT         | NOT Rd, Rs          | Rd = NOT Rs              |
| JMP         | JMP addr            | Jump to address          |
| HLT         | HLT                 | Halt execution           |

## Concepts Demonstrated
- CPU architecture (registers, ALU, control unit)
- Instruction Set Architecture design
- Program execution and instruction cycle

This is exactly the kind of project that got Rohan hired at Tata Elxsi. Indian hardware companies like Tata Elxsi, KPIT Technologies, and Sasken actively search GitHub for students who've built CPU simulators. It proves you understand hardware at a level beyond textbook theory.

🔧 Portfolio Project 5: Priority Interrupt Handler (Unit 5 — I/O Organization)

📋 Project Description

What it does: Simulates a priority-based interrupt handling system with multiple devices, priority levels, interrupt queue, and ISR (Interrupt Service Routine) dispatch. Demonstrates daisy-chain priority and vectored interrupts.

Difficulty: Intermediate | Time: 3–4 hours

Python
# priority_interrupt_handler.py — Project 5 (Unit 5: I/O Organization)
import heapq

class Interrupt:
    def __init__(self, device, priority, arrival):
        self.device = device
        self.priority = priority  # Lower number = higher priority
        self.arrival = arrival
    def __lt__(self, other):
        return self.priority < other.priority

class InterruptController:
    def __init__(self):
        self.queue = []  # Min-heap (priority queue)
        self.current = None
        self.stack = []   # For nested interrupts
        self.log = []

    def raise_interrupt(self, device, priority, time):
        intr = Interrupt(device, priority, time)
        heapq.heappush(self.queue, intr)
        self.log.append(f"  T={time}: IRQ raised by {device} (priority {priority})")

    def handle_interrupts(self):
        print("=== Priority Interrupt Handler ===")
        while self.queue:
            intr = heapq.heappop(self.queue)
            if self.current and intr.priority < self.current.priority:
                self.stack.append(self.current)
                print(f"  Preempting {self.current.device} for {intr.device}")
            self.current = intr
            print(f"  Servicing: {intr.device} | Priority: {intr.priority} | ISR executing...")
            if self.stack:
                resumed = self.stack.pop()
                print(f"  Resuming: {resumed.device}")
                self.current = resumed
        print("  All interrupts serviced.\n")

# Demo: Multiple devices with different priorities
ic = InterruptController()
ic.raise_interrupt("Keyboard",  3, 1)
ic.raise_interrupt("Disk",      2, 2)
ic.raise_interrupt("Timer",     0, 3)   # Highest priority
ic.raise_interrupt("Network",   1, 4)
ic.raise_interrupt("Printer",   4, 5)
ic.handle_interrupts()

README.md
# ⚡ Priority Interrupt Handler
**Portfolio Project 5 — COA Unit 5: I/O Organization**

## Overview
Simulates priority-based interrupt handling with preemption, nested
interrupts, and ISR dispatch using a min-heap priority queue.

## Concepts Demonstrated
- Priority interrupt system (hardware-level)
- Interrupt service routines (ISR)
- Preemption and nested interrupt handling
- Daisy-chain priority encoding

🔧 Portfolio Project 6: Cache Simulator (Unit 6 — Memory Organization)

📋 Project Description

What it does: Simulates three cache mapping techniques: Direct-Mapped, Fully-Associative, and Set-Associative. Processes a sequence of memory addresses and computes hit ratio, miss ratio, and replacement decisions (LRU for associative caches).

Difficulty: Advanced | Time: 4–6 hours

Python
# cache_simulator.py — Project 6 (Unit 6: Memory Organization)

class CacheSimulator:
    def __init__(self, cache_size, block_size, assoc, mapping):
        self.cache_size = cache_size
        self.block_size = block_size
        self.assoc = assoc  # 1=direct, cache_size//block_size=fully, else set
        self.mapping = mapping
        self.num_blocks = cache_size // block_size
        self.num_sets = self.num_blocks // assoc
        self.cache = [[None] * assoc for _ in range(self.num_sets)]
        self.lru = [[0] * assoc for _ in range(self.num_sets)]
        self.hits = 0
        self.misses = 0
        self.time = 0

    def access(self, address):
        self.time += 1
        block_num = address // self.block_size
        set_index = block_num % self.num_sets
        tag = block_num // self.num_sets
        # Check for hit
        for i in range(self.assoc):
            if self.cache[set_index][i] == tag:
                self.hits += 1
                self.lru[set_index][i] = self.time
                return "HIT"
        # Miss — find LRU slot to replace
        self.misses += 1
        lru_idx = self.lru[set_index].index(min(self.lru[set_index]))
        self.cache[set_index][lru_idx] = tag
        self.lru[set_index][lru_idx] = self.time
        return "MISS"

    def simulate(self, addresses):
        print(f"=== Cache Simulator ({self.mapping}) ===")
        print(f"  Size: {self.cache_size}B | Block: {self.block_size}B | Sets: {self.num_sets} | Assoc: {self.assoc}-way")
        for addr in addresses:
            result = self.access(addr)
            print(f"  Addr {addr:3d} → Block {addr//self.block_size:2d} → Set {(addr//self.block_size)%self.num_sets} → {result}")
        total = self.hits + self.misses
        print(f"  Hit Ratio: {self.hits}/{total} = {self.hits/total*100:.1f}%")

# Demo: Compare all three mapping types
addrs = [0, 4, 8, 12, 16, 0, 4, 20, 0, 8, 24, 12, 0, 4]
print()
CacheSimulator(16, 4, 1, "Direct-Mapped").simulate(addrs)
print()
CacheSimulator(16, 4, 4, "Fully-Associative").simulate(addrs)
print()
CacheSimulator(16, 4, 2, "2-Way Set-Associative").simulate(addrs)

Cache simulator projects are a GATE favourite. GATE CSE asks 2–3 questions on cache mapping every year. By building this simulator, you don't just memorize formulas — you see why direct-mapped has more conflict misses and why set-associative is the sweet spot.

🔧 Portfolio Project 7: Booth's Multiplication Calculator (Unit 7 — Computer Arithmetic)

📋 Project Description

What it does: Implements Booth's algorithm for signed binary multiplication. Shows step-by-step: accumulator (A), multiplier (Q), Q_-1 bit, and arithmetic shift right at each iteration.

Difficulty: Intermediate | Time: 3–4 hours

Python
# booths_multiplication.py — Project 7 (Unit 7: Computer Arithmetic)

def to_binary(num, bits):
    if num >= 0: return format(num, f'0{bits}b')
    return format((1 << bits) + num, f'0{bits}b')  # 2's complement

def arithmetic_shift_right(A, Q, Qn1, bits):
    """Arithmetic shift right: preserve sign of A."""
    new_Qn1 = Q & 1
    Q = (Q >> 1) | ((A & 1) << (bits - 1))
    sign = A & (1 << (bits - 1))
    A = (A >> 1) | sign
    return A, Q, new_Qn1

def booths_multiply(M, Q_val, bits=5):
    print(f"=== Booth's Multiplication: {M} × {Q_val} ===")
    mask = (1 << bits) - 1
    A = 0
    Q = Q_val & mask if Q_val >= 0 else ((1 << bits) + Q_val) & mask
    M_val = M & mask if M >= 0 else ((1 << bits) + M) & mask
    Qn1 = 0
    print(f"  Init: A={to_binary(A,bits)} Q={to_binary(Q,bits)} Q-1={Qn1}")
    for i in range(bits):
        Q0 = Q & 1
        if Q0 == 1 and Qn1 == 0:
            A = (A - M_val) & mask
            print(f"  Step {i+1}: Q0Qn1=10 → A = A - M")
        elif Q0 == 0 and Qn1 == 1:
            A = (A + M_val) & mask
            print(f"  Step {i+1}: Q0Qn1=01 → A = A + M")
        else:
            print(f"  Step {i+1}: Q0Qn1={Q0}{Qn1} → No operation")
        A, Q, Qn1 = arithmetic_shift_right(A, Q, Qn1, bits)
        print(f"          ASR: A={to_binary(A,bits)} Q={to_binary(Q,bits)} Q-1={Qn1}")
    result_bin = to_binary(A, bits) + to_binary(Q, bits)
    print(f"  Result: {result_bin} = {M * Q_val}")

booths_multiply(-3, 5)
print()
booths_multiply(7, -4)

Students often confuse Q₀ with Q_-1. Remember: Q₀ is the LSB of the multiplier Q. Q_-1 is an extra bit initialized to 0 that tracks the previous Q₀. The decision (add/subtract/nothing) depends on the pair [Q₀, Q_-1].

🔧 Portfolio Project 8: Pipeline Speedup Calculator (Unit 8 — Pipelining)

📋 Project Description

What it does: Calculates pipeline speedup, throughput, and efficiency for a k-stage pipeline. Also detects and reports data hazards, control hazards, and structural hazards in instruction sequences.

Difficulty: Intermediate | Time: 3–4 hours

Python
# pipeline_calculator.py — Project 8 (Unit 8: Pipelining)

class PipelineCalculator:
    def __init__(self, stages, clock_period_ns):
        self.k = stages
        self.tp = clock_period_ns

    def speedup(self, n):
        """Calculate speedup for n instructions."""
        non_pipeline = n * self.k * self.tp
        pipeline = (self.k + n - 1) * self.tp
        s = non_pipeline / pipeline
        return s, non_pipeline, pipeline

    def throughput(self, n):
        """Instructions completed per nanosecond."""
        pipeline_time = (self.k + n - 1) * self.tp
        return n / pipeline_time

    def efficiency(self, n):
        return self.speedup(n)[0] / self.k * 100

    def detect_hazards(self, instructions):
        """Detect data hazards (RAW, WAR, WAW) in instruction list."""
        hazards = []
        for i in range(len(instructions) - 1):
            curr_dest = instructions[i].get('dest')
            next_srcs = instructions[i+1].get('src', [])
            if curr_dest and curr_dest in next_srcs:
                hazards.append(f"  RAW Hazard: I{i+1}→I{i+2} on {curr_dest}")
        return hazards

    def report(self, n):
        s, t_np, t_p = self.speedup(n)
        print(f"=== Pipeline Performance Report ===")
        print(f"  Stages: {self.k} | Clock: {self.tp}ns | Instructions: {n}")
        print(f"  Non-pipeline time: {t_np}ns")
        print(f"  Pipeline time:     {t_p}ns")
        print(f"  Speedup:           {s:.2f}x")
        print(f"  Throughput:        {self.throughput(n):.4f} instr/ns")
        print(f"  Efficiency:        {self.efficiency(n):.1f}%")
        print(f"  Max Speedup (n→∞): {self.k:.1f}x")

# Demo
pipe = PipelineCalculator(stages=5, clock_period_ns=2)
pipe.report(100)

# Hazard Detection
instrs = [
    {'op':'ADD', 'dest':'R1', 'src':['R2','R3']},
    {'op':'SUB', 'dest':'R4', 'src':['R1','R5']},  # RAW on R1!
    {'op':'MUL', 'dest':'R6', 'src':['R4','R7']},  # RAW on R4!
]
print("\n=== Hazard Detection ===")
for h in pipe.detect_hazards(instrs):
    print(h)

📚 GATE Preparation — Top 40 COA Questions (5 per Unit)

The following 40 questions are organized by unit, covering all 8 units of Computer Organization. Each question is GATE-level with full solutions. See Appendix E for the complete Q&A set.

Unit 1: Digital Logic — Sample GATE Questions

Q1: The minimum number of 2-input NAND gates required to implement the function F = AB + CD is:
(A) 3 (B) 4 (C) 5 (D) 6
Answer: (C) 5 — Two NANDs for (AB)', two NANDs for (CD)', one NAND for ((AB)' · (CD)')' = AB + CD.

Q2: A JK flip-flop with J=K=1 acts as:
(A) SR flip-flop (B) D flip-flop (C) T flip-flop (D) Latch
Answer: (C) T flip-flop — When J=K=1, the output toggles on every clock edge.

Unit 4: CPU — Sample GATE Questions

Q1: In a microprogrammed control unit, the control memory stores:
(A) Data values (B) Microinstructions (C) Cache tags (D) Interrupt vectors
Answer: (B) Microinstructions — Control memory holds the microprogram that generates control signals for each machine instruction.

Q2: RISC architecture favors:
(A) Complex instructions with variable length (B) Simple, fixed-length instructions with register-to-register operations (C) Memory-to-memory operations (D) Microprogrammed control
Answer: (B) — RISC uses simple fixed-length instructions, primarily register-to-register, with hardwired control for fast execution.

Full 40 questions with detailed solutions available in Appendix E.

🎯 Career Guidance — 4 Hardware Engineering Paths

Career Path 1: VLSI Design Engineer

🔬 VLSI Design — The High-Paying Hardware Path

What you do: Design integrated circuits (chips) at transistor level. You work with logic synthesis, physical design, timing analysis, and verification. You literally design the chips that go into phones, cars, and satellites.

Key Tools: Cadence Virtuoso (analog design), Synopsys Design Compiler (logic synthesis), Mentor Graphics Calibre (DRC/LVS), Xilinx Vivado (FPGA prototyping)

Salary: Fresher ₹6–12 LPA → Mid ₹15–25 LPA → Senior ₹30–80 LPA

Companies in India: Intel Bangalore, Qualcomm Hyderabad, Samsung Noida, Texas Instruments Bangalore, MediaTek Noida, Synopsys Bangalore, Cadence Noida, ARM Bangalore

Career Path 2: Embedded Systems Engineer

🤖 Embedded Systems — Where Software Meets Hardware

What you do: Program microcontrollers and microprocessors that run inside devices — washing machines, car ECUs, medical devices, drones. You write C/C++ code that runs directly on hardware with real-time constraints.

Platforms: Arduino, STM32, Raspberry Pi, ESP32, TI MSP430. RTOS: FreeRTOS, Zephyr, VxWorks.

Salary: Fresher ₹5–8 LPA → Mid ₹10–18 LPA → Senior ₹20–40 LPA

Companies: Bosch India, Continental, Tata Elxsi, KPIT Technologies, Sasken, L&T Technology Services

Career Path 3: Firmware Engineer

⚙️ Firmware — The Bridge Between Hardware and Software

What you do: Write low-level code that initializes hardware. Boot loaders, device drivers, BIOS/UEFI, peripheral controllers. Your code is the first thing that runs when a device powers on.

Skills: C, Assembly (ARM/x86), hardware debugging (JTAG, oscilloscope), Linux kernel, device tree.

Salary: Fresher ₹6–10 LPA → Mid ₹12–20 LPA → Senior ₹25–50 LPA

Companies: Intel, AMD, Qualcomm, Western Digital, Seagate, Samsung, HP, Dell

Career Path 4: FPGA Developer

🧩 FPGA — Programmable Hardware

What you do: Design digital circuits using Verilog/VHDL that get programmed onto FPGA chips. Used in telecom (5G base stations), defense (radar systems), finance (high-frequency trading), and AI accelerators.

Tools: Verilog, VHDL, Xilinx Vivado, Intel Quartus Prime, ModelSim

Salary: Fresher ₹7–12 LPA → Mid ₹15–22 LPA → Senior ₹25–60 LPA

Companies: Xilinx (AMD), Intel, Qualcomm, Analog Devices, National Instruments, ISRO

💰 Hardware vs Software Salary Comparison (Indian Market)

Role	Fresher (0–2 yrs)	Mid (3–5 yrs)	Senior (6–10 yrs)	Lead (10+ yrs)
VLSI Design Engineer	₹6–12 LPA	₹15–25 LPA	₹30–50 LPA	₹50–80 LPA
Embedded Systems Engineer	₹5–8 LPA	₹10–18 LPA	₹20–35 LPA	₹35–55 LPA
Firmware Engineer	₹6–10 LPA	₹12–20 LPA	₹25–40 LPA	₹40–65 LPA
FPGA Developer	₹7–12 LPA	₹15–22 LPA	₹25–45 LPA	₹45–60 LPA
Software Developer	₹4–8 LPA	₹10–20 LPA	₹20–40 LPA	₹40–70 LPA
Full-Stack Developer	₹4–7 LPA	₹8–18 LPA	₹18–35 LPA	₹35–55 LPA
Data Engineer	₹5–10 LPA	₹12–22 LPA	₹22–40 LPA	₹40–65 LPA
DevOps Engineer	₹5–9 LPA	₹10–20 LPA	₹20–38 LPA	₹38–55 LPA

Hardware pays more at senior levels. While software freshers may earn slightly more, VLSI and FPGA engineers at 10+ years experience consistently earn ₹50–80 LPA in India — matching or exceeding software engineering. The supply of hardware talent is much lower, making it a less competitive but higher-paying career.

📝 LinkedIn Profile Template for Hardware Engineers

LinkedIn Template
HEADLINE:
"VLSI/Embedded Systems Engineer | CPU Architecture | Verilog | ARM | GATE Qualified | Open to Opportunities"

ABOUT:
Computer Organization enthusiast with hands-on experience in CPU design, cache simulation, and pipeline optimization. Built 8 portfolio projects including a complete CPU simulator with 8 registers and a 10-instruction ISA. Proficient in Python, C, Verilog, and embedded C. GATE qualified with strong fundamentals in digital logic, computer arithmetic, and memory systems.

FEATURED PROJECTS:
• CPU Simulator (Python) — 8 registers, fetch-decode-execute, deployed on GitHub
• Cache Simulator — Direct/Associative/Set-Associative with hit-ratio analysis
• Booth's Multiplication Calculator — Step-by-step signed binary multiplication
• Pipeline Speedup Calculator — Hazard detection and performance analysis

SKILLS:
Digital Logic | Computer Architecture | Verilog/VHDL | ARM Assembly
Cache Design | Pipelining | Booth's Algorithm | Python | C/C++
GATE Preparation | Cadence | Synopsys | Xilinx Vivado

🏢 Interview Prep Roadmap

Company	Rounds	Key Topics	Prep Tips
TCS (Digital)	Aptitude → Technical → HR	Digital logic, flip-flops, basic CPU, GATE questions	Focus on TCS NQT pattern; practice Mano machine questions
Infosys	Online Test → Technical → HR	Boolean algebra, memory hierarchy, basic pipelining	InfyTQ certification helps; practice coding + COA MCQs
Samsung R&D	Coding Test → 2 Technical → HR	Cache design, ARM architecture, OS+COA combined	Practice 3-hour coding tests; deep dive into cache problems
Intel India	Online → 3 Technical → Hiring Manager	VLSI, Verilog, pipeline hazards, cache coherence, ISA design	Expect whiteboard design questions; know x86 pipeline stages
Qualcomm	Online → 4 Technical → Bar Raiser	ARM architecture, cache protocols (MOESI), power optimization	Deep knowledge of ARM Cortex; practice design questions

Section D

Learn by Doing — Deploy CPU Simulator to GitHub

🟢 Tier 1 — GUIDED: Push CPU Simulator to GitHub

⏱️ 45–60 minutesBeginnerStep-by-step instructions

Step 1: Create a GitHub Account

Go to github.com → Sign up (free) → Verify email.

Step 2: Create a New Repository

Click "+" → "New repository"
Name: coa-cpu-simulator
Description: "Complete CPU simulator with 8 registers, 10-instruction ISA — COA Portfolio"
Set to Public, check "Add README"
Click "Create repository"

Step 3: Upload Your Python Files

Click "Add file" → "Upload files"
Upload all 8 project .py files
Write commit message: "Add all 8 COA portfolio projects"
Click "Commit changes"

Step 4: Write a Professional README

Edit README.md to include: project overview, screenshots, ISA table, how to run, and concepts demonstrated. Use the README templates from each project above.

🎉 You now have a live portfolio on GitHub! Share the URL on your LinkedIn profile.

🟡 Tier 2 — SEMI-GUIDED: Add CI, Tests & Badges

⏱️ 90 minutesIntermediateHints provided

Your Mission:

Write unit tests using Python's unittest module for the CPU simulator
Create a GitHub Actions workflow (.github/workflows/test.yml) to run tests automatically on every push
Add badges to your README: build status, Python version, license

Hints:

YAML
# .github/workflows/test.yml
name: Run Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with: { python-version: '3.11' }
      - run: python -m pytest tests/ -v

Stretch Goal: Add code coverage reporting using pytest-cov and display the coverage badge in your README. Target 80%+ coverage.

🔴 Tier 3 — OPEN CHALLENGE: Interactive CPU Simulator on GitHub Pages

⏱️ 3–5 hoursAdvancedNo instructions — build it yourself

The Brief:

Convert your CPU simulator into an interactive web application using HTML/CSS/JavaScript. Users should be able to:

Enter assembly instructions in a text area
Click "Execute" to run them step-by-step
See register values update in real-time
Visualize the fetch-decode-execute cycle

Deploy to GitHub Pages so anyone can try it without installing anything.

This single project can replace 100 resume bullet points. A live, interactive CPU simulator on GitHub Pages tells a recruiter: "This person doesn't just study computer architecture — they build it." It's a portfolio piece that separates you from 10,000 other applicants.

Section E

Cross-Unit Synthesis Problems

🔗 Problem 1: End-to-End CPU Design (Units 1, 3, 4, 8)

Problem: Design a 4-stage pipelined CPU that uses NAND gates for its ALU (Unit 1), implements the Mano machine instruction format (Unit 3), supports 5 instructions from Unit 4's ISA, and handles data hazards using forwarding (Unit 8).

Deliverables: Block diagram, instruction format, pipeline timing diagram showing a 5-instruction program with at least one hazard and its resolution.

Synthesis Required: You must connect gate-level design (Unit 1) with instruction format (Unit 3), CPU registers (Unit 4), and pipeline mechanics (Unit 8).

🔗 Problem 2: Memory System Design (Units 2, 6, 7)

Problem: Design a memory system for a CPU that has: a 4-way set-associative L1 cache with LRU replacement (Unit 6), uses shift registers for serial data transfer from cache to CPU (Unit 2), and performs Booth's multiplication on cache addresses for tag comparison (Unit 7).

Given: Cache size = 64KB, block size = 64B, memory address = 32 bits. Calculate tag, index, and offset bits. Show hit/miss sequence for 10 given addresses. Demonstrate Booth's multiplication for one address computation.

🔗 Problem 3: I/O-Driven Pipeline Stall Analysis (Units 5, 6, 8)

Problem: A 5-stage pipeline is executing instructions when a priority interrupt arrives at cycle 7 (Unit 5). The ISR needs data from memory, causing a cache miss (Unit 6) that takes 10 cycles. Analyze: (a) How many pipeline stages are stalled? (b) What is the effective CPI? (c) What is the total penalty in clock cycles? (d) How would a write-back cache vs write-through cache affect the penalty?

🔗 Problem 4: Complete Instruction Execution Trace (Units 1–4, 7)

Problem: Trace the execution of MUL R3, R1, R2 where R1=7 (binary: 0111) and R2=-3 (binary: 1101 in 4-bit 2's complement). Show: (a) Gate-level operations in the ALU (Unit 1), (b) Register transfers during fetch-decode-execute (Unit 3), (c) Booth's algorithm step-by-step for the multiplication (Unit 7), (d) Final result stored in R3 (Unit 4).

🔗 Problem 5: System Performance Optimization (Units 4, 5, 6, 8)

Problem: A computer system has: a 5-stage pipeline (Unit 8), 2-way set-associative L1 cache with 95% hit rate (Unit 6), vectored interrupt handling (Unit 5), and a RISC CPU (Unit 4). An interrupt occurs every 1000 instructions. Each interrupt takes 5 cycles. Cache miss penalty is 20 cycles. Calculate: (a) Effective CPI, (b) Pipeline efficiency, (c) Total execution time for 10,000 instructions at 2 GHz clock, (d) Suggest one improvement for each bottleneck.

Section F

MCQ Assessment Bank — 30 Cross-Unit GATE-Style Questions

Remember / Identify (Q1–Q5)

The output of XOR gate for inputs A=1, B=1 is:

0
1
Undefined
Depends on clock

RememberUnit 1

✅ Answer: (A) 0 — XOR outputs 1 only when inputs differ. 1⊕1 = 0.

In Mano's basic computer, the instruction STA performs:

Load accumulator from memory
Store accumulator to memory
Skip on accumulator zero
Store and add

RememberUnit 3

✅ Answer: (B) — STA (Store Accumulator) writes the AC value to the memory address specified in the instruction.

The number of flip-flops needed for a mod-16 counter is:

RememberUnit 2

✅ Answer: (C) 4 — A mod-N counter needs ⌈log₂N⌉ flip-flops. log₂(16) = 4.

Which addressing mode uses the operand value directly in the instruction?

Direct addressing
Indirect addressing
Immediate addressing
Register addressing

RememberUnit 4

✅ Answer: (C) Immediate addressing — The operand value itself is part of the instruction, e.g., LOAD R1, #5.

In Booth's algorithm, when Q₀Q₋₁ = "10", the operation performed is:

A = A + M
A = A - M
No operation
Shift left

RememberUnit 7

✅ Answer: (B) A = A - M — When Q₀=1 and Q₋₁=0, we subtract the multiplicand from the accumulator before shifting.

Understand / Explain (Q6–Q10)

Why does a set-associative cache have lower miss rate than a direct-mapped cache of the same size?

It has more total blocks
It allows multiple blocks to map to the same set, reducing conflict misses
It uses faster SRAM technology
It has a larger block size

UnderstandUnit 6

✅ Answer: (B) — Set-associative caches reduce conflict misses because each memory block can go into any slot within its set, rather than being forced into exactly one slot.

In pipelining, a data hazard occurs when:

Two instructions need the same functional unit
A branch instruction changes the program flow
An instruction depends on the result of a previous instruction still in the pipeline
The pipeline clock speed is too fast

UnderstandUnit 8

✅ Answer: (C) — Data hazard (RAW) occurs when an instruction needs a value that hasn't been computed yet by a prior instruction in the pipeline.

Why is DMA preferred over programmed I/O for bulk data transfers?

DMA is cheaper to implement
DMA transfers data without CPU intervention, freeing the CPU for other tasks
DMA uses less memory bandwidth
DMA doesn't need interrupt handling

UnderstandUnit 5

✅ Answer: (B) — DMA (Direct Memory Access) handles data transfer between I/O and memory independently, allowing the CPU to execute other instructions simultaneously.

Explain why RISC processors typically have more registers than CISC processors:

RISC chips are physically larger
RISC uses register-to-register operations, so more registers reduce memory accesses
CISC doesn't support registers
RISC registers are smaller in bit-width

UnderstandUnit 4

✅ Answer: (B) — RISC architectures perform most operations between registers (load-store architecture), so having more registers reduces the need for slow memory accesses.

Q10

In the memory hierarchy, why is cache memory faster but smaller than main memory?

Cache uses DRAM while main memory uses SRAM
Cache is built with SRAM which is faster but more expensive per bit than DRAM
Cache is located outside the CPU
Main memory has lower latency

UnderstandUnit 6

✅ Answer: (B) — Cache uses SRAM (6 transistors/bit, fast, expensive). Main memory uses DRAM (1 transistor + 1 capacitor/bit, slower, cheaper). Cost-performance tradeoff dictates cache is small but fast.

Apply (Q11–Q15)

Q11

A direct-mapped cache has 64 blocks, block size = 16 bytes, and the memory address is 16 bits. The number of tag bits is:

ApplyUnit 6

✅ Answer: (B) 6 — Offset = log₂(16) = 4 bits. Index = log₂(64) = 6 bits. Tag = 16 - 4 - 6 = 6 bits.

Q12

Using Booth's algorithm, multiply -5 × 3 (in 5-bit representation). The final result in decimal is:

ApplyUnit 7

✅ Answer: (A) -15 — Booth's algorithm correctly handles signed multiplication: -5 × 3 = -15.

Q13

A 5-stage pipeline processes 100 instructions. The speedup over non-pipelined execution is approximately:

5.0×
4.81×
3.5×
2.0×

ApplyUnit 8

✅ Answer: (B) 4.81× — Speedup = (n×k)/((k+n-1)) = (100×5)/(5+100-1) = 500/104 ≈ 4.81.

Q14

Simplify the Boolean expression: F = A'B + AB' + AB using a K-map.

A + B
A'B'
AB
A ⊕ B

ApplyUnit 1

✅ Answer: (A) A + B — Grouping minterms 01, 10, 11 in the K-map gives F = A + B.

Q15

In an 8-bit shift register initially loaded with 11001010, after 3 right shifts with serial input 0, the register contains:

00011001
01011001
00110010
10100000

ApplyUnit 2

✅ Answer: (A) 00011001 — Right shift 3 times with 0 input: 11001010 → 01100101 → 00110010 → 00011001.

Analyze (Q16–Q20)

Q16

Compare a 4-way set-associative cache with a direct-mapped cache of the same size. Which statement is TRUE?

Direct-mapped always has higher hit rate
Set-associative has higher hit rate but needs more comparison hardware
Both have identical hit rates for all access patterns
Set-associative is always slower

AnalyzeUnit 6

✅ Answer: (B) — Set-associative reduces conflict misses (higher hit rate) but requires n comparators running in parallel (more hardware complexity).

Q17

In a 5-stage pipeline, instruction I3 reads R1 which was written by I1 (still in MEM stage). This is an example of:

Structural hazard
Control hazard
RAW data hazard
WAR data hazard

AnalyzeUnit 8

✅ Answer: (C) RAW (Read After Write) — I3 needs to read R1 before I1 has written it back. This is the most common pipeline hazard, resolved by forwarding or stalling.

Q18

Analyze the instruction ADD R1, [R2]. Which addressing modes are used?

Register and Immediate
Register and Register Indirect
Direct and Immediate
Register and Direct

AnalyzeUnit 4

✅ Answer: (B) — R1 uses Register addressing. [R2] uses Register Indirect addressing — R2 contains the memory address of the operand.

Q19

A system uses polling to check 5 I/O devices. Each poll takes 2μs. Analyze the maximum latency for the highest-priority device vs the lowest-priority device.

2μs and 10μs
2μs and 8μs
10μs and 10μs
2μs and 2μs

AnalyzeUnit 5

✅ Answer: (A) — Highest priority device is checked first (2μs). Lowest priority (5th) is checked after all others: 5 × 2μs = 10μs maximum latency.

Q20

De Morgan's theorem states that (A + B)' equals:

A' + B'
A'B'
AB'
(AB)'

AnalyzeUnit 1

✅ Answer: (B) A'B' — De Morgan's: complement of sum = product of complements. (A+B)' = A'·B'.

Evaluate (Q21–Q25)

Q21

For a real-time embedded system controlling a car's ABS, which interrupt scheme is most appropriate?

Software polling
Vectored priority interrupts
Daisy-chain without priority
DMA with no interrupts

EvaluateUnit 5

✅ Answer: (B) Vectored priority interrupts — ABS needs guaranteed response time for the wheel sensors (highest priority). Vectored interrupts provide direct ISR dispatch without polling overhead.

Q22

Evaluate: For a mobile phone processor, RISC is preferred over CISC primarily because:

RISC instructions are more powerful
RISC has lower power consumption due to simpler decode logic and fixed-length instructions
RISC can run x86 software
RISC doesn't need cache

EvaluateUnit 4

✅ Answer: (B) — ARM (RISC) dominates mobile because simpler decode hardware = fewer transistors = lower power consumption. This is why 99% of smartphones use ARM, not x86.

Q23

A cache with high associativity (e.g., 16-way) vs a cache with low associativity (2-way) of the same size. Evaluate the tradeoff:

Higher associativity always improves performance
Higher associativity reduces conflict misses but increases access time and hardware cost
Lower associativity is always better
Associativity doesn't affect performance

EvaluateUnit 6

✅ Answer: (B) — More ways = fewer conflict misses, but each access requires more comparators running in parallel, increasing latency and silicon area. The diminishing returns typically make 4–8 way optimal.

Q24

Evaluate which pipeline hazard resolution technique has zero performance penalty:

Stalling (bubbles)
Operand forwarding (data bypassing)
Branch prediction (when correct)
Both (B) and (C)

EvaluateUnit 8

✅ Answer: (D) — Forwarding eliminates stalls for data hazards at zero CPI penalty. Correct branch predictions have zero penalty as the pipeline continues without flushing.

Q25

Evaluate: Is Booth's algorithm more efficient than simple binary multiplication for negative numbers?

No, they perform identically
Yes, Booth's handles signed numbers without conversion to unsigned first
No, Booth's only works for positive numbers
Yes, but only for powers of 2

EvaluateUnit 7

✅ Answer: (B) — Booth's algorithm works directly on 2's complement representation, handling both positive and negative numbers without needing separate sign handling logic.

Create (Q26–Q30)

Q26

Design a 4-bit ALU that supports ADD, SUB, AND, and OR. The minimum number of 2:1 multiplexers needed at the output stage is:

CreateUnit 1, 4

✅ Answer: (B) 4 — One 4:1 MUX (= two 2:1 MUXes) per output bit × 4 bits, but using 4:1 MUXes directly = 4 multiplexers total to select among the 4 operations for each bit.

Q27

Design a cache memory with the following specs: 256KB total, 64B block size, 4-way set-associative. How many sets does this cache have?

256
512
1024
4096

CreateUnit 6

✅ Answer: (C) 1024 — Total blocks = 256KB/64B = 4096. Sets = 4096/4 = 1024.

Q28

Create a 5-instruction program using the ISA {LOAD, ADD, SUB, MUL, HLT} that computes (A+B)×(A-B) where A=10, B=3:

LOAD R0,10; LOAD R1,3; ADD R2,R0,R1; SUB R3,R0,R1; MUL R4,R2,R3
LOAD R0,10; ADD R1,R0,3; SUB R2,R0,3; MUL R3,R1,R2; HLT
MUL R0,10,3; ADD R1,R0,13; SUB R2,R0,7; HLT; HLT
LOAD R0,3; LOAD R1,10; MUL R2,R0,R1; HLT; HLT

CreateUnit 4

✅ Answer: (A) — R0=10, R1=3, R2=R0+R1=13, R3=R0-R1=7, R4=R2×R3=91. This correctly computes (10+3)×(10-3)=91.

Q29

Create a priority interrupt system for 4 devices with priorities: Timer(0), Disk(1), Keyboard(2), Printer(3). If all 4 raise interrupts simultaneously, the correct servicing order is:

Printer → Keyboard → Disk → Timer
Timer → Disk → Keyboard → Printer
All serviced simultaneously
Random order

CreateUnit 5

✅ Answer: (B) — Priority 0 (Timer) is highest. Servicing order: Timer → Disk → Keyboard → Printer (lowest to highest priority number).

Q30

Create a pipeline diagram for these 3 instructions in a 5-stage pipeline, showing the stall required: I1: ADD R1, R2, R3 / I2: SUB R4, R1, R5 / I3: MUL R6, R4, R7. Without forwarding, how many stall cycles are needed?

CreateUnit 8

✅ Answer: (C) 4 — I2 depends on I1 (RAW on R1): 2 stalls. I3 depends on I2 (RAW on R4): 2 stalls. Total = 4 stall cycles without forwarding.

Section G

Short Answer Questions (8 Questions)

SA1: Explain the fetch-decode-execute cycle with a diagram. (5 marks)

Model Answer: The instruction cycle consists of three phases:

1. Fetch: PC → MAR → Memory → MBR → IR. The program counter provides the address, which goes to MAR. Memory contents at that address are fetched into MBR, then transferred to IR. PC is incremented.

2. Decode: The control unit examines the opcode portion of IR. It determines which instruction to execute and generates appropriate control signals.

3. Execute: The operation specified by the opcode is performed. This may involve ALU operations, register transfers, or memory access. For memory-reference instructions, the operand address is sent to MAR for a second memory access.

This cycle repeats until a HLT instruction is encountered.

SA2: Compare RISC and CISC architectures with 5 differences. (5 marks)

Feature	RISC	CISC
Instructions	Simple, fixed-length	Complex, variable-length
Addressing modes	Few (3–5)	Many (12–20)
Registers	Many (32–128)	Few (8–16)
Control unit	Hardwired	Microprogrammed
Execution	1 cycle per instruction (ideal)	Multiple cycles per instruction

Examples: RISC — ARM, MIPS, RISC-V. CISC — Intel x86, Motorola 68000.

SA3: What is a pipeline hazard? List all three types with examples. (5 marks)

1. Data Hazard (RAW): ADD R1,R2,R3 followed by SUB R4,R1,R5 — R1 is needed before it's written back.

2. Control Hazard: BEQ R1,R2,LABEL — The pipeline doesn't know which instruction to fetch next until the branch is resolved.

3. Structural Hazard: Two instructions need the same hardware unit (e.g., memory) in the same cycle.

Solutions: Forwarding (data), branch prediction (control), resource duplication (structural).

SA4: Explain Booth's algorithm steps for multiplying -3 × 5. (7 marks)

M = -3 = 11101 (5-bit 2's complement), Q = 5 = 00101

Init: A=00000, Q=00101, Q₋₁=0

Step 1: Q₀Q₋₁ = 10 → A = A-M = 00011, ASR → A=00001, Q=10010, Q₋₁=1

Step 2: Q₀Q₋₁ = 01 → A = A+M = 11110, ASR → A=11111, Q=01001, Q₋₁=0

Step 3: Q₀Q₋₁ = 10 → A = A-M = 00010, ASR → A=00001, Q=00100, Q₋₁=1

Step 4: Q₀Q₋₁ = 01 → A = A+M = 11110, ASR → A=11111, Q=00010, Q₋₁=0

Step 5: Q₀Q₋₁ = 00 → No op, ASR → A=11111, Q=10001, Q₋₁=0

Result: AQ = 1111110001 = -15 ✓

SA5: Differentiate between direct-mapped, fully-associative, and set-associative cache. (5 marks)

Direct-Mapped: Each memory block maps to exactly one cache line. Line = Block mod NumLines. Simple but high conflict misses.

Fully-Associative: A memory block can go in any cache line. Fewest misses but needs parallel comparators for all lines — expensive.

Set-Associative: Cache divided into sets; a block maps to one set but can go in any line within that set. Set = Block mod NumSets. Best balance of performance and cost. Most CPUs use 4–8 way set-associative L1 cache.

SA6: List and explain 4 types of flip-flops. (5 marks)

SR Flip-Flop: Set (S=1,R=0) makes Q=1; Reset (S=0,R=1) makes Q=0. S=R=1 is invalid.

JK Flip-Flop: Like SR but J=K=1 toggles output. No invalid state. Most versatile.

D Flip-Flop: Q follows D on clock edge. Used for data storage and registers.

T Flip-Flop: T=1 toggles Q; T=0 holds Q. Used in counters.

SA7: What is DMA? How does it differ from programmed I/O? (5 marks)

DMA (Direct Memory Access): A DMA controller handles data transfer between I/O devices and memory without CPU involvement. The CPU initiates the transfer and is interrupted when it's complete.

Programmed I/O: The CPU manually transfers each byte between I/O and memory, wasting CPU cycles.

Key differences: DMA frees the CPU (higher throughput), handles bulk transfers efficiently, but requires dedicated DMA controller hardware. Programmed I/O is simpler but ties up the CPU completely.

SA8: Explain the concept of memory-mapped I/O vs isolated I/O. (5 marks)

Memory-Mapped I/O: I/O devices share the same address space as memory. CPU uses regular LOAD/STORE instructions to access devices. Example: ARM processors use memory-mapped I/O exclusively.

Isolated I/O (Port-Mapped): I/O devices have a separate address space. CPU uses special IN/OUT instructions. Example: x86 has separate I/O ports (e.g., port 0x60 for keyboard).

Advantage of Memory-Mapped: No special instructions needed; any instruction that accesses memory can access I/O. Disadvantage: Reduces available memory address space.

Section H

Long Answer Questions (3 Questions, 15 marks each)

LA1: Design a complete 8-bit CPU architecture. Include registers, ALU, control unit, and memory interface. Draw a block diagram and explain each component. (15 marks)

Model Answer:

1. Register File (3 marks): 8 general-purpose registers (R0–R7), each 8 bits. Special registers: PC (8-bit), IR (16-bit: 4-bit opcode + 3-bit Rd + 3-bit Rs + 6-bit immediate/address), SP (8-bit stack pointer), Flags register (Zero, Carry, Sign, Overflow).

2. ALU (3 marks): 8-bit arithmetic logic unit supporting: ADD, SUB (with carry), AND, OR, XOR, NOT, SHL, SHR. Takes two 8-bit inputs (from register file), produces 8-bit result + 4 flag bits. Implemented using ripple-carry adder for arithmetic, gate arrays for logic.

3. Control Unit (3 marks): Hardwired control (RISC-style) or microprogrammed (CISC-style). Decodes IR opcode, generates control signals: RegWrite, MemRead, MemWrite, ALUOp, Branch, ALUSrc. Timing: single-cycle or multi-cycle design.

4. Memory Interface (3 marks): MAR (8-bit) holds address, MBR (8-bit) holds data. Address bus: 8-bit → 256 bytes addressable. Data bus: 8-bit. Control signals: Read/Write, Memory Enable. Supports aligned byte access.

5. Data Path (3 marks): Instruction flow: PC → Memory → IR → Control Unit → Control Signals. Data flow: Register File → ALU → Register File/Memory. Multiplexers select ALU inputs (register vs immediate) and write-back source (ALU result vs memory data).

LA2: Compare all three cache mapping techniques with numerical examples. Given: Memory = 64KB, Cache = 4KB, Block size = 64B. Calculate tag, index, offset for each technique. (15 marks)

Given: Memory = 64KB = 2¹⁶ bytes → 16-bit address. Cache = 4KB. Block size = 64B = 2⁶ bytes → Offset = 6 bits. Total cache blocks = 4KB/64B = 64. Total memory blocks = 64KB/64B = 1024.

1. Direct-Mapped (5 marks):

Cache lines = 64 → Index = log₂(64) = 6 bits. Tag = 16 - 6 - 6 = 4 bits.

Address format: [Tag: 4 bits | Index: 6 bits | Offset: 6 bits]

Example: Address 0x1A40 = 0001 101001 000000 → Tag=0001, Index=101001(=41), Offset=000000

Block 41 of memory maps to cache line 41. Conflict: Block 105 also maps to line 41 (105 mod 64 = 41).

2. Fully-Associative (5 marks):

No index bits. Tag = 16 - 6 = 10 bits.

Address format: [Tag: 10 bits | Offset: 6 bits]

Any memory block can go in any cache line. Needs 64 parallel comparators.

Replacement policy (LRU) needed. Best hit rate but most expensive hardware.

3. Set-Associative (5 marks):

4-way: Sets = 64/4 = 16 → Index = log₂(16) = 4 bits. Tag = 16 - 4 - 6 = 6 bits.

Address format: [Tag: 6 bits | Set Index: 4 bits | Offset: 6 bits]

Each set has 4 lines. Needs 4 parallel comparators per access.

Best balance: fewer conflict misses than direct-mapped, less hardware than fully-associative.

LA3: Explain pipelining with all hazard types and resolution techniques. Include a 5-stage pipeline timing diagram for 5 instructions with at least one hazard. (15 marks)

5-Stage Pipeline (3 marks): IF (Instruction Fetch) → ID (Instruction Decode) → EX (Execute) → MEM (Memory Access) → WB (Write Back).

Pipeline Timing (without hazards):

        C1   C2   C3   C4   C5   C6   C7   C8   C9
I1:     IF   ID   EX   MEM  WB
I2:          IF   ID   EX   MEM  WB
I3:               IF   ID   EX   MEM  WB
I4:                    IF   ID   EX   MEM  WB
I5:                         IF   ID   EX   MEM  WB

5 instructions complete in 9 cycles (vs 25 cycles non-pipelined). Speedup ≈ 2.78×.

Data Hazards (4 marks):

RAW (Read After Write): I1 writes R1, I2 reads R1. Solution: Operand forwarding — forward result from EX/MEM pipeline register to ID/EX input. Or insert NOPs/stalls.

WAR (Write After Read): Rare in in-order pipelines. Occurs in out-of-order execution.

WAW (Write After Write): Two instructions write to the same register. Relevant for superscalar processors.

Control Hazards (4 marks):

Branch instructions don't resolve until EX stage. Pipeline has already fetched wrong instructions.

Solutions: Branch prediction (static: always-taken/not-taken; dynamic: branch history buffer). Branch delay slot (MIPS). Early branch resolution in ID stage.

Structural Hazards (4 marks):

Two pipeline stages need the same hardware. Example: IF and MEM both need memory access.

Solution: Separate instruction memory and data memory (Harvard architecture). Or add pipeline registers and use separate ports.

Section I

Industry Spotlight — Amit Joshi, Freelancer Turned Hardware Startup Founder

🚀 Amit Joshi, 32 — Founder, NexaChip Technologies, Pune

Background: B.Tech in Electronics from a tier-2 college in Nagpur (not IIT, not NIT). Struggled to get placed on campus. Started freelancing on Upwork doing PCB design for ₹5,000/project in 2016.

The Journey:

2016: Started freelancing — PCB layouts, schematic designs for hobbyist clients. Monthly income: ₹15,000–₹20,000.

2017: Learned FPGA development (Verilog + Xilinx). Got a ₹2 lakh project from a Bangalore IoT startup to design a sensor data acquisition board.

2018: Started posting open-source embedded projects on GitHub. One FPGA-based signal processing project got 400+ stars. A German company noticed and offered a $50/hr remote contract.

2019: Annual freelance income crossed ₹25 LPA. Hired his first employee — a classmate from college.

2020: Founded NexaChip Technologies. Secured ₹50 lakh seed funding from a Pune-based angel investor. Focus: IoT sensor modules for agriculture (soil moisture, temperature, humidity).

2023: NexaChip has 12 employees, ₹1.5 crore annual revenue. Their IoT sensors are used by 200+ farmers in Maharashtra for smart irrigation.

Amit's Advice: "Don't wait for placements. Build something. Post it on GitHub. Write about it on LinkedIn. The hardware industry in India is starving for talent. If you can design a PCB or write Verilog, someone will pay you."

Detail	Info
Education	B.Tech ECE, tier-2 college Nagpur
First Earning	₹5,000/project (PCB design on Upwork)
Current Revenue	₹1.5 crore/year (NexaChip Technologies)
Team Size	12 employees
Key Skills	PCB Design, FPGA (Verilog), Embedded C, IoT, Business Development

Section J

Earn With It — Multiple Paths

💰 Your Earning Paths After This Course

Path 1: Freelance Hardware Consulting — Design PCBs, write embedded firmware, FPGA prototyping for startups. Platforms: Upwork, Freelancer, Toptal. Rate: ₹500–₹3,000/hr.

Path 2: Open-Source Contributions — Contribute to RISC-V, OpenROAD, LibreCores. Build reputation → get hired by open-source hardware companies.

Path 3: Technical Writing/Blogging — Write COA tutorials on Medium, Dev.to, or your own blog. Monetize with ads, affiliate links, or paid courses. Income: ₹5,000–₹50,000/month.

Path 4: YouTube Tutorials — Create "COA for GATE" or "CPU Design in Python" video series. Monetize through ads + sponsorships. Top Indian tech YouTubers earn ₹1–5 LPA from videos alone.

Path 5: GATE Coaching — Score well in GATE, then tutor juniors. Online tutoring for GATE COA: ₹500–₹2,000/hr on platforms like Unacademy, Chegg, or private tutoring.

Path 6: Hardware Startup — Like Amit Joshi, identify a problem (smart agriculture, home automation, wearables) and build an IoT product. India Semiconductor Mission provides grants for hardware startups.

Earning Path	Time to First ₹	Monthly Potential	Difficulty
GATE Tutoring	1–2 weeks	₹10,000–₹50,000	Easy
Technical Writing	2–4 weeks	₹5,000–₹30,000	Easy
Freelance Hardware	1–2 months	₹20,000–₹2,00,000	Medium
YouTube Tutorials	3–6 months	₹5,000–₹1,00,000	Medium
Hardware Startup	6–12 months	Variable (high ceiling)	Hard

Section K

Chapter Summary

🎯 Key Takeaways from Unit 9 — Capstone

1. Portfolio Projects (8): Truth Table Generator, Shift Register Simulator, Fetch-Decode-Execute, CPU Simulator, Interrupt Handler, Cache Simulator, Booth's Calculator, Pipeline Calculator — all deployed on GitHub.

2. GATE Preparation: 40 GATE-level questions covering all 8 units with full solutions. Focus areas: cache mapping, pipeline hazards, Booth's algorithm, Boolean algebra.

3. Career Paths: VLSI Design (₹6–80 LPA), Embedded Systems (₹5–40 LPA), Firmware (₹6–50 LPA), FPGA (₹7–60 LPA). Hardware pays more at senior levels.

4. Interview Prep: TCS, Infosys, Samsung, Intel, Qualcomm — each has unique interview patterns but all test COA fundamentals deeply.

5. Earning Paths: Freelance consulting, technical writing, YouTube, GATE coaching, hardware startups — multiple ways to monetize COA knowledge while still in college.

6. Units Synthesized: Unit 1 (Digital Logic) → Unit 2 (Sequential) → Unit 3 (Basic Computer) → Unit 4 (CPU) → Unit 5 (I/O) → Unit 6 (Memory) → Unit 7 (Arithmetic) → Unit 8 (Pipelining) → Unit 9 (Capstone).

Section L

Earning Checkpoint

Skill / Topic	Tool Used	Portfolio Piece	Earning Ready?
Truth Table Generator	Python	GitHub repo + README	✅ Yes — demonstrate in interviews
Shift Register Simulator	Python	GitHub repo + README	✅ Yes — shows sequential circuit understanding
Fetch-Decode-Execute	Python	GitHub repo + README	✅ Yes — core CPU knowledge
CPU Simulator	Python	GitHub repo + demo	✅ Yes — flagship portfolio project
Interrupt Handler	Python	GitHub repo + README	✅ Yes — I/O systems expertise
Cache Simulator	Python	GitHub repo + hit-ratio analysis	✅ Yes — GATE + interviews
Booth's Calculator	Python	GitHub repo + step trace	✅ Yes — arithmetic expertise
Pipeline Calculator	Python	GitHub repo + hazard detection	✅ Yes — performance analysis
GATE Preparation	Study + Practice	40 solved questions	✅ Yes — GATE-ready
Career Roadmap	LinkedIn + GitHub	Profile + Portfolio	✅ Yes — industry-ready

Minimum Viable Portfolio after this chapter: 8 GitHub repos with professional READMEs + a LinkedIn profile with hardware engineering headline + solved GATE questions = You are ready to apply to Tata Elxsi, Samsung R&D, Intel India, KPIT, Bosch, and 100+ other companies hiring hardware engineers in India.

— APPENDICES —

Appendix A

Digital Logic Quick Reference

Logic Gates — Truth Tables

A	B	AND	OR	NAND	NOR	XOR	XNOR
0	0	0	0	1	1	0	1
0	1	0	1	1	0	1	0
1	0	0	1	1	0	1	0
1	1	1	1	0	0	0	1

NOT Gate: NOT 0 = 1, NOT 1 = 0.

Flip-Flops — Characteristic Tables

Type	Inputs	Next State Q(t+1)	Key Property
SR	S=0,R=0 → Q(t); S=0,R=1 → 0; S=1,R=0 → 1; S=1,R=1 → Invalid	—	Set-Reset; S=R=1 undefined
JK	J=0,K=0 → Q(t); J=0,K=1 → 0; J=1,K=0 → 1; J=1,K=1 → Q'(t)	—	Toggle when J=K=1; no invalid state
D	D=0 → 0; D=1 → 1	Q(t+1) = D	Data/Delay; output follows input
T	T=0 → Q(t); T=1 → Q'(t)	Toggle on T=1	Used in counters

Boolean Algebra Laws

Law	AND Form	OR Form
Identity	A·1 = A	A+0 = A
Null	A·0 = 0	A+1 = 1
Idempotent	A·A = A	A+A = A
Inverse	A·A' = 0	A+A' = 1
Commutative	A·B = B·A	A+B = B+A
Associative	(A·B)·C = A·(B·C)	(A+B)+C = A+(B+C)
Distributive	A·(B+C) = A·B+A·C	A+(B·C) = (A+B)·(A+C)
Absorption	A·(A+B) = A	A+A·B = A
De Morgan's	(A·B)' = A'+B'	(A+B)' = A'·B'

Appendix B

Mano Machine Instruction Set Reference

Instruction Format (16 bits)

[I(1 bit) | Opcode(3 bits) | Address(12 bits)]

I=0: Direct addressing. I=1: Indirect addressing.

Memory-Reference Instructions (Opcode 000–110)

Opcode	Mnemonic	Operation	Micro-operations
000	AND	AC ← AC ∧ M[addr]	DR←M[AR], AC←AC∧DR
001	ADD	AC ← AC + M[addr]	DR←M[AR], AC←AC+DR, E←carry
010	LDA	AC ← M[addr]	DR←M[AR], AC←DR
011	STA	M[addr] ← AC	M[AR]←AC
100	BUN	PC ← addr	PC←AR
101	BSA	M[addr]←PC, PC←addr+1	M[AR]←PC, AR←AR+1, PC←AR
110	ISZ	M[addr]++, skip if zero	DR←M[AR], DR←DR+1, M[AR]←DR, if DR=0: PC←PC+1

Register-Reference Instructions (Opcode 111, I=0)

Bit	Mnemonic	Operation
B11	CLA	AC ← 0
B10	CLE	E ← 0
B9	CMA	AC ← AC'
B8	CME	E ← E'
B7	CIR	Circular right shift (E, AC)
B6	CIL	Circular left shift (E, AC)
B5	INC	AC ← AC + 1
B4	SPA	Skip if AC positive (AC[15]=0)
B3	SNA	Skip if AC negative (AC[15]=1)
B2	SZA	Skip if AC = 0
B1	SZE	Skip if E = 0
B0	HLT	Halt computer

Input-Output Instructions (Opcode 111, I=1)

Bit	Mnemonic	Operation
B11	INP	AC(0-7) ← INPR, FGI ← 0
B10	OUT	OUTR ← AC(0-7), FGO ← 0
B9	SKI	Skip if FGI = 1
B8	SKO	Skip if FGO = 1
B7	ION	IEN ← 1 (enable interrupts)
B6	IOF	IEN ← 0 (disable interrupts)

Appendix C

Cache Formulae Cheat Sheet

Core Formulae

Formula	Description
`Number of blocks = Cache Size / Block Size`	Total cache blocks
`Offset bits = log₂(Block Size)`	Bits to address bytes within a block
`Index bits = log₂(Number of Sets)`	Bits to select cache set
`Tag bits = Address bits - Index bits - Offset bits`	Remaining bits for tag comparison
`Number of Sets = Number of Blocks / Associativity`	Sets in set-associative cache
`Hit Ratio = Hits / Total Accesses`	Cache performance metric
`Miss Ratio = 1 - Hit Ratio`	Cache miss frequency
`AMAT = Hit Time + (Miss Rate × Miss Penalty)`	Average Memory Access Time
`Effective CPI = Base CPI + (Memory accesses per instruction × Miss Rate × Miss Penalty)`	CPI with memory stalls

Mapping-Specific Formulae

Mapping	Set Formula	Index Bits	Comparators
Direct-Mapped	Line = Block mod N	log₂(N)	1
Fully-Associative	Any line	0 (no index)	N (all lines)
k-way Set-Associative	Set = Block mod (N/k)	log₂(N/k)	k

Numerical Example

Given: Memory = 4GB (32-bit address), Cache = 64KB, Block = 64B, 4-way set-associative.

Blocks = 64KB/64B = 1024. Sets = 1024/4 = 256.

Offset = log₂(64) = 6 bits. Index = log₂(256) = 8 bits. Tag = 32 - 8 - 6 = 18 bits.

Address: [Tag: 18 | Index: 8 | Offset: 6] = 32 bits ✓

Appendix D

Booth's & Division Algorithm Quick Reference

Booth's Algorithm — Step by Step

Example: -3 × 5 (using 5-bit representation)

M = -3 = 11101, Q = 5 = 00101, -M = 00011

Step	A	Q	Q₋₁	Action
Init	00000	00101	0	—
1	00011	00101	0	Q₀Q₋₁=10 → A=A-M=A+(-M)
ASR	00001	10010	1	Arithmetic Shift Right
2	11110	10010	1	Q₀Q₋₁=01 → A=A+M
ASR	11111	01001	0	Arithmetic Shift Right
3	00010	01001	0	Q₀Q₋₁=10 → A=A-M
ASR	00001	00100	1	Arithmetic Shift Right
4	11110	00100	1	Q₀Q₋₁=01 → A=A+M
ASR	11111	00010	0	Arithmetic Shift Right
5	11111	00010	0	Q₀Q₋₁=00 → No op
ASR	11111	10001	0	Arithmetic Shift Right

Result: AQ = 11111 10001 = -15 in 10-bit 2's complement ✓ (-3 × 5 = -15)

Restoring Division Algorithm

Steps: 1) Shift left AQ. 2) A = A - M. 3) If A < 0: restore A = A + M, Q₀ = 0. If A ≥ 0: Q₀ = 1. 4) Repeat n times.

Non-Restoring Division Algorithm

Steps: 1) Shift left AQ. 2) If A < 0: A = A + M. If A ≥ 0: A = A - M. 3) If A ≥ 0: Q₀ = 1. Else: Q₀ = 0. 4) Repeat n times. 5) If A < 0 at end: A = A + M (final restore).

Common Pitfall in Booth's: Forgetting that the arithmetic shift right preserves the sign bit of A. If A is negative (MSB=1), after ASR the MSB stays 1. This is NOT a logical shift!

Appendix E

Top 40 GATE COA Q&A

Unit 1: Digital Logic (Q1–Q5)

Q1: Minimum NAND gates for F = AB + CD?
(A) 3 (B) 4 (C) 5 (D) 6
✅ (C) 5 — (AB)' needs 1 NAND, (CD)' needs 1 NAND, final output needs 1 NAND on these two outputs. But AB needs 1 NAND inverted by another = 2+2+1 = 5.

Q2: JK flip-flop with J=K=1 acts as? ✅ T flip-flop (toggles on every clock).

Q3: Number of minterms in a 4-variable Boolean function? ✅ 16 (2⁴ = 16).

Q4: Universal gates are? ✅ NAND and NOR — can implement any Boolean function.

Q5: Karnaugh map simplification of F(A,B,C) = Σm(0,1,2,4,5)? ✅ A' + B'C'

Unit 2: Sequential Circuits (Q6–Q10)

Q6: Flip-flops needed for mod-12 counter? ✅ 4 (⌈log₂12⌉ = 4).

Q7: D flip-flop with D tied to Q': behaves as? ✅ T flip-flop with T=1 (toggles every clock).

Q8: Ring counter with 8 flip-flops can count up to? ✅ 8 states (one-hot encoding).

Q9: Johnson counter with 4 flip-flops counts? ✅ 8 states (2n states for n flip-flops).

Q10: A shift register can be used for? ✅ Serial-to-parallel conversion, delay, sequence generation.

Unit 3: Basic Computer Organization (Q11–Q15)

Q11: In Mano's basic computer, the instruction AND has opcode? ✅ 000.

Q12: The BSA instruction is used for? ✅ Subroutine call — saves return address and jumps.

Q13: Register AR in Mano's computer holds? ✅ Memory address (12 bits, connects to address bus).

Q14: ISZ instruction does? ✅ Increments memory word, skips next instruction if result is zero.

Q15: In the interrupt cycle, IEN is set to? ✅ 0 (interrupts disabled during ISR to prevent nesting).

Unit 4: CPU (Q16–Q20)

Q16: Hardwired control is faster than microprogrammed because? ✅ Direct combinational logic generation of control signals vs. sequential memory reads.

Q17: Register indirect addressing: effective address is? ✅ Content of the register specified in the instruction.

Q18: RISC processors typically execute most instructions in? ✅ One clock cycle.

Q19: Condition code register stores? ✅ Flags: Zero, Carry, Sign, Overflow from ALU operations.

Q20: Stack-based addressing is used in? ✅ Zero-address instructions (operands implicit on stack).

Unit 5: I/O Organization (Q21–Q25)

Q21: DMA transfers data between? ✅ I/O devices and memory without CPU intervention.

Q22: In daisy-chain priority, the device closest to CPU has? ✅ Highest priority.

Q23: Cycle stealing in DMA means? ✅ DMA borrows one bus cycle from CPU when needed.

Q24: Vectored interrupt provides? ✅ Direct address of ISR, avoiding polling.

Q25: Asynchronous data transfer uses? ✅ Handshaking protocol (request-acknowledge signals).

Unit 6: Memory Organization (Q26–Q30)

Q26: Spatial locality means? ✅ If address X is accessed, addresses near X are likely to be accessed soon.

Q27: LRU replacement policy replaces? ✅ The block that hasn't been used for the longest time.

Q28: Write-back cache policy writes to main memory only when? ✅ The modified block is replaced (evicted) from cache.

Q29: Virtual memory uses? ✅ Page table to translate virtual addresses to physical addresses.

Q30: TLB (Translation Lookaside Buffer) is a cache for? ✅ Page table entries to speed up address translation.

Unit 7: Computer Arithmetic (Q31–Q35)

Q31: In Booth's algorithm, Q₀Q₋₁ = "01" means? ✅ Add multiplicand to accumulator (A = A + M).

Q32: Overflow in 2's complement addition occurs when? ✅ Two positive numbers give negative result or two negative give positive.

Q33: IEEE 754 single precision uses how many bits for exponent? ✅ 8 bits (with bias of 127).

Q34: Carry look-ahead adder advantage over ripple carry? ✅ O(log n) delay vs O(n) delay for n-bit addition.

Q35: Non-restoring division differs from restoring division in that? ✅ It doesn't restore the partial remainder; instead adds or subtracts based on sign of A.

Unit 8: Pipelining (Q36–Q40)

Q36: Maximum speedup of a k-stage pipeline (n→∞) is? ✅ k (approaches number of stages).

Q37: Forwarding (bypassing) solves which hazard? ✅ RAW data hazard — by forwarding result from EX/MEM stage to dependent instruction.

Q38: Branch prediction accuracy of 90% means? ✅ 10% of branches cause pipeline flush — penalty depends on pipeline depth.

Q39: Superscalar processor can issue? ✅ Multiple instructions per clock cycle using multiple functional units.

Q40: Pipeline bubble (NOP insertion) is used when? ✅ A hazard cannot be resolved by forwarding; the pipeline must stall for one or more cycles.

Appendix F

CPU Simulator Portfolio Checklist

Project	Code Done	GitHub Repo	README	Screenshots	Unit Tests	Live Demo	LinkedIn Post
1. Truth Table Generator	☐	☐	☐	☐	☐	☐	☐
2. Shift Register Simulator	☐	☐	☐	☐	☐	☐	☐
3. Fetch-Decode-Execute	☐	☐	☐	☐	☐	☐	☐
4. CPU Simulator ⭐	☐	☐	☐	☐	☐	☐	☐
5. Interrupt Handler	☐	☐	☐	☐	☐	☐	☐
6. Cache Simulator	☐	☐	☐	☐	☐	☐	☐
7. Booth's Calculator	☐	☐	☐	☐	☐	☐	☐
8. Pipeline Calculator	☐	☐	☐	☐	☐	☐	☐

Pro tip: Complete the ⭐ CPU Simulator first — it's the highest-impact project. Then work outward. Each project should take 2–5 hours. You can complete the entire portfolio in one focused weekend.

Appendix G

EduArtha COA Learning Path Map

Unit	Topic	Est. Hours	Key Projects	GATE Weightage
Unit 1	Digital Logic & Boolean Algebra	8–10 hrs	Truth Table Generator	★★★★☆ (8–10 marks)
Unit 2	Combinational & Sequential Circuits	8–10 hrs	Shift Register Simulator	★★★☆☆ (5–7 marks)
Unit 3	Basic Computer Organization	10–12 hrs	Fetch-Decode-Execute Simulator	★★★★☆ (8–10 marks)
Unit 4	Central Processing Unit	10–12 hrs	CPU Simulator ⭐	★★★★★ (10–12 marks)
Unit 5	I/O Organization	6–8 hrs	Priority Interrupt Handler	★★★☆☆ (5–7 marks)
Unit 6	Memory Organization	10–12 hrs	Cache Simulator	★★★★★ (10–15 marks)
Unit 7	Computer Arithmetic	6–8 hrs	Booth's Calculator	★★★☆☆ (5–8 marks)
Unit 8	Pipelining & Advanced Topics	8–10 hrs	Pipeline Calculator	★★★★☆ (8–10 marks)
Unit 9	Capstone & Career Launchpad	10–12 hrs	Complete Portfolio + GATE Prep	Synthesizes all units

Total Estimated Time: 76–94 hours | Total GATE Weightage: ~60–80 marks (out of 100 for COA section)

Final Goal: Complete all 9 units → Build 8 portfolio projects on GitHub → Solve 40 GATE questions → Write LinkedIn profile → Apply to hardware companies. You are now GATE-Ready + Industry-Ready + Portfolio-Ready.

✅ Computer Organization: COMPLETE! You are GATE-Ready & Industry-Ready.

[QR: Link to EduArtha video tutorial — COA Capstone & CPU Simulator]