Computer Organization & Architecture

Unit 9: Capstone โ€” CPU Simulator Portfolio & Career Launchpad

Synthesize all 8 units into portfolio-ready projects, master GATE preparation, and launch your hardware engineering career.

โฑ Time to Complete: 10โ€“12 hours  |  ๐Ÿ’ฐ Salary Range: โ‚น6โ€“80 LPA (hardware pays more!)  |  ๐Ÿ“ 30 MCQs (Bloom's Mapped)

๐Ÿ’ผ Jobs this unlocks: VLSI Engineer (โ‚น6โ€“25 LPA)  |  Embedded Systems (โ‚น5โ€“18 LPA)  |  FPGA Developer (โ‚น7โ€“22 LPA)  |  Firmware Engineer (โ‚น6โ€“20 LPA)

Section A

Opening Hook โ€” The BCA Student Who Built a CPU

๐Ÿ–ฅ๏ธ How Rohan Went from โ‚น0 to โ‚น8 LPA by Building a CPU Simulator on GitHub

Rohan Mehta was a 3rd-year BCA student at a tier-3 college in Indore. No IIT, no NIT, no connections in the tech industry. His COA professor gave an assignment: "Build something that demonstrates CPU concepts." Most students submitted a 2-page Word document. Rohan built a complete CPU simulator in Python.

His simulator had 8 registers, a 10-instruction ISA, a fetch-decode-execute pipeline, and cache simulation. He spent 3 weeks on it โ€” debugging late into the night, learning Git, writing proper documentation. He pushed the entire project to GitHub with a polished README, screenshots, and usage instructions.

Six months later, a recruiter at Tata Elxsi found Rohan's GitHub profile while searching for "CPU simulator Python." The recruiter was impressed โ€” not by fancy credentials, but by the fact that this student actually built something. Rohan cleared a technical interview on computer architecture, got an offer for โ‚น8 LPA as an Embedded Systems Engineer, and started working on automotive ECU software for a German car manufacturer.

What if YOU built this? This chapter gives you everything you need: 8 portfolio projects (one per unit), GATE preparation, career roadmaps, and interview prep for top hardware companies.

๐Ÿ‡ฎ๐Ÿ‡ณ Tata Elxsi๐Ÿ‡ฎ๐Ÿ‡ณ Intel India๐Ÿ‡ฎ๐Ÿ‡ณ Samsung R&D๐Ÿ‡ฎ๐Ÿ‡ณ Qualcomm Hyderabad๐Ÿ‡ฎ๐Ÿ‡ณ ISRO๐Ÿ‡ฎ๐Ÿ‡ณ Texas Instruments
India's semiconductor industry is projected to reach $63 billion by 2026 (India Semiconductor Mission, MeitY). The Indian government has approved โ‚น76,000 crore ($10B) in incentives for semiconductor manufacturing. Companies like Micron (โ‚น22,500 crore fab in Gujarat), Tata Electronics (OSAT facility in Assam), and CG Power are setting up chip facilities. This means thousands of new hardware engineering jobs that require exactly the COA knowledge you've built in Units 1โ€“8.
Section B

Learning Outcomes โ€” Bloom's Taxonomy Mapped (12 Outcomes)

Bloom's LevelLearning Outcome
๐Ÿ”ต RememberLO1: Recall all 7 basic logic gates, 4 flip-flop types (SR, JK, D, T), and their truth tables from Unit 1
๐Ÿ”ต RememberLO2: List the stages of the fetch-decode-execute cycle and name the 12 Mano machine register-reference instructions from Units 3โ€“4
๐Ÿ”ต UnderstandLO3: Explain how the fetch-decode-execute cycle works, including the role of PC, MAR, MBR, IR, and the control unit from Unit 3
๐Ÿ”ต UnderstandLO4: Describe the memory hierarchy (registers โ†’ cache โ†’ RAM โ†’ disk) and explain why each level trades speed for capacity from Unit 6
๐ŸŸข ApplyLO5: Build a working CPU simulator in Python with 8 registers, a basic ISA, and execute a 10-instruction program from Unit 4
๐ŸŸข ApplyLO6: Implement Booth's multiplication algorithm for signed binary numbers and verify results step-by-step from Unit 7
๐ŸŸข AnalyzeLO7: Compare direct-mapped, fully-associative, and set-associative cache mapping techniques using hit-ratio analysis from Unit 6
๐ŸŸข AnalyzeLO8: Analyze data, control, and structural hazards in a 5-stage pipeline and determine appropriate forwarding/stalling solutions from Unit 8
๐ŸŸ  EvaluateLO9: Evaluate ISA design choices (RISC vs CISC, fixed vs variable length) and justify which is optimal for given application scenarios from Unit 4
๐ŸŸ  EvaluateLO10: Assess interrupt handling strategies (polling, vectored, daisy-chain, priority) and recommend the best approach for real-time systems from Unit 5
๐ŸŸ  CreateLO11: Design a complete 8-bit CPU architecture with ALU, registers, control unit, and memory interface โ€” documented with block diagrams
๐ŸŸ  CreateLO12: Create a GitHub portfolio of 8 COA projects, deploy to GitHub Pages, and write LinkedIn-ready project descriptions for job applications
Section C

Concepts โ€” Portfolio Projects, GATE Prep & Career Guidance

This capstone synthesizes all 8 units of Computer Organization into portfolio-ready projects. For each project, you get: a description, complete Python code, and a README template. Build all 8 and you'll have a GitHub portfolio that gets you hired.

๐Ÿ”ง Portfolio Project 1: Truth Table Generator (Unit 1 โ€” Digital Logic)

๐Ÿ“‹ Project Description

What it does: Takes any Boolean expression (e.g., A AND B OR NOT C) and generates the complete truth table with all possible input combinations. Supports AND, OR, NOT, NAND, NOR, XOR, XNOR gates.

What it demonstrates: Understanding of Boolean algebra, logic gates, and combinational logic from Unit 1. Shows you can convert theory into working software.

Difficulty: Beginner | Time: 2โ€“3 hours

Python
# truth_table_generator.py โ€” Portfolio Project 1 (Unit 1: Digital Logic)
import itertools

def gate_and(a, b):  return a & b
def gate_or(a, b):   return a | b
def gate_not(a):     return 1 - a
def gate_nand(a, b): return 1 - (a & b)
def gate_nor(a, b):  return 1 - (a | b)
def gate_xor(a, b):  return a ^ b
def gate_xnor(a, b): return 1 - (a ^ b)

GATES = {'AND': gate_and, 'OR': gate_or, 'NOT': gate_not,
         'NAND': gate_nand, 'NOR': gate_nor,
         'XOR': gate_xor, 'XNOR': gate_xnor}

def evaluate_expression(expr, variables, values):
    """Evaluate a Boolean expression with given variable assignments."""
    env = dict(zip(variables, values))
    tokens = expr.upper().split()
    # Simple recursive-descent parser for: NOT A, A AND B, A OR B, etc.
    result = env.get(tokens[0], 0)
    if tokens[0] == 'NOT':
        result = gate_not(env.get(tokens[1], 0))
        tokens = tokens[2:]
    else:
        tokens = tokens[1:]
    i = 0
    while i < len(tokens) - 1:
        op, operand = tokens[i], tokens[i+1]
        val = env.get(operand, 0)
        if op in GATES:
            result = GATES[op](result, val)
        i += 2
    return result

def generate_truth_table(expression, variables):
    """Generate and print full truth table."""
    header = ' | '.join(variables) + ' | OUTPUT'
    print(header)
    print('-' * len(header))
    for combo in itertools.product([0, 1], repeat=len(variables)):
        result = evaluate_expression(expression, variables, combo)
        row = ' | '.join(str(v) for v in combo)
        print(f"{row} |   {result}")

# Example usage
print("=== Truth Table: A AND B OR NOT C ===")
generate_truth_table("A AND B", ['A', 'B'])
README.md
# ๐Ÿ”ข Truth Table Generator
**Portfolio Project 1 โ€” COA Unit 1: Digital Logic**

## Overview
A Python tool that generates complete truth tables for any Boolean expression
using standard logic gates (AND, OR, NOT, NAND, NOR, XOR, XNOR).

## Features
- Supports all 7 basic logic gates
- Generates truth tables for any number of variables
- Clean formatted output suitable for documentation

## Usage
```bash
python truth_table_generator.py
```

## Concepts Demonstrated
- Boolean Algebra fundamentals
- Combinational logic gate operations
- Truth table construction

## Author
[Your Name] | BCA Student | COA Portfolio Project
Make it interactive! Add a command-line interface using Python's argparse module so users can type expressions directly. This small touch makes your GitHub project look professional and usable โ€” exactly what recruiters want to see.

๐Ÿ”ง Portfolio Project 2: 8-bit Shift Register Simulator (Unit 2 โ€” Sequential Circuits)

๐Ÿ“‹ Project Description

What it does: Simulates an 8-bit shift register supporting SISO (Serial-In Serial-Out), SIPO (Serial-In Parallel-Out), PISO (Parallel-In Serial-Out), and PIPO (Parallel-In Parallel-Out) operations. Shows bit movement clock-by-clock.

What it demonstrates: Understanding of sequential circuits, flip-flops, clock-driven operations, and register behavior from Unit 2.

Difficulty: Beginner | Time: 2โ€“3 hours

Python
# shift_register_simulator.py โ€” Portfolio Project 2 (Unit 2: Sequential Circuits)

class ShiftRegister:
    def __init__(self, size=8):
        self.size = size
        self.register = [0] * size
        self.clock = 0

    def display(self):
        bits = ''.join(str(b) for b in self.register)
        print(f"  Clock {self.clock:2d}: [{bits}]  (Decimal: {int(bits, 2)})")

    def shift_left(self, input_bit=0):
        """Shift all bits left; new bit enters from right."""
        self.clock += 1
        output = self.register[0]
        self.register = self.register[1:] + [input_bit]
        return output

    def shift_right(self, input_bit=0):
        """Shift all bits right; new bit enters from left."""
        self.clock += 1
        output = self.register[-1]
        self.register = [input_bit] + self.register[:-1]
        return output

    def parallel_load(self, data):
        """Load 8 bits simultaneously (PIPO)."""
        self.clock += 1
        self.register = [int(b) for b in f"{data:08b}"]

    def serial_in(self, bit_stream):
        """SISO: Feed bits one at a time, shift right."""
        print("  SISO โ€” Serial In, Serial Out:")
        self.display()
        for bit in bit_stream:
            self.shift_right(bit)
            self.display()

# Demo
sr = ShiftRegister(8)
print("=== 8-Bit Shift Register Simulator ===")
print("\n--- Parallel Load: 10110010 ---")
sr.parallel_load(0b10110010)
sr.display()
print("\n--- Shift Left x3 ---")
for _ in range(3):
    sr.shift_left(1)
    sr.display()
print("\n--- Serial In: [1,0,1,1] ---")
sr2 = ShiftRegister(8)
sr2.serial_in([1, 0, 1, 1])
README.md
# ๐Ÿ“Ÿ 8-Bit Shift Register Simulator
**Portfolio Project 2 โ€” COA Unit 2: Sequential Circuits**

## Overview
Simulates all 4 types of shift registers: SISO, SIPO, PISO, PIPO.
Visualizes bit movement clock-by-clock.

## Features
- Left shift, Right shift operations
- Parallel load (PIPO)
- Serial input with visual output
- Clock cycle tracking

## Concepts Demonstrated
- Sequential circuit behavior
- Flip-flop based storage
- Clock-driven data movement
Shift registers are everywhere! Your keyboard uses a shift register (74HC165) to read key presses. UART serial communication uses shift registers to convert parallel data to serial. Even the SPI protocol that connects sensors to Arduino is a shift register protocol.

๐Ÿ”ง Portfolio Project 3: Fetch-Decode-Execute Simulator (Unit 3 โ€” Basic Computer Organization)

๐Ÿ“‹ Project Description

What it does: Simulates the complete instruction cycle of a basic computer. Shows how the Program Counter (PC), Memory Address Register (MAR), Memory Buffer Register (MBR), and Instruction Register (IR) work together to fetch, decode, and execute instructions.

What it demonstrates: The fundamental operation of any computer โ€” the fetch-decode-execute cycle โ€” from Unit 3.

Difficulty: Intermediate | Time: 3โ€“4 hours

Python
# fetch_decode_execute.py โ€” Portfolio Project 3 (Unit 3: Basic Computer Org)

class BasicComputer:
    def __init__(self, memory_size=256):
        self.memory = [0] * memory_size
        self.PC = 0      # Program Counter
        self.MAR = 0     # Memory Address Register
        self.MBR = 0     # Memory Buffer Register
        self.IR = 0      # Instruction Register
        self.AC = 0      # Accumulator
        self.halted = False
        # Opcodes: 0=HLT, 1=LDA, 2=STA, 3=ADD, 4=SUB, 5=JMP
        self.opcodes = {0:'HLT', 1:'LDA', 2:'STA', 3:'ADD', 4:'SUB', 5:'JMP'}

    def load_program(self, program, start=0):
        for i, instr in enumerate(program):
            self.memory[start + i] = instr

    def fetch(self):
        self.MAR = self.PC
        self.MBR = self.memory[self.MAR]
        self.IR = self.MBR
        self.PC += 1
        print(f"  FETCH:  PC={self.PC-1} โ†’ MAR={self.MAR} โ†’ MBR={self.MBR:#06x} โ†’ IR={self.IR:#06x}")

    def decode_execute(self):
        opcode = (self.IR >> 8) & 0xFF
        operand = self.IR & 0xFF
        name = self.opcodes.get(opcode, '???')
        print(f"  DECODE: opcode={name}({opcode}), operand={operand}")
        if opcode == 0:  # HLT
            self.halted = True
        elif opcode == 1:  # LDA addr
            self.AC = self.memory[operand]
        elif opcode == 2:  # STA addr
            self.memory[operand] = self.AC
        elif opcode == 3:  # ADD addr
            self.AC += self.memory[operand]
        elif opcode == 4:  # SUB addr
            self.AC -= self.memory[operand]
        elif opcode == 5:  # JMP addr
            self.PC = operand
        print(f"  EXEC:   AC={self.AC}, PC={self.PC}")

    def run(self):
        cycle = 0
        while not self.halted and cycle < 100:
            print(f"\n--- Cycle {cycle} ---")
            self.fetch()
            self.decode_execute()
            cycle += 1

# Program: Load 5, Add 3, Store result, Halt
# Memory[50]=5, Memory[51]=3
comp = BasicComputer()
comp.memory[50] = 5
comp.memory[51] = 3
program = [0x0132, 0x0333, 0x0234, 0x0000]  # LDA 50, ADD 51, STA 52, HLT
comp.load_program(program)
comp.run()
print(f"\nResult at Memory[52] = {comp.memory[52]}")
README.md
# ๐Ÿ”„ Fetch-Decode-Execute Simulator
**Portfolio Project 3 โ€” COA Unit 3: Basic Computer Organization**

## Overview
Simulates the fundamental instruction cycle of a von Neumann computer.
Visualizes PC, MAR, MBR, IR, and AC register changes at each step.

## Supported Instructions
| Opcode | Mnemonic | Description          |
|--------|----------|----------------------|
| 0      | HLT      | Halt execution       |
| 1      | LDA addr | Load AC from memory  |
| 2      | STA addr | Store AC to memory   |
| 3      | ADD addr | Add to AC            |
| 4      | SUB addr | Subtract from AC     |
| 5      | JMP addr | Jump to address      |

## Concepts Demonstrated
- Von Neumann architecture
- Instruction cycle (fetch โ†’ decode โ†’ execute)
- Register transfer operations

๐Ÿ”ง Portfolio Project 4: Simple CPU Simulator (Unit 4 โ€” Central Processing Unit)

๐Ÿ“‹ Project Description โ€” The Star of Your Portfolio

What it does: A complete CPU simulator with 8 general-purpose registers (R0โ€“R7), a 10-instruction ISA (LOAD, STORE, ADD, SUB, MUL, AND, OR, NOT, JMP, HLT), program memory, data memory, and step-by-step execution trace. This is the project that gets you noticed by recruiters.

What it demonstrates: Complete understanding of CPU architecture, instruction set design, register file management, and ALU operations from Unit 4.

Difficulty: Advanced | Time: 5โ€“8 hours

Python
# cpu_simulator.py โ€” Portfolio Project 4 (Unit 4: CPU Architecture)
# 8 registers (R0-R7), 10-instruction ISA, step-by-step execution

class CPUSimulator:
    def __init__(self):
        self.registers = [0] * 8   # R0-R7
        self.memory = [0] * 256    # 256 bytes data memory
        self.program = []            # Instruction list
        self.PC = 0
        self.halted = False
        self.cycle = 0
        self.ISA = {
            'LOAD': self._load, 'STORE': self._store,
            'ADD':  self._add,  'SUB':   self._sub,
            'MUL':  self._mul,  'AND':   self._and,
            'OR':   self._or,   'NOT':   self._not,
            'JMP':  self._jmp,  'HLT':   self._hlt,
        }

    def _load(self, rd, val, _):
        self.registers[rd] = val

    def _store(self, rs, addr, _):
        self.memory[addr] = self.registers[rs]

    def _add(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] + self.registers[rs2]

    def _sub(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] - self.registers[rs2]

    def _mul(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] * self.registers[rs2]

    def _and(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] & self.registers[rs2]

    def _or(self, rd, rs1, rs2):
        self.registers[rd] = self.registers[rs1] | self.registers[rs2]

    def _not(self, rd, rs, _):
        self.registers[rd] = ~self.registers[rs] & 0xFF

    def _jmp(self, addr, _, __):
        self.PC = addr - 1  # -1 because PC increments after

    def _hlt(self, *args):
        self.halted = True

    def load_program(self, instructions):
        self.program = instructions

    def dump_registers(self):
        regs = ' '.join(f"R{i}={v}" for i, v in enumerate(self.registers))
        print(f"  Registers: {regs}")

    def execute(self):
        print("=== CPU Simulator โ€” Execution Trace ===")
        while not self.halted and self.PC < len(self.program):
            instr = self.program[self.PC]
            op = instr[0]
            args = instr[1:] + [0] * (3 - len(instr[1:]))
            print(f"\n  Cycle {self.cycle} | PC={self.PC} | {op} {args}")
            self.ISA[op](*args)
            self.dump_registers()
            self.PC += 1
            self.cycle += 1
        print(f"\n=== Halted after {self.cycle} cycles ===")

# Sample 10-instruction program: Compute (5+3)*2 and store
cpu = CPUSimulator()
cpu.load_program([
    ['LOAD', 0, 5],       # R0 = 5
    ['LOAD', 1, 3],       # R1 = 3
    ['ADD',  2, 0, 1],    # R2 = R0 + R1 = 8
    ['LOAD', 3, 2],       # R3 = 2
    ['MUL',  4, 2, 3],    # R4 = R2 * R3 = 16
    ['STORE',4, 100],     # Memory[100] = R4
    ['LOAD', 5, 255],     # R5 = 255
    ['AND',  6, 4, 5],    # R6 = R4 AND R5
    ['NOT',  7, 6],       # R7 = NOT R6
    ['HLT'],               # Halt
])
cpu.execute()
README.md
# ๐Ÿ–ฅ๏ธ Simple CPU Simulator
**Portfolio Project 4 โ€” COA Unit 4: Central Processing Unit**

## Overview
A complete CPU simulator with 8 general-purpose registers, a 10-instruction
ISA, and step-by-step execution trace. The flagship project of this portfolio.

## Instruction Set Architecture (ISA)
| Instruction | Format              | Description              |
|-------------|---------------------|--------------------------|
| LOAD        | LOAD Rd, imm        | Load immediate to Rd     |
| STORE       | STORE Rs, addr      | Store Rs to memory       |
| ADD         | ADD Rd, Rs1, Rs2    | Rd = Rs1 + Rs2           |
| SUB         | SUB Rd, Rs1, Rs2    | Rd = Rs1 - Rs2           |
| MUL         | MUL Rd, Rs1, Rs2    | Rd = Rs1 * Rs2           |
| AND         | AND Rd, Rs1, Rs2    | Rd = Rs1 AND Rs2         |
| OR          | OR  Rd, Rs1, Rs2    | Rd = Rs1 OR Rs2          |
| NOT         | NOT Rd, Rs          | Rd = NOT Rs              |
| JMP         | JMP addr            | Jump to address          |
| HLT         | HLT                 | Halt execution           |

## Concepts Demonstrated
- CPU architecture (registers, ALU, control unit)
- Instruction Set Architecture design
- Program execution and instruction cycle
This is exactly the kind of project that got Rohan hired at Tata Elxsi. Indian hardware companies like Tata Elxsi, KPIT Technologies, and Sasken actively search GitHub for students who've built CPU simulators. It proves you understand hardware at a level beyond textbook theory.

๐Ÿ”ง Portfolio Project 5: Priority Interrupt Handler (Unit 5 โ€” I/O Organization)

๐Ÿ“‹ Project Description

What it does: Simulates a priority-based interrupt handling system with multiple devices, priority levels, interrupt queue, and ISR (Interrupt Service Routine) dispatch. Demonstrates daisy-chain priority and vectored interrupts.

Difficulty: Intermediate | Time: 3โ€“4 hours

Python
# priority_interrupt_handler.py โ€” Project 5 (Unit 5: I/O Organization)
import heapq

class Interrupt:
    def __init__(self, device, priority, arrival):
        self.device = device
        self.priority = priority  # Lower number = higher priority
        self.arrival = arrival
    def __lt__(self, other):
        return self.priority < other.priority

class InterruptController:
    def __init__(self):
        self.queue = []  # Min-heap (priority queue)
        self.current = None
        self.stack = []   # For nested interrupts
        self.log = []

    def raise_interrupt(self, device, priority, time):
        intr = Interrupt(device, priority, time)
        heapq.heappush(self.queue, intr)
        self.log.append(f"  T={time}: IRQ raised by {device} (priority {priority})")

    def handle_interrupts(self):
        print("=== Priority Interrupt Handler ===")
        while self.queue:
            intr = heapq.heappop(self.queue)
            if self.current and intr.priority < self.current.priority:
                self.stack.append(self.current)
                print(f"  Preempting {self.current.device} for {intr.device}")
            self.current = intr
            print(f"  Servicing: {intr.device} | Priority: {intr.priority} | ISR executing...")
            if self.stack:
                resumed = self.stack.pop()
                print(f"  Resuming: {resumed.device}")
                self.current = resumed
        print("  All interrupts serviced.\n")

# Demo: Multiple devices with different priorities
ic = InterruptController()
ic.raise_interrupt("Keyboard",  3, 1)
ic.raise_interrupt("Disk",      2, 2)
ic.raise_interrupt("Timer",     0, 3)   # Highest priority
ic.raise_interrupt("Network",   1, 4)
ic.raise_interrupt("Printer",   4, 5)
ic.handle_interrupts()
README.md
# โšก Priority Interrupt Handler
**Portfolio Project 5 โ€” COA Unit 5: I/O Organization**

## Overview
Simulates priority-based interrupt handling with preemption, nested
interrupts, and ISR dispatch using a min-heap priority queue.

## Concepts Demonstrated
- Priority interrupt system (hardware-level)
- Interrupt service routines (ISR)
- Preemption and nested interrupt handling
- Daisy-chain priority encoding

๐Ÿ”ง Portfolio Project 6: Cache Simulator (Unit 6 โ€” Memory Organization)

๐Ÿ“‹ Project Description

What it does: Simulates three cache mapping techniques: Direct-Mapped, Fully-Associative, and Set-Associative. Processes a sequence of memory addresses and computes hit ratio, miss ratio, and replacement decisions (LRU for associative caches).

Difficulty: Advanced | Time: 4โ€“6 hours

Python
# cache_simulator.py โ€” Project 6 (Unit 6: Memory Organization)

class CacheSimulator:
    def __init__(self, cache_size, block_size, assoc, mapping):
        self.cache_size = cache_size
        self.block_size = block_size
        self.assoc = assoc  # 1=direct, cache_size//block_size=fully, else set
        self.mapping = mapping
        self.num_blocks = cache_size // block_size
        self.num_sets = self.num_blocks // assoc
        self.cache = [[None] * assoc for _ in range(self.num_sets)]
        self.lru = [[0] * assoc for _ in range(self.num_sets)]
        self.hits = 0
        self.misses = 0
        self.time = 0

    def access(self, address):
        self.time += 1
        block_num = address // self.block_size
        set_index = block_num % self.num_sets
        tag = block_num // self.num_sets
        # Check for hit
        for i in range(self.assoc):
            if self.cache[set_index][i] == tag:
                self.hits += 1
                self.lru[set_index][i] = self.time
                return "HIT"
        # Miss โ€” find LRU slot to replace
        self.misses += 1
        lru_idx = self.lru[set_index].index(min(self.lru[set_index]))
        self.cache[set_index][lru_idx] = tag
        self.lru[set_index][lru_idx] = self.time
        return "MISS"

    def simulate(self, addresses):
        print(f"=== Cache Simulator ({self.mapping}) ===")
        print(f"  Size: {self.cache_size}B | Block: {self.block_size}B | Sets: {self.num_sets} | Assoc: {self.assoc}-way")
        for addr in addresses:
            result = self.access(addr)
            print(f"  Addr {addr:3d} โ†’ Block {addr//self.block_size:2d} โ†’ Set {(addr//self.block_size)%self.num_sets} โ†’ {result}")
        total = self.hits + self.misses
        print(f"  Hit Ratio: {self.hits}/{total} = {self.hits/total*100:.1f}%")

# Demo: Compare all three mapping types
addrs = [0, 4, 8, 12, 16, 0, 4, 20, 0, 8, 24, 12, 0, 4]
print()
CacheSimulator(16, 4, 1, "Direct-Mapped").simulate(addrs)
print()
CacheSimulator(16, 4, 4, "Fully-Associative").simulate(addrs)
print()
CacheSimulator(16, 4, 2, "2-Way Set-Associative").simulate(addrs)
Cache simulator projects are a GATE favourite. GATE CSE asks 2โ€“3 questions on cache mapping every year. By building this simulator, you don't just memorize formulas โ€” you see why direct-mapped has more conflict misses and why set-associative is the sweet spot.

๐Ÿ”ง Portfolio Project 7: Booth's Multiplication Calculator (Unit 7 โ€” Computer Arithmetic)

๐Ÿ“‹ Project Description

What it does: Implements Booth's algorithm for signed binary multiplication. Shows step-by-step: accumulator (A), multiplier (Q), Q-1 bit, and arithmetic shift right at each iteration.

Difficulty: Intermediate | Time: 3โ€“4 hours

Python
# booths_multiplication.py โ€” Project 7 (Unit 7: Computer Arithmetic)

def to_binary(num, bits):
    if num >= 0: return format(num, f'0{bits}b')
    return format((1 << bits) + num, f'0{bits}b')  # 2's complement

def arithmetic_shift_right(A, Q, Qn1, bits):
    """Arithmetic shift right: preserve sign of A."""
    new_Qn1 = Q & 1
    Q = (Q >> 1) | ((A & 1) << (bits - 1))
    sign = A & (1 << (bits - 1))
    A = (A >> 1) | sign
    return A, Q, new_Qn1

def booths_multiply(M, Q_val, bits=5):
    print(f"=== Booth's Multiplication: {M} ร— {Q_val} ===")
    mask = (1 << bits) - 1
    A = 0
    Q = Q_val & mask if Q_val >= 0 else ((1 << bits) + Q_val) & mask
    M_val = M & mask if M >= 0 else ((1 << bits) + M) & mask
    Qn1 = 0
    print(f"  Init: A={to_binary(A,bits)} Q={to_binary(Q,bits)} Q-1={Qn1}")
    for i in range(bits):
        Q0 = Q & 1
        if Q0 == 1 and Qn1 == 0:
            A = (A - M_val) & mask
            print(f"  Step {i+1}: Q0Qn1=10 โ†’ A = A - M")
        elif Q0 == 0 and Qn1 == 1:
            A = (A + M_val) & mask
            print(f"  Step {i+1}: Q0Qn1=01 โ†’ A = A + M")
        else:
            print(f"  Step {i+1}: Q0Qn1={Q0}{Qn1} โ†’ No operation")
        A, Q, Qn1 = arithmetic_shift_right(A, Q, Qn1, bits)
        print(f"          ASR: A={to_binary(A,bits)} Q={to_binary(Q,bits)} Q-1={Qn1}")
    result_bin = to_binary(A, bits) + to_binary(Q, bits)
    print(f"  Result: {result_bin} = {M * Q_val}")

booths_multiply(-3, 5)
print()
booths_multiply(7, -4)
Students often confuse Q0 with Q-1. Remember: Q0 is the LSB of the multiplier Q. Q-1 is an extra bit initialized to 0 that tracks the previous Q0. The decision (add/subtract/nothing) depends on the pair [Q0, Q-1].

๐Ÿ”ง Portfolio Project 8: Pipeline Speedup Calculator (Unit 8 โ€” Pipelining)

๐Ÿ“‹ Project Description

What it does: Calculates pipeline speedup, throughput, and efficiency for a k-stage pipeline. Also detects and reports data hazards, control hazards, and structural hazards in instruction sequences.

Difficulty: Intermediate | Time: 3โ€“4 hours

Python
# pipeline_calculator.py โ€” Project 8 (Unit 8: Pipelining)

class PipelineCalculator:
    def __init__(self, stages, clock_period_ns):
        self.k = stages
        self.tp = clock_period_ns

    def speedup(self, n):
        """Calculate speedup for n instructions."""
        non_pipeline = n * self.k * self.tp
        pipeline = (self.k + n - 1) * self.tp
        s = non_pipeline / pipeline
        return s, non_pipeline, pipeline

    def throughput(self, n):
        """Instructions completed per nanosecond."""
        pipeline_time = (self.k + n - 1) * self.tp
        return n / pipeline_time

    def efficiency(self, n):
        return self.speedup(n)[0] / self.k * 100

    def detect_hazards(self, instructions):
        """Detect data hazards (RAW, WAR, WAW) in instruction list."""
        hazards = []
        for i in range(len(instructions) - 1):
            curr_dest = instructions[i].get('dest')
            next_srcs = instructions[i+1].get('src', [])
            if curr_dest and curr_dest in next_srcs:
                hazards.append(f"  RAW Hazard: I{i+1}โ†’I{i+2} on {curr_dest}")
        return hazards

    def report(self, n):
        s, t_np, t_p = self.speedup(n)
        print(f"=== Pipeline Performance Report ===")
        print(f"  Stages: {self.k} | Clock: {self.tp}ns | Instructions: {n}")
        print(f"  Non-pipeline time: {t_np}ns")
        print(f"  Pipeline time:     {t_p}ns")
        print(f"  Speedup:           {s:.2f}x")
        print(f"  Throughput:        {self.throughput(n):.4f} instr/ns")
        print(f"  Efficiency:        {self.efficiency(n):.1f}%")
        print(f"  Max Speedup (nโ†’โˆž): {self.k:.1f}x")

# Demo
pipe = PipelineCalculator(stages=5, clock_period_ns=2)
pipe.report(100)

# Hazard Detection
instrs = [
    {'op':'ADD', 'dest':'R1', 'src':['R2','R3']},
    {'op':'SUB', 'dest':'R4', 'src':['R1','R5']},  # RAW on R1!
    {'op':'MUL', 'dest':'R6', 'src':['R4','R7']},  # RAW on R4!
]
print("\n=== Hazard Detection ===")
for h in pipe.detect_hazards(instrs):
    print(h)

๐Ÿ“š GATE Preparation โ€” Top 40 COA Questions (5 per Unit)

The following 40 questions are organized by unit, covering all 8 units of Computer Organization. Each question is GATE-level with full solutions. See Appendix E for the complete Q&A set.

Unit 1: Digital Logic โ€” Sample GATE Questions

Q1: The minimum number of 2-input NAND gates required to implement the function F = AB + CD is:
(A) 3   (B) 4   (C) 5   (D) 6
Answer: (C) 5 โ€” Two NANDs for (AB)', two NANDs for (CD)', one NAND for ((AB)' ยท (CD)')' = AB + CD.

Q2: A JK flip-flop with J=K=1 acts as:
(A) SR flip-flop   (B) D flip-flop   (C) T flip-flop   (D) Latch
Answer: (C) T flip-flop โ€” When J=K=1, the output toggles on every clock edge.

Unit 4: CPU โ€” Sample GATE Questions

Q1: In a microprogrammed control unit, the control memory stores:
(A) Data values   (B) Microinstructions   (C) Cache tags   (D) Interrupt vectors
Answer: (B) Microinstructions โ€” Control memory holds the microprogram that generates control signals for each machine instruction.

Q2: RISC architecture favors:
(A) Complex instructions with variable length   (B) Simple, fixed-length instructions with register-to-register operations   (C) Memory-to-memory operations   (D) Microprogrammed control
Answer: (B) โ€” RISC uses simple fixed-length instructions, primarily register-to-register, with hardwired control for fast execution.

Full 40 questions with detailed solutions available in Appendix E.

๐ŸŽฏ Career Guidance โ€” 4 Hardware Engineering Paths

Career Path 1: VLSI Design Engineer

๐Ÿ”ฌ VLSI Design โ€” The High-Paying Hardware Path

What you do: Design integrated circuits (chips) at transistor level. You work with logic synthesis, physical design, timing analysis, and verification. You literally design the chips that go into phones, cars, and satellites.

Key Tools: Cadence Virtuoso (analog design), Synopsys Design Compiler (logic synthesis), Mentor Graphics Calibre (DRC/LVS), Xilinx Vivado (FPGA prototyping)

Salary: Fresher โ‚น6โ€“12 LPA โ†’ Mid โ‚น15โ€“25 LPA โ†’ Senior โ‚น30โ€“80 LPA

Companies in India: Intel Bangalore, Qualcomm Hyderabad, Samsung Noida, Texas Instruments Bangalore, MediaTek Noida, Synopsys Bangalore, Cadence Noida, ARM Bangalore

Career Path 2: Embedded Systems Engineer

๐Ÿค– Embedded Systems โ€” Where Software Meets Hardware

What you do: Program microcontrollers and microprocessors that run inside devices โ€” washing machines, car ECUs, medical devices, drones. You write C/C++ code that runs directly on hardware with real-time constraints.

Platforms: Arduino, STM32, Raspberry Pi, ESP32, TI MSP430. RTOS: FreeRTOS, Zephyr, VxWorks.

Salary: Fresher โ‚น5โ€“8 LPA โ†’ Mid โ‚น10โ€“18 LPA โ†’ Senior โ‚น20โ€“40 LPA

Companies: Bosch India, Continental, Tata Elxsi, KPIT Technologies, Sasken, L&T Technology Services

Career Path 3: Firmware Engineer

โš™๏ธ Firmware โ€” The Bridge Between Hardware and Software

What you do: Write low-level code that initializes hardware. Boot loaders, device drivers, BIOS/UEFI, peripheral controllers. Your code is the first thing that runs when a device powers on.

Skills: C, Assembly (ARM/x86), hardware debugging (JTAG, oscilloscope), Linux kernel, device tree.

Salary: Fresher โ‚น6โ€“10 LPA โ†’ Mid โ‚น12โ€“20 LPA โ†’ Senior โ‚น25โ€“50 LPA

Companies: Intel, AMD, Qualcomm, Western Digital, Seagate, Samsung, HP, Dell

Career Path 4: FPGA Developer

๐Ÿงฉ FPGA โ€” Programmable Hardware

What you do: Design digital circuits using Verilog/VHDL that get programmed onto FPGA chips. Used in telecom (5G base stations), defense (radar systems), finance (high-frequency trading), and AI accelerators.

Tools: Verilog, VHDL, Xilinx Vivado, Intel Quartus Prime, ModelSim

Salary: Fresher โ‚น7โ€“12 LPA โ†’ Mid โ‚น15โ€“22 LPA โ†’ Senior โ‚น25โ€“60 LPA

Companies: Xilinx (AMD), Intel, Qualcomm, Analog Devices, National Instruments, ISRO

๐Ÿ’ฐ Hardware vs Software Salary Comparison (Indian Market)

RoleFresher (0โ€“2 yrs)Mid (3โ€“5 yrs)Senior (6โ€“10 yrs)Lead (10+ yrs)
VLSI Design Engineerโ‚น6โ€“12 LPAโ‚น15โ€“25 LPAโ‚น30โ€“50 LPAโ‚น50โ€“80 LPA
Embedded Systems Engineerโ‚น5โ€“8 LPAโ‚น10โ€“18 LPAโ‚น20โ€“35 LPAโ‚น35โ€“55 LPA
Firmware Engineerโ‚น6โ€“10 LPAโ‚น12โ€“20 LPAโ‚น25โ€“40 LPAโ‚น40โ€“65 LPA
FPGA Developerโ‚น7โ€“12 LPAโ‚น15โ€“22 LPAโ‚น25โ€“45 LPAโ‚น45โ€“60 LPA
Software Developerโ‚น4โ€“8 LPAโ‚น10โ€“20 LPAโ‚น20โ€“40 LPAโ‚น40โ€“70 LPA
Full-Stack Developerโ‚น4โ€“7 LPAโ‚น8โ€“18 LPAโ‚น18โ€“35 LPAโ‚น35โ€“55 LPA
Data Engineerโ‚น5โ€“10 LPAโ‚น12โ€“22 LPAโ‚น22โ€“40 LPAโ‚น40โ€“65 LPA
DevOps Engineerโ‚น5โ€“9 LPAโ‚น10โ€“20 LPAโ‚น20โ€“38 LPAโ‚น38โ€“55 LPA
Hardware pays more at senior levels. While software freshers may earn slightly more, VLSI and FPGA engineers at 10+ years experience consistently earn โ‚น50โ€“80 LPA in India โ€” matching or exceeding software engineering. The supply of hardware talent is much lower, making it a less competitive but higher-paying career.

๐Ÿ“ LinkedIn Profile Template for Hardware Engineers

LinkedIn Template
HEADLINE:
"VLSI/Embedded Systems Engineer | CPU Architecture | Verilog | ARM | GATE Qualified | Open to Opportunities"

ABOUT:
Computer Organization enthusiast with hands-on experience in CPU design, cache simulation, and pipeline optimization. Built 8 portfolio projects including a complete CPU simulator with 8 registers and a 10-instruction ISA. Proficient in Python, C, Verilog, and embedded C. GATE qualified with strong fundamentals in digital logic, computer arithmetic, and memory systems.

FEATURED PROJECTS:
โ€ข CPU Simulator (Python) โ€” 8 registers, fetch-decode-execute, deployed on GitHub
โ€ข Cache Simulator โ€” Direct/Associative/Set-Associative with hit-ratio analysis
โ€ข Booth's Multiplication Calculator โ€” Step-by-step signed binary multiplication
โ€ข Pipeline Speedup Calculator โ€” Hazard detection and performance analysis

SKILLS:
Digital Logic | Computer Architecture | Verilog/VHDL | ARM Assembly
Cache Design | Pipelining | Booth's Algorithm | Python | C/C++
GATE Preparation | Cadence | Synopsys | Xilinx Vivado

๐Ÿข Interview Prep Roadmap

CompanyRoundsKey TopicsPrep Tips
TCS (Digital)Aptitude โ†’ Technical โ†’ HRDigital logic, flip-flops, basic CPU, GATE questionsFocus on TCS NQT pattern; practice Mano machine questions
InfosysOnline Test โ†’ Technical โ†’ HRBoolean algebra, memory hierarchy, basic pipeliningInfyTQ certification helps; practice coding + COA MCQs
Samsung R&DCoding Test โ†’ 2 Technical โ†’ HRCache design, ARM architecture, OS+COA combinedPractice 3-hour coding tests; deep dive into cache problems
Intel IndiaOnline โ†’ 3 Technical โ†’ Hiring ManagerVLSI, Verilog, pipeline hazards, cache coherence, ISA designExpect whiteboard design questions; know x86 pipeline stages
QualcommOnline โ†’ 4 Technical โ†’ Bar RaiserARM architecture, cache protocols (MOESI), power optimizationDeep knowledge of ARM Cortex; practice design questions
Section D

Learn by Doing โ€” Deploy CPU Simulator to GitHub

๐ŸŸข Tier 1 โ€” GUIDED: Push CPU Simulator to GitHub

โฑ๏ธ 45โ€“60 minutesBeginnerStep-by-step instructions

Step 1: Create a GitHub Account

Go to github.com โ†’ Sign up (free) โ†’ Verify email.

Step 2: Create a New Repository

  1. Click "+" โ†’ "New repository"
  2. Name: coa-cpu-simulator
  3. Description: "Complete CPU simulator with 8 registers, 10-instruction ISA โ€” COA Portfolio"
  4. Set to Public, check "Add README"
  5. Click "Create repository"

Step 3: Upload Your Python Files

  1. Click "Add file" โ†’ "Upload files"
  2. Upload all 8 project .py files
  3. Write commit message: "Add all 8 COA portfolio projects"
  4. Click "Commit changes"

Step 4: Write a Professional README

Edit README.md to include: project overview, screenshots, ISA table, how to run, and concepts demonstrated. Use the README templates from each project above.

๐ŸŽ‰ You now have a live portfolio on GitHub! Share the URL on your LinkedIn profile.

๐ŸŸก Tier 2 โ€” SEMI-GUIDED: Add CI, Tests & Badges

โฑ๏ธ 90 minutesIntermediateHints provided

Your Mission:

  1. Write unit tests using Python's unittest module for the CPU simulator
  2. Create a GitHub Actions workflow (.github/workflows/test.yml) to run tests automatically on every push
  3. Add badges to your README: build status, Python version, license

Hints:

YAML
# .github/workflows/test.yml
name: Run Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with: { python-version: '3.11' }
      - run: python -m pytest tests/ -v
Stretch Goal: Add code coverage reporting using pytest-cov and display the coverage badge in your README. Target 80%+ coverage.

๐Ÿ”ด Tier 3 โ€” OPEN CHALLENGE: Interactive CPU Simulator on GitHub Pages

โฑ๏ธ 3โ€“5 hoursAdvancedNo instructions โ€” build it yourself

The Brief:

Convert your CPU simulator into an interactive web application using HTML/CSS/JavaScript. Users should be able to:

  1. Enter assembly instructions in a text area
  2. Click "Execute" to run them step-by-step
  3. See register values update in real-time
  4. Visualize the fetch-decode-execute cycle

Deploy to GitHub Pages so anyone can try it without installing anything.

This single project can replace 100 resume bullet points. A live, interactive CPU simulator on GitHub Pages tells a recruiter: "This person doesn't just study computer architecture โ€” they build it." It's a portfolio piece that separates you from 10,000 other applicants.
Section E

Cross-Unit Synthesis Problems

๐Ÿ”— Problem 1: End-to-End CPU Design (Units 1, 3, 4, 8)

Problem: Design a 4-stage pipelined CPU that uses NAND gates for its ALU (Unit 1), implements the Mano machine instruction format (Unit 3), supports 5 instructions from Unit 4's ISA, and handles data hazards using forwarding (Unit 8).

Deliverables: Block diagram, instruction format, pipeline timing diagram showing a 5-instruction program with at least one hazard and its resolution.

Synthesis Required: You must connect gate-level design (Unit 1) with instruction format (Unit 3), CPU registers (Unit 4), and pipeline mechanics (Unit 8).

๐Ÿ”— Problem 2: Memory System Design (Units 2, 6, 7)

Problem: Design a memory system for a CPU that has: a 4-way set-associative L1 cache with LRU replacement (Unit 6), uses shift registers for serial data transfer from cache to CPU (Unit 2), and performs Booth's multiplication on cache addresses for tag comparison (Unit 7).

Given: Cache size = 64KB, block size = 64B, memory address = 32 bits. Calculate tag, index, and offset bits. Show hit/miss sequence for 10 given addresses. Demonstrate Booth's multiplication for one address computation.

๐Ÿ”— Problem 3: I/O-Driven Pipeline Stall Analysis (Units 5, 6, 8)

Problem: A 5-stage pipeline is executing instructions when a priority interrupt arrives at cycle 7 (Unit 5). The ISR needs data from memory, causing a cache miss (Unit 6) that takes 10 cycles. Analyze: (a) How many pipeline stages are stalled? (b) What is the effective CPI? (c) What is the total penalty in clock cycles? (d) How would a write-back cache vs write-through cache affect the penalty?

๐Ÿ”— Problem 4: Complete Instruction Execution Trace (Units 1โ€“4, 7)

Problem: Trace the execution of MUL R3, R1, R2 where R1=7 (binary: 0111) and R2=-3 (binary: 1101 in 4-bit 2's complement). Show: (a) Gate-level operations in the ALU (Unit 1), (b) Register transfers during fetch-decode-execute (Unit 3), (c) Booth's algorithm step-by-step for the multiplication (Unit 7), (d) Final result stored in R3 (Unit 4).

๐Ÿ”— Problem 5: System Performance Optimization (Units 4, 5, 6, 8)

Problem: A computer system has: a 5-stage pipeline (Unit 8), 2-way set-associative L1 cache with 95% hit rate (Unit 6), vectored interrupt handling (Unit 5), and a RISC CPU (Unit 4). An interrupt occurs every 1000 instructions. Each interrupt takes 5 cycles. Cache miss penalty is 20 cycles. Calculate: (a) Effective CPI, (b) Pipeline efficiency, (c) Total execution time for 10,000 instructions at 2 GHz clock, (d) Suggest one improvement for each bottleneck.

Section F

MCQ Assessment Bank โ€” 30 Cross-Unit GATE-Style Questions

Remember / Identify (Q1โ€“Q5)

Q1

The output of XOR gate for inputs A=1, B=1 is:

  1. 0
  2. 1
  3. Undefined
  4. Depends on clock
RememberUnit 1
โœ… Answer: (A) 0 โ€” XOR outputs 1 only when inputs differ. 1โŠ•1 = 0.
Q2

In Mano's basic computer, the instruction STA performs:

  1. Load accumulator from memory
  2. Store accumulator to memory
  3. Skip on accumulator zero
  4. Store and add
RememberUnit 3
โœ… Answer: (B) โ€” STA (Store Accumulator) writes the AC value to the memory address specified in the instruction.
Q3

The number of flip-flops needed for a mod-16 counter is:

  1. 2
  2. 3
  3. 4
  4. 16
RememberUnit 2
โœ… Answer: (C) 4 โ€” A mod-N counter needs โŒˆlogโ‚‚NโŒ‰ flip-flops. logโ‚‚(16) = 4.
Q4

Which addressing mode uses the operand value directly in the instruction?

  1. Direct addressing
  2. Indirect addressing
  3. Immediate addressing
  4. Register addressing
RememberUnit 4
โœ… Answer: (C) Immediate addressing โ€” The operand value itself is part of the instruction, e.g., LOAD R1, #5.
Q5

In Booth's algorithm, when Qโ‚€Qโ‚‹โ‚ = "10", the operation performed is:

  1. A = A + M
  2. A = A - M
  3. No operation
  4. Shift left
RememberUnit 7
โœ… Answer: (B) A = A - M โ€” When Qโ‚€=1 and Qโ‚‹โ‚=0, we subtract the multiplicand from the accumulator before shifting.

Understand / Explain (Q6โ€“Q10)

Q6

Why does a set-associative cache have lower miss rate than a direct-mapped cache of the same size?

  1. It has more total blocks
  2. It allows multiple blocks to map to the same set, reducing conflict misses
  3. It uses faster SRAM technology
  4. It has a larger block size
UnderstandUnit 6
โœ… Answer: (B) โ€” Set-associative caches reduce conflict misses because each memory block can go into any slot within its set, rather than being forced into exactly one slot.
Q7

In pipelining, a data hazard occurs when:

  1. Two instructions need the same functional unit
  2. A branch instruction changes the program flow
  3. An instruction depends on the result of a previous instruction still in the pipeline
  4. The pipeline clock speed is too fast
UnderstandUnit 8
โœ… Answer: (C) โ€” Data hazard (RAW) occurs when an instruction needs a value that hasn't been computed yet by a prior instruction in the pipeline.
Q8

Why is DMA preferred over programmed I/O for bulk data transfers?

  1. DMA is cheaper to implement
  2. DMA transfers data without CPU intervention, freeing the CPU for other tasks
  3. DMA uses less memory bandwidth
  4. DMA doesn't need interrupt handling
UnderstandUnit 5
โœ… Answer: (B) โ€” DMA (Direct Memory Access) handles data transfer between I/O and memory independently, allowing the CPU to execute other instructions simultaneously.
Q9

Explain why RISC processors typically have more registers than CISC processors:

  1. RISC chips are physically larger
  2. RISC uses register-to-register operations, so more registers reduce memory accesses
  3. CISC doesn't support registers
  4. RISC registers are smaller in bit-width
UnderstandUnit 4
โœ… Answer: (B) โ€” RISC architectures perform most operations between registers (load-store architecture), so having more registers reduces the need for slow memory accesses.
Q10

In the memory hierarchy, why is cache memory faster but smaller than main memory?

  1. Cache uses DRAM while main memory uses SRAM
  2. Cache is built with SRAM which is faster but more expensive per bit than DRAM
  3. Cache is located outside the CPU
  4. Main memory has lower latency
UnderstandUnit 6
โœ… Answer: (B) โ€” Cache uses SRAM (6 transistors/bit, fast, expensive). Main memory uses DRAM (1 transistor + 1 capacitor/bit, slower, cheaper). Cost-performance tradeoff dictates cache is small but fast.

Apply (Q11โ€“Q15)

Q11

A direct-mapped cache has 64 blocks, block size = 16 bytes, and the memory address is 16 bits. The number of tag bits is:

  1. 4
  2. 6
  3. 8
  4. 10
ApplyUnit 6
โœ… Answer: (B) 6 โ€” Offset = logโ‚‚(16) = 4 bits. Index = logโ‚‚(64) = 6 bits. Tag = 16 - 4 - 6 = 6 bits.
Q12

Using Booth's algorithm, multiply -5 ร— 3 (in 5-bit representation). The final result in decimal is:

  1. -15
  2. 15
  3. -12
  4. 12
ApplyUnit 7
โœ… Answer: (A) -15 โ€” Booth's algorithm correctly handles signed multiplication: -5 ร— 3 = -15.
Q13

A 5-stage pipeline processes 100 instructions. The speedup over non-pipelined execution is approximately:

  1. 5.0ร—
  2. 4.81ร—
  3. 3.5ร—
  4. 2.0ร—
ApplyUnit 8
โœ… Answer: (B) 4.81ร— โ€” Speedup = (nร—k)/((k+n-1)) = (100ร—5)/(5+100-1) = 500/104 โ‰ˆ 4.81.
Q14

Simplify the Boolean expression: F = A'B + AB' + AB using a K-map.

  1. A + B
  2. A'B'
  3. AB
  4. A โŠ• B
ApplyUnit 1
โœ… Answer: (A) A + B โ€” Grouping minterms 01, 10, 11 in the K-map gives F = A + B.
Q15

In an 8-bit shift register initially loaded with 11001010, after 3 right shifts with serial input 0, the register contains:

  1. 00011001
  2. 01011001
  3. 00110010
  4. 10100000
ApplyUnit 2
โœ… Answer: (A) 00011001 โ€” Right shift 3 times with 0 input: 11001010 โ†’ 01100101 โ†’ 00110010 โ†’ 00011001.

Analyze (Q16โ€“Q20)

Q16

Compare a 4-way set-associative cache with a direct-mapped cache of the same size. Which statement is TRUE?

  1. Direct-mapped always has higher hit rate
  2. Set-associative has higher hit rate but needs more comparison hardware
  3. Both have identical hit rates for all access patterns
  4. Set-associative is always slower
AnalyzeUnit 6
โœ… Answer: (B) โ€” Set-associative reduces conflict misses (higher hit rate) but requires n comparators running in parallel (more hardware complexity).
Q17

In a 5-stage pipeline, instruction I3 reads R1 which was written by I1 (still in MEM stage). This is an example of:

  1. Structural hazard
  2. Control hazard
  3. RAW data hazard
  4. WAR data hazard
AnalyzeUnit 8
โœ… Answer: (C) RAW (Read After Write) โ€” I3 needs to read R1 before I1 has written it back. This is the most common pipeline hazard, resolved by forwarding or stalling.
Q18

Analyze the instruction ADD R1, [R2]. Which addressing modes are used?

  1. Register and Immediate
  2. Register and Register Indirect
  3. Direct and Immediate
  4. Register and Direct
AnalyzeUnit 4
โœ… Answer: (B) โ€” R1 uses Register addressing. [R2] uses Register Indirect addressing โ€” R2 contains the memory address of the operand.
Q19

A system uses polling to check 5 I/O devices. Each poll takes 2ฮผs. Analyze the maximum latency for the highest-priority device vs the lowest-priority device.

  1. 2ฮผs and 10ฮผs
  2. 2ฮผs and 8ฮผs
  3. 10ฮผs and 10ฮผs
  4. 2ฮผs and 2ฮผs
AnalyzeUnit 5
โœ… Answer: (A) โ€” Highest priority device is checked first (2ฮผs). Lowest priority (5th) is checked after all others: 5 ร— 2ฮผs = 10ฮผs maximum latency.
Q20

De Morgan's theorem states that (A + B)' equals:

  1. A' + B'
  2. A'B'
  3. AB'
  4. (AB)'
AnalyzeUnit 1
โœ… Answer: (B) A'B' โ€” De Morgan's: complement of sum = product of complements. (A+B)' = A'ยทB'.

Evaluate (Q21โ€“Q25)

Q21

For a real-time embedded system controlling a car's ABS, which interrupt scheme is most appropriate?

  1. Software polling
  2. Vectored priority interrupts
  3. Daisy-chain without priority
  4. DMA with no interrupts
EvaluateUnit 5
โœ… Answer: (B) Vectored priority interrupts โ€” ABS needs guaranteed response time for the wheel sensors (highest priority). Vectored interrupts provide direct ISR dispatch without polling overhead.
Q22

Evaluate: For a mobile phone processor, RISC is preferred over CISC primarily because:

  1. RISC instructions are more powerful
  2. RISC has lower power consumption due to simpler decode logic and fixed-length instructions
  3. RISC can run x86 software
  4. RISC doesn't need cache
EvaluateUnit 4
โœ… Answer: (B) โ€” ARM (RISC) dominates mobile because simpler decode hardware = fewer transistors = lower power consumption. This is why 99% of smartphones use ARM, not x86.
Q23

A cache with high associativity (e.g., 16-way) vs a cache with low associativity (2-way) of the same size. Evaluate the tradeoff:

  1. Higher associativity always improves performance
  2. Higher associativity reduces conflict misses but increases access time and hardware cost
  3. Lower associativity is always better
  4. Associativity doesn't affect performance
EvaluateUnit 6
โœ… Answer: (B) โ€” More ways = fewer conflict misses, but each access requires more comparators running in parallel, increasing latency and silicon area. The diminishing returns typically make 4โ€“8 way optimal.
Q24

Evaluate which pipeline hazard resolution technique has zero performance penalty:

  1. Stalling (bubbles)
  2. Operand forwarding (data bypassing)
  3. Branch prediction (when correct)
  4. Both (B) and (C)
EvaluateUnit 8
โœ… Answer: (D) โ€” Forwarding eliminates stalls for data hazards at zero CPI penalty. Correct branch predictions have zero penalty as the pipeline continues without flushing.
Q25

Evaluate: Is Booth's algorithm more efficient than simple binary multiplication for negative numbers?

  1. No, they perform identically
  2. Yes, Booth's handles signed numbers without conversion to unsigned first
  3. No, Booth's only works for positive numbers
  4. Yes, but only for powers of 2
EvaluateUnit 7
โœ… Answer: (B) โ€” Booth's algorithm works directly on 2's complement representation, handling both positive and negative numbers without needing separate sign handling logic.

Create (Q26โ€“Q30)

Q26

Design a 4-bit ALU that supports ADD, SUB, AND, and OR. The minimum number of 2:1 multiplexers needed at the output stage is:

  1. 2
  2. 4
  3. 8
  4. 16
CreateUnit 1, 4
โœ… Answer: (B) 4 โ€” One 4:1 MUX (= two 2:1 MUXes) per output bit ร— 4 bits, but using 4:1 MUXes directly = 4 multiplexers total to select among the 4 operations for each bit.
Q27

Design a cache memory with the following specs: 256KB total, 64B block size, 4-way set-associative. How many sets does this cache have?

  1. 256
  2. 512
  3. 1024
  4. 4096
CreateUnit 6
โœ… Answer: (C) 1024 โ€” Total blocks = 256KB/64B = 4096. Sets = 4096/4 = 1024.
Q28

Create a 5-instruction program using the ISA {LOAD, ADD, SUB, MUL, HLT} that computes (A+B)ร—(A-B) where A=10, B=3:

  1. LOAD R0,10; LOAD R1,3; ADD R2,R0,R1; SUB R3,R0,R1; MUL R4,R2,R3
  2. LOAD R0,10; ADD R1,R0,3; SUB R2,R0,3; MUL R3,R1,R2; HLT
  3. MUL R0,10,3; ADD R1,R0,13; SUB R2,R0,7; HLT; HLT
  4. LOAD R0,3; LOAD R1,10; MUL R2,R0,R1; HLT; HLT
CreateUnit 4
โœ… Answer: (A) โ€” R0=10, R1=3, R2=R0+R1=13, R3=R0-R1=7, R4=R2ร—R3=91. This correctly computes (10+3)ร—(10-3)=91.
Q29

Create a priority interrupt system for 4 devices with priorities: Timer(0), Disk(1), Keyboard(2), Printer(3). If all 4 raise interrupts simultaneously, the correct servicing order is:

  1. Printer โ†’ Keyboard โ†’ Disk โ†’ Timer
  2. Timer โ†’ Disk โ†’ Keyboard โ†’ Printer
  3. All serviced simultaneously
  4. Random order
CreateUnit 5
โœ… Answer: (B) โ€” Priority 0 (Timer) is highest. Servicing order: Timer โ†’ Disk โ†’ Keyboard โ†’ Printer (lowest to highest priority number).
Q30

Create a pipeline diagram for these 3 instructions in a 5-stage pipeline, showing the stall required: I1: ADD R1, R2, R3 / I2: SUB R4, R1, R5 / I3: MUL R6, R4, R7. Without forwarding, how many stall cycles are needed?

  1. 0
  2. 2
  3. 4
  4. 6
CreateUnit 8
โœ… Answer: (C) 4 โ€” I2 depends on I1 (RAW on R1): 2 stalls. I3 depends on I2 (RAW on R4): 2 stalls. Total = 4 stall cycles without forwarding.
Section G

Short Answer Questions (8 Questions)

SA1: Explain the fetch-decode-execute cycle with a diagram. (5 marks)

Model Answer: The instruction cycle consists of three phases:

1. Fetch: PC โ†’ MAR โ†’ Memory โ†’ MBR โ†’ IR. The program counter provides the address, which goes to MAR. Memory contents at that address are fetched into MBR, then transferred to IR. PC is incremented.

2. Decode: The control unit examines the opcode portion of IR. It determines which instruction to execute and generates appropriate control signals.

3. Execute: The operation specified by the opcode is performed. This may involve ALU operations, register transfers, or memory access. For memory-reference instructions, the operand address is sent to MAR for a second memory access.

This cycle repeats until a HLT instruction is encountered.

SA2: Compare RISC and CISC architectures with 5 differences. (5 marks)

FeatureRISCCISC
InstructionsSimple, fixed-lengthComplex, variable-length
Addressing modesFew (3โ€“5)Many (12โ€“20)
RegistersMany (32โ€“128)Few (8โ€“16)
Control unitHardwiredMicroprogrammed
Execution1 cycle per instruction (ideal)Multiple cycles per instruction

Examples: RISC โ€” ARM, MIPS, RISC-V. CISC โ€” Intel x86, Motorola 68000.

SA3: What is a pipeline hazard? List all three types with examples. (5 marks)

1. Data Hazard (RAW): ADD R1,R2,R3 followed by SUB R4,R1,R5 โ€” R1 is needed before it's written back.

2. Control Hazard: BEQ R1,R2,LABEL โ€” The pipeline doesn't know which instruction to fetch next until the branch is resolved.

3. Structural Hazard: Two instructions need the same hardware unit (e.g., memory) in the same cycle.

Solutions: Forwarding (data), branch prediction (control), resource duplication (structural).

SA4: Explain Booth's algorithm steps for multiplying -3 ร— 5. (7 marks)

M = -3 = 11101 (5-bit 2's complement), Q = 5 = 00101

Init: A=00000, Q=00101, Qโ‚‹โ‚=0

Step 1: Qโ‚€Qโ‚‹โ‚ = 10 โ†’ A = A-M = 00011, ASR โ†’ A=00001, Q=10010, Qโ‚‹โ‚=1

Step 2: Qโ‚€Qโ‚‹โ‚ = 01 โ†’ A = A+M = 11110, ASR โ†’ A=11111, Q=01001, Qโ‚‹โ‚=0

Step 3: Qโ‚€Qโ‚‹โ‚ = 10 โ†’ A = A-M = 00010, ASR โ†’ A=00001, Q=00100, Qโ‚‹โ‚=1

Step 4: Qโ‚€Qโ‚‹โ‚ = 01 โ†’ A = A+M = 11110, ASR โ†’ A=11111, Q=00010, Qโ‚‹โ‚=0

Step 5: Qโ‚€Qโ‚‹โ‚ = 00 โ†’ No op, ASR โ†’ A=11111, Q=10001, Qโ‚‹โ‚=0

Result: AQ = 1111110001 = -15 โœ“

SA5: Differentiate between direct-mapped, fully-associative, and set-associative cache. (5 marks)

Direct-Mapped: Each memory block maps to exactly one cache line. Line = Block mod NumLines. Simple but high conflict misses.

Fully-Associative: A memory block can go in any cache line. Fewest misses but needs parallel comparators for all lines โ€” expensive.

Set-Associative: Cache divided into sets; a block maps to one set but can go in any line within that set. Set = Block mod NumSets. Best balance of performance and cost. Most CPUs use 4โ€“8 way set-associative L1 cache.

SA6: List and explain 4 types of flip-flops. (5 marks)

SR Flip-Flop: Set (S=1,R=0) makes Q=1; Reset (S=0,R=1) makes Q=0. S=R=1 is invalid.

JK Flip-Flop: Like SR but J=K=1 toggles output. No invalid state. Most versatile.

D Flip-Flop: Q follows D on clock edge. Used for data storage and registers.

T Flip-Flop: T=1 toggles Q; T=0 holds Q. Used in counters.

SA7: What is DMA? How does it differ from programmed I/O? (5 marks)

DMA (Direct Memory Access): A DMA controller handles data transfer between I/O devices and memory without CPU involvement. The CPU initiates the transfer and is interrupted when it's complete.

Programmed I/O: The CPU manually transfers each byte between I/O and memory, wasting CPU cycles.

Key differences: DMA frees the CPU (higher throughput), handles bulk transfers efficiently, but requires dedicated DMA controller hardware. Programmed I/O is simpler but ties up the CPU completely.

SA8: Explain the concept of memory-mapped I/O vs isolated I/O. (5 marks)

Memory-Mapped I/O: I/O devices share the same address space as memory. CPU uses regular LOAD/STORE instructions to access devices. Example: ARM processors use memory-mapped I/O exclusively.

Isolated I/O (Port-Mapped): I/O devices have a separate address space. CPU uses special IN/OUT instructions. Example: x86 has separate I/O ports (e.g., port 0x60 for keyboard).

Advantage of Memory-Mapped: No special instructions needed; any instruction that accesses memory can access I/O. Disadvantage: Reduces available memory address space.

Section H

Long Answer Questions (3 Questions, 15 marks each)

LA1: Design a complete 8-bit CPU architecture. Include registers, ALU, control unit, and memory interface. Draw a block diagram and explain each component. (15 marks)

Model Answer:

1. Register File (3 marks): 8 general-purpose registers (R0โ€“R7), each 8 bits. Special registers: PC (8-bit), IR (16-bit: 4-bit opcode + 3-bit Rd + 3-bit Rs + 6-bit immediate/address), SP (8-bit stack pointer), Flags register (Zero, Carry, Sign, Overflow).

2. ALU (3 marks): 8-bit arithmetic logic unit supporting: ADD, SUB (with carry), AND, OR, XOR, NOT, SHL, SHR. Takes two 8-bit inputs (from register file), produces 8-bit result + 4 flag bits. Implemented using ripple-carry adder for arithmetic, gate arrays for logic.

3. Control Unit (3 marks): Hardwired control (RISC-style) or microprogrammed (CISC-style). Decodes IR opcode, generates control signals: RegWrite, MemRead, MemWrite, ALUOp, Branch, ALUSrc. Timing: single-cycle or multi-cycle design.

4. Memory Interface (3 marks): MAR (8-bit) holds address, MBR (8-bit) holds data. Address bus: 8-bit โ†’ 256 bytes addressable. Data bus: 8-bit. Control signals: Read/Write, Memory Enable. Supports aligned byte access.

5. Data Path (3 marks): Instruction flow: PC โ†’ Memory โ†’ IR โ†’ Control Unit โ†’ Control Signals. Data flow: Register File โ†’ ALU โ†’ Register File/Memory. Multiplexers select ALU inputs (register vs immediate) and write-back source (ALU result vs memory data).

LA2: Compare all three cache mapping techniques with numerical examples. Given: Memory = 64KB, Cache = 4KB, Block size = 64B. Calculate tag, index, offset for each technique. (15 marks)

Given: Memory = 64KB = 2ยนโถ bytes โ†’ 16-bit address. Cache = 4KB. Block size = 64B = 2โถ bytes โ†’ Offset = 6 bits. Total cache blocks = 4KB/64B = 64. Total memory blocks = 64KB/64B = 1024.

1. Direct-Mapped (5 marks):

Cache lines = 64 โ†’ Index = logโ‚‚(64) = 6 bits. Tag = 16 - 6 - 6 = 4 bits.

Address format: [Tag: 4 bits | Index: 6 bits | Offset: 6 bits]

Example: Address 0x1A40 = 0001 101001 000000 โ†’ Tag=0001, Index=101001(=41), Offset=000000

Block 41 of memory maps to cache line 41. Conflict: Block 105 also maps to line 41 (105 mod 64 = 41).

2. Fully-Associative (5 marks):

No index bits. Tag = 16 - 6 = 10 bits.

Address format: [Tag: 10 bits | Offset: 6 bits]

Any memory block can go in any cache line. Needs 64 parallel comparators.

Replacement policy (LRU) needed. Best hit rate but most expensive hardware.

3. Set-Associative (5 marks):

4-way: Sets = 64/4 = 16 โ†’ Index = logโ‚‚(16) = 4 bits. Tag = 16 - 4 - 6 = 6 bits.

Address format: [Tag: 6 bits | Set Index: 4 bits | Offset: 6 bits]

Each set has 4 lines. Needs 4 parallel comparators per access.

Best balance: fewer conflict misses than direct-mapped, less hardware than fully-associative.

LA3: Explain pipelining with all hazard types and resolution techniques. Include a 5-stage pipeline timing diagram for 5 instructions with at least one hazard. (15 marks)

5-Stage Pipeline (3 marks): IF (Instruction Fetch) โ†’ ID (Instruction Decode) โ†’ EX (Execute) โ†’ MEM (Memory Access) โ†’ WB (Write Back).

Pipeline Timing (without hazards):

        C1   C2   C3   C4   C5   C6   C7   C8   C9
I1:     IF   ID   EX   MEM  WB
I2:          IF   ID   EX   MEM  WB
I3:               IF   ID   EX   MEM  WB
I4:                    IF   ID   EX   MEM  WB
I5:                         IF   ID   EX   MEM  WB

5 instructions complete in 9 cycles (vs 25 cycles non-pipelined). Speedup โ‰ˆ 2.78ร—.

Data Hazards (4 marks):

RAW (Read After Write): I1 writes R1, I2 reads R1. Solution: Operand forwarding โ€” forward result from EX/MEM pipeline register to ID/EX input. Or insert NOPs/stalls.

WAR (Write After Read): Rare in in-order pipelines. Occurs in out-of-order execution.

WAW (Write After Write): Two instructions write to the same register. Relevant for superscalar processors.

Control Hazards (4 marks):

Branch instructions don't resolve until EX stage. Pipeline has already fetched wrong instructions.

Solutions: Branch prediction (static: always-taken/not-taken; dynamic: branch history buffer). Branch delay slot (MIPS). Early branch resolution in ID stage.

Structural Hazards (4 marks):

Two pipeline stages need the same hardware. Example: IF and MEM both need memory access.

Solution: Separate instruction memory and data memory (Harvard architecture). Or add pipeline registers and use separate ports.

Section I

Industry Spotlight โ€” Amit Joshi, Freelancer Turned Hardware Startup Founder

๐Ÿš€ Amit Joshi, 32 โ€” Founder, NexaChip Technologies, Pune

Background: B.Tech in Electronics from a tier-2 college in Nagpur (not IIT, not NIT). Struggled to get placed on campus. Started freelancing on Upwork doing PCB design for โ‚น5,000/project in 2016.

The Journey:

2016: Started freelancing โ€” PCB layouts, schematic designs for hobbyist clients. Monthly income: โ‚น15,000โ€“โ‚น20,000.

2017: Learned FPGA development (Verilog + Xilinx). Got a โ‚น2 lakh project from a Bangalore IoT startup to design a sensor data acquisition board.

2018: Started posting open-source embedded projects on GitHub. One FPGA-based signal processing project got 400+ stars. A German company noticed and offered a $50/hr remote contract.

2019: Annual freelance income crossed โ‚น25 LPA. Hired his first employee โ€” a classmate from college.

2020: Founded NexaChip Technologies. Secured โ‚น50 lakh seed funding from a Pune-based angel investor. Focus: IoT sensor modules for agriculture (soil moisture, temperature, humidity).

2023: NexaChip has 12 employees, โ‚น1.5 crore annual revenue. Their IoT sensors are used by 200+ farmers in Maharashtra for smart irrigation.

Amit's Advice: "Don't wait for placements. Build something. Post it on GitHub. Write about it on LinkedIn. The hardware industry in India is starving for talent. If you can design a PCB or write Verilog, someone will pay you."

DetailInfo
EducationB.Tech ECE, tier-2 college Nagpur
First Earningโ‚น5,000/project (PCB design on Upwork)
Current Revenueโ‚น1.5 crore/year (NexaChip Technologies)
Team Size12 employees
Key SkillsPCB Design, FPGA (Verilog), Embedded C, IoT, Business Development
Section J

Earn With It โ€” Multiple Paths

๐Ÿ’ฐ Your Earning Paths After This Course

Path 1: Freelance Hardware Consulting โ€” Design PCBs, write embedded firmware, FPGA prototyping for startups. Platforms: Upwork, Freelancer, Toptal. Rate: โ‚น500โ€“โ‚น3,000/hr.

Path 2: Open-Source Contributions โ€” Contribute to RISC-V, OpenROAD, LibreCores. Build reputation โ†’ get hired by open-source hardware companies.

Path 3: Technical Writing/Blogging โ€” Write COA tutorials on Medium, Dev.to, or your own blog. Monetize with ads, affiliate links, or paid courses. Income: โ‚น5,000โ€“โ‚น50,000/month.

Path 4: YouTube Tutorials โ€” Create "COA for GATE" or "CPU Design in Python" video series. Monetize through ads + sponsorships. Top Indian tech YouTubers earn โ‚น1โ€“5 LPA from videos alone.

Path 5: GATE Coaching โ€” Score well in GATE, then tutor juniors. Online tutoring for GATE COA: โ‚น500โ€“โ‚น2,000/hr on platforms like Unacademy, Chegg, or private tutoring.

Path 6: Hardware Startup โ€” Like Amit Joshi, identify a problem (smart agriculture, home automation, wearables) and build an IoT product. India Semiconductor Mission provides grants for hardware startups.

Earning PathTime to First โ‚นMonthly PotentialDifficulty
GATE Tutoring1โ€“2 weeksโ‚น10,000โ€“โ‚น50,000Easy
Technical Writing2โ€“4 weeksโ‚น5,000โ€“โ‚น30,000Easy
Freelance Hardware1โ€“2 monthsโ‚น20,000โ€“โ‚น2,00,000Medium
YouTube Tutorials3โ€“6 monthsโ‚น5,000โ€“โ‚น1,00,000Medium
Hardware Startup6โ€“12 monthsVariable (high ceiling)Hard
Section K

Chapter Summary

๐ŸŽฏ Key Takeaways from Unit 9 โ€” Capstone

1. Portfolio Projects (8): Truth Table Generator, Shift Register Simulator, Fetch-Decode-Execute, CPU Simulator, Interrupt Handler, Cache Simulator, Booth's Calculator, Pipeline Calculator โ€” all deployed on GitHub.

2. GATE Preparation: 40 GATE-level questions covering all 8 units with full solutions. Focus areas: cache mapping, pipeline hazards, Booth's algorithm, Boolean algebra.

3. Career Paths: VLSI Design (โ‚น6โ€“80 LPA), Embedded Systems (โ‚น5โ€“40 LPA), Firmware (โ‚น6โ€“50 LPA), FPGA (โ‚น7โ€“60 LPA). Hardware pays more at senior levels.

4. Interview Prep: TCS, Infosys, Samsung, Intel, Qualcomm โ€” each has unique interview patterns but all test COA fundamentals deeply.

5. Earning Paths: Freelance consulting, technical writing, YouTube, GATE coaching, hardware startups โ€” multiple ways to monetize COA knowledge while still in college.

6. Units Synthesized: Unit 1 (Digital Logic) โ†’ Unit 2 (Sequential) โ†’ Unit 3 (Basic Computer) โ†’ Unit 4 (CPU) โ†’ Unit 5 (I/O) โ†’ Unit 6 (Memory) โ†’ Unit 7 (Arithmetic) โ†’ Unit 8 (Pipelining) โ†’ Unit 9 (Capstone).

Section L

Earning Checkpoint

Skill / TopicTool UsedPortfolio PieceEarning Ready?
Truth Table GeneratorPythonGitHub repo + READMEโœ… Yes โ€” demonstrate in interviews
Shift Register SimulatorPythonGitHub repo + READMEโœ… Yes โ€” shows sequential circuit understanding
Fetch-Decode-ExecutePythonGitHub repo + READMEโœ… Yes โ€” core CPU knowledge
CPU SimulatorPythonGitHub repo + demoโœ… Yes โ€” flagship portfolio project
Interrupt HandlerPythonGitHub repo + READMEโœ… Yes โ€” I/O systems expertise
Cache SimulatorPythonGitHub repo + hit-ratio analysisโœ… Yes โ€” GATE + interviews
Booth's CalculatorPythonGitHub repo + step traceโœ… Yes โ€” arithmetic expertise
Pipeline CalculatorPythonGitHub repo + hazard detectionโœ… Yes โ€” performance analysis
GATE PreparationStudy + Practice40 solved questionsโœ… Yes โ€” GATE-ready
Career RoadmapLinkedIn + GitHubProfile + Portfolioโœ… Yes โ€” industry-ready
Minimum Viable Portfolio after this chapter: 8 GitHub repos with professional READMEs + a LinkedIn profile with hardware engineering headline + solved GATE questions = You are ready to apply to Tata Elxsi, Samsung R&D, Intel India, KPIT, Bosch, and 100+ other companies hiring hardware engineers in India.

โ€” APPENDICES โ€”

Appendix A

Digital Logic Quick Reference

Logic Gates โ€” Truth Tables

ABANDORNANDNORXORXNOR
00001101
01011010
10011010
11110001

NOT Gate: NOT 0 = 1, NOT 1 = 0.

Flip-Flops โ€” Characteristic Tables

TypeInputsNext State Q(t+1)Key Property
SRS=0,R=0 โ†’ Q(t); S=0,R=1 โ†’ 0; S=1,R=0 โ†’ 1; S=1,R=1 โ†’ Invalidโ€”Set-Reset; S=R=1 undefined
JKJ=0,K=0 โ†’ Q(t); J=0,K=1 โ†’ 0; J=1,K=0 โ†’ 1; J=1,K=1 โ†’ Q'(t)โ€”Toggle when J=K=1; no invalid state
DD=0 โ†’ 0; D=1 โ†’ 1Q(t+1) = DData/Delay; output follows input
TT=0 โ†’ Q(t); T=1 โ†’ Q'(t)Toggle on T=1Used in counters

Boolean Algebra Laws

LawAND FormOR Form
IdentityAยท1 = AA+0 = A
NullAยท0 = 0A+1 = 1
IdempotentAยทA = AA+A = A
InverseAยทA' = 0A+A' = 1
CommutativeAยทB = BยทAA+B = B+A
Associative(AยทB)ยทC = Aยท(BยทC)(A+B)+C = A+(B+C)
DistributiveAยท(B+C) = AยทB+AยทCA+(BยทC) = (A+B)ยท(A+C)
AbsorptionAยท(A+B) = AA+AยทB = A
De Morgan's(AยทB)' = A'+B'(A+B)' = A'ยทB'
Appendix B

Mano Machine Instruction Set Reference

Instruction Format (16 bits)

[I(1 bit) | Opcode(3 bits) | Address(12 bits)]

I=0: Direct addressing. I=1: Indirect addressing.

Memory-Reference Instructions (Opcode 000โ€“110)

OpcodeMnemonicOperationMicro-operations
000ANDAC โ† AC โˆง M[addr]DRโ†M[AR], ACโ†ACโˆงDR
001ADDAC โ† AC + M[addr]DRโ†M[AR], ACโ†AC+DR, Eโ†carry
010LDAAC โ† M[addr]DRโ†M[AR], ACโ†DR
011STAM[addr] โ† ACM[AR]โ†AC
100BUNPC โ† addrPCโ†AR
101BSAM[addr]โ†PC, PCโ†addr+1M[AR]โ†PC, ARโ†AR+1, PCโ†AR
110ISZM[addr]++, skip if zeroDRโ†M[AR], DRโ†DR+1, M[AR]โ†DR, if DR=0: PCโ†PC+1

Register-Reference Instructions (Opcode 111, I=0)

BitMnemonicOperation
B11CLAAC โ† 0
B10CLEE โ† 0
B9CMAAC โ† AC'
B8CMEE โ† E'
B7CIRCircular right shift (E, AC)
B6CILCircular left shift (E, AC)
B5INCAC โ† AC + 1
B4SPASkip if AC positive (AC[15]=0)
B3SNASkip if AC negative (AC[15]=1)
B2SZASkip if AC = 0
B1SZESkip if E = 0
B0HLTHalt computer

Input-Output Instructions (Opcode 111, I=1)

BitMnemonicOperation
B11INPAC(0-7) โ† INPR, FGI โ† 0
B10OUTOUTR โ† AC(0-7), FGO โ† 0
B9SKISkip if FGI = 1
B8SKOSkip if FGO = 1
B7IONIEN โ† 1 (enable interrupts)
B6IOFIEN โ† 0 (disable interrupts)
Appendix C

Cache Formulae Cheat Sheet

Core Formulae

FormulaDescription
Number of blocks = Cache Size / Block SizeTotal cache blocks
Offset bits = logโ‚‚(Block Size)Bits to address bytes within a block
Index bits = logโ‚‚(Number of Sets)Bits to select cache set
Tag bits = Address bits - Index bits - Offset bitsRemaining bits for tag comparison
Number of Sets = Number of Blocks / AssociativitySets in set-associative cache
Hit Ratio = Hits / Total AccessesCache performance metric
Miss Ratio = 1 - Hit RatioCache miss frequency
AMAT = Hit Time + (Miss Rate ร— Miss Penalty)Average Memory Access Time
Effective CPI = Base CPI + (Memory accesses per instruction ร— Miss Rate ร— Miss Penalty)CPI with memory stalls

Mapping-Specific Formulae

MappingSet FormulaIndex BitsComparators
Direct-MappedLine = Block mod Nlogโ‚‚(N)1
Fully-AssociativeAny line0 (no index)N (all lines)
k-way Set-AssociativeSet = Block mod (N/k)logโ‚‚(N/k)k

Numerical Example

Given: Memory = 4GB (32-bit address), Cache = 64KB, Block = 64B, 4-way set-associative.

Blocks = 64KB/64B = 1024. Sets = 1024/4 = 256.

Offset = logโ‚‚(64) = 6 bits. Index = logโ‚‚(256) = 8 bits. Tag = 32 - 8 - 6 = 18 bits.

Address: [Tag: 18 | Index: 8 | Offset: 6] = 32 bits โœ“

Appendix D

Booth's & Division Algorithm Quick Reference

Booth's Algorithm โ€” Step by Step

Example: -3 ร— 5 (using 5-bit representation)

M = -3 = 11101, Q = 5 = 00101, -M = 00011

StepAQQโ‚‹โ‚Action
Init00000001010โ€”
100011001010Qโ‚€Qโ‚‹โ‚=10 โ†’ A=A-M=A+(-M)
ASR00001100101Arithmetic Shift Right
211110100101Qโ‚€Qโ‚‹โ‚=01 โ†’ A=A+M
ASR11111010010Arithmetic Shift Right
300010010010Qโ‚€Qโ‚‹โ‚=10 โ†’ A=A-M
ASR00001001001Arithmetic Shift Right
411110001001Qโ‚€Qโ‚‹โ‚=01 โ†’ A=A+M
ASR11111000100Arithmetic Shift Right
511111000100Qโ‚€Qโ‚‹โ‚=00 โ†’ No op
ASR11111100010Arithmetic Shift Right

Result: AQ = 11111 10001 = -15 in 10-bit 2's complement โœ“ (-3 ร— 5 = -15)

Restoring Division Algorithm

Steps: 1) Shift left AQ. 2) A = A - M. 3) If A < 0: restore A = A + M, Qโ‚€ = 0. If A โ‰ฅ 0: Qโ‚€ = 1. 4) Repeat n times.

Non-Restoring Division Algorithm

Steps: 1) Shift left AQ. 2) If A < 0: A = A + M. If A โ‰ฅ 0: A = A - M. 3) If A โ‰ฅ 0: Qโ‚€ = 1. Else: Qโ‚€ = 0. 4) Repeat n times. 5) If A < 0 at end: A = A + M (final restore).

Common Pitfall in Booth's: Forgetting that the arithmetic shift right preserves the sign bit of A. If A is negative (MSB=1), after ASR the MSB stays 1. This is NOT a logical shift!
Appendix E

Top 40 GATE COA Q&A

Unit 1: Digital Logic (Q1โ€“Q5)

Q1: Minimum NAND gates for F = AB + CD?
(A) 3 (B) 4 (C) 5 (D) 6
โœ… (C) 5 โ€” (AB)' needs 1 NAND, (CD)' needs 1 NAND, final output needs 1 NAND on these two outputs. But AB needs 1 NAND inverted by another = 2+2+1 = 5.

Q2: JK flip-flop with J=K=1 acts as? โœ… T flip-flop (toggles on every clock).

Q3: Number of minterms in a 4-variable Boolean function? โœ… 16 (2โด = 16).

Q4: Universal gates are? โœ… NAND and NOR โ€” can implement any Boolean function.

Q5: Karnaugh map simplification of F(A,B,C) = ฮฃm(0,1,2,4,5)? โœ… A' + B'C'

Unit 2: Sequential Circuits (Q6โ€“Q10)

Q6: Flip-flops needed for mod-12 counter? โœ… 4 (โŒˆlogโ‚‚12โŒ‰ = 4).

Q7: D flip-flop with D tied to Q': behaves as? โœ… T flip-flop with T=1 (toggles every clock).

Q8: Ring counter with 8 flip-flops can count up to? โœ… 8 states (one-hot encoding).

Q9: Johnson counter with 4 flip-flops counts? โœ… 8 states (2n states for n flip-flops).

Q10: A shift register can be used for? โœ… Serial-to-parallel conversion, delay, sequence generation.

Unit 3: Basic Computer Organization (Q11โ€“Q15)

Q11: In Mano's basic computer, the instruction AND has opcode? โœ… 000.

Q12: The BSA instruction is used for? โœ… Subroutine call โ€” saves return address and jumps.

Q13: Register AR in Mano's computer holds? โœ… Memory address (12 bits, connects to address bus).

Q14: ISZ instruction does? โœ… Increments memory word, skips next instruction if result is zero.

Q15: In the interrupt cycle, IEN is set to? โœ… 0 (interrupts disabled during ISR to prevent nesting).

Unit 4: CPU (Q16โ€“Q20)

Q16: Hardwired control is faster than microprogrammed because? โœ… Direct combinational logic generation of control signals vs. sequential memory reads.

Q17: Register indirect addressing: effective address is? โœ… Content of the register specified in the instruction.

Q18: RISC processors typically execute most instructions in? โœ… One clock cycle.

Q19: Condition code register stores? โœ… Flags: Zero, Carry, Sign, Overflow from ALU operations.

Q20: Stack-based addressing is used in? โœ… Zero-address instructions (operands implicit on stack).

Unit 5: I/O Organization (Q21โ€“Q25)

Q21: DMA transfers data between? โœ… I/O devices and memory without CPU intervention.

Q22: In daisy-chain priority, the device closest to CPU has? โœ… Highest priority.

Q23: Cycle stealing in DMA means? โœ… DMA borrows one bus cycle from CPU when needed.

Q24: Vectored interrupt provides? โœ… Direct address of ISR, avoiding polling.

Q25: Asynchronous data transfer uses? โœ… Handshaking protocol (request-acknowledge signals).

Unit 6: Memory Organization (Q26โ€“Q30)

Q26: Spatial locality means? โœ… If address X is accessed, addresses near X are likely to be accessed soon.

Q27: LRU replacement policy replaces? โœ… The block that hasn't been used for the longest time.

Q28: Write-back cache policy writes to main memory only when? โœ… The modified block is replaced (evicted) from cache.

Q29: Virtual memory uses? โœ… Page table to translate virtual addresses to physical addresses.

Q30: TLB (Translation Lookaside Buffer) is a cache for? โœ… Page table entries to speed up address translation.

Unit 7: Computer Arithmetic (Q31โ€“Q35)

Q31: In Booth's algorithm, Qโ‚€Qโ‚‹โ‚ = "01" means? โœ… Add multiplicand to accumulator (A = A + M).

Q32: Overflow in 2's complement addition occurs when? โœ… Two positive numbers give negative result or two negative give positive.

Q33: IEEE 754 single precision uses how many bits for exponent? โœ… 8 bits (with bias of 127).

Q34: Carry look-ahead adder advantage over ripple carry? โœ… O(log n) delay vs O(n) delay for n-bit addition.

Q35: Non-restoring division differs from restoring division in that? โœ… It doesn't restore the partial remainder; instead adds or subtracts based on sign of A.

Unit 8: Pipelining (Q36โ€“Q40)

Q36: Maximum speedup of a k-stage pipeline (nโ†’โˆž) is? โœ… k (approaches number of stages).

Q37: Forwarding (bypassing) solves which hazard? โœ… RAW data hazard โ€” by forwarding result from EX/MEM stage to dependent instruction.

Q38: Branch prediction accuracy of 90% means? โœ… 10% of branches cause pipeline flush โ€” penalty depends on pipeline depth.

Q39: Superscalar processor can issue? โœ… Multiple instructions per clock cycle using multiple functional units.

Q40: Pipeline bubble (NOP insertion) is used when? โœ… A hazard cannot be resolved by forwarding; the pipeline must stall for one or more cycles.

Appendix F

CPU Simulator Portfolio Checklist

ProjectCode DoneGitHub RepoREADMEScreenshotsUnit TestsLive DemoLinkedIn Post
1. Truth Table Generatorโ˜โ˜โ˜โ˜โ˜โ˜โ˜
2. Shift Register Simulatorโ˜โ˜โ˜โ˜โ˜โ˜โ˜
3. Fetch-Decode-Executeโ˜โ˜โ˜โ˜โ˜โ˜โ˜
4. CPU Simulator โญโ˜โ˜โ˜โ˜โ˜โ˜โ˜
5. Interrupt Handlerโ˜โ˜โ˜โ˜โ˜โ˜โ˜
6. Cache Simulatorโ˜โ˜โ˜โ˜โ˜โ˜โ˜
7. Booth's Calculatorโ˜โ˜โ˜โ˜โ˜โ˜โ˜
8. Pipeline Calculatorโ˜โ˜โ˜โ˜โ˜โ˜โ˜
Pro tip: Complete the โญ CPU Simulator first โ€” it's the highest-impact project. Then work outward. Each project should take 2โ€“5 hours. You can complete the entire portfolio in one focused weekend.
Appendix G

EduArtha COA Learning Path Map

UnitTopicEst. HoursKey ProjectsGATE Weightage
Unit 1Digital Logic & Boolean Algebra8โ€“10 hrsTruth Table Generatorโ˜…โ˜…โ˜…โ˜…โ˜† (8โ€“10 marks)
Unit 2Combinational & Sequential Circuits8โ€“10 hrsShift Register Simulatorโ˜…โ˜…โ˜…โ˜†โ˜† (5โ€“7 marks)
Unit 3Basic Computer Organization10โ€“12 hrsFetch-Decode-Execute Simulatorโ˜…โ˜…โ˜…โ˜…โ˜† (8โ€“10 marks)
Unit 4Central Processing Unit10โ€“12 hrsCPU Simulator โญโ˜…โ˜…โ˜…โ˜…โ˜… (10โ€“12 marks)
Unit 5I/O Organization6โ€“8 hrsPriority Interrupt Handlerโ˜…โ˜…โ˜…โ˜†โ˜† (5โ€“7 marks)
Unit 6Memory Organization10โ€“12 hrsCache Simulatorโ˜…โ˜…โ˜…โ˜…โ˜… (10โ€“15 marks)
Unit 7Computer Arithmetic6โ€“8 hrsBooth's Calculatorโ˜…โ˜…โ˜…โ˜†โ˜† (5โ€“8 marks)
Unit 8Pipelining & Advanced Topics8โ€“10 hrsPipeline Calculatorโ˜…โ˜…โ˜…โ˜…โ˜† (8โ€“10 marks)
Unit 9Capstone & Career Launchpad10โ€“12 hrsComplete Portfolio + GATE PrepSynthesizes all units

Total Estimated Time: 76โ€“94 hours  |  Total GATE Weightage: ~60โ€“80 marks (out of 100 for COA section)

Final Goal: Complete all 9 units โ†’ Build 8 portfolio projects on GitHub โ†’ Solve 40 GATE questions โ†’ Write LinkedIn profile โ†’ Apply to hardware companies. You are now GATE-Ready + Industry-Ready + Portfolio-Ready.

โœ… Computer Organization: COMPLETE! You are GATE-Ready & Industry-Ready.

[QR: Link to EduArtha video tutorial โ€” COA Capstone & CPU Simulator]