Programming lesson
Build a RISC-V Pipeline Simulator: A Step-by-Step Guide for CDA 4102/5155
Learn to build a cycle-accurate RISC-V pipeline simulator for CDA 4102/5155. Covers fetch, decode, issue, execute, and writeback with scoreboarding and hazard handling.
Understanding the RISC-V Pipeline Simulator Project
If you're taking CDA 4102 or CDA 5155 in Fall 2025, you already know that Project 2 is all about creating a cycle-by-cycle simulator for a pipelined RISC-V processor. This is no small task—it's like building a miniature CPU in software. But don't worry: this guide will walk you through the key concepts, using timely analogies to make the abstract concrete. Think of it as the playbook for your team's championship run, where every cycle counts.
Pipeline Basics: The Assembly Line Analogy
A pipelined processor works like a modern car assembly line. At a Tesla Gigafactory, each station performs a specific task: installing the battery pack, attaching doors, mounting wheels. Similarly, your RISC-V pipeline has stages: Fetch, Decode, Issue, Execute, and Writeback. Instructions flow through these stages, and at the end of each clock cycle, they move to the next stage. Your simulator must track this flow cycle by cycle, just as a factory manager tracks each car's progress.
The key is that multiple instructions are in the pipeline at once. If one stage stalls (say, because a needed part isn't ready), the whole line might pause. In your simulator, you'll model this with queues and registers.
Fetch and Decode: The First Two Stages
The Instruction Fetch/Decode (IF) unit can grab up to two instructions per cycle, in program order. But it has to check conditions: Is the fetch unit stalled from a previous branch? Is there an empty slot in the Pre-Issue queue? If not, no new instructions can enter. This is like a ticket booth at a concert: if the entry line is full, no new tickets are sold until someone moves inside.
When a branch instruction (like beq or jal) is fetched, the unit tries to compute the target address right away. If the registers are ready, the PC updates immediately—zero penalty. Otherwise, the fetch unit stalls until the registers are available. This is a classic control hazard.
If a branch is fetched alongside another instruction, the next instruction is discarded if the branch is first. If the branch is second, both are decoded normally. Your simulator must handle these scenarios exactly as described in the project spec.
The Pre-Issue Queue and Scoreboarding
After decoding, instructions go into the Pre-Issue queue (4 entries). The Issue unit then uses a scoreboard algorithm to issue instructions out-of-order, up to three per cycle. It checks for structural hazards: only one load/store can go to ALU1, one arithmetic to ALU2, one logical to ALU3 per cycle. The issue unit scans from entry 0 to 3, issuing instructions whose source operands are ready and whose target functional unit queue has space.
Think of this like a ride-sharing dispatch system: you have three types of vehicles (ALU1, ALU2, ALU3), each with a limited queue. The dispatcher (Issue unit) assigns riders (instructions) to available vehicles as soon as all passengers (operands) are present.
Data Hazards and Forwarding
One of the trickiest parts is handling data hazards. For example, if instruction A writes to register x1, and instruction B reads x1, B must wait until A's result is ready. In a real pipeline, this is often solved with forwarding (bypassing). However, your simulator's issue unit only issues instructions when operands are ready at the end of the previous cycle. This means you must track register writes and reads carefully.
Your simulator should model the register file such that a write in the WB stage is only visible at the end of that cycle. So if an instruction in the EX stage needs a value that is being written in WB in the same cycle, it must stall. This is a common pitfall—make sure your scoreboard logic checks the status of each register.
Memory and Writeback
The Memory (MEM) stage handles load and store instructions. For loads, it reads from data memory; for stores, it writes. The Writeback (WB) stage writes results back to the register file. Both stages have their own queues (Pre-MEM, Post-MEM, etc.). Your simulator must print the contents of all registers, queues, and memory at each cycle, as shown in the sample output.
This is where the cycle-by-cycle simulation becomes critical. Every cycle, you update the state of the pipeline: move instructions from one stage to the next, update the PC, check for hazards, and output the trace.
Trend Connection: AI Inference Pipelines
Interestingly, the concept of pipelining isn't limited to CPUs. In modern AI inference (like running a large language model), requests go through a pipeline: tokenization, embedding, transformer blocks, and output generation. Each stage can be parallelized, and bottlenecks are managed with queues. Understanding your RISC-V pipeline gives you insight into how hardware accelerators for AI work—like NVIDIA's Tensor Cores or Google's TPU.
For example, when you're batching requests for a chatbot, you need to handle data dependencies and structural hazards similar to your simulator. The scoreboard algorithm you implement is a simplified version of what's used in out-of-order processors for both CPUs and GPUs.
Implementation Tips
- Start with a single source file in C, C++, Java, or Python. Keep it modular but in one file to avoid linking issues.
- Define data structures for instructions, queues, and pipeline stages. Use arrays or lists for queues with fixed sizes as per the spec.
- Model the clock cycle as a loop. At the start of each cycle, check conditions based on the end of the previous cycle. Update state at the end of the cycle.
- Handle branches carefully. Remember that branch instructions are not written to the Pre-Issue queue—they are resolved in fetch. If the branch target is computed, update PC immediately; otherwise, stall fetch.
- Test with the sample file first. Your output should match the sample simulation exactly. Then create your own test cases to cover edge cases like multiple branches, data hazards, and structural hazards.
Common Mistakes to Avoid
- Ignoring the end-of-cycle timing. Many students mistakenly update registers in the middle of a cycle. Remember: reads see values from the end of the previous cycle, writes take effect at the end of the current cycle.
- Not handling the Pre-Issue queue correctly. The fetch unit checks for empty slots at the end of the last cycle before fetching. If the queue is full, no new instructions are fetched.
- Forgetting to discard instructions after a branch. When a branch is fetched with its next instruction, the next instruction is discarded immediately. Make sure you don't decode it.
- Misunderstanding the issue unit's search order. It scans from entry 0 to 3, issuing up to three instructions per cycle, but only one per functional unit type.
Sample Input and Output
Your simulator, named Vsim, takes an input file and produces simulation.txt. The sample input might have instructions like:
add x1, x2, x3
lw x4, 0(x1)
beq x4, x0, loop
sub x5, x6, x7Your output should show for each cycle: the PC, fetched instructions, contents of Pre-Issue queue, register file, and memory. The sample output from the assignment is your best friend—match it exactly.
Conclusion
Building a RISC-V pipeline simulator is challenging but rewarding. It gives you a deep understanding of how modern processors work, from the fetch unit to the writeback stage. Use the analogies here—factory assembly lines, ride-sharing dispatch, AI inference pipelines—to keep the big picture in mind. And remember: the key is to follow the spec precisely, especially the timing of reads and writes. Good luck!