Computer Organization and Design
I gained a solid understanding of the underlying design and organization of a CPU architecture. Some topics that were covered are instruction set architectures, cache memory design, pipelining, superscalar processors, and multiprocessor designs.
Pipelined Microprocessor
The main project of this course is designing a single core pipelined microprocessor that implements the 32-bit RISC-V instruction set. In addition to designing the microprocessor, we also designed two levels of cache memory for instruction retrieval and storage for faster processing. SystemVerilog was used in creating this project along with some proficiency of the RISC-V instruction set.
Pipelining
The main objective of the project was designing a basic five stage pipelined microprocessor. At its core, pipelining allows for instruction level parallelism by utilising multiple hardware resources concurrently, which greatly increases CPU throughput. However, pipelining may increase latency due to the overhead incurred from the pipeline process.
The five stages of the pipeline are as follows:
- Instruction fetch (IF)
- Instruction Decode (DE)
- Execute (EX)
- Memory (ME)
- Writeback (WB)
As the name implies, the IF stage fetches 32-bit instructions from memory. To do this, a program counter register is required to provide the memory address of the instruction to be fetched.
Normally, instructions are in adjacent memory addresses, which makes for convenient fetching if the instructions are present in the instruction cache. However, if the instruction is not in the cache, this stage will stall until the instructions in fetched into the instruction cache.
The structure of instruction word details the operation and the operand(s) of the instruction. The DE stage essentially "decodes" the instruction word into separate useful information for later stages. This stage also sets the control bits that are used in the later stages.
Since RISC-V is a load-store ISA, the register file for all general purpose registers are situated here so the register values may be passed as operands.
This stages is where the computation happens. The ALU resides in this stage and thus all arithmetic and logical operations are completed in this stage.
The memory stage is required for instructions that require R/W operation into memory. Such instructions include st and ld.
Like the IF stage, this stage interacts with the L1 cache. Therefore, if the R/W operation are on memory addresses that are currently not in the cache, this stage will have to stall the pipeline to wait for the cache.
In this stage, the results of any operation that involves a destination register is written back into the destination register. Some examples of such instructions are all arithmetic and logical instructions and the load instruction.
Hazards
Many data hazards are present in a pipelined microprocessor, particularly when multiple stages require reading or writing into the same register or memory address. This occurs because the final register values are not changed until the writeback stage.
- Read after Write (RAW)
- Write after Read (WAR)
- Write after Write (WAW)
A RAW hazard occurs whenever a register write instruction precedes a register read instruction inside the pipeline. Because the register may be modified by the write, the read operation would potentially be reading a so-called "dirty" value.
This occurs when multiple one instruction writes into a register after another. This hazard occurs with concurrent execution.
The first technique to resolving such conflict is to stall the pipeline. This way, later instructions can wait for the earlier instruction to write to the register before retrieving the register value.
Another technique to resolve such conflict is to forward intermediate register values between stages. This technique increases throughput by avoiding CPU idleness.
Branch Prediction
The use of branching is essential in most assembly languages. It allows for the use of conditionals and therefore creating a control flow of the program.
However, a branch direction cannot be determined until the EX stage, by which time two instructions will have been fed into the IF and DE stage. If the direction is opposite of what the pipeline chose, then essentially two cycles have been wasted.
Branch predictors try to mitigate this problem by guessing the direction of a branch before it is resolved.
More on branch predictors here.
Memory Structure
Alongside our microprocessor are a hierarchy of memory. The hierarchy includes two 2-way set-associative L1 cache for instruction and data storage, a 4-way set-associative L2 cache, and an eviction write buffer between the L2 cache and the main memory.
More on the memory systems here.