ILP 3, H
Pipelining C
Low levels of cache 2
SIMDs and GPUs 4

VIPT vs PIPT

Fetch -> Pre-Decode (branch prediction based on PC via NLP and FTQ) -> Fetch Buffer -> -> Decode -> Register Rename -> ROB + Issue Queue (Dispatch) -> Issued (Polled from the Issue Queue) -> Register Read -> Bypass -> Execute (send to a functional unit)

Questions

How could BOOM support hit-under-miss in IL1-cache?
- Create a "I am looking for that thing" register/queue for the cache that will stall extra misses but will serve the reads

RVC - RISC-V Compressed
Fetch Buffer - FIFO queue that the instructions are fetched into from I1 cache
Fetch Packet - Instructions + metadata (such as valid mask and some bred info)
Fetch Target Queue (FTQ) - Predictions from BPD:
- ROB for branch predictions
Load/Store Unit: (LSU)
- LAQ/LDQ (Load Address Queue) - allocated in Decode, has a "store mask" for S*Q dependencies
- SAQ (Store Address Queue) & SDQ (Store Data Queue) - fired into memory in program order
ROB (Reorder Buffer):
- Tracks the status of every instruction in the pipeline
- Holds speculative data and committed data (for instructions, i.e. register states..)
- Populated at dispatch (from the fetch buffer)
BTB (Branch Target Buffer) - Table of the branch destination and whether it was taken:
- BIM (Bimodal Table) --> the table itself
- RAS (Return address Stack) --> stacktrace (if we are to return?)
- Tags --> First, tag match to find a BTB entry
- If there is no entry for the taken branch/jump, it is allocated in it
- Only updated when NLP made a branch misprediction
NLP (Next-line Predictor):
- Is BTB
- Small capacity, but expensive in terms of area and power
- Very simple -> Cannot learn complex or long history patterns
BPD (Backing Predictor):
- Updated mostly during commit to avoid pollution
- Updated at execute since it relies on global history and needs updating on misspeculation
- Once all instructions in the info packet have committed the packet is updated and sent to BPD for the BPD to be updated with eventual taken/not taken info
GHR (Global History Register):
- Part of the BPD
- Contains outcomes of previous n branches (n is the size of GHR)
- Updated speculatively once the branch is fetched and predicted
- Each Fetch Packet snapshots the GHR in case of misprediction
- A Commit copy of the GHR is maintained by the BPD in case of exceptions
Branch Rename Snapshots:
- Metadata and prediction snapshots that are used to fix the branch predictor after mispredictions
Rename Map Table - holds speculative mappings from ISA registers to physical registers
Committed Map Table - rename map for the committed state (singe-cycle reset)
The Busy Table - Tracks the readiness status of each physical register -> fire the instruction when all operands are ready
The Free List - Tracks which physical registers are free (bit-vector)
Issue Queue - the uops queue that stores uops to be executed on the F-units:
- Unordered
- Age-ordered
- Wake-up
Bypass network - is this something like Tomasulo's architecture?
CSR (Control Status Register)
RVWMO - RISC-V Weak Memory Ordering:
- Newer loads may execute before older stores
- Loads to the same address appear in-order
- Writes can be read early
Cache: (data cache is cache-coherent)
- S0: Send request address
- S1: Access SRAM
- S2: Perform way-select and format response data

Every branch passed down the pipeline remembers its own PC and its Fetch Packet head PC (in ROB)

Book Sections to read

Pipeline

Questions

Dictionary

Facts