-
ILP 3, H
-
Pipelining C
-
Low levels of cache 2
-
SIMDs and GPUs 4
VIPT vs PIPT
Fetch -> Pre-Decode (branch prediction based on PC via NLP and FTQ) -> Fetch Buffer ->
-> Decode -> Register Rename -> ROB + Issue Queue (Dispatch) -> Issued (Polled from the Issue Queue)
-> Register Read -> Bypass -> Execute (send to a functional unit)
-
How could BOOM support hit-under-miss in IL1-cache?
-
Create a "I am looking for that thing" register/queue for the cache that will stall extra
misses but will serve the reads
2.
-
RVC - RISC-V Compressed
-
Fetch Buffer - FIFO queue that the instructions are fetched into from I1 cache
-
Fetch Packet - Instructions + metadata (such as valid mask and some bred info)
-
Fetch Target Queue (FTQ) - Predictions from BPD:
-
ROB for branch predictions
-
Load/Store Unit: (LSU)
-
LAQ/LDQ (Load Address Queue) - allocated in Decode, has a "store mask" for S*Q dependencies
-
SAQ (Store Address Queue) & SDQ (Store Data Queue) - fired into memory in program order
-
ROB (Reorder Buffer):
-
Tracks the status of every instruction in the pipeline
-
Holds speculative data and committed data (for instructions, i.e. register states..)
-
Populated at dispatch (from the fetch buffer)
-
BTB (Branch Target Buffer) - Table of the branch destination and whether it was taken:
-
BIM (Bimodal Table) --> the table itself
-
RAS (Return address Stack) --> stacktrace (if we are to return?)
-
Tags --> First, tag match to find a BTB entry
-
If there is no entry for the taken branch/jump, it is allocated in it
-
Only updated when NLP made a branch misprediction
-
NLP (Next-line Predictor):
-
Is BTB
-
Small capacity, but expensive in terms of area and power
-
Very simple -> Cannot learn complex or long history patterns
-
BPD (Backing Predictor):
-
Updated mostly during commit to avoid pollution
-
Updated at execute since it relies on global history and needs updating on misspeculation
-
Once all instructions in the info packet have committed the packet is updated and sent to BPD
for the BPD to be updated with eventual taken/not taken info
-
GHR (Global History Register):
-
Part of the BPD
-
Contains outcomes of previous n branches (n is the size of GHR)
-
Updated speculatively once the branch is fetched and predicted
-
Each Fetch Packet snapshots the GHR in case of misprediction
-
A Commit copy of the GHR is maintained by the BPD in case of exceptions
-
Branch Rename Snapshots:
-
Metadata and prediction snapshots that are used to fix the branch predictor after
mispredictions
-
Rename Map Table - holds speculative mappings from ISA registers to physical registers
-
Committed Map Table - rename map for the committed state (singe-cycle reset)
-
The Busy Table - Tracks the readiness status of each physical register -> fire the instruction
when all operands are ready
-
The Free List - Tracks which physical registers are free (bit-vector)
-
Issue Queue - the uops queue that stores uops to be executed on the F-units:
-
Unordered
-
Age-ordered
-
Wake-up
-
Bypass network - is this something like Tomasulo's architecture?
-
CSR (Control Status Register)
-
RVWMO - RISC-V Weak Memory Ordering:
-
Newer loads may execute before older stores
-
Loads to the same address appear in-order
-
Writes can be read early
-
Cache: (data cache is cache-coherent)
-
S0: Send request address
-
S1: Access SRAM
-
S2: Perform way-select and format response data
-
Every branch passed down the pipeline remembers its own PC and its Fetch Packet head PC (in ROB)