1. ILP 3, H
  2. Pipelining C
  3. Low levels of cache 2
  4. SIMDs and GPUs 4

VIPT vs PIPT

Fetch -> Pre-Decode (branch prediction based on PC via NLP and FTQ) -> Fetch Buffer -> -> Decode -> Register Rename -> ROB + Issue Queue (Dispatch) -> Issued (Polled from the Issue Queue) -> Register Read -> Bypass -> Execute (send to a functional unit)

  1. How could BOOM support hit-under-miss in IL1-cache?
    • Create a "I am looking for that thing" register/queue for the cache that will stall extra misses but will serve the reads

2.

  1. RVC - RISC-V Compressed
  2. Fetch Buffer - FIFO queue that the instructions are fetched into from I1 cache
  3. Fetch Packet - Instructions + metadata (such as valid mask and some bred info)
  4. Fetch Target Queue (FTQ) - Predictions from BPD:
    • ROB for branch predictions
  5. Load/Store Unit: (LSU)
    • LAQ/LDQ (Load Address Queue) - allocated in Decode, has a "store mask" for S*Q dependencies
    • SAQ (Store Address Queue) & SDQ (Store Data Queue) - fired into memory in program order
  6. ROB (Reorder Buffer):
    • Tracks the status of every instruction in the pipeline
    • Holds speculative data and committed data (for instructions, i.e. register states..)
    • Populated at dispatch (from the fetch buffer)
  7. BTB (Branch Target Buffer) - Table of the branch destination and whether it was taken:
    • BIM (Bimodal Table) --> the table itself
    • RAS (Return address Stack) --> stacktrace (if we are to return?)
    • Tags --> First, tag match to find a BTB entry
    • If there is no entry for the taken branch/jump, it is allocated in it
    • Only updated when NLP made a branch misprediction
  8. NLP (Next-line Predictor):
    • Is BTB
    • Small capacity, but expensive in terms of area and power
    • Very simple -> Cannot learn complex or long history patterns
  9. BPD (Backing Predictor):
    • Updated mostly during commit to avoid pollution
    • Updated at execute since it relies on global history and needs updating on misspeculation
    • Once all instructions in the info packet have committed the packet is updated and sent to BPD for the BPD to be updated with eventual taken/not taken info
  10. GHR (Global History Register):
    • Part of the BPD
    • Contains outcomes of previous n branches (n is the size of GHR)
    • Updated speculatively once the branch is fetched and predicted
    • Each Fetch Packet snapshots the GHR in case of misprediction
    • A Commit copy of the GHR is maintained by the BPD in case of exceptions
  11. Branch Rename Snapshots:
    • Metadata and prediction snapshots that are used to fix the branch predictor after mispredictions
  12. Rename Map Table - holds speculative mappings from ISA registers to physical registers
  13. Committed Map Table - rename map for the committed state (singe-cycle reset)
  14. The Busy Table - Tracks the readiness status of each physical register -> fire the instruction when all operands are ready
  15. The Free List - Tracks which physical registers are free (bit-vector)
  16. Issue Queue - the uops queue that stores uops to be executed on the F-units:
    • Unordered
    • Age-ordered
    • Wake-up
  17. Bypass network - is this something like Tomasulo's architecture?
  18. CSR (Control Status Register)
  19. RVWMO - RISC-V Weak Memory Ordering:
    • Newer loads may execute before older stores
    • Loads to the same address appear in-order
    • Writes can be read early
  20. Cache: (data cache is cache-coherent)
    • S0: Send request address
    • S1: Access SRAM
    • S2: Perform way-select and format response data
  1. Every branch passed down the pipeline remembers its own PC and its Fetch Packet head PC (in ROB)