Computer Architecture
Course code: 0521292B 12. Prefetching
Jianhua Li
College of Computer and Information Hefei University of Technology
slides are adapted from CA course of wisc, princeton, mit, berkeley, etc.
The uses of the slides of this course are for educa/onal purposes only and should be
used only in conjunc/on with the textbook. Deriva/ves of the slides must
acknowledge the copyright no/ces of this and the originals.
Outline
Whyprefetch?Whycould/doesitwork? Thefourquestions
What (to prefetch), when, where, how Softwareprefetching
Hardwareprefetching
Execution-basedprefetching
Prefetchingperformance
Coverage, accuracy, timeliness
Bandwidth consumption, cache pollution
Prefetchthrottling
Prefetching
Idea: Fetch the data before it is needed (i.e. pre-fetch) by the program
Why?
Memory latency is high. If we can prefetch accurately and
early enough we can reduce that latency.
Can eliminate compulsory cache misses
Can it eliminate all cache misses? Capacity, conflict?
Involves predicting which address will be needed in the
future
Works if programs have predictable miss address patterns
Prefetching and Correctness
Does a misprediction in prefetching affect correctness? No,prefetcheddataatamispredictedaddressis
simply not used
There is no need for state recovery
In contrast to branch misprediction or value misprediction
Basics
In modern systems, prefetching is usually done in cache block granularity
Prefetching is a technique that can reduce both Miss rate
Miss latency
Prefetching can be done by hardware
compiler
programmer
Prefetching: The Four Questions
What
What addresses to prefetch
When
When to initiate a prefetch request
Where
Where to place the prefetched data
How
Software, hardware, execution-based, cooperative
Challenges in Prefetching: What
Whataddressestoprefetch
Prefetching useless data wastes resources
Memory bandwidth
Cache or prefetch buffer space
Energyconsumption
These could all be utilized by demand requests or more accurate prefetch requests
Accurate prediction of addresses to prefetch is important Prefetch accuracy = used prefetches / sent prefetches
How do we know what to prefetch
Predict based on past access patterns
Use the compilers knowledge of data structures
Prefetching algorithm determines what to prefetch
Challenges in Prefetching: When
Whentoinitiateaprefetchrequest Prefetching too early
Prefetched data might not be used before it is evicted Prefetching too late
Might not hide the whole memory latency
Whenadataitemisprefetchedaffectsthetimelinessof
the prefetcher
Prefetcher can be made more timely by
Making it more aggressive: try to stay far ahead of the processors access stream (hardware)
Moving the prefetch instructions earlier in the code (software)
Challenges in Prefetching: Where (I)
Where to place the prefetched data
In cache
+ Simple design, no need for separate buffers
Can evict useful demand datacache pollution
In a separate prefetch buffer
+ Demand data protected from prefetchesno cache pollution More complex memory system design
Where to place the prefetch buffer
When to access the prefetch buffer (parallel vs. serial with cache) When to move the data from the prefetch buffer to cache
How to size the prefetch buffer
Keeping the prefetch buffer coherent
Many modern systems place prefetched data into the cache
Intel Pentium 4, Core2s, AMD systems, IBM POWER4, 5, 6,
Challenges in Prefetching: Where (II)
Which level of cache to prefetch into?
Memory to L2, memory to L1. Advantages/disadvantages? L2 to L1? (a separate prefetcher between levels)
Where to place the prefetched data in the cache?
Do we treat prefetched blocks the same as demand-fetched
blocks?
Prefetched blocks are not known to be needed
With LRU, a demand block is placed into the MRU position
Do we skew the replacement policy such that it favors
the demand-fetched blocks?
E.g., place all prefetches into the LRU position in a way?
Challenges in Prefetching: Where (III)
Where to place the hardware prefetcher in the memory hierarchy?
In other words, what access patterns does the prefetcher see?
L1 hits and misses
L1 misses only
L2 misses only
Seeingamorecompleteaccesspattern:
+ Potentially better accuracy and coverage in prefetching
Prefetcher needs to examine more requests (bandwidth intensive, more ports into the prefetcher?)
Challenges in Prefetching: How
Softwareprefetching
ISA provides prefetch instructions
Programmer or compiler inserts prefetch instructions (effort) Usually works well only for regular access patterns
Hardwareprefetching
Hardware monitors processor accesses
Memorizes or finds patterns/strides
Generates prefetch addresses automatically
Execution-basedprefetchers
A thread is executed to prefetch data for the main program
Can be generated by either software/programmer or hardware
Software Prefetching (I)
Idea:Compiler/programmerplacesprefetchinstructions into appropriate places in code
Mowry et al., Design and Evaluation of a Compiler Algorithm for Prefetching, ASPLOS 1992.
Prefetchinstructionsprefetchdataintocaches
Compiler or programmer can insert such instructions into the program
microarchitecture dependent specification
different instructions for different cache levels
X86 PREFETCH Instruction
for (i=0; i
Reviews
There are no reviews yet.