[SOLVED] 代写 data structure algorithm Scheme parallel compiler computer architecture software Computer Architecture

30 $

File Name: 代写_data_structure_algorithm_Scheme_parallel_compiler_computer_architecture_software_Computer_Architecture.zip
File Size: 1026.78 KB

SKU: 1769602951 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Computer Architecture
Course code: 0521292B 12. Prefetching
Jianhua Li
College of Computer and Information Hefei University of Technology
slides are adapted from CA course of wisc, princeton, mit, berkeley, etc.
The uses of the slides of this course are for educa/onal purposes only and should be
used only in conjunc/on with the textbook. Deriva/ves of the slides must
acknowledge the copyright no/ces of this and the originals.

Outline
• Whyprefetch?Whycould/doesitwork? • Thefourquestions
– What (to prefetch), when, where, how • Softwareprefetching
• Hardwareprefetching
• Execution-basedprefetching
• Prefetchingperformance
– Coverage, accuracy, timeliness
– Bandwidth consumption, cache pollution
• Prefetchthrottling

Prefetching
• Idea: Fetch the data before it is needed (i.e. pre-fetch) by the program
• Why?
– Memory latency is high. If we can prefetch accurately and
early enough we can reduce that latency.
– Can eliminate compulsory cache misses
– Can it eliminate all cache misses? Capacity, conflict?
• Involves predicting which address will be needed in the
future
– Works if programs have predictable miss address patterns

Prefetching and Correctness
• Does a misprediction in prefetching affect correctness? • No,prefetcheddataata“mispredicted”addressis
simply not used
• There is no need for state recovery
– In contrast to branch misprediction or value misprediction

Basics
• In modern systems, prefetching is usually done in cache block granularity
• Prefetching is a technique that can reduce both – Miss rate
– Miss latency
• Prefetching can be done by – hardware
– compiler
– programmer

Prefetching: The Four Questions
• What
– What addresses to prefetch
• When
– When to initiate a prefetch request
• Where
– Where to place the prefetched data
• How
– Software, hardware, execution-based, cooperative

Challenges in Prefetching: What
• Whataddressestoprefetch
– Prefetching useless data wastes resources
• Memory bandwidth
• Cache or prefetch buffer space
• Energyconsumption
• These could all be utilized by demand requests or more accurate prefetch requests
– Accurate prediction of addresses to prefetch is important • Prefetch accuracy = used prefetches / sent prefetches
• How do we know what to prefetch
– Predict based on past access patterns
– Use the compiler’s knowledge of data structures
• Prefetching algorithm determines what to prefetch

Challenges in Prefetching: When
• Whentoinitiateaprefetchrequest – Prefetching too early
• Prefetched data might not be used before it is evicted – Prefetching too late
• Might not hide the whole memory latency
• Whenadataitemisprefetchedaffectsthetimelinessof
the prefetcher
• Prefetcher can be made more timely by
– Making it more aggressive: try to stay far ahead of the processor’s access stream (hardware)
– Moving the prefetch instructions earlier in the code (software)

Challenges in Prefetching: Where (I)
• Where to place the prefetched data
– In cache
+ Simple design, no need for separate buffers
— Can evict useful demand datacache pollution
– In a separate prefetch buffer
+ Demand data protected from prefetchesno cache pollution — More complex memory system design
– Where to place the prefetch buffer
– When to access the prefetch buffer (parallel vs. serial with cache) – When to move the data from the prefetch buffer to cache
– How to size the prefetch buffer
– Keeping the prefetch buffer coherent
• Many modern systems place prefetched data into the cache
– Intel Pentium 4, Core2’s, AMD systems, IBM POWER4, 5, 6, …

Challenges in Prefetching: Where (II)
• Which level of cache to prefetch into?
– Memory to L2, memory to L1. Advantages/disadvantages? – L2 to L1? (a separate prefetcher between levels)
• Where to place the prefetched data in the cache?
– Do we treat prefetched blocks the same as demand-fetched
blocks?
– Prefetched blocks are not known to be needed
• With LRU, a demand block is placed into the MRU position
• Do we skew the replacement policy such that it favors
the demand-fetched blocks?
– E.g., place all prefetches into the LRU position in a way?

Challenges in Prefetching: Where (III)
• Where to place the hardware prefetcher in the memory hierarchy?
– In other words, what access patterns does the prefetcher see?
– L1 hits and misses
– L1 misses only
– L2 misses only
• Seeingamorecompleteaccesspattern:
+ Potentially better accuracy and coverage in prefetching
— Prefetcher needs to examine more requests (bandwidth intensive, more ports into the prefetcher?)

Challenges in Prefetching: How
• Softwareprefetching
– ISA provides prefetch instructions
– Programmer or compiler inserts prefetch instructions (effort) – Usually works well only for “regular access patterns”
• Hardwareprefetching
– Hardware monitors processor accesses
– Memorizes or finds patterns/strides
– Generates prefetch addresses automatically
• Execution-basedprefetchers
– A “thread” is executed to prefetch data for the main program
– Can be generated by either software/programmer or hardware

Software Prefetching (I)
• Idea:Compiler/programmerplacesprefetchinstructions into appropriate places in code
Mowry et al., “Design and Evaluation of a Compiler Algorithm for Prefetching,” ASPLOS 1992.
• Prefetchinstructionsprefetchdataintocaches
• Compiler or programmer can insert such instructions into the program

microarchitecture dependent specification
different instructions for different cache levels
X86 PREFETCH Instruction

for (i=0; i

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 data structure algorithm Scheme parallel compiler computer architecture software Computer Architecture
30 $