[SOLVED] C algorithm html parallel concurrency computer architecture security Computer Architecture

$25

File Name: C_algorithm_html_parallel_concurrency_computer_architecture_security_Computer_Architecture.zip
File Size: 847.8 KB

5/5 - (1 vote)

Computer Architecture
Course code: 0521292B 13. Main Memory
Jianhua Li
College of Computer and Information Hefei University of Technology
slides are adapted from CA course of wisc, princeton, mit, berkeley, etc.
The uses of the slides of this course are for educaonal purposes only and should be
used only in conjuncon with the textbook. Derivaves of the slides must
acknowledge the copyright noces of this and the originals.

The Main Memory System
Processor and caches
Main Memory
Storage SSDHDD
Mainmemoryisacriticalcomponentofallcomputingsystems: server, mobile, embedded, desktop, sensor
Mainmemorysystemmustscaleinsize,technology,efficiency, cost, and management algorithms to match the growing demands of bandwidths
2

Memory System: A Shared Resource View
Storage
3

State of the Main Memory System
Recenttechnology,architecture,andapplicationtrendsleadtonewrequirements
exacerbateoldrequirements
DRAM and memory controllers, as we know them today, are will be unlikely to satisfy all requirements
Someemergingnonvolatilememorytechnologiese.g.,PCM enable new opportunities: memorystorage merging
We need to rethinkreinvent the main memory systemto fix DRAM issues and enable emerging technologies
tosatisfyallrequirements
4

Major Trends Affecting Main Memory
Needformainmemorycapacity,bandwidth,QoSincreasing
Main memory energypower is a key system design concern
DRAMtechnologyscalingisending
5

Demand for Memory Capacity
More coresMore concurrencyLarger working set
AMD Barcelona: 4 cores IBM Power7: 8 cores Intel SCC: 48 cores
Modern applications are increasingly dataintensive
Many applicationsvirtual machines will share main memoryCloud computingservers: Consolidation to improve efficiencyGPGPUs: Many threads from multiple parallel applications
Mobile: Interactivenoninteractive consolidation
6

Example: The Memory Capacity Gap
Core count doublingevery 2 years
DRAM DIMM capacity doublingevery 3 years
Memory capacity per core expected to drop by 30 every two years
Trends worse for memory bandwidth per core !
7

Major Trends Affecting Main Memory
Needformainmemorycapacity,bandwidth,QoSincreasingMulticore:increasingnumberofcores
Dataintensiveapplications:increasingdemandrfordata
Consolidation:Cloudcomputing,GPUs,mobile,heterogeneity
Main memory energypower is a key system design concern
DRAMtechnologyscalingisending
8

Major Trends Affecting Main Memory
Needformainmemorycapacity,bandwidth,QoSincreasingMulticore:increasingnumberofcores
Dataintensiveapplications:increasingdemandfordata
Consolidation:Cloudcomputing,GPUs,mobile,heterogeneity
Main memory energypower is a key system design concern
IBM servers: 50 energy spent in offchip memory hierarchy Lefurgy,
IEEE Computer 2003
DRAM consumes power when idle and needs periodic refresh
DRAM technology scaling is ending
9

Major Trends Affecting Main Memory
Needformainmemorycapacity,bandwidth,QoSincreasingMulticore:increasingnumberofcores
Dataintensiveapplications:increasingdemandfordata
Consolidation:Cloudcomputing,GPUs,mobile,heterogeneity
Main memory energypower is a key system design concern
IBM servers: 50 energy spent in offchip memory hierarchy Lefurgy,
IEEE Computer 2003
DRAM consumes power when idle and needs periodic refresh
DRAMtechnologyscalingisending
ITRS projects DRAM will not scale easily below X nm
Scaling has provided many benefits:
higher capacity, higher density, lower cost, lower energy
10

The DRAM Scaling Problem
DRAM stores charge in a capacitor chargebased memory
Capacitor must be large enough for reliable sensing
Access transistor should be large enough for low leakage and high retention time
Scaling beyond 4035nm 2013 is challenging ITRS, 2009
DRAM capacity, cost, and energypower hard to scale
11

Evidence of the DRAM Scaling Problem Wordline
VLHOIGWH
Row of Cells
Row
Victim Row
Row OCpleonsed
Aggressor Row
Row
Victim Row
Row
Repeatedly opening and closing a row enough times within a refresh interval induces disturbance errors in adjacent rows in most real DRAM chips you can buy today
Kim, Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors, ISCA 2014.
12

Most DRAM Modules Are At Risk
A company 86
3743
Up to
1.0107 errors
B company 83
4554
Up to
2.7106 errors
C company 88
2832
Up to
3.3105 errors
Kim, Flipping Bits in Memory Without Accessing Them: An Experimental Study of
DRAM Disturbance Errors, ISCA 2014. 13

x86 CPU
loop:
mov X, eax mov Y, ebx clflush X clflush Y mfence
jmp loop
DRAM Module
X Y

x86 CPU
loop:
mov X, eax mov Y, ebx clflush X clflush Y mfence
jmp loop
DRAM Module
X Y

x86 CPU
loop:
mov X, eax mov Y, ebx clflush X clflush Y mfence
jmp loop
DRAM Module
X Y

x86 CPU
loop:
mov X, eax mov Y, ebx clflush X clflush Y mfence
jmp loop
DRAM Module
X Y

Observed Errors in Real Systems
CPU Architecture
Intel Haswell 2013
Intel Ivy Bridge 2012 Intel Sandy Bridge 2011 AMD Piledriver 2012
Errors
22.9K 20.7K 16.1K 59
AccessRate
12.3Msec 11.7Msec 11.6Msec 6.1Msec
A real reliabilitysecurity issue
In a more controlled environment, we can
induce as many as ten million disturbance errors
Kim, Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors, ISCA 2014.
18

Security Implications
http:googleprojectzero.blogspot.com2015 03exploitingdramrowhammerbugto gain.html
19

Main Memory in the System
CORE 0
CORE 1
DRAM MEMORY CONTROLLER
CORE 2
CORE 3
20
DRAM BANKS DRAM INTERFACE
L2 CACHE 1 L2 CACHE 3 L2 CACHE 0 L2 CACHE 2
SHARED L3 CACHE

Memory Bank Organization
Read access sequence: 1. Decode row address
drive wordlines
2. Selected bits drive bit lines
Entire row read
3. Amplify row data
4. Decode column addressselect subset of row
Send to output
5. Precharge bitlinesFor next access
21

DRAM vs. SRAM
DRAM
Slower access capacitor
Higher density 1T 1C cell
Lower cost
Requires refresh power, performance, circuitry
Manufacturing requires putting capacitor and logic together
SRAM
Faster access no capacitor
Lower density 6T cell
Higher cost
No need for refresh
Manufacturing compatible with logic process no capacitor
22

DRAM Subsystem Organization
ChannelDIMM
Rank
Chip
Bank
RowColumnCell
23

Page Mode DRAM
A DRAM bank is a 2D array of cells: rows x columns
A DRAM row is also called a DRAM page
Sense amplifiers also called row buffer
Each address is a row,column pair
Access to a closed row
Activate command opens row placed into row buffer
Readwrite command readswrites column in the row buffer
Precharge command closes the row and prepares the bank for next access
Access to an open row
No need for an activate command
24

DRAM Bank Operation
Access Address: Row 0, Column 0 Row 0, Column 1 Row 0, Column 85 Row 1, Column 0
Row address 01
Columns
RERmowpt0y1
Row Buffer CHOITNFLICT !
Column address 0815
Column mux Data
25
Row decoder
Rows

The DRAM Chip
Consists of multiple banks 8 is a common number todayBanks share commandaddressdata buses
The chip itself has a narrow interface 416 bits per read
Changing the number of banks, size of the interface pins, whether or not commandaddressdata buses are shared has significant impact on DRAM system cost
26

128M x 8bit DRAM Chip
27

DRAM Rank and Module
Rank: Multiple chips operated together to form a wide interface
All chips comprising a rank are controlled at the same time
Respond to a single command
Share address and command buses, but provide different data
A DRAM module consists of one or more ranksE.g., DIMM dual inline memory module
This is what you plug into your motherboard
If we have chips with 8bit interface, to read 8 bytes in a single access, use 8 chips in a DIMM
28

A 64bit Wide DIMM One Rank
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
Command Data
29

A 64bit Wide DIMM One Rank
Advantages:
Actslikeahigh capacity DRAM chip with a wide interface
Flexibility: memory controller does not need to deal with individual chips
Disadvantages:
Granularity: Accesses cannot be smaller than the interface width
30

DRAM Channels
2 Independent Channels: 2 Memory Controllers Above
2 DependentLockstep Channels: 1 Memory Controller with wide interface Not Shown
31

Generalized Memory Structure
32

Generalized Memory Structure
33

The DRAM subsystem
Channel
DIMM Dual inline memory module
Memory channel
Memory channel
Processor

Breaking down a DIMM
DIMM Dual inline memory module
Front of DIMM
Back of DIMM
Side view

Breaking down a DIMM
DIMM Dual inline memory module
Side view
Front of DIMM
Back of DIMM
Rank 0: collection of 8 chips
Rank 1

Rank
Rank 0 Front
0:63
Rank 1 Back
0:63
AddrCmd CS 0:1
Memory channel
Data 0:63

Breaking down a Rank
Rank 0
0:63
. ..
Data 0:63
0:7
Chip 0
8:15
Chip 1
56:63
Chip 7

Breaking down a Chip
B
ank 0
0:7 0:7
0:7

0:7
0:7
Chip 0

Breaking down a Bank
1B column
2kB
Rowbuffer
row 16k1
Bank 0
row 0
1B1B 1B

0:7
0:7

DRAM Subsystem Organization
ChannelDIMMRank
Chip
Bank
RowColumnCell
41

0xFFFFF
Channel 0
Example: Transferring a cache block
Physical memory space
DIMM 0
0x40
0x00
64B cache block
Rank 0

0xFFFFF
Example: Transferring a cache block
Physical memory space
Chip 0
Chip 1
Rank 0
Chip 7

0x40
0x00
64B cache block
Data 0:63
0:7
8:15
56:63

Example: Transferring a cache block
Physical memory space
Chip 0
Chip 1
Rank 0
Chip 7
0xFFFFF

Row 0 Col 0
0x40
0x00
64B cache block
Data 0:63
0:7
8:15
56:63

Example: Transferring a cache block
Physical memory space
Chip 0
Chip 1
Rank 0
Chip 7
0xFFFFF
8B

Row 0 Col 0
0x40
0x00
64B cache block
Data 0:63
8B
0:7
8:15
56:63

Example: Transferring a cache block
Physical memory space
Chip 0
Chip 1
Rank 0
Chip 7
0xFFFFF
8B

Row 0 Col 1
0x40
0x00
64B cache block
Data 0:63
0:7
8:15
56:63

Example: Transferring a cache block
Physical memory space
Chip 0
Chip 1
Rank 0
Chip 7
0xFFFFF
8B
8B

Row 0 Col 1
0x40
0x00
64B cache block
Data 0:63
8B
0:7
8:15
56:63

Example: Transferring a cache block
Physical memory space
Chip 0
Chip 1
Rank 0
Chip 7
0xFFFFF
8B
8B

Row 0 Col 1
0x40
0x00
64B cache block
Data 0:63
A 64B cache block takes 8 IO cycles to transfer. During the process, 8 columns are read sequentially.
0:7
8:15
56:63

Next Topic Virtual Memory

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] C algorithm html parallel concurrency computer architecture security Computer Architecture
$25