1 Programming
The programming part is only for practice, you have no need to hand in this part of this homework.
But if you are interested in this part, it is free to email TAs to have some discussion.
In this homework, we are going to examine the cache effect. The tool well use is rocket-chip. You can either build rocket-chip yourself or use the image provided
docker pull ntuca2020/hw5 # size ~ 8.28G docker run name=test -it ntuca2020/hw5 cd /root ls
Folder structure for this homework:
emulator/ | // link to rocket-chip emulator | |
| benchmarks/ | // link to riscv-tests benchmark | |
| | | Makefile | // complie all benchmarks |
| | | qsort/ | // qsort benchmark folder |
| | | qsort.riscv | // riscv executable |
| | | qsort.riscv.dump | // objdump riscv executable |
| | | mt-matmul/ | // mt-matmul benchmark |
| | | mt-matmul.riscv | // riscv executable |
| | | mt-matmul.riscv.dump | // objdump riscv executable |
| | | mt-matmul_4/ | // for part2 |
| | | matmul.c | < need to be handed in |
| | | mt-matmul_4.riscv | // riscv executable |
| | mt-matmul_4.riscv.dump // objdump riscv executable
| | // other benchmarks
| common
| |
| crt.S // specify number of cores available
| system/ // link to rocket-chip system
| | test.scala // first part SoC settings
| | HW5.scala < used for matrix multiplication and need to be handed in
| *.scala // other default scala settings
| build.sh // build all settings
| test.sh // test all settings
| spike_test.sh // can test on spike first
| Config1 // Configuration1
| generated-src_Config1 // Layout, RTL, mappings, dts, etc, for Config1 |
Makefile // Build the configuration
Part 1: Observing cache behavior
Run test.sh and fill in cycle counts for each benchmark and each setting in the following form
Answer the following questions (answers should be based your observation on the cache configurations and the program behavior)
- Why are (1) the same or different?
- Why are (2) the same or different?
- Why are (3) the same or different?
- Why are (4) the same or different?
- Why are (5) the same or different?
- See the pmp.c in /root/emulator/benchmarks/pmp, what does this program want to do? And how does it make it?
- Change the number of cores available in crt.S file (line 125) in /root/emulator/benchmarks/common and recompile the mt-matmul program (for this question, matrix size is 3232).
- Report the cycle count of configuration17 on 1-core, configuration19 on 2-core, and configuration20 on 4-core (1%)
- Describe whether the cycle count decreases linearly, why or why not.
dhrystone | median | multiply | qsort | rsort | towers | vvadd | |
Configuration 1 | (4) | (3) | (1) | ||||
Configuration 2 | (1) | ||||||
Configuration 3 | (2),(3) | ||||||
Configuration 4 | (2) | ||||||
Configuration 5 | |||||||
Configuration 6 | (4) | ||||||
Configuration 7 | (4) | ||||||
Configuration 8 | |||||||
Configuration 9 | |||||||
Configuration 10 | |||||||
Configuration 11 | |||||||
Configuration 12 | (5) | ||||||
Configuration 13 | (5) |
Tabelle 1: Benchmark on different configurations
Part 2: Cache and matrix multiplication
In this part, we revisit the matrix multiplication. You are asked to implement 6464 matrix multiplication on 4-core, 128-B L1-D$, 128-B L1-I$ (no L2). The size of cache is fixed so that you can only change way-set setting in L1.
Change the dataset in /root/emulator/benchmarks/mt-matmul/mt matmul.c to the one with 6464 (dataset2.h). The cache setting is specified in /root/emulator/system/HW5.scala and you can build the simulator using
make -j8 CONFIG=freechips.rocketchip.system.HW5Config
in /root/emulator.
The matrix multiplication program is located at /root/emulator/benchmarks/mt-matmul/matmul.c. Each thread will enter this function with its thread id and local storage (128KB) and exit once the task is finished. You may want to see the files under mt-matmul/ and common/.
The distribution of the workload and the cache behavior should be considered when you implement matrix multiplication. We will score based on the cycle count coming out from your HW5.scala and matmul.c.
Grading:
- Correctness
- Based on cycle count
- Ranking: Top 5
- Ranking: 620
- Ranking: 2140
- Ranking: 4180
- Ranking: > 80
- Report on how you make your matrix multiplication and maybe some cache miss rate statistics using spike
Architecture and Security (0%)
Although it is important to design a high-performance architecture, it is also crucial to design a secure architecture. Read the Spectre Attacks: Exploiting Speculative Execution (or you may want to reference the original paper here) and answer the questions.
- How to perform exploiting conditional branch misprediction attack?
- How to perform poisoning indirect branches attack?
- How to mitigate Spectre Attacks? (at least 3 methods)
Reviews
There are no reviews yet.