Section 1. Handwritten
- 28 (12%)
- 29 (12%)
Section 2. Programming
- Pipelined CPU (74%)
- Report (12%)
Section 5. Supplementary
- Homework introduction video
Section 6. Practice
- There are some handwritten questions for everyone to practice on chp4 about midterm exams.
1 Handwritten
Each 2%.
2 Programming (86%, including: Pipelined CPU and report)
Pipelined CPU
In this section, we are going to implement a pipeline cpu.
The provided instruction memory is as follows:
Signal | I/O | Width | Functionality |
i clk | Input | 1 | Clock signal |
i rst n | Input | 1 | Active low asynchronous reset |
i valid | Input | 1 | Signal that tells pc-address from cpu is ready |
i addr | Input | 64 | 64-bits address from cpu |
o valid | Output | 1 | Valid when instruction is ready |
o inst | Output | 32 | 32-bits instruction to cpu |
And the provided data memory is as follows:
Signal | I/O | Width | Functionality |
i clk | Input | 1 | Clock signal |
i rst n | Input | 1 | Active low asynchronous reset |
i data | Input | 64 | 64-bits data that will be stored |
i w addr | Input | 64 | Write to target 64-bits address |
i r addr | Input | 64 | Read from target 64-bits address |
i MemRead | Input | 1 | One cycle signal and set current mode to reading |
i MemWrite | Input | 1 | One cycle signal and set current mode to writing |
o valid | Output | 1 | One cycle signal telling data is ready (used when ld happens) |
o data | Output | 64 | 64-bits data from data memory (used when ld happens) |
The test environment is as follows:
We will only test the instructions highlighted in the red box, as the figures below
And one more instruction to be implemented is
i inst | Function | Description |
32b11111111111111111111111111111111 | Stop | Stop and set o finish to 1 |
All the environment settings are the same as HW3 except the rule of accessing data_memory.v and instruction_memory.v, and the interface of modules are changed this time. See the supplementary.pdf for more information.
You may want to reference the diagram of pipelined cpu from textbook.
To make sure that pipeline is actually implemented in your design, we are going to use an open source synthesis tool Yosys to check the timing of the critical path in your design. Well also use the FreePDK 45 nm process standard cell library provided here.
You can either build Yosys yourself or use the image provided
docker pull ntuca2020/hw4 # size ~ 1.28G docker run name=test -it ntuca2020/hw4 cd /root ls
Folder structure for this homework:
HW4/
| testcases/
| | generate.s
| generate.cpp
| codes/
| | cpu.v
| | data_memory.v // provided data memory
| instruction_memory.v // provided instruction memory
| testbench.v
| Makefile
| cpu.ys // synthesis command
stdcells.lib // FreePDK 45 nm standard cell library
Specify all the used modules in the cpu.ys file, then run
make // Compile
make test // Test all test cases
make time // Show the timing and area used in your design
Information about your design is shown when running make time:
ABC: WireLoad = none Gates = 13123 ( 14.8 %) Cap = 3.2 ff ( 1.9 %)
Area = 17519.56 ( 87.9 %) Delay = 1091.13 ps ( 5.1 %)
You can optimize the cpu for the 3 workloads (code address range, data address range, etc), but it should not affect other test cases.
Grading:
- Correctness check (10%)
- 10 testcases, each 2% for correctness check
- Required area and frequency (inverse of delay) (32%)
- Area < 25,000 m2, and frequency > 10MHz (5%)
- Area < 25,000 m2, and frequency > 100MHz (5%)
- Area < 25,000 m2, and frequency > 200MHz (5%)
- Area < 25,000 m2, and frequency > 500MHz (5%) Area < 25,000 m2, and frequency > 800MHz (4%)
- Area < 25,000 m2, and frequency > 1000MHz (3%) Area < 25,000 m2, and frequency > 1200MHz (3%)
- Area < 25,000 m2, and frequency > 1500MHz (2%)
- Required time (clock cycle * operating frequency) to finish workloads from last 3 testcases. (32%)
- Workload1 < 100,000 ns (5%)
- Workload2 < 150,000 ns (5%)
- Workload3 < 200,000 ns (5%)
- Workload1 < 10,000 ns (5%)
- Workload2 < 15,000 ns (4%)
- Workload3 < 20,000 ns (3%)
- Workload1 < 5,000 ns, and Workload2 < 20,000 ns, and Workload3 < 15,000 ns (3%)
- Workload1 < 3,500 ns, and Workload2 < 9,000 ns, and Workload3 < 10,000 ns (2%)
Reviews
There are no reviews yet.