1 Matrix Multiplication
In the lecture and discussion, we discussed two approaches to compute matrix multiplication (C = AB) using CUDA: (1) unoptimized implementation using global memory only and (2) block matrix multiplication using shared memory.
In this assignment, your task is implementing 1024 1024 matrix multiplication using these two approaches.
- Approach 1 (unoptimized implementation using global memory only):
- Name this program as p1.cu
- The value of each element of A is 1
- The value of each element of B is 2
- Thread block configuration: 16 16
- Grid configuration: 64 64
- After computation, print the value of C[451][451]
- Approach 2 (block matrix multiplication using shared memory):
- Name this program as p2.cu
- The value of each element of A is 1
- The value of each element of B is 2
- Thread block configuration: 32 32
- Grid configuration: 32 32
- More details of this algorithm can be found in the paper Matrix Multiplication with CUDA under the Readings category of blackboard.
- After computation, print the value of C[451][451]
- Report: measure the execution time of the kernel of Approach 1 and Approach 2, respectively. Briefly discuss your observations.
2

![[Solved] EE451 Homework5- Matrix Multiplication](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] EE451 Homework2-Example Program](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.