[Solved] Lab 6 CMPUT 398

$25

File Name: Lab_6_CMPUT_398.zip
File Size: 141.3 KB

SKU: [Solved] Lab 6 CMPUT 398 Category: Tag:
5/5 - (1 vote)

Objective

The labs objective is to implement a simple image convolution and tiled image convolution using both shared and constant memory. We will have a constant 55 convolution mask, but will have arbitrarily sized image (assume the image dimensions are greater than 55 for this Lab).

For the simple image convolution, you shouldnt use any constant or shared memory. Keep it simple. It should look very similar to the pseudo code shown below. However, for the tiled image convolution you need to use both shared and constant memory.

To use the constant memory for the convolution mask, you can first transfer the mask data to the device. Consider the case where the pointer to the device array for the mask is named M. You can use const float * __restrict__ M as one of the parameters during your kernel launch. This informs the compiler that the contents of the mask array are constants and will only be accessed through pointer variable M. This will enable the compiler to place the data into constant memory and allow the SM hardware to aggressively cache the mask data at runtime.

Convolution is used in many fields, such as image processing for image filtering. A standard image convolution formula for a 55 convolution filter M with an Image I is:

Pi, j,cM x ,y

x=2 y=2

where Pi, j,c Pi, j,c is the output pixel at position i,j in channel c, Ii, j,c is the input pixel at i,j in channel c (the number of channels will always be 3 for this MP corresponding to the RGB values), and Mx , y is the mask at position x,y.

This lab will be submitted as one zipped file through eclass. Details for submission are at the end of the lab.

Input Data

The input is an interleaved image of height x width x channels. By interleaved, we mean that the the element I[y][x] contains three values representing the RGB channels. This means that to index a particular elements value, you will have to do something like:

index = (yIndex*width + xIndex)*channels + channelIndex;

For this assignment, the channel index is 0 for R, 1 for G, and 2 for B. So, to access the G value of I[y][x], you should use the linearized expression I[(yIndex*width+xIndex)*channels + 1].

For simplicity, you can assume that channels is always set to 3.

Instructions

Edit the code where the TODOs are specified and perform the following:

  • Allocate device memory
  • Copy host memory to device
  • Initialize thread block and kernel grid dimensions
  • Invoke CUDA kernel
  • Copy results from device to host
  • Deallocate device memory
  • Implement the simple 2D convolution kernel with adjustments for channel
  • Implement the tiled 2D convolution kernel with adjustments for channels
  • Use shared memory to reduce the number of global accesses, handle the boundary conditions in when loading input list elements into the shared

memory

Pseudo Code

A sequential pseudo code would look something like this:

maskWidth := 5

maskRadius := maskWidth/2 # this is integer division, so the result is 2 for i from 0 to height do for j from 0 to width do for k from 0 to channels accum := 0

for y from -maskRadius to maskRadius do

for x from -maskRadius to maskRadius do xOffset := j + x yOffset := i + y

if xOffset >= 0 && xOffset < width && yOffset >= 0 && yOffset < height then

imagePixel := I[(yOffset * width + xOffset) * channels + k] maskValue := K[(y+maskRadius)*maskWidth+x+maskRadius] accum += imagePixel * maskValue end end end

# pixels are in the range of 0 to 1

P[(i * width + j)*channels + k] = clamp(accum, 0, 1) end end end

where clamp is defined as

def clamp(x, lower, upper)

return min(max(x, lower), upper) end

Local Setup Instructions

Steps:

  1. Download Lab6.zip.
  2. Unzip the file.
  3. Open the Visual Studios Solution in Visual Studios 2013.
  4. Build the project. Note the project has two configurations.
    1. Debug
    2. Submission

But make sure you have the Submission configuration selected when you finally submit.

  1. Run the program by pressing the following button:

Make sure the Debug configuration is selected. Running the program in Visual Studios will run one of the tests located in Dataset/Test.

Testing

To run all tests located in Dataset/Test, first build the project with the Submission configuration selected. Make sure you see the Submission folder and the folder contains the executable.

To run the tests, click on Testing_Script.bat. This will take a couple of seconds to run and the terminal should close when finished. The output is saved in Marks.js, but to view the calculated grade open Grade.html in a browser. If you make changes and rerun the tests, then make sure you reload Grade.html. You can double check with the timestamp at the top of the page.

You will be given two executable files for reference on simple 2D convolution and tiled convolution (in the director References). Do NOT submit these files. On each version, your code run time on the Submission configuration on Test 7 should be at most 3 times slower than the reference file of that version to get mark.

Write a report with screenshots show the run time of your code compare to the reference file of each version. If you dont submit this report, you will lose 20% of you mark.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Lab 6 CMPUT 398
$25