[Solved] Lab Assignment L4: Code Optimization

$25

File Name: Lab_Assignment_L4:_Code_Optimization.zip
File Size: 339.12 KB

SKU: [Solved] Lab Assignment L4: Code Optimization Category: Tag:
5/5 - (1 vote)

1 IntroductionThis assignment deals with optimizing memory intensive code. Image processing offers many examples offunctions that can benefit from optimization. In this lab, we will consider two image processing operations:rotate, which rotates an image counter-clockwise by 90, and smooth, which smooths or blurs animage.For this lab, we will consider an image to be represented as a two-dimensional matrix M, where Mi,jdenotes the value of (i, j)th pixel of M. Pixel values are triples of red, green, and blue (RGB) values. Wewill only consider square images. Let N denote the number of rows (or columns) of an image. Rows andcolumns are numbered, in C-style, from 0 to N 1.Given this representation, the rotate operation can be implemented quite simply as the combination ofthe following two matrix operations: Transpose: For each (i, j) pair, Mi,j and Mj,i are interchanged. Exchange rows: Row i is exchanged with row N 1 i.This combination is illustrated in Figure 1.The smooth operation is implemented by replacing every pixel value with the average of all the pixelsaround it (in a maximum of 3 3 window centered at that pixel). Consider Figure 2. The values of pixelsM2[1][1] and M2[N-1][N-1] are given below:M2[1][1] =P2i=0P2j=0 M1[i][j]9M2[N 1][N 1] =PN1i=N2PN1j=N2 M1[i][j]41Rotate by 90(counterclockwise)TransposeExchangeRowsjiijij(0,0)(0,0)(0,0)Figure 1: Rotation of an image by 90 counterclockwisesmoothM1[1][1]M1[N1][N1]M2[1][1]M2[N1][N1]Figure 2: Smoothing an image

2 LogisticsYou may work in a group of up to two people in solving the problems for this assignment. The only handinwill be electronic. Any clarifications and revisions to the assignment will be posted on the course Webpage.3 Hand Out InstructionsSITE-SPECIFIC: Insert a paragraph here that explains how the instructor will hand outthe perflab-handout.tar file to the students.Start by copying perflab-handout.tar to a protected directory in which you plan to do your work.Then give the command: tar xvf perflab-handout.tar. This will cause a number of files to beunpacked into the directory. The only file you will be modifying and handing in is kernels.c. Thedriver.c program is a driver program that allows you to evaluate the performance of your solutions. Usethe command make driver to generate the driver code and run it with the command ./driver.Looking at the file kernels.c youll notice a C structure team into which you should insert the requestedidentifying information about the one or two individuals comprising your programming team. Do this rightaway so you dont forget.4 Implementation OverviewData StructuresThe core data structure deals with image representation. A pixel is a struct as shown below:typedef struct {unsigned short red; /* R value */unsigned short green; /* G value */unsigned short blue; /* B value */} pixel;As can be seen, RGB values have 16-bit representations (16-bit color). An image I is represented as a onedimensionalarray of pixels, where the (i, j)th pixel is I[RIDX(i,j,n)]. Here n is the dimension of the imagematrix, and RIDX is a macro defined as follows:#define RIDX(i,j,n) ((i)*(n)+(j))See the file defs.h for this code.RotateThe following C function computes the result of rotating the source image src by 90 and stores the result in destinationimage dst. dim is the dimension of the image.3void naive_rotate(int dim, pixel *src, pixel *dst) {int i, j;for(i=0; i < dim; i++)for(j=0; j < dim; j++)dst[RIDX(dim-1-j,i,dim)] = src[RIDX(i,j,dim)];return;}The above code scans the rows of the source image matrix, copying to the columns of the destination image matrix.Your task is to rewrite this code to make it run as fast as possible using techniques like code motion, loop unrollingand blocking.See the file kernels.c for this code.SmoothThe smoothing function takes as input a source image src and returns the smoothed result in the destination imagedst. Here is part of an implementation:void naive_smooth(int dim, pixel *src, pixel *dst) {int i, j;for(i=0; i < dim; i++)for(j=0; j < dim; j++)dst[RIDX(i,j,dim)] = avg(dim, i, j, src); /* Smooth the (i,j)th pixel */return;}The function avg returns the average of all the pixels around the (i,j)th pixel. Your task is to optimize smooth(and avg) to run as fast as possible. (Note: The function avg is a local function and you can get rid of it altogether toimplement smooth in some other way.)This code (and an implementation of avg) is in the file kernels.c.Performance measuresOur main performancemeasure is CPE or Cycles per Element. If a function takes C cycles to run for an image of sizeN N, the CPE value is C/N2. Table 1 summarizes the performance of the naive implementations shown aboveand compares it against an optimized implementation. Performance is shown for for 5 different values of N. Allmeasurements were made on the Pentium III Xeon Fish machines.The ratios (speedups) of the optimized implementation over the naive one will constitute a score of your implementation.To summarize the overall effect over different values of N, we will compute the geometric mean of the resultsfor these 5 values. That is, if the measured speedups for N = {32, 64, 128, 256, 512} are R32, R64, R128, R256, andR512 then we compute the overall performance asR = 5pR32 R64 R128 R256 R5124Test case 1 2 3 4 5Method N 64 128 256 512 1024 Geom. MeanNaive rotate (CPE) 14.7 40.1 46.4 65.9 94.5Optimized rotate (CPE) 8.0 8.6 14.8 22.1 25.3Speedup (naive/opt) 1.8 4.7 3.1 3.0 3.7 3.1Method N 32 64 128 256 512 Geom. MeanNaive smooth (CPE) 695 698 702 717 722Optimized smooth (CPE) 41.5 41.6 41.2 53.5 56.4Speedup (naive/opt) 16.8 16.8 17.0 13.4 12.8 15.2Table 1: CPEs and Ratios for Optimized vs. Naive ImplementationsAssumptionsTo make life easier, you can assume that N is a multiple of 32. Your code must run correctly for all such values of N,but we will measure its performance only for the 5 values shown in Table 1.5 InfrastructureWe have provided support code to help you test the correctness of your implementations and measure their performance.This section describes how to use this infrastructure. The exact details of each part of the assignment isdescribed in the following section.Note: The only source file you will be modifying is kernels.c.VersioningYou will be writing many versions of the rotate and smooth routines. To help you compare the performance ofall the different versions youve written, we provide a way of registering functions.For example, the file kernels.c that we have provided you contains the following function:void register_rotate_functions() {add_rotate_function(&rotate, rotate_descr);}This function contains one or more calls to add rotate function. In the above example,add rotate function registers the function rotate along with a string rotate descr which is an ASCIIdescription of what the function does. See the file kernels.c to see how to create the string descriptions. Thisstring can be at most 256 characters long.A similar function for your smooth kernels is provided in the file kernels.c.5DriverThe source code you will write will be linked with object code that we supply into a driver binary. To create thisbinary, you will need to execute the commandunix> make driverYou will need to re-make driver each time you change the code in kernels.c. To test your implementations, youcan then run the command:unix> ./driverThe driver can be run in four different modes: Default mode, in which all versions of your implementation are run. Autograder mode, in which only the rotate() and smooth() functions are run. This is the mode we willrun in when we use the driver to grade your handin. File mode, in which only versions that are mentioned in an input file are run. Dump mode, in which a one-line description of each version is dumped to a text file. You can then edit this textfile to keep only those versions that youd like to test using the file mode. You can specify whether to quit afterdumping the file or if your implementations are to be run.If run without any arguments, driver will run all of your versions (default mode). Other modes and options can bespecified by command-line arguments to driver, as listed below:-g : Run only rotate() and smooth() functions (autograder mode).-f <funcfile> : Execute only those versions specified in <funcfile> (file mode).-d <dumpfile> : Dump the names of all versions to a dump file called <dumpfile>, one line to a version(dump mode).-q : Quit after dumping version names to a dump file. To be used in tandem with -d. For example, to quitimmediately after printing the dump file, type ./driver -qd dumpfile.-h : Print the command line usage.Team InformationImportant: Before you start, you should fill in the struct in kernels.c with information about your team (groupname, team member names and email addresses). This information is just like the one for the Data Lab.6 Assignment DetailsOptimizing Rotate (50 points)In this part, you will optimize rotate to achieve as low a CPE as possible. You should compile driver and thenrun it with the appropriate arguments to test your implementations.For example, running driver with the supplied naive version (for rotate) generates the output shown below:6unix> ./driverTeamname: bovikMember 1: Harry Q. BovikEmail 1: [email protected]Rotate: Version = naive_rotate: Naive baseline implementation:Dim 64 128 256 512 1024 MeanYour CPEs 14.6 40.9 46.8 63.5 90.9Baseline CPEs 14.7 40.1 46.4 65.9 94.5Speedup 1.0 1.0 1.0 1.0 1.0 1.0Optimizing Smooth (50 points)In this part, you will optimize smooth to achieve as low a CPE as possible.For example, running driver with the supplied naive version (for smooth) generates the output shown below:unix> ./driverSmooth: Version = naive_smooth: Naive baseline implementation:Dim 32 64 128 256 512 MeanYour CPEs 695.8 698.5 703.8 720.3 722.7Baseline CPEs 695.0 698.0 702.0 717.0 722.0Speedup 1.0 1.0 1.0 1.0 1.0 1.0Some advice. Look at the assembly code generated for the rotate and smooth. Focus on optimizing the innerloop (the code that gets repeatedly executed in a loop) using the optimization tricks covered in class. The smooth ismore compute-intensive and less memory-sensitive than the rotate function, so the optimizations are of somewhatdifferent flavors.Coding RulesYou may write any code you want, as long as it satisfies the following: It must be in ANSI C. You may not use any embedded assembly language statements. It must not interfere with the time measurement mechanism. You will also be penalized if your code prints anyextraneous information.You can only modify code in kernels.c. You are allowed to define macros, additional global variables, and otherprocedures in these files.EvaluationYour solutions for rotate and smooth will each count for 50% of your grade. The score for each will be based onthe following: Correctness: You will get NO CREDIT for buggy code that causes the driver to complain! This includes codethat correctly operates on the test sizes, but incorrectly on image matrices of other sizes. As mentioned earlier,you may assume that the image dimension is a multiple of 32.7 CPE: You will get full credit for your implementations of rotate and smooth if they are correct and achievemean CPEs above thresholds Sr and Ss respectively. You will get partial credit for a correct implementationthat does better than the supplied naive one.SITE-SPECIFIC: As the instructor, you will need to decide on your full credit threshholds Sr andSs and your rules for partial credits. We typically use a linear scale, with about a 40% minimumif students actually tried to solve the lab.7 Hand In InstructionsSITE-SPECIFIC: Insert a paragraph here that tells each team how to hand in their kernels.cfile. For example, here are the handin instructions we use at CMU.When you have completed the lab, you will hand in one file, kernels.c, that contains your solution. Here is howto hand in your solution: Make sure you have included your identifying information in the team struct in kernels.c. Make sure that the rotate() and smooth() functions correspond to your fastest implemnentations, as theseare the only functions that will be tested when we use the driver to grade your assignement. Remove any extraneous print statements. Create a team name of the form: ID where ID is your Andrew ID, if you are working alone, or ID1+ID2 where ID1 is the Andrew ID of the first team member and ID2 is the Andrew ID of thesecond team member.This should be the same as the team name you entered in the structure in kernels.c. To handin your kernels.c file, type:make handin TEAM=teamnamewhere teamname is the team name described above. After the handin, if you discover a mistake and want to submit a revised copy, typemake handin TEAM=teamname VERSION=2Keep incrementing the version number with each submission. You can verify your handin by looking in/afs/cs.cmu.edu/academic/class/15213-f01/L1/handinYou have list and insert permissions in this directory, but no read or write permissions.Good luck!8

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Lab Assignment L4: Code Optimization
$25