- Examine the code given below to compute the average of an array: total = 0; for(j=0; j < k; j++) { sub_total = 0;
/* Nested loops to avoid overflow */ for(i=0; i < N; i++) sub_total += A[j*N + i];
total += sub_total/N;
}
average = total/k;
When designing a cache to run this application, given a constant cache capacity and associativity, will you want a larger or smaller block size? Why?
- examine the MODIFIED code given below:
total = 0; for(i=0; i < N; i++) { sub_total = 0;
/* Nested loops to avoid overflow */ for(j=0; j < k; j++) sub_total += A[j*N + i];
total += sub_total/k;
} average = total/N;
Generally, how will the size of the array and the cache capacity impact the choice of block size for good performance? Why?
- Translate the following line of code into MIPS. Assume i is $s0, j is $s1, base address of A is $a0, N is $a1 and sub_total is $s2. sub_total += A[j*N + i];
- Now consider that we are executing one of these programs for the very first time. Assume we have a memory system with TLB, L1 I-cache, L1 Dcache, L2 cache and 2-level Page Table Virtual Memory system. List all the steps that will/may happen as we load instructions or data from memory. You will also need to list the steps taken when target instructions or data are not in cache or page tables. i.e. the steps to handle misses.
Load instructions:
- Read instruction: check TLB if instruction is in memory by looking up its virtual page number.
- If virtual page number is not present in TLB, read miss. Check L1 page table to see if virtual page number is present.
Load data:
- List and number all the ADDITIONAL steps that will happen when we are executing your translated code for sub_total += A[j*N + i]
- Suppose the whole program fit into the L1 caches and we are executing it many times. What are the steps from that will be skipped from (d) (just write down the #s).
Reviews
There are no reviews yet.