[SOLVED] 编程代考 Computer Graphics

30 $

File Name: 编程代考_Computer_Graphics.zip
File Size: 282.6 KB

SKU: 6773031711 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Computer Graphics
OpenMP Reduction Case Study: Trapezoid Integration Example
Mike Bailey
trapezoid.pptx

Copyright By PowCoder代写加微信 assignmentchef

mjb – March 22, 2021
Find the area under the curve y = sin(x) for 0 ≤ x ≤ π
using the Trapezoid Rule
Computer Graphics
Exact answer:  (sin x)dx   cos x |0  2.0
mjb – March 22, 2021
Don’t do it this way !
• There is no guarantee when each thread will execute this line
• There is not even a guarantee that each thread will finish this line before some other thread interrupts it.
const double A = 0.;
const double B = M_PI; doubledx=(B-A)/(float)(numSubdivisions –1); double sum = ( Function( A ) + Function( B ) ) / 2.;
omp_set_num_threads( numThreads );
#pragma omp parallel for default(none), shared(dx,sum) for( int i = 1; i < numSubdivisions – 1; i++ )double x = A + dx * (float) i; double f = Function( x ); sum += f;sum *= dx;Assembly code: Load sum Add f Store sum What if the scheduler decides to switch threads right here? Computer Graphicsmjb – March 22, 2021 The answer should be 2.0 exactly, but in 30 trials, it’s not even close.4 And, the answers aren’t even consistent. How do we fix this?0.469635 0.517984 0.438868 0.437553 0.398761 0.506564 0.489211 0.584810 0.476670 0.530668 0.500062 0.672593 0.411158 0.408718 0.5234480.398893 0.446419 0.431204 0.501783 0.334996 0.484124 0.506362 0.448226 0.434737 0.444919 0.442432 0.548837 0.363092 0.544778 0.356299 Computer Graphicsmjb – March 22, 2021 The answer should be 2.0 exactly, but in 30 trials, it’s not even close.5 And, the answers aren’t even consistent. How do we fix this? Computer Graphicsmjb – March 22, 2021Computer GraphicsThere are Three Ways to Make the Summing Work Correctly: #1: Atomicx op= expr , x = x op expr , x = expr op x where op is one of: +, -, *, /, &, |, ^, <<, >>
#pragma omp parallel for shared(dx) for( int i = 0; i < numSubdivisions; i++ ) {double x = A + dx * (float) i; double f = Function( x ); #pragma omp atomic sum += f;• More lightweight than critical (#2)• Uses a hardware instruction CMPXCHG (compare-and-exchange)• Can only handle these operations:x++, ++x, x–, –x mjb – March 22, 2021There are Three Ways to Make the Summing Work Correctly: #2: CriticalComputer Graphics #pragma omp parallel for shared(dx) for( int i = 0; i < numSubdivisions; i++ ) {double x = A + dx * (float) i; double f = Function( x ); #pragma omp critical sum += f;More heavyweight than atomic (#1)• Allows only one thread at a time to enter this block of code (similar to a mutex)• Can have any operations you want in this block of code mjb – March 22, 2021 #pragma omp parallel for shared(dx),reduction(+:sum) for( int i = 0; i < numSubdivisions; i++ )double x = A + dx * (float) i; double f = Function( x ); sum += f; • OpenMP creates code to make this as fast as possible• Reductionoperatorscanbe:+,-,*,&,|,^,&&,||,max,minThere are Three Ways to Make the Summing Work Correctly: #3: ReductionComputer Graphics mjb – March 22, 2021 Speed of Reduction vs. Atomic vs. Critical (up = faster) Computer Graphicsmjb – March 22, 2021 So, do it this way ! const double A = 0.; const double B = M_PI;double dx = ( B – A ) / (float) ( numSubdivisions – 1 ); omp_set_num_threads( numThreads ); doublesum=( Function(A)+Function(B) ) / 2.;#pragma omp parallel for default(none),shared(dx),reduction(+:sum) for( int i = 1; i < numSubdivisions – 1; i++ )double x = A + dx * (float) i; double f = Function( x ); sum += f;sum *= dx;Computer Graphicsmjb – March 22, 20211. Reduction secretly creates a temporary private variable for each thread’s running sum. Each thread adding into its own running sum doesn’t interfere with any other thread adding into its own running sum, and so threads don’t need to slow down to get out of the way of each other.2. Reduction automatically creates a binary tree structure, like this, to add the N running sums in log2N time instead N time.mjb – March 22, 2021Two Reasons Why Reduction is so Much Better in this Case Computer GraphicsO(N) vs. O(log2N)Serial addition:Adding 8 numbers requires 7 stepsAdding 1,048,576 (1M) numbers requires 1,048,575 stepsParallel addition:Adding 8 numbers requires 3 stepsAdding 1,048,576 (1M) numbers requires 20 stepsComputer Graphicsmjb – March 22, 2021Performance If You Understand NCAA Basketball Brackets, You Understand Power-of-Two Reduction 13 Source: ESPNComputer Graphicsmjb – March 22, 2021Why Not Do Reduction by Creating Your Own sums Array, one for each Thread, Like This?float*sums=newfloat[ omp_get_num_threads() ]; for( int i = 0; i < omp_get_num_threads( ); i++ )sums[ i ] = 0.;#pragma omp parallel for private(myPartialSum),shared(sums) for( int i = 0; i < N; i++ )myPartialSum = …sums[ omp_get_thread_num( ) ] += myPartialSum; }float sum = 0.;for( int i= 0; i < omp_get_num_threads( ); i++ )sum += sums[ i ]; delete [ ] sums;• This seems perfectly reasonable, it works, and it gets rid of the problem of multiple threads trying to write into the same reduction variable.• The reason we don’t do this is that this method provokes a problem called CompFutaerlGsreaphSicsharing. We will get to that when we discuss caching. mjb – March 22, 2021程序代写 CS代考加微信: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 编程代考 Computer Graphics
30 $