OpenMP: An Implementation of Thread Level Parallelism
aka Real-world Multithreading!! https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/ 18/01/2022 CS402/922 High Performance Computing
18/01/2022
Copyright By Assignmentchef assignmentchef
Previously, on the HPC module
ThreadaA small section of code that is split into multiple copies within a single process.
Threads often share code, global information and other resources
Multiprocessing vs. Multithreading
Allocate 1 (or more if hyperthreading) thread to a processor core
18/01/2022
Parallelism made easy!
OpenMP (shortened to OMP) is a pragma based multithreading library
Compiler manages threads
Programmers write specialised comments
Supports FORTRAN, C and C++
Version 1.0 came out on FORTRAN in 1997, with
C/C++ following the next year
Version 3.0 (most widely used) came out in 2008
18/01/2022
Fork-Join Model
Not to be confused with fork and spoons
Different ways threads can be managed
Thread PoolaA collection of persistent threads that
work could be allocated to
Fork-JoinaThreads are created (forked) and
destroyed (joined) when required OMP uses fork-join model
Master thread
Parallel region
Parallel region
18/01/2022
Building programs with OpenMP
Not to be confused with fork and spoons
OMP has been built into many compilers
Careful!aDifferent compilers require different flags to
Need to include the OpenMP header file (omp.h)
libomp to be installed separately (e.g. through Homebrew)
OpenMP Flag(s)
OpenMP Support
GCC 6 onwards supports OMP 4.5
GCC 9 has initial support for OMP 5
Clang (LLVM)
Fully supports OMP 4.5
Working on OMP 5.0 and 5.1
Clang (Apple)
-Xpreprocessor -fopenmp -lomp
See Clang (LLVM)
Intel 17.0 onwards supports OMP 4.5
Intel oneAPI supports part of OMP 5.1
18/01/2022
Parallelising Loops
Finally, lets parallelise!
OMP is most often utilised through pragma comments
#pragma omp
Creates OMP threads and executes the following region in parallel
int driver1(int N, int* a, int* b, int* c)
#pragma omp
parallel for
Specifies a for loop to be ran in parallel over all OMP threads
int kernel2(int N, int* a, int* b, int* c)
Other pragma commands
#pragma omp parallel do
Equivalent to parallel for, but for do while loops
#pragma omp parallel loop
Allows for multiple loops to be ran concurrently
#pragma omp simd
Indicates a loop can be transformed into a SIMD loop
#pragma omp parallel {
kernel1(N, a, b, c); }}
#pragma omp parallel for
for (i = 0; i < N; i++) { c[i] = a[i] + b[i]; 18/01/2022 Private variables Specifies a list of variables that are local to that threada=0 … a=4 The variables can be set and reset in different ways privateaIt is not given an initial value firstprivateaIts initial variable is set to thevalue of the variable lastprivateaThe variables value is set to the value in the primary thread a=9 … a=4 a=9 … a=4a=0 … a=5a=9 … a=5a=0 … a=6 a=9 … a=6 a=9 … a=5a=9 … a=6Lastprivate firstprivate private 18/01/2022So whos taking what thing again? We can specify how the work is split up between threads The most commonly used ones are: staticaworkload is split evenly between threads before compute dynamicaworkload is split into equally sized chunks, threads request chunks when required guidedasame asdynamic, but successive chunks get smaller Great for load balancing and/or reducing overhead18/01/2022 Syncing OpenMP ThreadsEven threads need to coordinate sometimes! Synchronisation is sometimes required as well #pragma omp critical Runs the following code in a single thread #pragma omp atomic Ensures a memory location is accessed without conflict Difference between these operations: atomic has a lower overhead critical allows for multi-line statementsint factorialCritical(int N) { int i;int resultShared = 0;#pragma omp parallelint resultLocal = 0;#pragma omp forfor (i = 0; i < N; i++) {resultLocal += i;#pragma omp criticalresultShared += resultLocal; }return resultShared;int factorialAtomic(int N) {int resultShared = 0;#pragma omp parallel forfor (i = 0; i < N; i++) { #pragma omp atomicresultShared += i;return resultShared; 18/01/2022 Reductions Allows for the same operation to be applied to the same variable over multiple threads Often faster than atomic or critical Limited to a certain number of operations:int factorialReduction(int N) { int i;int resultShared = 0;#pragma omp parallel for reduction(+:resultShared)for (i = 0; i < N; i++) { resultShared += i;return resultShared; }Making dependencies easier one step at a time! Identifier function/expression (min, max) +,-,*,&,|,^,&&,|| 18/01/2022 OpenMP functionsWhats a library without functions! Some aspects of the OMP environment can be set or retrieved within the program Key examples include: omp_get_num_threads() a Gets the number of threads available omp_get_thread_num() a Gets the ID of the thread omp_set_num_threads(int) a Sets the number of threads that can be utilised omp_get_wtime() a Gets the wall clock time (thread safe)18/01/2022 Environment Variables Why recompile when we can alter the environment Allows us to change key elements without changesto the code Often used examples: OMP_NUM_THREADSaThe number of threads to be utilised in the program OMP_SCHEDULEaThe ordering with which the threads should iterate through a loop OMP_PROC_BINDaControls if and how threads can move between cores18/01/2022 Whats next for OpenMP?Onwards and upwards! OMP4.5, 5.0 and 5.1 Target offloadaSpecify where the compute shouldoccur (CPU/GPU/Accelerator etc.) Memory managementaSpecify where the data should be stored and how New version out (5.2) out soon18/01/2022 Interesting related readsSome of this might even be fun… International Workshop on OpenMP (IWOMP)ahttps://www.iwomp.org/ OpenMP Quick Reference Guide (Version 5.2)ahttps://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf OpenMP Examples (Version 5.1)ahttps://www.openmp.org/wp- content/uploads/openmp-examples-5.1.pdf Next lecture: Intro to Coursework 1 CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.