[SOLVED] COMP90025 Parallel and Multicore Computing

$25

File Name: COMP90025_Parallel_and_Multicore_Computing.zip
File Size: 395.64 KB

Category:
5/5 - (1 vote)

‌COMP90025 Parallel and Multicore Computing

OpenMP

Lachlan Andrew

School of Computing and Information Systems The University of Melbourne

n

n

2023 Semester II

1

image

OpenMP Framework image Concepts

2

image

Directives and Clauses image Execution Model

image Nested Regions image Work Sharing image Data Sharing

image Synchronization

3

image

NUMA

image Thread Affinity

image Memory Allocation

n

n

4

image

Task Model image Tasks

OpenMP Framework Concepts

OpenMP Specification

https://www.openmp.org/specifications/

  • These slides draw significantly from the OpenMP 5.1 Specification (Nov 2020). You are encouraged to read the specification as well.

  • The OpenMP Specification defines a parallel programming model that abstracts a single process, multi-threaded program execution on a (single) machine:

    • an abstract view of the machine’s resources, including multi-socket, multi-core processors, threads and thread synchronization, memory hierarchy, SIMD, NUMA and devices, e.g., o✏oading to GPUs;

    • an API providing high-level parallel programming constructs that consistently encapsulate the machine’s resources, e.g., by transparently making use of the OS threading library, processor SIMD vector registers and instructions, and compiling of target code for supported devices.

      n

      n

  • It is not an implementation. Individual hardware and compiler vendors are required to support and implement the OpenMP Specification. An OpenMP program should run consistently across di↵erent machines/compilers, however some aspects of the specification remain implementation dependent and not all hardware/compiler implementations support or distinguish all of the specifications. Similarly the specifications may not cover every kind of machine implementation that exists.

    OpenMP Framework Concepts

    Getting started

    OpenMP is commonly supported on commercial OSes and hardware. Linux distributions support it via the gcc compiler. You can therefore readily develop and test OpenMP programs. On HPC facilities, the OpenMP compiler implementation should be vendor specific and support everything that the machine o↵ers; e.g., using an Intel compiler for Intel processors.

  • If you use the gcc compiler then gcc-11 is required for OpenMP 5:

  • A minimal simple compile line for your OpenMP program might be:

    g++-11.2 -fopenmp prog.cpp -o prog

  • You could install some other useful stu↵:

    • libnuma – for C++ NUMA support functions (may need the -lnuma compile option) will allow you to find out which NUMA node a thread is running on

      n

      n

    • hwloc-ls and numactl – utilities to inspect and control NUMA on your OS, will give you a definitive overview of the processor resources, such as RAM, cache, cores, hyperthreads and NUMA characteristics

      OpenMP Framework Concepts

      Runtime Environment Variables

      OMP_NUM_THREADS=32 OMP_THREAD_LIMIT=64 OMP_PLACES=”{0:8},{0:8}”

      }

      Internal Control Variables (ICVs)

      Compiler Directives

      Runtime Library Routines

  • Runtime Environment Variables allow you to set the value of OpenMP internal control variables (ICVs) prior to the program main execution. E.g. in bash-like shells, using export OMP NUM THREADS=16 prior to running your program or, if your program is called myprog then OMP NUM THREADS=16 ./myprog will set the variable for that run only. Some ICVs must be set this way.

  • Compiler Directives allow specifying parallelism without changing the semantics of the base language. In theory, if the directives were removed the program would produce the same output, sequentially. In practice, there can be di↵erences because parallel computation can introduce artefacts.

    n

    n

  • Runtime Library Routines allow you to inspect and modify ICVs, and to interact with the OpenMP implementation, e.g., to allocate memory with specified properties or to control parallel constructs. image

    imageimage

    n o mat em The M

    n ourne)

    OpenMP Framework Concepts

    image

    image

    imageimage

    image

    image

    image

  • At any point in time, an OpenMP program’s contextimagedefines traits that

    OpenMP Framework Concepts

    Synchronization, SIMD and NUMA

  • Synchronization between threads, that may ordinarily be done by the programmer using primitive concurrency control techniques such as locks and signals, are provided via directives.

    • A thread team consists of a number of threads. Some synchronization operations are

      limited to the team of threads, i.e., it does not a↵ect other teams, while some a↵ect all teams.

    • A barrier directive marks a point of execution of a program encountered by a team of

      threads, beyond which no thread in the team may execute until all threads in the team have reached the barrier, and all explicit tasks generated by the team have executed to completion.

    • A flush directive tells a thread to enforce consistency between its view and other threads’ view of memory.

    • A critical directive ensures that only one thread can execute a given structured code block at a time.

    • An atomic directive ensures that memory reads, writes, and updates are done by a

      thread without interference from other threads. Faster, but only applies for some simple operations.

  • A simd directive makes use of SIMD processor instructions.

  • Synchronization hints can enable processor optimization techniques like speculative execution to be used.

  • Memory management and thread affinity provide for optimization over NUMA architectures, using an abstract model of placement, memory categories and partitioning.

Directives and Clauses Execution Model

Program Execution

  • Understanding an OpenMP program execution is about understanding what happens when a thread of execution encounters a directive.

  • In the program above, the main or primary thread of the program starts executing code that is part of its task.

  • Any thread that encounters the parallel directive will create a parallel region and create a thread team in that region, with itself as the primary thread, where every thread in the team executes the structured block. An implicit barrier exists at the end of the parallel region.

    #pragma omp parallel [clause[[,] clause]…] new-line structured-block

  • num threads(integer-expression)

    • Overrides the nthreads-var ICV. The actual number of threads created is limited by the

      thread-limit-var ICV and other conditions.

    • The OMP NUM THREADS environment variable sets the nthreads-var and optionally also the max-active-levels-var ICVs. Programmers can use

      omp set num threads(int num threads) to set nthreads-var at runtime, and respectively int omp get num threads() to obtain its value.

    • Similarly OMP THREAD LIMT, OMP DYNAMIC and OMP MAX ACTIVE LEVELS a↵ect

      the number of threads created and can be controlled at runtime.

    • For details see

      https://www.openmp.org/spec-html/5.1/openmpsu40.html#x60-600002.6.1

  • private(list) – a list of program variables that each thread in the team will have its own memory allocated for

    n

    n

  • shared(list) – a list of program variables that will be shared among the threads in the team

    Directives and Clauses Execution Model

    #pragma omp syntax

    In C/C++:

    #pragma omp directive-name [[,] clause[[,] clause]… ] new-line

    In C+11 and higher with C++ attribute specifiers:

    [[ omp :: directive( directive-name [[,] clause[[,] clause]… ] ) ]]

    or

    [[ using omp :: directive( directive-name [[,] clause[[,] clause]… ] ) ]]

  • Only one directive-name can be specified per directive, but multiple directives can be applied to one following block.

  • The order in which clauses appear on directives is not significant. Clauses on directives may be repeated as needed, subject to the restrictions listed in the description of each clause or the directives on which they can appear.

    n

    n

  • Some directives are stand alone in that they instruct the runtime to do something, and some directives must be followed by a structured block or structured block sequence.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] COMP90025 Parallel and Multicore Computing
$25