High Performance Computing
Course Notes
Coursework 2
Dr Ligang He
Problem Domain
Coursework taken from the field of Computational Fluid Dynamics (CFD)
Fluid dynamics based on three fundamental principles: (i) mass is conserved; (ii) Newtons second law ;(iii) energy is conserved
Expressed as partial differential equations, showing how velocity and pressure are related, etc. (called governing equations).
Computer Science, University of Warwick
2
Governing Equations
Computer Science, University of Warwick
3
Problem Domain
Coursework taken from the field of Computational Fluid Dynamics (CFD)
Fluid dynamics based on three fundamental principles: (i) mass is conserved; (ii) Newtons second law ;(iii) energy is conserved
Expressed as partial differential equations, showing how velocity and pressure are related, etc. (called governing equations).
The coordinates and time are independent variables while velocity and pressure are dependent variables
Computational Fluid Dynamics is the science of finding the numerical solution to the governing equations of fluid flow, over the discretized space or time
Computer Science, University of Warwick
4
CFD code
The code in the coursework, called Karman, calculates the velocity and pressure of a 2D flow
The code writes the solution values into a binary file Currently the code is sequential
The purpose of the coursework is to parallelize the code
Computer Science, University of Warwick
5
Data and stencil
The area represented as a 2D Grid (discretize) Calculate one point in each cell
Computer Science, University of Warwick
6
CFD code
jmax
There are two types of cell: fluid (C_F) and obstacle (C_B)
Computer Science, University of Warwick
7
Data and stencil
The properties of a cell in the grid
Communication pattern: using a five-point stencil to
calculate a point
Computer Science, University of Warwick
8
Numerical method for solving the governing equations
Initialize each cell value
Check if the solution satisfies the governing equations
If not, generate the new solution based on a stencil of current solutions from neighbouring cells
Iterations are advanced until the termination condition is met
Computer Science, University of Warwick
9
General Steps for using the numerical method to solve the linear equations
Aim: solve A=B
Step 1: Guess a initial solution 0
Step 2: If the convergence is reached by checking the residual B-Ai
Key: when we repeat iterative steps, each step generates a better solution
Computer Science, University of Warwick
10
Successive Over-Relaxation(SOR)
SOR is a method to generate new solutions: it can speed up convergence
For a set of linear equations: A=B
let A=D+U+L, where D, L and U denote the diagonal, strictly lower
triangular, and strictly upper triangular parts of A, respectively
The successive over-relaxation (SOR) iteration is defined by the following
Where values of > 1 are used to speedup convergence of a slow-converging process, while values of < 1 are often help establish convergence of diverging iterative process Computer Science, University of Warwick11SORSuccessive Over-Relaxation (SOR) method is used to speed up convergenceComputer Science, University of Warwick12SOR SOR is originally written so that p_new(i, j) depends on1) a stencil of current solutions from neighbouring cells,2) p_new(i-1, j), 3) p_new(i, j-1).for all i imax, j jmax, do pij(k+1)=f(P(k), pi-1,j(k+1), pi,j-1(k+1))P(k) is the solution at the iteration k (P is a matrix [pij]) This introduces a chain ofdependencies for each cell Fine on sequential execution: just calculate values from left to right, and top to bottomComputer Science, University of Warwick13SOR SOR is originally written so that calculating p_new(i, j) depends on1) a stencil of current solutions from neighbouring cells,2) p_new(i-1, j), 3) p_new(i, j-1).for all i imax, j jmax, do pij(k+1)=f(P(k), pi-1,j(k+1), pi,j-1(k+1))P(k) is the solution at the iteration k (P is a matrix [pij]) This introduces a chain ofdependencies for each cell Fine on sequential machines: just calculate values from left to right, and top to bottomComputer Science, University of Warwick14SOR SOR is originally written so that calculating p_new(i, j) depends on1) a stencil of current solutions from neighbouring cells,2) p_new(i-1, j), 3) p_new(i, j-1).for all i imax, j jmax, do pij(k+1)=f(P(k), pi-1,j(k+1), pi,j-1(k+1))P(k) is the solution at the iteration k (P is a matrix [pij]) This introduces a chain ofdependencies for each cell Fine on sequential machines: just calculate values from left to right, and top to bottom Computer Science, University of Warwick15SOR SOR is originally written so that calculating p_new(i, j) depends on1) a stencil of current solutions from neighbouring cells,2) p_new(i-1, j), 3) p_new(i, j-1).for all i imax, j jmax, do pij(k+1)=f(P(k), pi-1,j(k+1), pi,j-1(k+1))P(k) is the solution at the iteration k (P is a matrix [pij]) This introduces a chain ofdependencies for each cell Fine on sequential machines: just calculate values from left to right, and top to bottom Computer Science, University of Warwick16SOR This approach forms a wave front It does not make for a good parallel solution The fundamental reasons areo A chain of dependencieso The number of cells to be computed varies in each step (unbalanced workload) Computer Science, University of Warwick17Red/Black SORLess dependencyWorkload can be balanced among the processors Another solution that achieves the same result is red/black ordering Rewrite the SOR equations so the cells are divided into two populations. New red values are calculated first (using current black values)Red cells:pij(k+1)=f(pi-1,j(k), pi,j-1(k), pi+1,j(k) , pi,j+1(k)) Then black is calculated using the just updated red values.Black cells:pij(k+1)=f(pi-1,j(k+1), pi,j-1(k+1), pi+1,j(k+1), pi,j+1(k+1)) Note: pi-1,j(k+1), pi,j-1(k+1), … are red cells All red cells can be calculated in parallel, Then all black cells can be calculated inparallel Computer Science, University of Warwick18Decomposition of a grid of cells Make each process responsible for updating a block of cells A process must send the cells at the edge of its domain to its neighbours, and receive a copy of the edge cells from its neighbours You can think whether you implement 1D or 2D decompositiono 1D is OK for this courseworkProcess 1Process 2Computer Science, University of Warwick19The main loop for (t = 0.0; t< t_end; t = t + delt){1.Calculate an approximate time-step size by seeing how much movement occurred in the last time-step. The discrete approximation is only stable when the maximum motion < 1 cell per time-step.2.For each cell, compute a tentative new velocity (f,g) based on the previous (u,v) values. It takes as input the u, v and flag matrices, and updates the f and g matrices.3.For each cell calculate the RHS of the pressure equation. This uses two f and g values. It takes as input the f, g, and flag matrices and updates the RHS matrix.4.For the entire pressure matrix, use Red/Black SOR to solve the Poisson equation. This takes a large number iterations of the Red/Black process as shown in the slides. It takes as input the current pressure matrix, flag matrix, and the RHS matrix and outputs a new pressure matrix Computer Science, University of Warwick20The main loop 5. For each cell, update the real (u,v) values based on the pressure matrix and the tentative velocity values (f,g). It takes as input the pressure, f, g and flag matrices and updates the u and v matrices.6. For each cell that is adjacent to an edge cell, the (u,v) values of the boundary cells are updated (by taking values from their neighbours). It takes as input the u, v and flag matrices and updates the u and v matrices} Note that it is not going to be worth parallelizing the whole program, just parts of it. Computer Science, University of Warwick21Coursework 2 What to Do in General- You will be given a serial program, called Karman- You need to parallelize the Karman code and write a report- 80% of the full mark comes from the parallelization using MPI- 20% of the mark comes from the parallelization using both MPI and OpenMP (a hybrid approach), i.e., parallelizing the computations in one machine using OpenMP, while parallelizing the computations across machines using MPIComputer Science, University of Warwick22Coursework 2 Detailed Requirements for Programming Parallelize the Karman code, using a pure MPI approach (80% of marks) and a hybrid MPI-OpenMP approach (20% marks) Profiling which Karman functions are more time consuming Design your decomposition strategy, date exchange strategies between neighboring partitions, the MPI functions (e.g., collective operations) For a hybrid approach, add OpenMP directives Computer Science, University of Warwick23Coursework 2 Detailed Requirements for Programming Benchmarking the execution time (e.g., using MPI_Wtime function) of your parallel code as youincrease the number of processesand/or change the problem size if it helps you observe the trendIf you develop a hybrid MPI-OpenMP implementation,benchmark the runtime of the hybrid parallel versionCompare the performance between the hybrid approach and the pure MPI approachEnsure the simulation results are the same after the parallelizationContain adequate comments as good programmingpractice. Computer Science, University of Warwick24Assignment 2 Requirements for Report Profiling the functions to determine which functions are more time consuming; discuss the profiling resultsDiscuss your MPI implementation, for example,your decomposition strategy and load balancing strategy;your strategy of exchanging the boundary data between neighboring partitions;MPI functions such as collective operations you used;Benchmark the change in execution times of your parallel program as you increase the number of processes and/or problem size; present the results in graph or tableAnalyze and discuss the performance of your parallelprogram Computer Science, University of Warwick25Assignment 2 Requirements for Report If you develop the hybrid MPI-OpenMP implementation,Discuss OpenMP implementationBenchmark the runtime of the hybrid parallel code and present the resultsDiscuss the performance comparison between the hybrid approach and the pure MPI approachWriting skills: pay attention to spelling, punctuation and grammarUp to 6 A4 pages (not a strict up limit) Computer Science, University of Warwick26Assignment 2 Submission -Submit a compressed folder (.zip) to Tabula; the folder should containyour parallel code (the directory structure used in the project should not change)your report (pdf format and place your report in the top level directory in the folder)Deadline: 12 noon, March 18th, 2021– – Computer Science, University of Warwick27
Reviews
There are no reviews yet.