High Performance Computing
Dr Ligang He
Guest lecture
Copyright By Assignmentchef assignmentchef
Design of a Low-Level Interconnection Network
Dean Chester
Principle and benefits of parallel I/O
Implementation of parallel I/O (in MPI)
High Performance Parallel I/O Why are we looking at parallel I/O?
p I/O is a major bottleneck in some parallel applications
Processing sensor data in earth science
Biological sequency analysis in computational biology
Parallel I/O version 1.0
Assume 4 processes compute the elements in a matrix in parallel and the results need to be written into the disk.
Early solutions:
All processes send data to process 0, which then writes to a file
Parallel I/O version 1.0
Assume 4 processes compute the elements in a matrix in parallel and the calculation results need to be written into the disk.
Early solutions:
All processes send data to process 0, which then writes to file
Parallel I/O
Bad things about version 1.0
1. Single node bottleneck 2. Single point of failure 3. Poor performance
4. Poor scalability
Good things about version 1.0
1. The IO system only needs to deal with I/O from one process 2. Do not need specialized I/O library
3. Results in a single file which is easy to manage
version 1.0
Parallel I/O
Each process writes to a separate file
version 2.0
Parallel I/O
Each process writes to a separate file
version 2.0
Good things about version 2.0
1. Now we are doing things in parallel 2. High performance
Parallel I/O
Each process writes to a separate file
version 2.0
Bad things about version 2.0
1. We now have lots of small files to manage
2. How do we read the data back when #procs changes? 3. Does not interoperate well with other applications
Parallel I/O
Multiple processes of parallel program access (read/write) data from a common file
version 3.0
Parallel I/O
Good things about version 3.0
p Simultaneous I/O from any number of processes
p Excellent performance and scalability
p Results in a single file which is easy to manage and interoperates well with other applications
p Maps well onto collective operations
Bad things about version 3.0
p Requires more complex I/O library support
p Traditionally, when one process is accessing a file, it blocks
the file and another process cannot access it.
p Needs the support of simultaneous access by multiple processes
version 3.0
What is Parallel I/O?
Multiple processes of a parallel program accessing (reading or writing) different parts of a common file at the same time
Parallel I/O is the integral part since MPI-2
I/O optimization: Data Sieving reading
Data sieving is used to combine lots of small accesses into a single larger one
pReducing the number of I/O operations
I/O optimization: Data Sieving Writes
Using data sieving for writes is more complicated
p read the entire region first p Then make the changes p Then write the block back
Requires locking in the file system
p Can result in false sharing
I/O optimization: Collective I/O
Problems with independent, noncontiguous access
p Lots of small accesses (9 separate accesses in this example)
-Collective operations
p Underlying I/O layers know what data are being requested
by each process
p First phase reads the entire block
p Second phase moves data to final destinations
I/O optimization: Collective I/O
Collective I/O is coordinated access to storage by a group of processes
pCollective I/O functions must be called by all processes participating in I/O at the same time
pAllows I/O layers to know more as a whole about the data to be accessed
Parallel I/O example
p Consider a 1616 array stored in the disk in the row-major order p Each of 16 processes accesses a 44 subarray
Parallel I/O example
p Consider a 1616 array stored in the disk in the row-major order p Each of16 processes access a 44 subarray
Access pattern 1:
MPI_File_seek
Updates the individual file pointer
int MPI_File_seek( MPI_File mpi_fh, MPI_Offset offset, int whence );
mpi_fh : [in] file handle (handle) offset : [in] file offset (integer) whence : [in] update mode (state)
MPI_File_Seek updates the individual file pointer according to offset and whence, which has the following possible values:
MPI_SEEK_SET: the pointer is set to offset
MPI_SEEK_CUR: set to the current pointer position plus offset MPI_SEEK_END: set to the end of file plus offset
Access pattern 1: MPI_File_read Read using individual file pointer
int MPI_File_read( MPI_File mpi_fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status );
mpi_fh: [in] file handle (handle)
buf: [out] initial address of buffer (where the data to be put) count: [in] number of elements in buffer (nonnegative integer) datatype: [in] datatype of each buffer element (handle)
status: [out] status object (Status)
Access pattern 1
p One independent read request is done for each row in the local array
MPI_File_open( , filename, , &fh) for(i=0; i < n_local_rows; i++) {… set offset …MPI_File_seek (fh, offset, …) MPI_File_read (fh, row[i], 4,…)MPI_File_close (&fh)64 independent I/O operationsp Individual file pointers per process per file handlep Each process sets the file pointer with some suitable offsetp The data is then read into the local arrayp This is not a collective operation Access pattern 2: MPI_File_read_all MPI_FILE_READ_ALL is a collective version of MPI_FILE_READint MPI_File_read_all( MPI_File mpi_fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status );fh : [in] file handle (handle)buf : [out] initial address of buffer (choice)count : [in] number of elements in buffer (nonnegative integer) datatype : [in] datatype of each buffer element (handle)status : [out] status object (Status) Access pattern 2p Similar to access pattern 1 but using collectivesp All processes that opened file will read data together (with ownaccess information)16 MPI_File_open(… , filename, … , &fh) for(i=0; i < n_local_rows; i++) {MPI_File_seek (fh, offset, …) MPI_File_read_all (fh, row[i], …)MPI_File_close (&fh)16 I/O operationsp read_all is a collective version of the read operationp This is blocking readp Each process accesses the file at the same timep This may be useful as independent I/O operations do not convey what other procs are doing at the same time Access pattern 3: Definitions p File viewp View is the set of data visible to a process in a file, defined by displacement, etype and filetypep Displacementp Defines the location where a view begins p Position relative to the beginning of a file Access pattern 3: Definitions p File viewp View is the set of data visible to a process in a file, defined by displacement, etype and filetypep etype (elementary datatype)p Unit of data access and positioningp Can be a predefined or derived datatypep Displacement is expressed as multiples of etypes Access pattern 3: Definitions p File viewp View is the set of data visible to a process in a file, defined by displacement, etype and filetypep Filetypep Defines a template/pattern in a file accessible by a process Access pattern 3: Definitions p File viewp View is the set of data visible to a process in a file, defined by displacement, etype and filetypep View is a repeated pattern defined by filetype (in units of etypes) beginning at the displacementp Can construct the derived datatype for the filetype In this case, use MPI_Type_Vector(count, blocklen, stride, oldtype, newtype) 29 Access pattern 3: complementary views of multiple processesproc. 0 filetype proc. 1 filetype proc. 2 filetypeGroup of processes using complementary views to achieve global data distributionDifferent processes can have different viewsdisplacementPartitioning a file among parallel processes MPI_File_set_viewDescribes a part of the file accessed by a MPI process.int MPI_File_set_view( MPI_File mpi_fh, MPI_Offset disp, MPI_Datatype etype, MPI_Datatype filetype, char *datarep, MPI_Info info );mpi_fh :[in] file handle (handle)disp :[in] displacement (nonnegative integer) etype :[in] elementary datatype (handle) filetype :[in] filetype (handle)datarep :[in] data representation (string)info :[in] info object (handle) MPI_Type_create_subarrayCreate a datatype for a subarray of a big arrayint MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );Parametersoldtype :[in] array element datatype (handle) newtype :[out] new datatype (handle) MPI_Type_create_subarrayCreate a datatype for a subarray of a regular, multidimensional arrayint MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );Parametersoldtype :[in] array element datatype (handle)newtype :[out] new datatype (handle)ndims :[in] number of array dimensions (positive integer) MPI_Type_create_subarrayCreate a datatype for a subarray of a regular, multidimensional arrayint MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );Parametersoldtype :[in] array element datatype (handle)newtype :[out] new datatype (handle)ndims :[in] number of array dimensions (positive integer)array_of_sizes :[in] number of elements of type oldtype in each dimension of the full array (array of positive integers) MPI_Type_create_subarrayCreate a datatype for a subarray of a regular, multidimensional arrayint MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );Parametersoldtype :[in] array element datatype (handle)newtype :[out] new datatype (handle)ndims :[in] number of array dimensions (positive integer)array_of_sizes :[in] number of elements of type oldtype in each dimension of the full array (array of positive integers)array_of_subsizes :[in] number of elements of type oldtype in each dimension of the subarray (array of positive integers) MPI_Type_create_subarrayCreate a datatype for a subarray of a regular, multidimensional arrayint MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );Parametersoldtype :[in] array element datatype (handle)newtype :[out] new datatype (handle)ndims :[in] number of array dimensions (positive integer)array_of_sizes :[in] number of elements of type oldtype in each dimension of the full array (array of positive integers)array_of_subsizes :[in] number of elements of type oldtype in each dimension of the subarray (array of positive integers)array_of_starts :[in] starting coordinates of the subarray in each dimension (array of nonnegative integers) order :[in] array storage order flag (state) Example of using the Subarray Datatypegsizes[0] = 16; /* no. of rows in global array */ gsizes[1] = 16; /* no. of columns in global array*/psizes[0] = 4; /* no. of procs. in vertical dimension */ psizes[1] = 4; /* no. of procs. in horizontal dimension */lsizes[0] = 16/psizes[0]; /* no. of rows in local array */ lsizes[1] = 16/psizes[1]; /* no. of columns in local array*/dims[0] = 4; dims[1] = 4;periods[0] = periods[1] = 1;MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm); MPI_Comm_rank(comm, &rank);MPI_Cart_coords(comm, rank, 2, coords);/* global indices of first element of local array */start_indices[0] = coords[0] * lsizes[0];start_indices[1] = coords[1] * lsizes[1];MPI_Type_create_subarray(2, gsizes, lsizes, start_indices, MPI_ORDER_C, MPI_FLOAT, &filetype);MPI_Type_commit(&filetype); Cartesian Topologynaming the processes in a communicator using Cartesian coordinatesint MPI_Cart_create(MPI_Comm comm_old, int ndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart) Access pattern 3q Each process creates a derived datatype to describe the non- contiguous access patternq We thus have a file view and independent accessq Creates a datatype describing a subarray of a multi-dimentional arrayq Commits the datatypeq Open the file as before MPI_Type_create_subarray (… , &subarray, …) MPI_Type_commit (&subarray)MPI_File_open(… , filename, … , &fh) MPI_File_set_view (fh, disp, MPI_INT, subarray, …) MPI_File_read (fh, local_array, 1, subarray,…) MPI_File_close (&fh) Access pattern 3q Each process creates a derived datatype to describe the non- contiguous access patternq We thus have a file view and independent access16 independent requestsEach request contain 4 non-contiguous accessesq Now changes the processs view of the data in the file using set_viewq set_view is collective q Although the readsare still independent MPI_Type_create_subarray (… , &subarray, …) MPI_Type_commit (&subarray)MPI_File_open(… , filename, … , &fh) MPI_File_set_view (fh, disp, MPI_INT, subarray, …) MPI_File_read (fh, local_array, 1, subarray,…) MPI_File_close (&fh) Access pattern 4q Each process creates a derived datatype to describe the non- contiguous access patternq We thus have a file view and independent accessq Creates and commits MPI_Type_create_subarray(… , &subarray, …)MPI_Type_commit (&subarray) MPI_File_open(… , filename, … , &fh) MPI_File_set_view (fh, …, subarray, …) MPI_File_read_all (fh, local_array, 1, subarray,…) MPI_File_close (&fh)Single collectivedatatype as beforeq Now changes the processes view of the data in the file using set_viewq set_view is collectiveq Reads are now collective Access patternsq We discussed four different styles of parallel I/Oq You should choose your access pattern depending onthe applicationsq Combine multiple small I/O requests to a bigger request q Collectives are going to do better than individual reads q Pattern 4 offers (potentially) the best performance CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.