[SOLVED] R C algorithm math operating system computer architecture graph software network cuda GPU OpenCL C N 4 3 -1 2 5 8 / T P 4 1 4 2 0 1 9 4 ISSN 1007-130X Computer Enginering & Science Vol.41No.4Apr.2019

Whatsapp Us

[SOLVED] R C algorithm math operating system computer architecture graph software network cuda GPU OpenCL C N 4 3 -1 2 5 8 / T P 4 1 4 2 0 1 9 4 ISSN 1007-130X Computer Enginering & Science Vol.41No.4Apr.2019

Name: [SOLVED] R C algorithm math operating system computer architecture graph software network cuda GPU OpenCL C N 4 3 -1 2 5 8 / T P 4 1 4 2 0 1 9 4 ISSN 1007-130X Computer Enginering & Science Vol.41No.4Apr.2019
Brand: Assignment Chef
SKU: 8618712291
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

$25

File Name: R_C_algorithm_math_operating_system_computer_architecture_graph_software_network_cuda_GPU_OpenCL_C_N__4_3__1_2_5_8___T_P__________________4_1___4__2_0_1_9__4____ISSN_1007_130X_Computer_Enginering___Science_Vol_41No_4Apr_2019.zip
File Size: 2110.08 KB

SKU: 8618712291 Category: Programming Tags: AI, algorithm, Android, ARM, C, case study, compiler, Computer Architecture, concurrency, data mining, data science, data structure, database, decision tree, deep learning, distributed system, ER, file system, finance, GPU, GUI, Haskell, interpreter, Java, Javascript, kernel, Matlab, MIPS, PROLOG, Python, Scheme, SQL, x86

Description
Reviews (0)

5/5 - (1 vote)

C N4 3 -1 2 5 8 / T P 4 1 4 2 0 1 9 4 ISSN 1007-130X Computer Enginering & Science Vol.41No.4Apr.2019
:1007-130X(2019)04-0590-08
SpMV *
Akrem Benatia ( 100081)
:SpMV SpMV CPU GPU CPU GPU SpMV
CPU-GPU SpMV 2:Quasi-di- agonal Tetris SVR 2 66% GPU
Quasi-diagonal Tetris1.742.15 : ; ; ;S V R ;S p M V
:O246 :A
doi:10.3969/j.isn.1007-130X.2019.04.003 An SpMV partitioning and optimization algorithm
on heterogeneous computing platforms TAN Zhao-nianJI Wei-xingAKREM BenatiaGAO Jian-huaLI An-minWANG Yi-zhuo
(School of Computer Science and TechnologyBeijing Institute of TechnologyBeijing 100081China)
Abstract:Sparse matrix vector multiplication SpMV is widely used in solving scientific computing and enginering problems.The distribution of non-zero elements of the sparse matrix can greatly afect the computational eficiency of SpMVand significant performance improvements can be achieved by u- sing specific algorithms for diferent data distribution paterns.CPU has strong controlability which en- ables its application in general-purpose computing while GPU has high degre of paralelism and many coresthus suitable for data-intensive computing.Taking advantage of the two can gain greater perform- ance for SpMV.We study the task partitioning and optimization methods of SpMV on CPU-GPU hybrid architecture and propose a task-based rescheduling algorithm based on SVR for two main data distribu- tion paterns:Quasi-diagonal and Tetris.The two representative sparse matrix paterns acount for 66% of the data in practical scientific and enginering applications.Experimental results show that our approach can achieve an average spedup of 1.74xand 2.15xfor Quasi-diagonal and Tetris on heteroge- neous platforms over on GPU respectively.
Key words:heterogeneous computation;matrix partitioning;colaborative optimization;SVR;SpMV
* :2018-10-17; :2018-12-20 :100081
Addres:School of Computer Science and TechnologyBeijing Institute of TechnologyBeijing 100081P.R.China

: SpMV
591
1
SpM V (Sparse Matrix Vec- tor multiplication) SpMV[1]
CPU-FPGACPU- GPU CPU-GPU [2]
CPU ;
GPU

SpMV [3]
Sp M V Maringanti [4] CSR(Compresed Sparse
Row) ELL(ELLpack-itpack) :CSR 32 CSRELL
Su [5 ] Cocktail M o n a k o v [6 ] BCOO(Blocked COOrdinate) BCSR(Blocked Compres sed Sparse Row ) M uralidharan [7 ] GPU [8][9] 2:(1) ;(2) CPUGPU[10]

GPU CPU
SVR (Support Vector Regres sion)[11]
SpMV SVR
2
CSRELL DIA (DIAgonal) 1
Figure 1 CSRELL and DIA format 1 C S R E L L D I A
CSR 3 1 ELL ELL DIA DIA
COO (COOrdi- nate)CSRELL DIA
CSR[12]Merge-based CSR[13]Jagged-DIA[14] BSR (Blocket Sparse R o w ) [1 5 ]
SpMV y = Ax A MN xN y
M

592 Computer Engine ering & Science 201941(4) 3 SpMV
3 .1
SpMV
SuiteSparse ( )[16 ]
2:Quasi-diagonal Tet- ris 2 SuiteSparse 23.0% 43.1%
SpMV GPU
GPU CPU Cache
2: 2 GPU CPU GPU CPU (S V R ) S p M V CPU
GPU
2 SpMV SVR
3 .2 S p M V
(1 )Q uasi-diagonal
Quasi-diagonal 2diagGT01R lock1074 Quasi-diagonal
Figure 2 Examples of Quasi-diagonal matrix 2 Q uasi-diagonal
Quasi-diagonal 3 WIDTH 2:Q1 Q2 Q1 DIA Q2 CSRSpMVQ1
G P U Q2
C P U [8 ]
Figure 3 An example of Tetris matrix 3 T etris
(2)Tetris
4 bratu 3dbig geom Tetris
Figure 4 Schematic diagram of Quasi-diagonal partitioning 4 Q uasi-diagonal
Tetris5 K 2:T1 T2 CPU GPU K 80%
GPU20%CPUT1 ELLT2 CSR SpMVT1 GPUT2
CPU
3.3 SpMV
(1)
SpMV [17]

: SpMV
593
Figure 5 Schematic diagram of Tetris partitioning
5 T etris A B
6
S p M V G P U C P U2 t0 GPU 5C1 CPU 3C2t2 GPU CPU C2
SpMVt3-t0 GPU CPU
GPU
Figure 6 Task execution without coperative optimization 6
Figure 7 Task execution after coperative optimization 7
SpMV1 :
(1) SpMV Q1 Q2 (T1 T2 );
(2)SVR:MGPU MCPU ; (3 )
Q1 Q2 (
T1 T1 )
1 SpMV
:Q 1 Q 2 M G P U M C P U :Q1 Q2
01:TGPU TGPU =0;
02:do
0 3 : T G P U = M G P U (Q 1 ) ;
0 4 : T C P U = M C P U (Q 2 ) ;
05: if|TGPU -TCPU|<06: break;07: else if TGPU >TCPU
08:RQ1 nnz(Q1); 09: Q2 Q2 R;
10: Q1 Q1 R;
11: else
12:RQ2 nnz(Q2); 13: Q1 Q1 R;
14: Q2 Q2 R;
15: endif
16:while(1)
17:Q1 Q1 Q2 Q2 ;
7
GPUCPU
CPU
C2C2 (01)
GPU C1 C2t2 GPUCPU SpMV
t2 -t0 t3 -t2
GPU CPU GPU CPU SpMV
1nz(Q1)Q1

594 Computer Engine ering & Science 201941(4)
(2)SVRSpMV[18]
SVR GPU CPU SpMV :
1 Q T ;
2(1) 80 % 20 %
3 SpMV SVR ;
4 SVR 4;
Intel MKL(Math Kernel Library)[20] CPU SpMV
5 Table 1 Data set information
1
Quasi-diagonalQ629TetrisT1 165

QDIA QCSR TELLTCSR
4 .2
GFLOPS10
SpMV GFLOPS
Sp M V Sp M V SpMV SpMV SpMV
34

/GB 1112
/M Hz 1 582 2.13
Table 2 Experimental equipment 2
GPUNVIDIA GTX 1080TiCPUIntel Xeon E5606
Quasi-diagonal Q QDIA QCSR (1)
Q = {A 1 A 2 A N }

Q N Ai 2ADIAi ACSRi Tetris

Quasi-diagonal
2 Tetris2 SpMV
4 4 .1
GPU CPU 2 Intel Xeon48 Win- dows 7-64 GPU CUSP
[19] SpMV CUSP Thrust CPU
Table 3 Information of Quasi-diagonal matrics 3 Quasi-diagonal
QDIA = {ADIA1 ADIA2 ADIAN }
{ } (1)

1
2
3
4
5
6
7
8
9
10

bbmat
conf5_4-88-05epb2ex40lock1074nasa4704rg_n_2_15_s0rimted_Bxenon1

38744*38744
49152*49152
25228*25228
7740*7740
1074*1074
4704*4704
32768*32768
22560*22560
10605*10605
48600*48600
NNZ 1 771 722
1 916 928
175 227
456 188
51 588
104 756
320 480
1 014 951
144 579
1 181 120
QCSR = ACSR1 ACSR2 ACSRN Ai=ADIA +ACSR i[1N]
ii

: SpMV
595
Table 4 Information of Tetris matrices 4 Tetris
CPU-GPU GFLOPS 1 000 CSR [21] GPU CSR SpMV Quasi-diagonal GFLOPS GPU 1.43 Tetris GFLOPS
GPU 1.82 Quasi-di- agonal 1 .74 Tetris 2.15 Tetris ELLCSR C S R [1 7 ] 2 GPU
CPU SpMV
8 bbmatrg g_n_2_15_ s0xenon13 GFLOPS
DIA() GPU
Quasi-diagonal : QDIA QCSR GPU DIA CSR QDIA SpMV QDIA 2:(1) DIA ;(2) DIA DIA Tetris GPU ELL CSR
TELL SpMV10GPUDIA

11
12
13
14
15
16
17
18
19
20

apach2
ar2010
ASIC_680ks
mn2010
msc10848
m_t1
ne2010
sc2010
t3dhvsp_finan51
2_scagr7- 2c_rlfddd

715176*715176
186211*186211
682712*682712
259777*259777
10848*10848
97578*97578
193352*193352
181908*181908
79171*79171
139752*139752
NNZ 4 817 870
904 310
1 693 767
1 227 102
1 229 776
9 753 570
913 854
893 160
4 352 105
1 104 040
Tetris Quasi- diagonal 2 2
4.3
8 9 Quasi-diagonal Tetris GPU
Figure 8 Experimental results of Quasi-diagonal patern 8 Q uasi-diagonal

596 Computer Engine ering & Science 201941(4)
( ELL) CSR
101
bbmatrg g_n_2_15_s0 xenon1 8 DIA
5
2 :Quasi-diagonal Tetris CPU -GPU
2 SpMV SVR Sp M V
:
[1] Beck ATeboule M.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].Siam Journal on Im- aging Sciences20092(1):183-202.
[2] Shen JVarbanescu A LLu Yet al.Workload partitioning for acelerating applications on heterogeneous platforms[J].
IEEE Transactions on Paralel and Distributed Systems
201627(9):2766-2780. [3] Filippone SCardelini VBarbieri Det al.Sparse matrix-vec-
tor multiplication on GPGPUs[J].ACM Transactions on Mathematical Software201743(4):1-49.
[4]Maringanti AAthavale VPatkar S Bet al.Aceleration of conjugate gradient method for circuit simulation using CUDA [C]Proc of IEEE International Conference on High Per-
formance ComputingDataand Analytics2009:438-444. [5] Su BKeutzer K.clSpMV:A cros-platform OpenCL SpMV framework on GPUs[C]Proc of International Conference
on Supercomputing2012:353-364.
[6]Monakov AAvetisyan A.Implementing blocked sparse ma-
trix-vector multiplication on NVIDIA GPUs[C]Proc of In- ternational Workshop on Embedded Computer Systems 2009:289-297.
[7]Muralidharan SRoy A Hal l MW et al.Architecture-adap- tive code variant tuning[J].Architectural Support for Pro- gramming Languages and Operating Systems201651(4): 325-338.
[8] Yang WLi KMo Zet al.Performance optimization using partitioned SpMV on GPUs and multicore CPUs[J].IEEE Transactions on Computers201564(9):2623-2636.
[9] Yang W DLi K LLi K Qet al.A hybrid computing meth- od of SpMV on CPU-GPU heterogeneous computing systems [J].Journal of Paralel and Distributed Computing2017
104:49-60.
Figure 9 Experimental results of Tetris patern 9 T etris
igure 10 Spedup of DIA (or ELL)relative to CSR on GPU platform 10 GPU DIA( ELL) CSR

: SpMV
[10] Chen LVila OKrishnamoorthy Set al.Dynamic load bal- ancing on single-and multi-GPU systems[C]Proc of Inter- national Paralel and Distributed Procesing Symposium 2010:1-12.
[11] Drucker HBurges C J CKaufman Let al.Support vector regresion machines[C]Proc of Neural Information Pro- cesing Systems Conference1997:155-161.
[12] Sad Y.SPARSKIT:A basic tool kit for sparse matrix com- putations[R].Ilinios University of Ilinios1994.
[13]Meril DGarland M.Merge-based paralel sparse matrix- vector multiplication[C]Prco of the International Confer- ence for High Performance ComputingNetworkingStror- age and Analytics2016:Article No.58.
[14] Dazevedo E FFahey M RMils R Tet al.Vectorized sparse matrix multiply for compresed row storage format [C] Proc of International Conference on Computational
Science2005:99-106. [15] Remington KPozo R.NIST sparse BLAS:Users guide:
Technical Report NISTIR 6744[R].Caithersbury:National Institute of Standards and Technology1996.
[16] Davis T AHu Y.The University of Florida sparse matrix colection[J].ACM Transactions on Mathematical Soft-
ware200938(1):1-25. [17] Bel NGarland M.Implementing sparse matrix-vector mul-
tiplication on throughput-oriented procesors[C]Proc of IEEE International Conference on High Performance Com- puting Data and Analytics2009:1-11.
[18] Benatia AJi WWang Yet al.Machine learning approach for the predicting performance of SpMV on GPU[C]Proc of International Conference on Paralel and Distributed Sys- tems2016:894-901.
[19] CUSP.The NVIDIA library of generic paralel algorithms for sparse linear algebra and graph computations on CUDA architecture GPUs[EB/OL].[2018-05-15].htps://devel- oper.nvidia.com/cusp.
[20] Intel.Intel math kernel library[EB/OL].[2012-12-10].ht- tp://software.intel.com/en-us/intel-mkl.
[21] Bel NGarland M.Eficient sparse matrix-vector multiplica- tion on CUDA:NVIDIA Technical Report NVR-2008-004
[R].Santa Clara:NVIDIA Corporation2008.
597 didateCCF member(80204G)his research interests in-
clude high performance computingand machine learning.
(1980-) CCF (16954M ) E-mail:
jwx@bit.edu.cn JI Wei-xingborn in 1980PhDaso-
ciate profesorCCF member(16954M)his research inter- ests include paralel computingand code analysis & opti-
mization.
Akrem Benatia(1990-) E-mail:Akrem.benatia@
yahoo.com AKREM Benatiaborn in 1990PhD
:
candidatehis research interests include high performance computingand computer architecture.
(1995-) CCF (79759G ) E-mail:gjh@bit.edu.cn
GAO Jian-huaborn in 1995MS can- didateher research interest includes high
performance computing.
(1994-)
CCF (80264G ) E-mail:momentum @
bit.edu.cn LI An-minborn in 1994MS candi-
dateCCF member(80264G)his research interests include computer architectureand high performance computing.
(1979-) CCF (E200022292M ) E-mail:
frankwyz@bit.edu.cn WANG Yi-zhuoborn in 1979PhD
candidatelecturerCCF member(E200022292M )his re- search interests include computer architectureand paralel programming.
(1994-) CCF (80204G ) E -mail:tan _zn @
163.com TAN Zhao-nianborn in 1994MS can-

Whatsapp Us

[SOLVED] R C algorithm math operating system computer architecture graph software network cuda GPU OpenCL C N 4 3 -1 2 5 8 / T P 4 1 4 2 0 1 9 4 ISSN 1007-130X Computer Enginering & Science Vol.41No.4Apr.2019

Reviews

Related products

[SOLVED] COP 3223 Program #2: P2 Lottery

[SOLVED] Programming Project for TCP Socket Programming

[SOLVED] pakudex

[Solved] Problem 3: Who are the Winners

[SOLVED] Basic Calculator Client and Server