Files in this item

FilesDescriptionFormat

application/pdf

application/pdfALMASRI-THESIS-2018.pdf (2MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:On implementing sparse matrix-vector multiplication on intel platform
Author(s):AlMasri, Mohammad
Advisor(s):Hwu, Wen-Mei W; Abu-Sufah, Walid
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):SpMV, SIMD, CCF, CSR, I-e, MKL, OpenMP, Skylake, KNL
Abstract:Sparse matrix-vector multiplication, SpMV, can be a performance bottle-neck in iterative solvers and algebraic eigenvalue problems. In this thesis, we present our sparse matrix compressed chunk storage format (CCF) and SpMV CCF kernel that realizes high performance on Intel Xeon multicore and Phi processors for unstructured matrices. CCF kernel exploits the properties of CCF to enhance load balancing and SIMD efficiency. Moreover, we present the CCF auto-tuner that selects the most effective parameters and the SpMV kernel to achieve the highest possible performance that CCF can attain on a target architecture. Using 151 unstructured matrices from 38 application areas, we compare the performance of the CCF kernel to that of MKL 2018u1 SpMV CSR, MKL 2018u2 Inspector executor SpMV CSR, and Compressed Vectorization-oriented sparse Row (CVR) SpMV. We execute the kernels on a dual 24-core Skylake Xeon Platinum 8160 and a 68-core KNL Xeon Phi 7250. Executing on the dual 24-core Skylake Xeon Platinum 8160, and compared to MKL SpMV CSR, our kernel achieves superior execution throughputs for 135 matrices (89%) with an average speed improvement of 2.3x and maximum speed improvement of 27.5x. Our kernel outperforms MKL Inspector-executor SpMV CSR for 109 matrices (73%) with an average speed improvement of 1.5x and maximum speed improvement of 3.0x. Moreover, SpMV CCF outperforms SpMV CVR for 81% of the matrices with an average speed improvement of 1.8x and maximum speed improvement of 4.2x. Executing on the 68-core KNL Xeon Phi 7250, CCF achieves high average and maximum speed improvements compared to the other three kernels but for slightly smaller percentages of matrices. Lastly, we show that auto-tuning CCF parameters improves the performance for more than 50 matrices compared to the default CCF on Skylake and KNL with an average speed improvement of 1.2x.
Issue Date:2018-07-19
Type:Text
URI:http://hdl.handle.net/2142/101729
Rights Information:2018 Mohammad Almasri
Date Available in IDEALS:2018-09-27
Date Deposited:2018-08


This item appears in the following Collection(s)

Item Statistics