摘要:近年来,处理器运行速度的增长和存储器访问速度的增长之间存在着巨大的差距,这使得两者之间的速度差距越来越大,现代计算机体系结构中广泛采用高速缓冲存储器(Cache)来缓解这两者之间的速度差距。
本文根据矩阵乘法运算的六种不同程序代码,构建了矩阵乘法运算时间的测试程序,得到矩阵乘法运算六种不同版本的运行时间;并通过分析六种不同矩阵乘法运算程序代码中的空间局部性与时间局部性,得出由于高速缓冲存储器和程序访问的局部性差异,同一算法的不同程序代码运行时间相差很大。为了充分利用高速缓冲存储器,提高程序运行效率,在编写程序时需要考虑程序和数据的空间局部性和时间局部性。
为了充分利用高速缓冲存储器,论文又给出了分块矩阵乘法运算程序,它可以进一步提高矩阵乘法运算效率。
关键字:高速缓冲存储器;矩阵乘法;分块矩阵;局部性原理;时间局部性;空间局部性
Abstract:Recent years, there has been a big gap between the growth of processor and memory runs access speed, which makes the speed difference between them is more and more big . In modern computer system structure, Cache is widely used to alleviate the speed gap.
Based on the six different program code of matrix multiplication, constructs the matrix multiplication time test procedures, obtaining the running time of matrix multiplication six different versions; And through the analysis of space localized and time localized in six different program code of matrix multiplication, it is concluded that due to the cache memory and the local differences of programs access, there is a huge difference in the running time of the same algorithm of different program code. In order to make full use of cache memory and improve program efficiency, it is needed to consider the space and time localized when programming.
In order to make full use of cache memory, paper gives the program of partitioned matrix multiplication, which could further improve the matrix multiplication efficiency.
Key words: Cache; matrix multiplication; block matrix; principle of locality; temporal locality; spatial locality
本文根据矩阵乘法运算的六种不同程序代码,构建了矩阵乘法运算时间的测试程序,得到矩阵乘法运算六种不同版本的运行时间;并通过分析六种不同矩阵乘法运算程序代码中的空间局部性与时间局部性,得出由于高速缓冲存储器和程序访问的局部性差异,同一算法的不同程序代码运行时间相差很大。为了充分利用高速缓冲存储器,提高程序运行效率,在编写程序时需要考虑程序和数据的空间局部性和时间局部性。