摘要:随着电子商务的发展及应用,互联网上陈列了越来越多的商品信息,但是仅有极少部分为用户所关注,如何为用户准确有效地筛选信息成为了电子商务领域的一大热点问题。也因此,推荐系统的研究工作备受关注。协同过滤算法在推荐系统中有很重要的应用, 它的性能直接影响了推荐系统的工作效率。随着推荐系统的增大,数据规模与稀疏性越来越大,如何减少由于数据数据稀疏度所带来的影响,成为近些年来协同过滤算法研究的重点方向。目前的解决方法主要采用大型矩阵的降维技术,其中使用最为普遍和成功的就是基于SVD矩阵分解算法。但是,基于传统SVD算法的特性,导致协同计算极为复杂,占用内存大,运算时间长,这样大大限制了它的实际应用。因此,本文探讨了一种新型的SVD算法RSVD,它将在语义分析系统中应用效果较好的RI随机索引技术,RRI两次随机索引算法与SVD结合,用RI对数据进行预处理,对SVD奇异值分解进行向量空间优化。RSVD在movie lens电影数据集上的实验结果表明,RSVD提高了推荐结果的精确度,减少了运算时间,同时提高了算法的可计算度。
关键字:推荐系统,协同过滤,降维,SVD(奇异值分解),RI(随机索引),RRI(二次随机索引),RSVD
Abstract:With the development of e-commerce, the commodity information of all kinds has been growing rapidly, but only a little part of it is useful to a certain user. Then how to pick up useful information efficiently and accurately for users has become a hot spot in e-commerce field and it is also the reason that recommender systems attract much attention. Collaborative filtering algorithm has been seen wildly applicated in recommender systems ,whose performance directly influences recommender systems. The main study of collaborative filtering focuses on how to reduce the negative effect brought by data sparsity . The proposed solution is to use matrix dimensionality reduction technique. However, SVD is limited practical by its high computing complexity, much heavy memory space and costs as well as running time. In this paper, we discuss a new SVD algorithm --- RSVD. In this algorithm, traditional SVD is combined with well-performed RI or RRI algorithm, in which RI or RRI is used for data preconditioning and SVD vector space optimization. The RSVD experiments on movie lens dataset indicate that RSVD improves the recommendation accuracy, reduces the running time and computing complexity.
Keywords:Recommender system, Collaborative filtering, Dimensionality reduction, SVD ,Random Indexing, Reflective Random Indexing, RSVD
本论文中,预期类比实际应用中处理数据的模式,采用一个稀疏矩阵,模拟仿真,代码实现几种算法,记录实验结果,并做数据分析,检验算法的性能。而关于RSVD算法的基础思想是用RI技术对矩阵进行初步的处理,也可以说是采样,得到一个向量空间的基,最终提供全套的近似SVD矩阵。
应用RSVD算法,期望其改进算法的精确度,减少资源的消耗,同时,期望其能在提高推荐质量的基础上,使算法运算效率有所改进。关于其实现过程,在后文中,有具体说明。