关键词:多标签分类, k近邻, k/2法, 离散Bayes法, 线性阈值函数, 多输出线性回归, Logistic回归
Abstract:In multi-label learning, each training instance is associated with a label set, and the task is to predict the label set for each unknown instance. k nearest neighbor method is a classic single-label classification method. To determine the category of the unknown instance, it calculates the distance between the unknown instance and the training ones, and selects the top k instances as its k nearest neighbors, then votes for each label according to k nearest neighbors' label information. k nearest neighbor method can be extended to solve multi-label classification problems but post-processing is a critical problem. In this paper, five post-processing method including k/2 method, discrete Bayesian method, linear threshold function method, multi-output linear regression and Logistic regression will be realized by programming and tested in three data-sets (Yeast, Image and Scene). Experiments show the five methods all have excellent performance. Discrete Bayesian method, multi-output linear regression and logistic regression work better. Further, different distances have a certain impact on the algorithm performance.
Key words: k nearest neighbor method; multi-label classification problem; k/2 method; discrete Bayesian method; linear threshold function method; multi-output linear regression ; logistic regression