KNN question classification method based on Apriori algorithm
Caixian Chen1, 2, Huijian Han2, Zheng Liu1, 2
1School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, Shandong, China
2Shandong Provincial Key Laboratory of Digital Media Technology, Shandong University of Finance and Economics, Jinan 250014, Shandong, China
KNN (K-Nearest Neighbours) algorithm is a classification algorithm that can apply to question classification. However, its time complexity will increase linearly with the increase of training set size, which constrains the actual application effects of this algorithm. In this paper, based on a discussion of disadvantages of traditional KNN methods, an improved KNN algorithm based on Apriori algorithm was proposed. This method extracts the frequent feature set of training samples of different categories and the associated samples. Next, on the basis of correlation analysis of each category of samples, a proper nearest neighbour number k was determined for an unknown category of questions. In the training samples of known categories, k nearest neighbours were selected. And then, according to the category of nearest neighbours, the category of unknown question was identified. Compared with the question classification method of traditional KNN, the improved method could efficiently determine the value of k and decrease time complexity. Our experimental results demonstrated that the improved KNN question classification method improved the efficiency and accuracy of question classification.