A hybrid multi-class text categorization based on SVM-DT

A hybrid multi-class text categorization based on SVM-DT

Ying Fang1, 2, Heyan Huang1

1College of Computer, Beijing Institute of Technology, Beijing, China, 100081

2School of computer & technology, ShangQiu Normal College, ShangQiu HeNan, China, 476000

How to improve the text categorization efficiency as well as keeping high speed is a research problem. Several factors are effected the processing of the decision tree construction, such as, the degree, the balancing degree, the constructing way, the group number and the division degree between groups etc. Considered the various roles between the above factors, a comprehensive algorithm to construct the SVM-DT (Support Vector Machine - Decision Tree) is proposed. In this method, three conditions are considered respectively. The text categorization experiments on massive corpus demonstrate that the algorithm can improve the efficiency in some degree and decrease the training and testing time largely at the same time. The algorithm to construct the SVM-DT is feasible and adaptable.