The application of PLSA features in the automatic assessment system for English oral test

The application of PLSA features in the automatic assessment system for English oral test

Ding Ming 1, Dong Bin 1, Yan Yonghong 1, Ding Yousheng 2


1 The Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, No. 21 North 4th Ring Road, Haidian District, 100190 Beijing, China

2 College of Light Industry, Wuhan Polytechnic, No.463 Guan Shan Road, Hongshan District, 430000 Wuhan, China,

As an efficacious statistical tool for the analysis of co-occurrence data, the PLSA (Probabilistic Latent Semantic Analysis) is usually applied to the information retrieval. However, the theory foundation of PLSA is document data mining. So PLSA should also be a content understanding tool. In this paper, we try to develop its potential as an content assessment feature extraction tool for the auto English oral test rating system which need more precision and comprehensive content assessment. In the contrast group, word frequency which is extracted from the test data  is used to assess the content correlation in the A&Q item as a data mining feature and it has proved to be a success .But, the word frequency feature has a significant weak point: When the system lacks test data, the capability of the feature will drop sharply. Oppositely, building the PLSA model of word frequency with the data prepared before the exam and extracting the probabilistic feature from the examinee’s speech can avoid the problem above. In the result, the single dimension feature performance of PLSA feature is better than the simple word frequency feature and the assessment performance will also be improved, if the choice of PLSA model parameters is appropriate.