K-Medoids algorithm used for english sentiment classification in a distributed system

K-Medoids algorithm used for english sentiment classification in a distributed system

Vo Ngoc Phu1, Vo Thi Ngoc Tran2
COMPUTER MODELLING & NEW TECHNOLOGIES 2018 22(1) 20-39
1Nguyen Tat Thanh University, 300A Nguyen Tat Thanh Street, Ward 13, District 4, Ho Chi Minh City, 702000, Vietnam
2School of Industrial Management (SIM), Ho Chi Minh City University of Technology - HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam

Sentiment classification is significant in everyday life, such as in political activities, commodity production, and commercial activities. Finding a fast, highly accurate solution to classify emotion has been a challenge for scientists. In this research, we have proposed a new model for Big Data sentiment classification in the parallel network environment – a Cloudera system with Hadoop Map (M) and Hadoop Reduce (R). Our new model has used a K-Medoids Algorithm (PAM) with multi-dimensional vector and 2,000,000 English documents of our English training data set for English document-level sentiment classification. Our new model can classify sentiment of millions of English documents based on many English documents in the parallel network environment. However, we tested our new model on our testing data set (including 1,000,000 English reviews, 500,000 positive and 500,000 negative) and achieved 85.98% accuracy.