Design and implementation of a parallel algorithm based on Hadoop platform
Qingnian Zhang, Zhao Chen, Zihui Wang
Wuhan University of Technology, China
Existing clustering algorithm is transplanted into the Hadoop cloud computing platform, through the low price on the computer cluster nodes dynamically allocate huge amounts of data distributed task, solve the enterprise needs a large amount of data storage and the problem of real time analysis results. Graphs programming model can help developers to quickly realize the parallel clustering, and do not need too much to understand the specific underlying communication realization. This article will improve the clustering algorithm, which is transplanted into graphs on the programming model, realize the parallel design, and through the error sum of squares criteria such as function test and verify the reliability of the parallel algorithm. Under the Hadoop cluster composed of four machines respectively samples of different sizes of data clustering analysis, proves that the parallel algorithm of Hadoop platform on the large data applications better speedup and scalability.