Implementation of neotype simple Bayesian algorithm by mapreduce and the application of discreteness and continuity in data mining
COMPUTER MODELLING & NEW TECHNOLOGIES 2014 18(12B) 454-459
Zhengzhou University of Industrial Technology, Zhengzhou, China
MapReduce is a programming model that can run in a heterogeneous environment. Its programming is simple and used for the parallel arithmetic of large-scale data sets. We do not need worry about the underlying implementation details. MapReduce is applied into the three arithmetic of data mining: simple Bayesian algorithm, K-modes clustering algorithm and ECLAT frequent item set mining algorithm. This paper put forward an improved simple Bayesian algorithm which was implemented by MapReduce based on MapReduce programming model and the existing research. It could deal with the application of data mining which both with the nature of discreteness and continuity. At the same time, combined with the ideas of each algorithm and the running mechanism of MapReduce, this paper put forward K-modes clustering algorithm and ECLAT frequent item set mining algorithm which was implemented by MapReduce. These implementations expanded the application range of the two algorithms from stand-alone to cloud computing platform. When facing huge amounts of data, it can effectively improve the work efficiency of the algorithm.