DPQR: an improved parallel DBSCAN algorithm based on data partition and QR*-tree

DPQR: an improved parallel DBSCAN algorithm based on data partition and QR*-tree

HongboXu, NianminYao, QilongHan, HaiweiPan

COMPUTER MODELLING & NEW TECHNOLOGIES 2014 18(12A) 209-214

College of Computer Science and Technology, HarbinEngineeringUniversity, Harbin 150001, Heilongjiang, China

As a typical clustering algorithm based on the density, DBSCAN shows good performance in spatial data clustering. When clustering large-scale database, DBSCAN requires the overhead of memory and I/O. With the development of high-performance computers and the appearance of cluster computers in particular, this gives us a way to solve the defects of DBSCAN. The paper presents an improved parallel DBSCAN algorithm DPQR based on data partition and QR*-tree. According to the distribution of data on one or more dimensions, the entire data space is divided into a number of local regions. Each local region is transmitted to a processing unit. The processing unit calculates local k-dist graph for local region to getthe local value Eps, and builds QR*-tree.DPQR executes the partial clustering on QR*-tree. Finally, the clustering resultsare merged in accordance with the merging rules. Experimental results show that DPQR is better than DBSCAN.