Algorithm design and the application for cluster validity based on geometric probability

Algorithm design and the application for cluster validity based on geometric probability

Jian-Wei Li1, Xiao-Wen Li2

1College of physics and information engineering, Fuzhou University, Fuzhou 350108
2College of mathematics and computer science, Longyan University, Longyan 364012

Determining optimum cluster number is a key research topic included in cluster validity, a fundamental problem unsolved in cluster analysis. In order to determine the optimum cluster number, this article proposes a new cluster validity function for two-dimension datasets theoretically based on geometric probability. The function makes use of the corresponding relationship between a two-dimension dataset and the related two-dimension discrete point set to measure the cluster structure of the dataset according to the distributive feature of the point set in the characteristic space. It is designed from the perspective of intuition and thus easily understood. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square, which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.