Bipartite graph model for RDF data cleansing
Huang Li1, 2
COMPUTER MODELLING & NEW TECHNOLOGIES 2014 18(2) 99-106
1 College of Computer Science and Technology, Wuhan University of Science and Technology
2 Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System
Many systems use RDF to describe information resources and semantic associations between resources. RDF data plays a very important role in advanced information retrieval. Due to diversity and imprecise of resources, duplicates exist in RDF data. The query and retrieval of RDF data are studied by many researchers. However, researchers seldom study RDF data cleansing. In this paper, we focus on RDF data cleansing. According to the features of RDF data, we propose a new approach. This approach combines similarity and connections among resources. First, we introduce an intermediate model, named RDF-Bipartite Graph model, to represent the RDF data. This model improves from Bipartite Statement-Value Graphs model. Then on the proposed model, we design a Subgraph-Extend method, to find the path connecting two nodes. This method detects the minimum subgraph containing two nodes for connect-path finding. It avoids the connect-weight setting in traditional method. Experiments on publication datasets show that the proposed method is efficient in duplicate detection of RDF data, and has high performance and accuracy.