An unbiased crawling strategy for directed social networks

An unbiased crawling strategy for directed social networks

Xuehua Yang1,2, HongbinLi2


1School of Software, Shenyang Normal University, Shenyang 110034, Liaoning, China

2Shenyang Institute of Computing Technology Chinese Academy of Science, Shenyang 110168, Liaoning, China

Online Social Networks (OSNs) is a hot research topic and data crawling or collection is an important and based task for OSN analysis and mining. Due to the large amount of data, not open and other factors, the acquisition of social networking is different from the ordinary crawling technology. The quality of the data determines the effect of the majority of social network data mining analysis, data crawling technology is essential. Micro-blog is different from social network such as Facebook, the need for better crawling strategies to obtain the data set is huge. Improving Random Walking (RW) algorithm, an unbiased crawling strategy is proposed to crawling directed social networks. By contrast with the uniform sampling method, the strategy has been proved to ensure data crawling with all similar data at the same time to ensure the unbiasedness of the sampling data.