An unbiased crawling strategy for directed social networks
Xuehua Yang1,2, HongbinLi2
COMPUTER MODELLING&NEW TECHNOLOGIES 2014 18(12B) 585-589
1School of Software, Shenyang Normal University, Shenyang 110034, Liaoning, China
2Shenyang Institute of Computing Technology Chinese Academy of Science, Shenyang 110168, Liaoning, China
Online Social Networks (OSNs) is a hot research topic and data crawling or collection is an important and based task for OSN analysis and mining. Due to the large amount of data, not open and other factors, the acquisition of social networking is different from the ordinary crawling technology. The quality of the data determines the effect of the majority of social network data mining analysis, data crawling technology is essential. Micro-blog is different from social network such as Facebook, the need for better crawling strategies to obtain the data set is huge. Improving Random Walking (RW) algorithm, an unbiased crawling strategy is proposed to crawling directed social networks. By contrast with the uniform sampling method, the strategy has been proved to ensure data crawling with all similar data at the same time to ensure the unbiasedness of the sampling data.