An algorithm for mining frequent itemsets on uncertain dataset
Si Tian1, Shui Wang1, Yang Liu2, Le Wang1
1School of Information Engineering, Ningbo Dahongying University, Ningbo, Zhejiang, China, 315175
2School of Innovation Experiment, Dalian University of Technology, Liaoning, China, 116024
Mining frequent itemsets from uncertain transaction dataset is a research topic in data mining. Some algorithms are based on FP-Growth, but they construct the tree structure in a manner that cannot be as compact as the original FP-Tree, so the tree is easily developed to huge size and this hinders their performance. In this paper, we propose a new tree structure called IT-Tree (Itemset Tail-node Tree) to efficiently maintain probability information of itemsets in tail-nodes; we also propose a corresponding algorithm IT-Mine to mine frequent itemsets from IT-Tree without additional dataset scans. We evaluate our approach on real sparse and dense datasets with different minimum support numbers that can produce non-null frequent k-itemsets (k≥2); the results show that IT-Mine outperforms other algorithms in terms of execution time, especially for large dataset or small minimum expected support number.