TY - JOUR

T1 - Mining blackhole and volcano patterns in directed graphs

T2 - A general approach

AU - Li, Zhongmou

AU - Xiong, Hui

AU - Liu, Yanchi

N1 - Funding Information:
Acknowledgements This research was supported in part by National Science Foundation (CCF-1018151) and National Natural Science Foundation of China (NSFC) via project numbers 70890082 and 71028002. The authors would also like to thank the editors and the anonymous reviewers for their valuable and constructive comments on this article.

PY - 2012/11

Y1 - 2012/11

N2 - Given a directed graph, the problem of blackhole mining is to identify groups of nodes, called blackhole patterns, in a way such that the average in-weight of this group is significantly larger than the average out-weight of the same group. The problem of finding volcano patterns is a dual problem of mining blackhole patterns. Therefore, we focus on discovering the blackhole patterns. Indeed, in this article, we develop a generalized blackhole mining framework. Specifically, we first design two pruning schemes for reducing the computational cost by reducing both the number of candidate patterns and the average computation cost for each candidate pattern. The first pruning scheme is to exploit the concept of combination dominance to reduce the exponential growth search space. Based on this pruning approach, we develop the gBlackhole algorithm. Instead, the second pruning scheme is an approximate approach, named approxBlackhole, which can strike a balance between the efficiency and the completeness of blackhole mining. Finally, experimental results on real-world data show that the performance of approxBlackhole can be several orders of magnitude faster than gBlackhole, and both of them have huge computational advantages over the brute-force approach. Also, we show that the blackhole mining algorithm can be used to capture some suspicious financial fraud patterns.

AB - Given a directed graph, the problem of blackhole mining is to identify groups of nodes, called blackhole patterns, in a way such that the average in-weight of this group is significantly larger than the average out-weight of the same group. The problem of finding volcano patterns is a dual problem of mining blackhole patterns. Therefore, we focus on discovering the blackhole patterns. Indeed, in this article, we develop a generalized blackhole mining framework. Specifically, we first design two pruning schemes for reducing the computational cost by reducing both the number of candidate patterns and the average computation cost for each candidate pattern. The first pruning scheme is to exploit the concept of combination dominance to reduce the exponential growth search space. Based on this pruning approach, we develop the gBlackhole algorithm. Instead, the second pruning scheme is an approximate approach, named approxBlackhole, which can strike a balance between the efficiency and the completeness of blackhole mining. Finally, experimental results on real-world data show that the performance of approxBlackhole can be several orders of magnitude faster than gBlackhole, and both of them have huge computational advantages over the brute-force approach. Also, we show that the blackhole mining algorithm can be used to capture some suspicious financial fraud patterns.

KW - Blackhole pattern

KW - Financial fraud detection

KW - Graph mining

KW - Network structure analysis

KW - Volcano pattern

UR - http://www.scopus.com/inward/record.url?scp=84865575119&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865575119&partnerID=8YFLogxK

U2 - 10.1007/s10618-012-0255-0

DO - 10.1007/s10618-012-0255-0

M3 - Article

AN - SCOPUS:84865575119

VL - 25

SP - 577

EP - 602

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 3

ER -