TY - GEN
T1 - A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling
AU - Assadi, Sepehr
AU - Kapralov, Michael
AU - Khanna, Sanjeev
N1 - Funding Information:
Supported in part by the National Science Foundation grant CCF-1617851. 2 Supported in part by ERC Starting Grant 759471. 3 Supported in part by the National Science Foundation grants CCF-1617851 and CCF-1763514.
Funding Information:
We are thankful to the anonymous reviewers of ITCS 2019 for many valuable comments. Supported in part by the National Science Foundation grant CCF-1617851. Supported in part by ERC Starting Grant 759471. Supported in part by the National Science Foundation grants CCF-1617851 and CCF-1763514.
Publisher Copyright:
© Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - In the subgraph counting problem, we are given a (large) input graph G(V, E) and a (small) target graph H (e.g., a triangle); the goal is to estimate the number of occurrences of H in G. Our focus here is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent papers which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(mρ(H)), where ρ(H) is the fractional edge-cover of H, and enumeration algorithms with matching runtime are known for any H. We bridge this gap between subgraph counting and subgraph enumeration by designing a simple sublinear-time algorithm that can estimate the number of occurrences of any arbitrary graph H in G, denoted by #H, to within a (1 ± ε)-approximation with high probability in O(m#ρ(HH) ) · poly(log n, 1/ε) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries.
AB - In the subgraph counting problem, we are given a (large) input graph G(V, E) and a (small) target graph H (e.g., a triangle); the goal is to estimate the number of occurrences of H in G. Our focus here is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent papers which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(mρ(H)), where ρ(H) is the fractional edge-cover of H, and enumeration algorithms with matching runtime are known for any H. We bridge this gap between subgraph counting and subgraph enumeration by designing a simple sublinear-time algorithm that can estimate the number of occurrences of any arbitrary graph H in G, denoted by #H, to within a (1 ± ε)-approximation with high probability in O(m#ρ(HH) ) · poly(log n, 1/ε) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries.
KW - AGM bound
KW - Subgraph counting
KW - Sublinear-time algorithms
UR - http://www.scopus.com/inward/record.url?scp=85069190653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069190653&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ITCS.2019.6
DO - 10.4230/LIPIcs.ITCS.2019.6
M3 - Conference contribution
AN - SCOPUS:85069190653
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 10th Innovations in Theoretical Computer Science, ITCS 2019
A2 - Blum, Avrim
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 10th Innovations in Theoretical Computer Science, ITCS 2019
Y2 - 10 January 2019 through 12 January 2019
ER -