Scalable Heterogeneous Graph Neural Networks for Predicting High-potential Early-stage Startups

Shengming Zhang, Hao Zhong, Zixuan Yuan, Hui Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

It is critical and important for venture investors to find high-potential startups at their early stages. Indeed, many efforts have been made to study the key factors for the success of startups through the topological analysis of the heterogeneous information network of people, startup, and venture firms or representation learning of latent startup profile features. However, the existing topological analysis lacks an in-depth understanding of heterogeneous information. Also, the approach based on representation learning heavily relies on domain-specific knowledge for feature selections. Instead, in this paper, we propose aScalable Heterogeneous Graph Markov Neural Network (SHGMNN) for identifying the high-potential startups. The general idea is to use graph neural networks (GNN) to learn effective startup representations through end-to-end efficient training and model the label dependency among startups through Maximum A Posterior (MAP) inference. Specifically, we first define different metapaths to capture various semantics over the heterogeneous information network (HIN) and aggregate all semantic information into a summated graph structure. To predict the high-potential early-stage startups, we introduce GNN to diffuse the information over the summated graph. We then adopt an MAP inference over Hinge-Loss Markov Random Fields to enforce label dependency. Here, a pseudolikelihood variational expectation-maximization (EM) framework is incorporated to optimize both MAP inference and GNN iteratively: The E-step calculates the inference, and the M-step updates the GNN. For efficiency concerns, we develop a GNN with a lightweight linear diffusion architecture to perform graph propagation over web-scale heterogeneous information networks. Finally, extensive experiments and case studies on real-world datasets demonstrate the superiority of SHGMNN.

Original languageEnglish (US)
Title of host publicationKDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2202-2211
Number of pages10
ISBN (Electronic)9781450383325
DOIs
StatePublished - Aug 14 2021
Event27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021 - Virtual, Online, Singapore
Duration: Aug 14 2021Aug 18 2021

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
Country/TerritorySingapore
CityVirtual, Online
Period8/14/218/18/21

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • business intelligence
  • graph embedding
  • graph mining
  • graph neural networks
  • heterogeneous information networks
  • markov random fields
  • representation learning
  • startup success prediction

Fingerprint

Dive into the research topics of 'Scalable Heterogeneous Graph Neural Networks for Predicting High-potential Early-stage Startups'. Together they form a unique fingerprint.

Cite this