The Case for Learned Provenance Graph Storage Systems

Hailun Ding, Juan Zhai, Dong Deng, Shiqing Ma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Cyberattacks are becoming more frequent and sophisticated, and investigating them becomes more challenging. Provenance graphs are the primary data source to support forensics analysis. Because of system complexity and long attack duration, provenance graphs can be huge, and efficiently storing them remains a challenging problem. Existing works typically use relational or graph databases to store provenance graphs. These solutions suffer from high storage overhead and low query efficiency. Recently, researchers leveraged Deep Neural Networks (DNNs) in storage system design and achieved promising results. We observe that DNNs can embed given inputs as context-aware numerical vector representations, which are compact and support parallel query operations. In this paper, we propose to learn a DNN as the storage system for provenance graphs to achieve storage and query efficiency. We also present novel designs that leverage domain knowledge to reduce provenance data redundancy and build fast-query processing with indexes. We built a prototype LEONARD and evaluated it on 12 datasets. Compared with the relational database Quickstep and the graph database Neo4j, LEONARD reduced the space overhead by up to 25.90x and boosted up to 99.6% query executions.

Original languageEnglish (US)
Title of host publication32nd USENIX Security Symposium, USENIX Security 2023
PublisherUSENIX Association
Pages3277-3294
Number of pages18
ISBN (Electronic)9781713879497
StatePublished - 2023
Event32nd USENIX Security Symposium, USENIX Security 2023 - Anaheim, United States
Duration: Aug 9 2023Aug 11 2023

Publication series

Name32nd USENIX Security Symposium, USENIX Security 2023
Volume5

Conference

Conference32nd USENIX Security Symposium, USENIX Security 2023
Country/TerritoryUnited States
CityAnaheim
Period8/9/238/11/23

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'The Case for Learned Provenance Graph Storage Systems'. Together they form a unique fingerprint.

Cite this