TY - GEN
T1 - The Case for Learned Provenance Graph Storage Systems
AU - Ding, Hailun
AU - Zhai, Juan
AU - Deng, Dong
AU - Ma, Shiqing
N1 - Publisher Copyright:
© USENIX Security 2023. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Cyberattacks are becoming more frequent and sophisticated, and investigating them becomes more challenging. Provenance graphs are the primary data source to support forensics analysis. Because of system complexity and long attack duration, provenance graphs can be huge, and efficiently storing them remains a challenging problem. Existing works typically use relational or graph databases to store provenance graphs. These solutions suffer from high storage overhead and low query efficiency. Recently, researchers leveraged Deep Neural Networks (DNNs) in storage system design and achieved promising results. We observe that DNNs can embed given inputs as context-aware numerical vector representations, which are compact and support parallel query operations. In this paper, we propose to learn a DNN as the storage system for provenance graphs to achieve storage and query efficiency. We also present novel designs that leverage domain knowledge to reduce provenance data redundancy and build fast-query processing with indexes. We built a prototype LEONARD and evaluated it on 12 datasets. Compared with the relational database Quickstep and the graph database Neo4j, LEONARD reduced the space overhead by up to 25.90x and boosted up to 99.6% query executions.
AB - Cyberattacks are becoming more frequent and sophisticated, and investigating them becomes more challenging. Provenance graphs are the primary data source to support forensics analysis. Because of system complexity and long attack duration, provenance graphs can be huge, and efficiently storing them remains a challenging problem. Existing works typically use relational or graph databases to store provenance graphs. These solutions suffer from high storage overhead and low query efficiency. Recently, researchers leveraged Deep Neural Networks (DNNs) in storage system design and achieved promising results. We observe that DNNs can embed given inputs as context-aware numerical vector representations, which are compact and support parallel query operations. In this paper, we propose to learn a DNN as the storage system for provenance graphs to achieve storage and query efficiency. We also present novel designs that leverage domain knowledge to reduce provenance data redundancy and build fast-query processing with indexes. We built a prototype LEONARD and evaluated it on 12 datasets. Compared with the relational database Quickstep and the graph database Neo4j, LEONARD reduced the space overhead by up to 25.90x and boosted up to 99.6% query executions.
UR - http://www.scopus.com/inward/record.url?scp=85176152264&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85176152264&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85176152264
T3 - 32nd USENIX Security Symposium, USENIX Security 2023
SP - 3277
EP - 3294
BT - 32nd USENIX Security Symposium, USENIX Security 2023
PB - USENIX Association
T2 - 32nd USENIX Security Symposium, USENIX Security 2023
Y2 - 9 August 2023 through 11 August 2023
ER -