C2S: Translating natural language comments to formal program specifications

Juan Zhai, Yu Shi, Minxue Pan, Guian Zhou, Yongxiang Liu, Chunrong Fang, Shiqing Ma, Lin Tan, Xiangyu Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Formal program specifications are essential for various software engineering tasks, such as program verification, program synthesis, code debugging and software testing. However, manually inferring formal program specifications is not only time-consuming but also error-prone. In addition, it requires substantial expertise. Natural language comments contain rich semantics about behaviors of code, making it feasible to infer program specifications from comments. Inspired by this, we develop a tool, named C2S, to automate the specification synthesis task by translating natural language comments into formal program specifications. Our approach firstly constructs alignments between natural language word and specification tokens from existing comments and their corresponding specifications. Then for a given method comment, our approach assembles tokens that are associated with words in the comment from the alignments into specifications guided by specification syntax and the context of the target method. Our tool successfully synthesizes 1,145 specifications for 511 methods of 64 classes in 5 different projects, substantially outperforming the state-of-the-art. The generated specifications are also used to improve a number of software engineering tasks like static taint analysis, which demonstrates the high quality of the specifications.

Original languageEnglish (US)
Title of host publicationESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsPrem Devanbu, Myra Cohen, Thomas Zimmermann
PublisherAssociation for Computing Machinery, Inc
Pages25-37
Number of pages13
ISBN (Electronic)9781450370431
DOIs
StatePublished - Nov 8 2020
Event28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 - Virtual, Online, United States
Duration: Nov 8 2020Nov 13 2020

Publication series

NameESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Country/TerritoryUnited States
CityVirtual, Online
Period11/8/2011/13/20

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • Comment
  • Formal Specification
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'C2S: Translating natural language comments to formal program specifications'. Together they form a unique fingerprint.

Cite this