CPC: Automatically classifying and propagating natural language comments via program analysis

Juan Zhai, Xiangzhe Xu, Yu Shi, Guanhong Tao, Minxue Pan, Shiqing Ma, Lei Xu, Weifeng Zhang, Lin Tan, Xiangyu Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Scopus citations


Code comments provide abundant information that have been leveraged to help perform various software engineering tasks, such as bug detection, speciication inference, and code synthesis. However, developers are less motivated to write and update comments, making it infeasible and error-prone to leverage comments to facilitate software engineering tasks. In this paper, we propose to leverage program analysis to systematically derive, reine, and propagate comments. For example, by propagation via program analysis, comments can be passed on to code entities that are not commented such that code bugs can be detected leveraging the propagated comments. Developers usually comment on diferent aspects of code elements like methods, and use comments to describe various contents, such as functionalities and properties. To more efectively utilize comments, a ine-grained and elaborated taxonomy of comments and a reliable classiier to automatically categorize a comment are needed. In this paper, we build a comprehensive taxonomy and propose using program analysis to propagate comments. We develop a prototype CPC, and evaluate it on 5 projects. The evaluation results demonstrate 41573 new comments can be derived by propagation from other code locations with 88% accuracy. Among them, we can derive precise functional comments for 87 native methods that have neither existing comments nor source code. Leveraging the propagated comments, we detect 37 new bugs in open source large projects, 30 of which have been conirmed and ixed by developers, and 304 defects in existing comments (by looking at inconsistencies between existing and propagated comments), including 12 incomplete comments and 292 wrong comments. This demonstrates the efectiveness of our approach. Our user study conirms propagated comments align well with existing comments in terms of quality.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020
PublisherIEEE Computer Society
Number of pages13
ISBN (Electronic)9781450371216
StatePublished - Jun 27 2020
Event42nd ACM/IEEE International Conference on Software Engineering, ICSE 2020 - Virtual, Online, Korea, Republic of
Duration: Jun 27 2020Jul 19 2020

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257


Conference42nd ACM/IEEE International Conference on Software Engineering, ICSE 2020
Country/TerritoryKorea, Republic of
CityVirtual, Online

All Science Journal Classification (ASJC) codes

  • Software


Dive into the research topics of 'CPC: Automatically classifying and propagating natural language comments via program analysis'. Together they form a unique fingerprint.

Cite this