Spine: Scaling up Programming-by-Negative-Example for String Filtering and Transformation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


Program synthesis (a.k.a. programming-by-example, PBE) has been deployed in several widely-used commercial products, such as Microsoft Excel, Power BI, and Google Spreadsheet, due to its effectiveness and user-friendliness. It takes a few user-provided positive and negative examples as input and produces a program that is consistent with all the examples, which helps end-users wrangle messy texts without writing any code. In this paper, we focus on two text wrangling tasks, string filtering and transformation. Existing PBE systems for string filtering do not scale well with negative examples. This is because they first explicitly synthesize all the consistent programs and then greedily search a good one in them. However, when there are negative examples, it could take an exponential time and space to synthesize all the exponential number of consistent programs. In contrast, we propose to synthesize all the programs consistent with the positive examples first and then lazily determine whether a program is also consistent with all the negative examples on demand in the search step. For this purpose, we develop a dynamic programming algorithm to search the optimal consistent program. Many programs are never explored during dynamic programming as they are dominated by other better consistent programs. As for string transformation, existing PBE systems do not even support negative examples. Our approach naturally extends to string transformation. Experimental results show that our methods significantly outperformed the state-of-the-art string filtering and transformation approaches and achieved better scalability.

Original languageEnglish (US)
Title of host publicationSIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Number of pages10
ISBN (Electronic)9781450392495
StatePublished - Jun 10 2022
Event2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022 - Virtual, Online, United States
Duration: Jun 12 2022Jun 17 2022

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078


Conference2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Country/TerritoryUnited States
CityVirtual, Online

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems


  • data wrangling
  • filtering
  • program synthesis
  • string transformation


Dive into the research topics of 'Spine: Scaling up Programming-by-Negative-Example for String Filtering and Transformation'. Together they form a unique fingerprint.

Cite this