Data-Driven Learning of Invariants and Specifications

Saswat Padhi

⟨ PhD Dissertation 2020 ⟩

  PDF
Abstract

Although the program verification community has developed several techniques for analyzing software and formally proving their correctness, these techniques are too sophisticated for end users and require significant investment in terms of time and effort. In this dissertation,I present techniques that help programmers easily formalize the initial requirements for verifying their programs – specifications and inductive invariants. The proposed techniques leverage ideas from program synthesis and statistical learning to automatically generate these formal requirements from readily available program-related data, such as test cases, execution traces etc. I detail three of these data-driven learning techniques – FlashProfile and PIE for specification learning, and LoopInvGen for invariant learning.

I conclude with some principles for building robust synthesis engines, which I learned while refining the aforementioned techniques. Since program synthesis is a form of function learning, it is perhaps unsurprising that some of the fundamental issues in program synthesis have also been explored in the machine learning (ML) community. I study one particular phenomenon – overfitting. I present a formalization of overfitting in program synthesis, and discuss two mitigation strategies, inspired by existing ML techniques.

BibTeX Citation
@phdthesis{ucla20/padhi/data,
  title   = {Data-Driven Learning of Invariants and Specifications},
  author  = {Saswat Padhi},
  year    = {2020},
  school  = {University of California, Los Angeles}
}