Saswat Padhi
Data-Driven Learning of Invariants and Specifications
Computer Science   @   UCLA

Data-Driven Learning of

Invariants and Specifications



Saswat Padhi

UCLA logo University of California, Los Angeles, USA

DISPLAY$ \newcommand{\argmin}{\mathop{\text{arg}\,\text{min}}} $DISPLAY

Software Reliability

Research Artifacts

Verified OS Kernel G. Klein et al. SOSP ' 09 CompCert Verified C Compiler X. Leroy POPL ' 06 Verified File System H. Chen et al. SOSP ' 14 Quark Verified Web Browser D. Jang et al. USENIX Security ' 14

CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task.

— X. Yang et al. “Finding and Understanding Bugs in C Compilers.” PLDI’11

Industrial Products

Shellshock (2014) Heartbleed (2014) Badlock (2016) glibc Ghost (2015)

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

— Disclaimer from the GNU Public License (GPLv3)

Code reviews, extensive testing, continuous integration/deployment (CI/CD), reproducible builds, quality assurance (QA) … Not enough!

  • What limits the adoption of software verification?
  • How can we bridge this gap between research and practice?
  • The research community has produced fully verified prototypes of large systems
  • However, despite extensive testing, code review etc. major bugs are discovered regularly even in mature industrial products
  • My research was motivated by these two questions …

Verification

Program Specification Verifier Inductive Invariants ☑ Certified correct ☒ Bug found! ??

Stating these formal properties accurately
is extremely onerous, even for experts

crafting a correct specification (especially one using an obscure formal system) is often much more difficult than writing the program to be proved (even one written in an obscure programming language)

— B. Hailpern et al. “Software Debugging, Testing, and Verification.” IBM Systems Journal ' 02

Existing tools provide little or no support in generating correct specifications and invariants

Research Focus

Program Specification Verifier Inductive Invariants ☑ Certified correct ☒ Bug found! ??

Drastically reduce the cost of building verified software by automated synthesis of formal properties

Program, execution traces, test cases, etc. Synthesizer Properties Likely property Domain knowledge, constraints, etc. Likely property Counterexample Verifier inspired by Syntax-Guided Synthesis inspired by Counterexample-Guided Inductive Synthesis

$^\star$ inspired by two emergent techniques — CEGIS (A. Solar-Lezama. STTT ' 13) and SyGuS (R. Alur et al. FMCAD ' 13)

Overview of Projects

Specification Synthesis

PIE

PLDI 2016$\,^\star$

Feature learning for black-box precondition inference

Adopted by CVC4, Alive-Infer (used by LLVM)

FlashProfile

OOPSLA 2018

Input specifications for
data-processing workflows

Released as part of Microsoft PROSE SDK 

Invariant Synthesis

LoopInvGen

PLDI 2016$\,^\star$

Feature learning for white-box loop invariant inference

Winning solver at SyGuS - Comp 2017, 2018

Hanoi

PLDI 2020

Representation invariants for data-type implementations

Received an ACM SIGPLAN Distinguished Paper Award

Overfitting in Synthesis

CAV 2019

Mitigating a fundamental trade-off in data-driven synthesis

$^\star$ presented in the same PLDI 2016 paper

FlashProfile

S. Padhi, P. Jain, D. Perelman, O. Polozov, S. Gulwani, T. Millstein · OOPSLA 2018 · UCLA logo Microsoft logo

Real-life datasets are often incomplete and inconsistent

Dataset Profiler Data Profile Candidate profiles Intents and requirements

Reference ID
PMC9035311
doi:␣10.1016/S1387-
7003(03)00113-8
ISBN:␣2-287-34069-6
⋮       ⋮       ⋮
ISBN:␣0-006-08903-1
PMC9473786
⋮       ⋮       ⋮
doi:␣␣10.13039/100005795
ISBN:␣1-158-23466-X
not_available
PMC9035311
  • data format$\,_1$
  • data format$\,_2$
  • data format$\,_3$
  • data format$\,_4$

Process a large dataset

Write a script manually

Programming by examples

Manually discover and handle all data formats

Provide a representative
set of examples

FlashProfile

Data Profiles

User-defined custom patterns
Reference ID
PMC9035311
doi:␣10.1016/S1387-
7003(03)00113-8
ISBN:␣2-287-34069-6
⋮     ⋮     ⋮
ISBN:␣0-006-08903-1
PMC9473786
⋮     ⋮     ⋮
doi:␣␣10.13039/100005795
ISBN:␣1-158-23466-X
not_available
PMC9035311

Microsoft SSDT

  • 1024   PMC D+
  •  204   ISBN: ␣ 0- D D D - D D D D D - D
  •  113   .*
  •  110   doi: ␣+ 10. D D D D D / D+

Ataccama One

  • 1024   W N
  •  267   W : ␣ D - N - N - D
  •  110   W : ␣ N . N / N
  •   34   W : ␣ D - N - N - L
  •   11   W : ␣ N . N / L N - N ( N ) N - D
  •    5   W _ W

FlashProfile

  • 1024   PMC D7
  •  267   ISBN: ␣ D - D3 - D5 - D
  •  110   doi: ␣+ 10.13039/ D+
  •   34   ISBN: ␣ D - D3 - D5 -X
  •   11   doi: ␣+ 10.1016/ U D4 - D4 ( D2 ) D5 - D
  •    5   not_available 
User-controlled refinement
  • 1024   PMC D7
  •  301   ISBN: ␣ [ISBN] 
  •  121   doi: ␣+ [DOI]
  •    5   not_available 

FlashProfile

Key Challenges

The space of profiles is inherently ambiguous …

Is “1824” more similar to “1879” or “1924”?

  • Character-wise similarity measures have a fixed bias
    • e.g. “1824” is more similar to “1924”
  • A fixed set of base patterns also results in an a priori bias
    • e.g. inseparable given only D+
  • Must allow custom patterns for domain-specific bias
    • e.g. “1824” is more similar to “1879” given  [1800s] = 18.*

… and is extremely large.

Exponentially many ways of partitioning a dataset

Exponentially many ways of generalizing strings to patterns

FlashProfile

Formalization

  • Profiling = Clustering + Synthesis

DISPLAY$ \mathsf{Synth} : \mathtt{String[\,]} \to \mathtt{Pattern[\,]} \quad;\quad \mathsf{Cost} : \mathtt{Pattern} \times \mathtt{String[\,]} \to \mathtt{Real} $DISPLAY

DISPLAY$ \mathsf{Profile}(S,k) \triangleq \big\{ \argmin\limits_{P \,\raise{0.125ex}{\in}\, \mathsf{Synth}(S_i)} \! \mathsf{Cost}(P,S_i) : S_i \in \mathsf{Clusters}(S,k) \big\} $DISPLAY

$ \mathsf{Clusters}(S,k) \triangleq \argmin\limits_{\substack{ \small \{ S_1, \ldots, S_k \} \\ \text{s.t.} \: S = \bigsqcup\limits_{i=1}^{k} S_i }} \: \sum\limits^{k}_{i=1} \Omega(S_i) $     … $\Omega(S_i):$ objective function

  • $\Omega$ should capture pattern-based similarity within $S_i$

DISPLAY$ \Omega_\text{ideal}(S_i) \triangleq \hspace{-0.5em}\min\limits_{\,P \,\raise{0.125ex}{\in}\, \mathsf{Synth}(S_i)} \mathsf{Cost}(P,S_i) $DISPLAY

    • $\mathsf{Cost}$ of the best pattern describing the entire cluster
  • First we formalize profiling as a problem of clustering followed by pattern learning
  • Second, we observe that the cluster itself must be pattern-aware. For example, one cannot cluster based on edit distance and then learn patterns for each cluster.

FlashProfile

Tractable Approximation

Clustering with $\Omega_\text{ideal}$ requires $O(2^{|S|})$ synthesis calls

  • A tractable objective that works well in practice:

DISPLAY$ \Omega_\text{CL}(S_i) \triangleq\!\! \max_\limits{\,x,y\,\raise{0.125ex}{\in}\,S_i} \,\min_\limits{\,P \,\raise{0.125ex}{\in}\, \mathsf{Synth}({x,y})} \mathsf{Cost}(P,{x,y}) $DISPLAY

    • Worst-case $\mathsf{Cost}$ of the best pattern describing a pair of strings
    • Inspired by the widely used$^\star$ complete-linkage clustering method
    • Only need to evaluate all pairs of strings; $ O(|S|^2) $ synthesis calls
  • Still prohibitively expensive for real-life datasets …
    1. Profile a small random subset, add to the cumulative profile
    2. Recursively profile strings that do not match the cumulative profile
    3. Repeatedly merge two most similar partitions if profile size $> k$

$^\star$ See survey by A. K. Jain et al. CSUR ' 99

FlashProfile

Architecture

Dataset Size bound FlashProfile Custom patterns Thresholds Profile Clustering Synthesis Candidate clusters Patterns & costs
  • Agglomerative Hierarchical Clustering (AHC)
    • An internal node corresponds to a subset of the dataset (annotated with a pattern)
    • Split hierarchy at depth $k$ for $k$ partitions

(an example hierarchy)

  • A small regex-like grammar for patterns
    • Top-down recursive synthesis
      • Patterns match the prefix of all strings
      • Recursively learn patterns for unmatched suffixes
      • Accept if no leftover suffixes, else reject
    • Implemented using Microsoft PROSE

DISPLAY$ \begin{array}{rl} \text{Constant strings} & \;\text{Character classes} \\[0.25em] \text{Regular exprs.} & \;\text{String predicates} \end{array} $DISPLAY

(kinds of base patterns in FlashProfile)

FlashProfile

Results & Impact

Profiling time: little correlation with dataset size; increases with string length

  • Soundness and relative completeness on any user-defined patterns
  • Can significantly improve programming-by-example (PBE) workflows

    • Reduces the number of examples required for synthesis
    • FlashFill achieves minimal examples for 84% of benchmarks
  • Publicly released as Matching.Text C# library within the Microsoft PROSE framework

Automated Feature Learning

S. Padhi, R. Sharma, T. Millstein · PLDI 2016 · UCLA logo Microsoft logo

Data Synthesizer Properties [Prior techniques] P. Garg et al. CAV ' 14, R. Sharma et al. CAV ' 14, T. Gehr et al. CAV ' 15, P. Garg et al. POPL ' 16, … Features over data $\{ f_1, \ldots, f_n \}$ $ (f_1 \vee \neg f_3) \wedge f_2 \ldots $
  • Users require prior knowledge
  • Limits expressiveness of the technique

Solution: Learn features automatically on demand

PIE

Black-box precondition inference

LoopInvGen

White-box loop invariant inference

PIE

Precondition Inference

assume Pre Code assert Post

Given Code and Post,
find Pre s.t. {Pre} Code {Post}

Limitations of Static Analyses

  Source code unavailable
  Source code not analyzable
  Postcondition not analyzable

Guess precondition from test executions

$t_1 = \mathtt{("pie", 1, -1)}$ $t_2 = \mathtt{("xy", 2, 3)}$ $t_3 = \mathtt{("a", -1, 3)}$ $t_4 = \mathtt{("siva", 1, 2)}$ substring(s, i, l) no exception $t_4$ $t_1, t_2, t_3$ Boolean-Function Learner Feature Set (l != 0), (l ≥ 0), (i ≥ 0), (s != ""), (i + l ≤ s.length) … $p(t_i) = \mathtt{true}$ $p(t_i) = \mathtt{false}$ $p = $ (i ≥ 0) ∧ (l ≥ 0) ∧ (i + l ≤ s.length)

PIE

Issue with Fixed Features

Test case Feature Set
(s, i, l) i ≥ 0 l ≥ 0
("pie", 1, -1)
("xy", 2, 3)
("a", -1, 3)
("siva", 1, 2)
  • Code: substring(s, i, l)
    Extract a substring of length l from index i of string s

  • Post: no exceptions
s : i l
  • Consider feature set:

{ i ≥ 0, l ≥ 0 }

No separator exists
over the given features!

Conflict: Same feature vector for a positive and a negative test

PIE

Conflict Resolution

Test case Feature Set
(s, i, l) i ≥ 0 l ≥ 0 f
("pie", 1, -1)
("xy", 2, 3)
("a", -1, 3)
("siva", 1, 2)

A new feature must be added to resolve a conflict

A new feature never creates new conflicts

Feature Synthesis Spec
Input Output
("xy", 2, 3) false
("siva", 1, 2) true

Grammar-based exhaustive enumeration in increasing order of expression size

Learned f : i + l ≤ s.length

Learned Pre : i ≥ 0 ∧ l ≥ 0 ∧ f

PIE

Architecture

Tests Executable Assertion +ve Tests -ve Tests Boolean-Function Learner Synthesizer Conflict Feature Precondition Random Test Generation PAC k-CNF Learning Exhaustive Enumeration

PIE

Properties

Black-box use of synthesizer and Boolean function learner

Sufficiency and Necessity

Inferred preconditions are guaranteed to be both sufficient and necessary at least up to given test cases

Strong Convergence

If a sufficient and necessary precondition exists in the synthesizer’s grammar, then PIE will find it in finite time

PIE

PIE Verified PIE

Tests Executable Source Code Assertion Assertion +ve Tests -ve Tests Boolean-Function Learner Synthesizer Conflict Feature Precondition Precondition Verifier Counterexample Verified Precondition

CEGIS

Counterexample-Guided Inductive Synthesis
(A. Solar-Lezama. STTT ' 13)

Preserves strong convergence

Source code must be loop-free

LoopInvGen

Verifying Loopy Code

assume Pre Code assert Post

Given Pre, Code and Post,
validate {Pre} Code {Post}

Using Dijkstra’s weakest precondition,

DISPLAY$ \large\mathsf{Pre} \Rightarrow \mathbf{WP}(\mathsf{Code}, \mathsf{Post}) $DISPLAY

In presence of unbounded control flow, computing $\mathbf{WP}$ requires inductive invariants

  int square(n) {
      assume (n > 0);
      s = 1;
      k = r = 0;
      while(k != n) {
          r += s;
          s += 2;
          k++;
      }
      assert (r == n*n);
  }

A sufficient loop invariant: DISPLAY$ r = k^2 \wedge s = 2k+1 $DISPLAY

Require a loop invariant $\mathcal{I}$ such that

  • $\mathcal{I}$ holds at the entry to the loop
  • $\mathcal{I}$ is preserved after each iteration
  • $\mathcal{I} \Rightarrow \mathsf{Post}$ holds just after exiting the loop

Prior data-driven approaches required a fixed set of features to be provided a priori

LoopInvGen

Architecture

LoopInvGen: Loop invariant inference using precondition inference

  • inspired by HOLA (Dillig et al. OOPSLA ' 13),
    which used logical abduction instead

assume $P$
while $G$ do $S$ done
assert $Q$

$ \mathcal{I} \wedge \neg G \Rightarrow Q $

$ \{ \, \mathcal{I} \wedge G \, \} \;\;S\;\; \{ \, \mathcal{I} \, \} $

$ P \Rightarrow \mathcal{I} $

Loop-head States $ \mathcal{I} := (\neg G \Rightarrow Q) $ $ \\\{ \, \mathcal{I} \wedge G \, \\\} \;\;S\;\; \\\{ \, \mathcal{I} \, \\\} $ $ \mathcal{I} := \mathcal{I} \;\wedge\; \mathsf{VPIE}([\mathtt{assume (}\mathcal{I} \wedge G\mathtt{) ;} S],\; \mathcal{I}) $ $ P \Rightarrow \mathcal{I} $ Counterexample (Weakening) (Strengthening) (Restart) $ \mathcal{I} $

PIE + LoopInvGen

Results & Impact

  • PIE: Learning specifications for OCaml libraries$^\star$
    • Postconditions: emptiness, no exceptions
    • Found incorrect documentation for 2 functions in AVLTree
    • Generated sufficient and necessary preconditions for first-order functions in AVLTree, List, String
  • PIE: Adopted by several state-of-the-art tools
    • Alive-Infer (D. Menendez et al. PLDI ' 17)
      • Used by LLVM developers to verify peep-hole optimizations
    • CVC4 (H. Barbosa et al. FMCAD ' 19)
      • Used within the invariant-synthesis engine
  • LoopInvGen: Won Inv track of SyGuS-Comp 2017 and 2018
    • Won in all three categories: (a) most tasks solved, (b) fastest on most tasks, (c) smallest solution for most tasks
    • Runner up in SyGuS-Comp 2019

$^\star$ batteries, a widely used alternative to OCaml’s standard library

Overfitting in Synthesis

S. Padhi, T. Millstein, A. Nori, R. Sharma · CAV 2019 · UCLA logo Microsoft logo

Specification Synthesizer Expression Likely expression Domain knowledge, constraints, etc. Grammar Likely expression Counterexample Verifier SyGuS Framework R. Alur et al. FMCAD ' 13

How does the choice of the grammar
affect the performance of synthesizers?

Theoretical results & experiments
that demonstrate overfitting

Practical mitigation strategies
inspired by ML techniques

Overfitting

State of The Art

  • Grammars : $ 6 $ commonly used arithmetic grammars
  • Benchmarks : $ 180 $ invariant-synthesis tasks over integers
  • Tools : $ 5 $ participants from SyGuS-Comp'18
LoopInvGen [↗] [PLDI'16] SketchAC [↗] [Jeon et al, CAV'15] EUSolver [↗] [Alur et al, TACAS'17] CVC4 [↗] [Reynolds et al, CAV'15] Stoch [Alur et al, FMCAD'13] Equalities DISPLAY$ \boldsymbol{(x = y)} $DISPLAY DISPLAY$ \subset $DISPLAY Inequalities DISPLAY$ \boldsymbol{(x \leq y)} $DISPLAY DISPLAY$ \subset $DISPLAY Octagons DISPLAY$ \boldsymbol{(x > y + 2)} $DISPLAY DISPLAY$ \subset $DISPLAY Polyhedra DISPLAY$ \boldsymbol{(x \geq 3 * y - 5)} $DISPLAY DISPLAY$ \subset $DISPLAY Polynomials DISPLAY$ \boldsymbol{(x \leq 7 * y * z + 11)} $DISPLAY DISPLAY$ \subset $DISPLAY Peano DISPLAY$ \boldsymbol{(x = y \,\%\, z\, - 13)} $DISPLAY

(Timeout = $ 30 $ mins per benchmark per tool per grammar)

With more expressive power, every tool fails on many benchmarks it could previously solve!

But is this performance degradation
simply because of larger search spaces?

Overfitting

Overfitting

On increasing expressiveness for LoopInvGen: Increase No Change
Increase in Examples Required $ \ \Rightarrow\ $ ________ in Synth. Time $ 79 \,\% $ $ 6 \,\% $
Increase in Synth. Time $ \ \Rightarrow\ $ ________ in Examples Required $ 27 \,\% $ $ 67 \,\% $

 Synthesizers not only spend more time

  • searching for a function within a large space,
  • but also collecting more examples from the verifier

Machine Learning: Learned function closely fits the training data, but does not correctly generalize beyond it

Program Synthesis: Learned expression is consistent with the given examples, but does not satisfy the specification

  • How do we formalize overfitting in synthesis?
  • What can we do to mitigate it?

Overfitting

Theoretical Contributions

  • Formal definition of learnability for example-guided synthesis:

A specification $ \phi $ is $ m $-learnable by a synthesizer $ \mathcal{S} $ if there exists some sequence of $ m $ examples with which $ \mathcal{S} $ learns an expression $e$ s.t. $e \models \phi $

(weaker than ML, which requires learning from any $m$ i.i.d. examples)

  • Perfect learning is impossible even with this weaker notion:

No Free Lunch: Let $X$ and $Y$ be arbitrary domains, $\mathcal{E}$ be an arbitrary grammar, and $m \in \mathbb{Z}$ be an arbitrary number of examples s.t. $0 \leq m < |X|$. Then, either:

  • $\mathcal{E}$ captures only a small number$^\star$ of semantically distinct $X \to Y$ expressions
  • or, for every synthesizer $\mathcal{S}$, there exists a specification $\phi$ for an $X \to Y$ function that admits a solution in $\mathcal{E}$, but is not $m$-learnable by $\mathcal{S}$
  • The underlying cause for the non $m$-learnability is overfitting:

Given a grammar $\mathcal{E}$, a specification $\phi$, and a sequence $M$ of examples,

degree of overfitting $\Omega_\mathcal{E}(\phi, M)$ = $|\{ e \in \mathcal{E} : e \text{ is consistent with } M \;\wedge\; e \not\models \phi \}|$

$^\star$ The exact bounds for finite and infinite $X$ and $Y$ cases can be found in our CAV ' 19 paper

Overfitting

PLearn

A mitigation technique inspired by ensemble methods$^\star$

  • run several synthesizer instances each with a different grammar
  • provably lower overall degree of overfitting
LoopInvGen [↗] CVC4 [↗] Stoch SketchAC [↗] EUSolver [↗]

✖ Original failure count
  (U-shaped trend)

● Failure count with PLearn
  (decreasing trend)

Number of failures
  • Easy to implement thin wrapper
  • Agnostic to underlying synthesizer
  • Extremely resource intensive
  • Many redundant computations

$^\star$ T. G. Dietterich “Ensemble Methods in Machine Learning” MCS 2000

Overfitting

Hybrid Enumeration

$ \mathcal{E}_1 $ $ \subseteq $ $ \mathcal{E}_2 $ $ \subseteq $ $ \mathcal{E}_3 $ $ \subseteq $ $ \mathcal{E}_4 $ $ \subseteq $ $ \mathcal{E}_5 $ $ \subseteq $ $ \mathcal{E}_6 $ Grammars Sizes $ 1 $ $ 2 $ $ 3 $ $ 4 $ • A path through all the points on this grid • Expressions must be enumerated only once • An expression must be enumerated after all its subexpressions

Expressions of size $ 4 $ that appear only in $ \mathcal{E}_6 $, but not in $ \mathcal{E}_1, \ldots, \mathcal{E}_5$

A relation $ \lhd $ on $ \mathcal{E}_{1,\ldots,n} \times \mathbb{N} $ is said to be a well order
if $ \forall \mathcal{E}_1, \mathcal{E}_2, k_1, k_2 : [\mathcal{E}_1 \subseteq \mathcal{E}_2 \ \wedge \ k_1 < k_2] \Rightarrow (\mathcal{E}_1,k_1) \lhd (\mathcal{E}_2,k_2) $

Hybrid enumeration — an efficient algorithm for this 2D search over component-based$^\star$ grammars $\mathcal{E}_1, \ldots, \mathcal{E}_n$ using an arbitrary well order $ \lhd $

$^\star$ S. Jha et al. “Oracle-Guided Component-Based Program Synthesis” ICSE 2010

Overfitting

Key Results

Inequalities Octagons Polyhedra Polynomials Peano [L] [H] [P]
Grammar $ {\small \textbf{median}}\Big[\frac{\tau(\textbf{P})}{\tau(\textbf{H})}\Big] $ $ {\small \textbf{median}}\Big[\frac{\tau(\textbf{H})}{\tau(\textbf{L})}\Big] $
Equalities $ 1.00 $ $ 1.00 $
Inequalities $ 1.91 $ $ 1.04 $
Octagons $ 2.84 $ $ 1.03 $
Polyhedra $ 3.72 $ $ 1.01 $
Polynomials $ 4.62 $ $ 1.00 $
Peano $ 5.49 $ $ 0.97 $

(On $ 180 $ benchmarks with $ 30 $ mins timeout per benchmark per tool per grammar)

  • Hybrid enumeration is resilient to increasing expressiveness
    • Approaches the ideal failure rate with PLearn
  • Negligible impact on total time $ \tau = $ wall-clock time $ \times $ # threads
    • Approaches the original running time of LoopInvGen
  • LoopInvGen+HE solves $2$ benchmarks that no other synthesizer could in SyGuS-Comp 2018

Recent Work

Hanoi

PLDI 2020

Inferring representation invariants for verifying data-type implementations

Jura

(under review)

Automatically identifying data-frames and formulas within spreadsheets

Recent Work

Hanoi

A. Miltner, S. Padhi, T. Millstein, D. Walker · PLDI 2020 · Princeton logo UCLA logo

(* A signature for Set ADT *)

module type SET = sig
  type t
  val empty : t
  val insert : t -> int -> t
  val delete : t -> int -> t
  val lookup : t -> int -> bool
end

A specification for the Set ADT:

DISPLAY$ \begin{array}{rcl} (\phi\ s) & \triangleq & \forall i : \text{int} \\[0.125em] & & \neg (\mathtt{lookup}\ \mathtt{empty}\ i) \\[0.125em] & \wedge & \neg (\mathtt{lookup}\ (\mathtt{delete}\ s\ i)\ i) \\[0.125em] & \wedge & (\mathtt{lookup}\ (\mathtt{insert}\ s\ i)\ i) \end{array} $DISPLAY

(* An implementation for Set using List *)

module ListSet : SET = struct
  type t = int list

  let empty = []

  let rec lookup l x =
    match l with
    | [] -> false
    | hd :: tl -> (hd = x) || (lookup tl x)

  let insert l x =
    if (lookup l x) then l else (x :: l)

  let rec delete l x =
    match l with
    | [] -> []
    | hd :: tl -> if (hd = x) then tl
                  else (hd :: (delete tl x))
end

How do we automatically verify that the ListSet implementation satisfies $\phi$?

(* A representation invariant for ListSet *)

let rec rep_inv : int list -> bool = function
  | [] -> true
  | hd :: tl -> (not (lookup tl hd)) && (rep_inv tl)
  • Hanoi can handle arbitrary first-order modules
  • Work on higher-order modules is on going

The first data-driven approach that guarantees soundness and completeness

Recent Work

Jura

S. Padhi, D. Rouhana, O. Polozov, B. Zorn, et al. · Microsoft logo UCLA logo

How do we automatically
recover missing formulas?

  Multiple tables, unclear bounds for fomulas
  Header, subheaders, metadata …
  Inconsistent formatting

(1) ML pipeline for identifying and extracting tables


(2) Example-based synthesis
within individual tables

Conclusion

  • Specifications and invariants are difficult to state accurately
    • Learn from readily available program-related data
    • Automatic learning and user-driven refinement
  • Require strong guarantees and learning at large scale
    • Program synthesis techniques for formal reasoning
    • Data-driven insights from machine learning
Program Synthesis/Repair
  • Recovering spreadsheet formulas
  • Automatic synthesis of small bug fixes
Proof Synthesis/Repair
  • Representation invariants for ADTs
  • Synthesis within proof assistants
  • Automatic proof maintainance
Connections with ML + HCI
  • Symbolic optimizations + enumeration
  • Better regularization in synthesis
  • Interaction models for guiding synthesis

Links

  • Papers: saswatpadhi.github.io/publications/
  • FlashProfile: microsoft.github.io/prose/documentation/matching-text/intro/
  • PIE + LoopInvGen: github.com/SaswatPadhi/LoopInvGen

Contact: padhi@cs.ucla.edu