Saswat Padhi
University of California, Los Angeles, USA
DISPLAY$ \newcommand{\argmin}{\mathop{\text{arg}\,\text{min}}} $DISPLAY
CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task.
— X. Yang et al. “Finding and Understanding Bugs in C Compilers.” PLDI’11
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
— Disclaimer from the GNU Public License (GPLv3)
Code reviews, extensive testing, continuous integration/deployment (CI/CD), reproducible builds, quality assurance (QA) … Not enough!
Stating these formal properties accurately
is extremely onerous, even for experts
crafting a correct specification (especially one using an obscure formal system) is often much more difficult than writing the program to be proved (even one written in an obscure programming language)
— B. Hailpern et al. “Software Debugging, Testing, and Verification.” IBM Systems Journal ' 02
Existing tools provide little or no support in generating correct specifications and invariants
Drastically reduce the cost of building verified software by automated synthesis of formal properties
$^\star$ inspired by two emergent techniques — CEGIS (A. Solar-Lezama. STTT ' 13) and SyGuS (R. Alur et al. FMCAD ' 13)
PLDI 2016$\,^\star$
Feature learning for black-box precondition inference
Adopted by CVC4, Alive-Infer (used by LLVM)
Input specifications for
data-processing workflows
$^\star$ presented in the same PLDI 2016 paper
S. Padhi, P. Jain, D. Perelman, O. Polozov, S. Gulwani, T. Millstein · OOPSLA 2018 ·
Real-life datasets are often incomplete and inconsistent
Reference ID |
---|
PMC9035311 |
doi:␣10.1016/S1387- 7003(03)00113-8 |
ISBN:␣2-287-34069-6 |
⋮ ⋮ ⋮ |
ISBN:␣0-006-08903-1 |
PMC9473786 |
⋮ ⋮ ⋮ |
doi:␣␣10.13039/100005795 |
ISBN:␣1-158-23466-X |
not_available |
PMC9035311 |
Process a large dataset
Write a script manually
Programming by examples
Manually discover and handle all data formats
Provide a representative
set of examples
FlashProfile
Reference ID |
---|
PMC9035311 |
doi:␣10.1016/S1387- 7003(03)00113-8 |
ISBN:␣2-287-34069-6 |
⋮ ⋮ ⋮ |
ISBN:␣0-006-08903-1 |
PMC9473786 |
⋮ ⋮ ⋮ |
doi:␣␣10.13039/100005795 |
ISBN:␣1-158-23466-X |
not_available |
PMC9035311 |
Microsoft SSDT
1024
PMC D+ 204
ISBN: ␣ 0- D D D - D D D D D - D 113
.* 110
doi: ␣+ 10. D D D D D / D+Ataccama One
1024
W N 267
W : ␣ D - N - N - D 110
W : ␣ N . N / N 34
W : ␣ D - N - N - L 11
W : ␣ N . N / L N - N ( N ) N - D 5
W _ WFlashProfile
1024
PMC D7 267
ISBN: ␣ D - D3 - D5 - D 110
doi: ␣+ 10.13039/ D+ 34
ISBN: ␣ D - D3 - D5 -X 11
doi: ␣+ 10.1016/ U D4 - D4 ( D2 ) D5 - D 5
not_available 1024
PMC D7 301
ISBN: ␣ [ISBN] 121
doi: ␣+ [DOI] 5
not_available FlashProfile
The space of profiles is inherently ambiguous …
Is “1824” more similar to “1879” or “1924”?
… and is extremely large.
Exponentially many ways of partitioning a dataset
Exponentially many ways of generalizing strings to patterns
FlashProfile
DISPLAY$ \mathsf{Synth} : \mathtt{String[\,]} \to \mathtt{Pattern[\,]} \quad;\quad \mathsf{Cost} : \mathtt{Pattern} \times \mathtt{String[\,]} \to \mathtt{Real} $DISPLAY
DISPLAY$ \mathsf{Profile}(S,k) \triangleq \big\{ \argmin\limits_{P \,\raise{0.125ex}{\in}\, \mathsf{Synth}(S_i)} \! \mathsf{Cost}(P,S_i) : S_i \in \mathsf{Clusters}(S,k) \big\} $DISPLAY
$ \mathsf{Clusters}(S,k) \triangleq \argmin\limits_{\substack{ \small \{ S_1, \ldots, S_k \} \\ \text{s.t.} \: S = \bigsqcup\limits_{i=1}^{k} S_i }} \: \sum\limits^{k}_{i=1} \Omega(S_i) $ … $\Omega(S_i):$ objective function
DISPLAY$ \Omega_\text{ideal}(S_i) \triangleq \hspace{-0.5em}\min\limits_{\,P \,\raise{0.125ex}{\in}\, \mathsf{Synth}(S_i)} \mathsf{Cost}(P,S_i) $DISPLAY
FlashProfile
Clustering with $\Omega_\text{ideal}$ requires $O(2^{|S|})$ synthesis calls
DISPLAY$ \Omega_\text{CL}(S_i) \triangleq\!\! \max_\limits{\,x,y\,\raise{0.125ex}{\in}\,S_i} \,\min_\limits{\,P \,\raise{0.125ex}{\in}\, \mathsf{Synth}({x,y})} \mathsf{Cost}(P,{x,y}) $DISPLAY
$^\star$ See survey by A. K. Jain et al. CSUR ' 99
FlashProfile
(an example hierarchy)
DISPLAY$ \begin{array}{rl} \text{Constant strings} & \;\text{Character classes} \\[0.25em] \text{Regular exprs.} & \;\text{String predicates} \end{array} $DISPLAY
(kinds of base patterns in FlashProfile)
FlashProfile
Profiling time: little correlation with dataset size; increases with string length
Can significantly improve programming-by-example (PBE) workflows
Matching.Text
C# library within the Microsoft PROSE frameworkS. Padhi, R. Sharma, T. Millstein · PLDI 2016 ·
Solution: Learn features automatically on demand
Black-box precondition inference
White-box loop invariant inference
PIE
Given Code and Post,
find Pre s.t.
{Pre} Code {Post}
Source code unavailable
Source code not analyzable
Postcondition not analyzable
Guess precondition from test executions
PIE
Test case | Feature Set | |
---|---|---|
(s, i, l) | i ≥ 0 | l ≥ 0 |
("pie", 1, -1) | ||
("xy", 2, 3) | ||
("a", -1, 3) | ||
("siva", 1, 2) |
{ i ≥ 0, l ≥ 0 }
No separator exists
over the given features!
Conflict: Same feature vector for a positive and a negative test
PIE
Test case | Feature Set | ||
---|---|---|---|
(s, i, l) | i ≥ 0 | l ≥ 0 | f |
("pie", 1, -1) | |||
("xy", 2, 3) | |||
("a", -1, 3) | |||
("siva", 1, 2) |
A new feature must be added to resolve a conflict
A new feature never creates new conflicts
Feature Synthesis Spec | |
---|---|
Input | Output |
("xy", 2, 3) | false |
("siva", 1, 2) | true |
Grammar-based exhaustive enumeration in increasing order of expression size
Learned f : i + l ≤ s.length
Learned Pre : i ≥ 0 ∧ l ≥ 0 ∧ f
PIE
PIE
Black-box use of synthesizer and Boolean function learner
Inferred preconditions are guaranteed to be both sufficient and necessary at least up to given test cases
If a sufficient and necessary precondition exists in the synthesizer’s grammar, then PIE will find it in finite time
PIE
Counterexample-Guided Inductive Synthesis
(A. Solar-Lezama. STTT ' 13)
Preserves strong convergence
Source code must be loop-free
LoopInvGen
Given Pre, Code and Post,
validate {Pre} Code {Post}
Using Dijkstra’s weakest precondition,
DISPLAY$ \large\mathsf{Pre} \Rightarrow \mathbf{WP}(\mathsf{Code}, \mathsf{Post}) $DISPLAY
In presence of unbounded control flow, computing $\mathbf{WP}$ requires inductive invariants
int square(n) {
assume (n > 0);
s = 1;
k = r = 0;
while(k != n) {
r += s;
s += 2;
k++;
}
assert (r == n*n);
}
A sufficient loop invariant: DISPLAY$ r = k^2 \wedge s = 2k+1 $DISPLAY
Require a loop invariant $\mathcal{I}$ such that
Prior data-driven approaches required a fixed set of features to be provided a priori
LoopInvGen
LoopInvGen: Loop invariant inference using precondition inference
assume $P$
while $G$ do $S$ done
assert $Q$
$ \mathcal{I} \wedge \neg G \Rightarrow Q $
$ \{ \, \mathcal{I} \wedge G \, \} \;\;S\;\; \{ \, \mathcal{I} \, \} $
$ P \Rightarrow \mathcal{I} $
PIE + LoopInvGen
$^\star$
batteries
,
a widely used alternative to OCaml’s standard library
S. Padhi, T. Millstein, A. Nori, R. Sharma · CAV 2019 ·
How does the choice of the grammar
affect the performance of synthesizers?
Theoretical results & experiments
that demonstrate overfitting
Practical mitigation strategies
inspired by ML techniques
Overfitting
(Timeout = $ 30 $ mins per benchmark per tool per grammar)
With more expressive power, every tool fails on many benchmarks it could previously solve!
But is this performance degradation
simply because of larger search spaces?
Overfitting
On increasing expressiveness for LoopInvGen: | Increase | No Change |
---|---|---|
Increase in Examples Required $ \ \Rightarrow\ $ ________ in Synth. Time | $ 79 \,\% $ | $ 6 \,\% $ |
Increase in Synth. Time $ \ \Rightarrow\ $ ________ in Examples Required | $ 27 \,\% $ | $ 67 \,\% $ |
Synthesizers not only spend more time
Machine Learning: Learned function closely fits the training data, but does not correctly generalize beyond it
Program Synthesis: Learned expression is consistent with the given examples, but does not satisfy the specification
Overfitting
A specification $ \phi $ is $ m $-learnable by a synthesizer $ \mathcal{S} $ if there exists some sequence of $ m $ examples with which $ \mathcal{S} $ learns an expression $e$ s.t. $e \models \phi $
(weaker than ML, which requires learning from any $m$ i.i.d. examples)
No Free Lunch: Let $X$ and $Y$ be arbitrary domains, $\mathcal{E}$ be an arbitrary grammar, and $m \in \mathbb{Z}$ be an arbitrary number of examples s.t. $0 \leq m < |X|$. Then, either:
Given a grammar $\mathcal{E}$, a specification $\phi$, and a sequence $M$ of examples,
$^\star$ The exact bounds for finite and infinite $X$ and $Y$ cases can be found in our CAV ' 19 paper
Overfitting
A mitigation technique inspired by ensemble methods$^\star$
✖ Original failure count
(U-shaped trend)
● Failure count with PLearn
(decreasing trend)
$^\star$ T. G. Dietterich “Ensemble Methods in Machine Learning” MCS 2000
Overfitting
Expressions of size $ 4 $ that appear only in $ \mathcal{E}_6 $, but not in $ \mathcal{E}_1, \ldots, \mathcal{E}_5$
A relation $ \lhd $ on $ \mathcal{E}_{1,\ldots,n} \times \mathbb{N} $
is said to be a well order
if
$ \forall \mathcal{E}_1, \mathcal{E}_2, k_1, k_2 : [\mathcal{E}_1 \subseteq \mathcal{E}_2 \ \wedge \ k_1 < k_2] \Rightarrow (\mathcal{E}_1,k_1) \lhd (\mathcal{E}_2,k_2) $
Hybrid enumeration — an efficient algorithm for this 2D search over component-based$^\star$ grammars $\mathcal{E}_1, \ldots, \mathcal{E}_n$ using an arbitrary well order $ \lhd $
$^\star$ S. Jha et al. “Oracle-Guided Component-Based Program Synthesis” ICSE 2010
Overfitting
Grammar | $ {\small \textbf{median}}\Big[\frac{\tau(\textbf{P})}{\tau(\textbf{H})}\Big] $ | $ {\small \textbf{median}}\Big[\frac{\tau(\textbf{H})}{\tau(\textbf{L})}\Big] $ |
---|---|---|
Equalities | $ 1.00 $ | $ 1.00 $ |
Inequalities | $ 1.91 $ | $ 1.04 $ |
Octagons | $ 2.84 $ | $ 1.03 $ |
Polyhedra | $ 3.72 $ | $ 1.01 $ |
Polynomials | $ 4.62 $ | $ 1.00 $ |
Peano | $ 5.49 $ | $ 0.97 $ |
(On $ 180 $ benchmarks with $ 30 $ mins timeout per benchmark per tool per grammar)
(under review)
Automatically identifying data-frames and formulas within spreadsheets
Recent Work
A. Miltner, S. Padhi, T. Millstein, D. Walker · PLDI 2020 ·
(* A signature for Set ADT *)
module type SET = sig
type t
val empty : t
val insert : t -> int -> t
val delete : t -> int -> t
val lookup : t -> int -> bool
end
A specification for the Set ADT:
DISPLAY$
\begin{array}{rcl}
(\phi\ s) & \triangleq & \forall i : \text{int} \\[0.125em]
& & \neg (\mathtt{lookup}\ \mathtt{empty}\ i) \\[0.125em]
& \wedge & \neg (\mathtt{lookup}\ (\mathtt{delete}\ s\ i)\ i) \\[0.125em]
& \wedge & (\mathtt{lookup}\ (\mathtt{insert}\ s\ i)\ i)
\end{array}
$DISPLAY
(* An implementation for Set using List *)
module ListSet : SET = struct
type t = int list
let empty = []
let rec lookup l x =
match l with
| [] -> false
| hd :: tl -> (hd = x) || (lookup tl x)
let insert l x =
if (lookup l x) then l else (x :: l)
let rec delete l x =
match l with
| [] -> []
| hd :: tl -> if (hd = x) then tl
else (hd :: (delete tl x))
end
How do we automatically verify that the ListSet implementation satisfies $\phi$?
(* A representation invariant for ListSet *)
let rec rep_inv : int list -> bool = function
| [] -> true
| hd :: tl -> (not (lookup tl hd)) && (rep_inv tl)
The first data-driven approach that guarantees soundness and completeness
Recent Work
S. Padhi, D. Rouhana, O. Polozov, B. Zorn, et al. ·
How do we automatically
recover missing formulas?
Multiple tables, unclear bounds for fomulas
Header, subheaders, metadata …
Inconsistent formatting
(2) Example-based synthesis
within individual tables