### Overfitting in Synthesis: Theory and Practice

Proceedings of the 31 st International Conference on Computer-Aided Verification, 2019
⟨ CAV 2019 ⟩

###### Abstract

In syntax-guided synthesis (SyGuS), a synthesizer’s goal is to automatically generate a program belonging to a grammar of possible implementations that meets a logical specification. We investigate a common limitation across state-of-the-art SyGuS tools that perform counterexample-guided inductive synthesis (CEGIS). We empirically observe that as the expressiveness of the provided grammar increases, the performance of these tools degrades significantly.

We claim that this degradation is not only due to a larger search space, but also due to overfitting. We formally define this phenomenon and prove no-free-lunch theorems for SyGuS, which reveal a fundamental tradeoff between synthesizer performance and grammars expressiveness.

A standard approach to mitigate overfitting in machine learning is to run multiple learners with varying expressiveness in parallel. We demonstrate that this insight can immediately benefit existing SyGuS tools. We also propose a novel single-threaded technique called hybrid enumeration, which interleaves different grammars, and outperforms the winner of the 2018 SyGuS competition (Inv track) by solving more problems and achieving a $5 \times$ mean speed up.

###### BibTeX Citation
@inproceedings{cav19/padhi/overfitting,
author    = {Saswat Padhi and
Todd D. Millstein and
Aditya V. Nori and
Rahul Sharma},
title     = {Overfitting in Synthesis: Theory and Practice},
booktitle = {Computer Aided Verification - 31st International Conference, {CAV}
2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part
{I}},
series    = {Lecture Notes in Computer Science},
volume    = {11561},
pages     = {315--334},
publisher = {Springer},
year      = {2019},
url       = {https://doi.org/10.1007/978-3-030-25540-4\_17},
doi       = {10.1007/978-3-030-25540-4\_17},
}