Code-Level Model Checking in the Software Development Workflow

Nathan Chong, Byron Cook, Kareem Khazem, Konstantinos Kallas, Felipe Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle.

This experience report describes a style of applying symbolic model checking developed over the course of four years at Amazon Web Services (AWS). Lessons learned are drawn from proving properties of numerous C-based systems, e.g., custom hypervisors, encryption code, boot loaders, and an IoT operating system. Using our methodology, we find that we can prove the correctness of industrial low-level C-based systems with reasonable effort and predictability.Furthermore, AWS developers are increasingly writing their own formal specifications. All proofs discussed in this paper are publicly available on GitHub.

Model Checking Boot Code from AWS Data Centers

Byron Cook, Kareem Khazem, Daniel Kroening, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle.

Extended journal version of our CAV 2018 paper, with more details about linkers and linker scripts.

CBMC Path: A Symbolic Execution Retrofit of the C Bounded Model Checker

Kareem Khazem and Michael Tautschnig.

We gave CBMC the ability to explore and model check single program paths, as opposed to its default whole-program model-checking behaviour. This means that CBMC, when invoked with the --paths flag, symbolically executes one program path at a time—saving unexplored paths for later—and attempts to prove properties for only that path. By doing this repeatedly for each path that CBMC encounters, CBMC can detect property violations in a scalable and incremental way.

Implementing single-path exploration raises the question of which order the paths should be explored in. Our implementation makes it easy for researchers to implement and investigate alternative path exploration strategies. Our competition contribution uses a breadth-first strategy, where diverging paths are each pushed onto a queue at program decision points, and the path to explore next is gotten by dequeueing the oldest path to have been added.

Model Checking Boot Code from AWS Data Centers

Byron Cook, Kareem Khazem, Daniel Kroening, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle.

This paper describes our experience with symbolic model checking in an industrial setting. We have proved that the initial boot code running in data centers at Amazon Web Services is memory safe, an essential step in establishing the security of any data center. Standard static analysis tools cannot be easily used on boot code without modification owing to issues not commonly found in higher-level code, including memory-mapped device interfaces, byte-level memory access, and linker scripts. This paper describes automated solutions to these issues and their implementation in the C Bounded Model Checker (CBMC). CBMC is now the first source-level static analysis tool to extract the memory layout described in a linker script for use in its analysis.

Making Data-Driven Porting Decisions with Tuscan

Kareem Khazem, Earl T. Barr, and Petr Hosek.

Software typically outlives the platform that it was originally written for. To smooth the transition to new tools and platforms, programs should depend on the underlying platform as little as possible. In practice, however, software build processes are highly sensitive to their build platform, notably the implementation of the compiler and standard library. This makes it difficult to port existing, mature software to emerging platforms—web based runtimes like WebAssembly, resource-constrained environments for Internet-of-Things devices, or innovative new operating systems like Fuchsia.

We present Tuscan, a framework for conducting automatic, deterministic, reproducible tests on build systems. Tuscan is the first framework to solve the problem of reproducibly testing program builds cross-platform at massive scale. We also wrote a build wrapper, Red, which hijacks builds to automatically fix common failures that arise from platform dependence—allowing the test harness to ignore known errors and discover errors later in the build. On the platform on which the most programs failed to build, using Red rescued 48% of builds from failing, and allowed an additional 34% of builds to proceed further before failing. Authors of innovative new platforms can use Tuscan and Red to test the extent of platform dependence in the software ecosystem, and to quantify the porting effort necessary to build legacy software on new platforms.

We evaluated Tuscan by using it to build an operating system distribution, consisting of 2,699 Red-wrapped programs, on four platforms, yielding a `catalog' of the most common portability errors. This catalog informs data-driven porting decisions and motivates changes to programs, build systems, and language standards; systematically quantifies problems that platform writers have hitherto discovered only on a costly and ad-hoc basis; and forms the basis for a common substrate of portability fixes that developers and distributors can apply to their software.

Kareem Khazem