Model Checking Boot Code from AWS Data Centers
Published in Computer Aided Verification 2018.
This paper describes our experience with symbolic model checking in an industrial setting. We have proved that the initial boot code running in data centers at Amazon Web Services is memory safe, an essential step in establishing the security of any data center. Standard static analysis tools cannot be easily used on boot code without modification owing to issues not commonly found in higher-level code, including memory-mapped device interfaces, byte-level memory access, and linker scripts. This paper describes automated solutions to these issues and their implementation in the C Bounded Model Checker (CBMC). CBMC is now the first source-level static analysis tool to extract the memory layout described in a linker script for use in its analysis.
Making Data-Driven Porting Decisions with Tuscan
Published in the International Symposium on Software Testing and Analysis 2018.
Software typically outlives the platform that it was originally written for. To smooth the transition to new tools and platforms, programs should depend on the underlying platform as little as possible. In practice, however, software build processes are highly sensitive to their build platform, notably the implementation of the compiler and standard library. This makes it difficult to port existing, mature software to emerging platforms---web based runtimes like WebAssembly, resource-constrained environments for Internet-of-Things devices, or innovative new operating systems like Fuchsia.
We present Tuscan, a framework for conducting automatic, deterministic, reproducible tests on build systems. Tuscan is the first framework to solve the problem of reproducibly testing program builds cross-platform at massive scale. We also wrote a build wrapper, Red, which hijacks builds to automatically fix common failures that arise from platform dependence---allowing the test harness to ignore known errors and discover errors later in the build. On the platform on which the most programs failed to build, using Red rescued 48% of builds from failing, and allowed an additional 34% of builds to proceed further before failing. Authors of innovative new platforms can use Tuscan and Red to test the extent of platform dependence in the software ecosystem, and to quantify the porting effort necessary to build legacy software on new platforms.
We evaluated Tuscan by using it to build an operating system distribution, consisting of 2,699 Red-wrapped programs, on four platforms, yielding a `catalog' of the most common portability errors. This catalog informs data-driven porting decisions and motivates changes to programs, build systems, and language standards; systematically quantifies problems that platform writers have hitherto discovered only on a costly and ad-hoc basis; and forms the basis for a common substrate of portability fixes that developers and distributors can apply to their software.