FEAST’19- Proceedings of the 3rd ACM Workshop on Forming an Ecosystem Around Software TransformationFull Citation in the ACM Digital Library
SESSION: Workshop Presentations
Software debloating is an emerging field of study aimed at improving the security and performance of software by removing excess library code and features that are not needed by the end user (called bloat). Software bloat is pervasive, and several debloating techniques have been proposed to address this problem. While these techniques are effective at removing bloat, they are not practical for the average end user, risk creating unsound programs and introducing new vulnerabilities, and are not well suited for debloating complex software such as network protocol implementations.
In this paper, we propose CARVE, a simple yet effective security-focused debloating technique that addresses these shortcomings. CARVE employs static source code annotation to map software features to source code, eliminating the need for advanced software analysis during debloating and reducing the overall level of technical sophistication required by the end user. CARVE also introduces the concept of debloating with replacement, which is capable of removing software features while preserving software interoperability and mitigating the risk of creating an unsound program or introducing a vulnerability.
We evaluate CARVE in 12 debloating scenarios and present our results demonstrating security and performance improvements that meet or exceed those of existing techniques.
Finite state machine (FSM) is a type of computation models widely used in various software programs. Extracting implemented FSMs has many important applications in the networking, software engineering and security domains. In this paper, we first conduct an empirical study to understand how FSMs are implemented in real-world software. Under the guidance of our study results, we then design a static analysis tool, FSMExtractor, to automatically identify and synthesize implemented FSMs. Evaluation using 160 software programs from three sources shows that FSMExtractor can extract all implemented FSMs and report very few false positives.
In recent years, coverage-based greybox fuzzing has become popular forvulnerability detection due to its simplicity and efficiency. However, it is less powerful when applied directly to protocol fuzzing due to the unique challenges involved in fuzzing communication protocols. In particular, the communication among multiple ends contains more than one packet, which are not necessarily dependent upon each other, i.e., fuzzing single (usually the first) packet can only achieve extremely limited code coverage. In this paper, we study such challenges and demonstrate the limitation of current non-stateful greybox fuzzer. In order to achieve higher code coverage, we design stateful protocol fuzzing strategies for communication protocols to explore the code related to different protocol states. Our approach contains a state switching engine, together with a multi-state forkserver to consistently and flexibly fuzz different states of an compiler-instrumented protocol program. Our experimental results on OpenSSL show that our approach achieves an improvement of 73% more code coverage and 2x unique crashes when comparing against fuzzing the first packet during a protocol handshake.
Picinæ is an infrastructure for machine-proving properties of raw native code programs without sources within the Coq program-proof co-development system. This facilitates formal reasoning about binary code that is inexpressible in source languages or for which no source code exists, such as hand-written assembly code or code resulting from binary transformations (e.g., binary hardening or debloating algorithms). Preliminary results validating some highly optimized, low-level subroutines for Intel and ARM architectures using this new framework are presented and discussed.
Code bloating in software has been proven to be pervasive in recent research. However, each study provides a different approach to measure bloat. In this paper, we propose a system of metrics to effectively quantify bloat in binaries called bloat factors. Subsequently, we conducted an extensive study to calculate bloat factors for over 3000 Linux applications and 896 shared libraries. Using these metrics as pointers, we introduce a static approach to perform debloating for closed-source binaries by creating corresponding specialized versions to cater for a specific program requirements. We evaluated our debloating technique on large programs and achieved a maximum code reduction of 19.7%.
We present inuring, an attack-guided repair method for software vulnerabilities in n-variant systems. N-variant systems detect attacks that cause divergence in variant behavior, converting severe vulnerabilities (such as those that enable remote code execution) into less severe denial-of-service vulnerabilities. Inuring is a general technique for n-variant systems that uses information gleaned from an attack to perform a “live” field repair of the underlying vulnerability, thereby obviating the denial-of-service attack. We present a case study of the use of inuring to protect against a powerful class of memory-corruption exploits in the Apache web server. Our demonstration leverages dappling, a new technique for provably secure memory layout in n-variant systems. With inuring and dappling we are able to guarantee strong protection and remediation for a class of write-what-where vulnerabilities in n-variant systems. Our case study illustrates the efficacy and efficiency of these techniques.
Optimizing a program for a specific machine or a specific workload is possible with today’s compilers, but infrequently used, despite significant performance gains. We implement workload specialization, or Profile-Guided Optimization (PGO), at the binary level. Our system CodeMason runs on x86_64 Linux and is based on a binary rewriting platform called Egalito. CodeMason performs static binary rewriting to obtain program profiles, then adjusts function ordering, alignment, and other binary-level details to achieve faster performance (particularly on the given workload). We obtain 1.98% average performance speedup on SPEC CPU 2006, and 11.8% speedup in the best case. These substantial performance improvements suggest that binary-level PGO may be widely useful when compiler-based PGO is impossible because the source code is inaccessible.