On the implementation of bytecode compression for interpreted languages

Ekaterina Stefanov*, Anthony M. Sloane

*Corresponding author for this work

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper describes a new method for code space optimization for interpreted languages called LZW-CC. The method is based on a well-known and widely used compression algorithm, LZW, which has been adapted to compress executable program code represented as bytecode. Frequently occurring sequences of bytecode instructions are replaced by shorter encodings for newly generated bytecode instructions. The interpreter for the compressed code is modified to recognize and execute those new instructions. When applied to systems where a copy of the interpreter is supplied with each user program, space is saved not only by compressing the program code but also by automatically removing the unused implementation code from the interpreter. The method's implementation within two compiler systems for the programming languages Haskell and Java is described and implementation issues of interest are presented, notably the recalculations of target jumps and the automated tailoring of the interpreter to program code. Applying LZW-CC to nhc98 Haskell results in bytecode size reduction by up to 15.23% and executable size reduction by up to 11.9%. Java bytecode is reduced by up to 52%. The impact of compression on execution speed is also discussed; the typical speed penalty for Java programs is between 1.8 and 6.6%, while most compressed Haskell executables run faster than the original.

Original languageEnglish
Pages (from-to)111-135
Number of pages25
JournalSoftware - Practice and Experience
Volume39
Issue number2
DOIs
Publication statusPublished - Feb 2009

    Fingerprint

Cite this