This paper develops a static type system equivalent to static single assignment (SSA) form. In this type system, a type of a variable at some program point represents the control flows from the assignment statements that reach the program point. For this type system, we show that a derivable typing of a program corresponds to the program in SSA form. By this result, any SSA transformation can be interpreted as a type inference process in our type system. By adopting a result on efficient SSA transformation, we develop a type inference algorithm that reconstructs a type annotated code from a given code. These results provide a static alternative to SSA based compiler optimization without performing code transformation. Since this process does not change the code, it does not incur overhead due to insertion of $\phi$ functions. Another advantage of this type based approach is that it is not constrained to naming mechanism of variables and can therefore be combined with other static properties useful for compilation and code optimization such as liveness information of variables. As an application, we express optimizations as type-directed code transformations.
Most of the current implementations of functional languages adopt so-called ``tagged data representations'' to support tracing garbage collection. The representations impose a burden of data conversion on the runtime performance and the interoperability between ML and other languages. In this paper, we present a type-directed compilation method for ML polymorphism that supports natural representations of integers and other atomic data. This is achieved by compiling ML so that each runtime object (a heap block or a stack frame) has a ``bitmap'' that describes the pointer positions in the block. Since a polymorphic function may produce runtime objects of different types, the compiler needs to compute appropriate bitmaps for each instantiation of the function. This would require us to insert extra lambda abstractions and applications to pass the bits required in bitmap calculations. This compilation process should be done for both stack frames and heap-allocated objects including functions' closures and their environment records. We solve these problems by combining the type-directed compilation method with typed closure conversion, and type-preserving A-normalization. The resulting compilation process is shown to be sound with respect to an untyped operational semantics with bitmap-inspecting garbage collection. The proposed compilation method has been implemented for the full Standard ML Language, demonstrating its practical feasibility.
This paper developes a proof theory for low-level code languages. We first define a proof system, which we refer to as the sequential sequent calculus, and show that it enjoys the cut elimination property and that its expressive power is the same as that of the natural deduction proof system. We then establish the Curry-Howard isomorphism beteen this proof system and a low-level code language by showing the following properties: (1) the set of proofs and the set of typed codes is in one-to-one correspondence, (2) the operational semantics of the code language is directly derived from the cut elimination procedure of the proof system, and (3) compilation and de-compilation algorithms between the code language and the typed lambda calulus are extracted from the proof transformations beteen the sequential sequent calculus and the natural deduction proof system. This logical framework serves as a basis for the development of type systems of various low-level code languages, type-preserving compilation, and static code analysis.
This paper presents a static type system for the Java Virtual Machine (JVM) code that enforces an access control mechanism similar to that found in, for example, a Java implementation. In addition to verifying type consistency of a given JVM code, the type system statically verifies whether the code accesses only those resources that are granted by the prescribed access policy. The type system is proved to be sound with respect to an operational semantics that enforces access control dynamically, similar to Java stack inspection. This result ensures that ``well typed code cannot violate access policy.'' The authors then develop a type inference algorithm and show that it is sound with respect to the type system and that it always infers a minimal set of access privileges. These results allows us to develop a static system for JVM access control without resorting to costly runtime stack inspection.
This paper presents a proof-theoretical framework for register allocation that accounts for the entire process of register allocation. Liveness analysis is proof reconstruction (similar to type inference), and register allocation is proof transformation from a proof system with unrestricted variable accesses to the one with restricted variable access. In our framework, the set of registers acts as the ``working set'' of the live variables at each instruction step, which changes during the execution of the code. This eliminates the ad-hoc notion of ``spilling''. The necessary memory-register moves are systematically incorporated in the proof transformation process. Its correctness is a simple corollary of our construction; the resulting proof is equivalent to the proof of the original code modulo the treatment of structural rules. This yields a simple yet powerful register allocation algorithm. The algorithm has been implemented, demonstrating the feasibility of the framework.
We propose a type system for the Java bytecode language, prove the type soundness,and develop a type inference algorithm. Different from the existing proposals, our type system yields a typed term calculus similar to type systems of lambda calculi. This enables us to transfer existing techniques and results of type theory to a JVM-style bytecode language. We show that ML-style let polymorphism and recursive types can be used to type JVM subroutines, and that there is an ML-style type inference algorithm.The type inference algorithm has beeen implemented. The ability to verify type soundness is a simple corollary of the existence of type inference algorithm. Moreover, our type theoretical approach opens up various type safe extensions including higher-order methods, flexible polymorphic typing through polymorphic type inference, and type-preserving compilation.
By extending an ML-style type inference system with record polymorphism, recursive type definitions, and an ordering relation induced by field inclusion, it is possible to achieves seamless and type safe interoperability with an object-oriented language. Based on this observation, we define an ML-style language that can directly access external objects and methods, and develop a type inference algorithm. This calculus enjoys both features of higher-order programming with ML polymorphism and class-based object-oriented programming with dynamic method dispatch. To establish type safety, we define a sample object calculus with multiple inheritance as the target calculus for interoperability, define an operational semantics of the calculus, and show that the type system is sound with respect to the operational semantics. These results have been implemented in our prototype interpretable language, which can access Java class files and other external resources.
In the past couple of years interest in decompilation has widened from its initial concentration on reconstruction of control flow into well-founded-in-theory methods to reconstruct type information. Mycroft described Type-Based Decompilation and Katsumata and Ohori described Proof-Directed Decompilation. This note summarises the two approaches and identifies their commonality, strengths and weaknesses; it concludes by suggesting how they may be integrated.
We present a proof theoretical method for de-compiling low-level code to the typed lambda calculus. We first define a proof system for a low-level code language based on the idea of Curry-Howard isomorphism. This allows us to regard an executable code as a proof in intuitionistic propositional logic. As being a proof of intuitionistic logic, it can be translated to an equivalent proof of natural deduction proof system. This property yields an algorithm to translate a given code into a lambda term. Moreover, the resulting lambda term is not a trivial encoding of a sequence of primitive instructions, but reflects the behavior of the given program. This process therefore serves as proof-directed de-compilation of a low-level code language to a high-level language. We carry out this development for a subset of Java Virtual Machine instructions including most of its features such as jumps, object creation and method invocation. The proof-directed de-compilation algorithm has been implemented, which demonstrates the feasibility of our approach.
A ps file. (This version contains an additional note added after the Fuji Symposium on a related work by by Christophe Raffalli on "Machine Deduction". The author is grateful to Yukiyoshi Kameyama for pointing out the existence of Raffalli's paper and its relevance to the present paper soon after the Fuji Symposium, November 1999.)
This paper presents a logical framework for low-level machine code and code generation. We first define a calculus, called sequential sequent calculs, of intuitionistic propositional logic. A proof of the calculus only contains left rules and has a linear (non-branching) structure, which reflects the properties of sequential machine code. We then establish a Curry-Howard isomorphism between this proof system and machine code based on the following observation. An ordinary machine instruction corresponds to a polymorphic proof transformer that extends a given proof with one inference step. A return instruction, which turns a sequence of instructions into a program, corresponds to a logical axiom (an initial proof tree). Sequential execution of code corresponds to transforming a proof to a smaller one by successively eliminating the last inference step. This logical correspondence enables us to present and analyze various low-level implementation processes of a functional language within the logical framework. For example, a code generation algorithm for the lambda calculus is extracted from a proof of the equivalence theorem between the natural deduction and the sequential sequent calculus.
This paper defines an extended polymorphic type system for an ML-style programming language, and develops a sound and complete type inference algorithm. Different from the conventional ML type discipline, the proposed type system allows full rank 1 polymorphism, where polymorphic types can appear in other types such as product types, disjoint union types and range types of function types. Because of this feature, the proposed type system significantly reduces the value-only restriction of polymorphism, which is currently adopted in most of ML-style impure languages. It also serves as a basis for efficient implementation of type-directed compilation of polymorphism. The extended type system achieves more efficient type inference algorithm, and it also contributes to develop more efficient type-passing implementation of polymorphism. We show that the conventional ML polymorphism sometimes introduces exponential overhead both at compile-time elaboration and run-time type-passing execution, and that these problems can be eliminated by our type inference system. Compared with a more powerful rank 2 type inference systems based on semi-unification, the proposed type inference algorithm infers a most general type for any typable expression by using the conventional first-order unification, and it is therefore easily adopted in existing implementation of ML family of languages.
This paper establishes a Curry-Howard isomorphism for compilation and program execution by showing the following facts. (1) The set of {\em A-normal forms}, which is often used as an intermediate language for compilation, corresponds to a subsystem of Kleene's contraction-free variant of Gentzen's intuitionistic sequent calculus. (2) Compiling the lambda terms to the set of A-normal forms corresponds to proof transformation from the natural deduction to the sequent calculus followed by proof normalization. (3) Execution of an A-normal form corresponds to a special proof reduction in the sequent calculus. Different from cut elimination, this process eliminates left rules by converting them to cuts of proofs corresponding to closed values. The evaluation of an entire program is the process of inductively applying this process followed by constructing data structures.
Flexibility of programming and efficiency of program execution are two important features of a programming language. Unfortunately, however, these two features conflict each other in design and implementation of a modern statically typed programming language. Flexibility is achieved by high-degree of polymorphism, which is based on generic primitives in an abstract model of computation, while efficiency requires optimal use of low-level primitives specialized to individual data structures. The motivation of this work is to reconcile these two features by developing a mechanism for specializing polymorphic primitives directed by static type information. We first present a coherent type reconstruction method for transforming an ML style polymorphic language into an explicitly typed intermediate language. We then analyze the existing methods for compiling a record calculus and an unboxed calculus, extract their common structure, and develop a framework for type directed specialization of polymorphism.
This paper develops a typed calculus for contexts i.e., lambda terms with ``holes.'' In addition to ordinary lambda terms, the calculus contains labeled holes, hole abstraction and context application for manipulating first-class contexts. The primary operation for contexts is hole filling which captures free variables. This operation conflicts with the capture-avoiding substitution of the lambda calculus, and a straightforward mixture of the two results in an inconsistent system. We solve this problem by defining a type system that precisely specifies variable-capturing properties of contexts as well as their types, and systematically performing bound variable renaming. This enables us to define a reduction system that properly integrates full beta-reduction and fill-reduction. For this calculus, we prove the subject reduction property and Church-Rosser property. This context calculus will serve a basis for developing programming languages with advanced features that call for manipulation of open terms such as flexible first-class modules.
This paper presents an equational formulation of an object-oriented data model. In this model, a database is represented as a system of equations over a set of oid's, and a database query is an operation to compute a new system of equations from a given system of equations. During the query processing, our model maintains an equivalence relation on oid's that relates oid's corresponding to the same ``real-world entity.'' By this mechanism, the model achieves a declarative set-based query language and views for objects with identity. Moreover, the query primitives are designed so that queries including object traversal can be evaluated in data-parallel fashion.
The motivation of this work is to provide a type theoretical basis for developing a practical polymorphic programming language with labeled records and labeled variants. Our goal in this paper is to establish both a polymorphic type discipline and an efficient compilation method for a calculus with those labeled data structures. We define a second-order polymorphic record calculus as an extension of Girard-Reynolds polymorphic lambda calculus. We then develop an ML style type inference algorithm for a predicative subset of the second-order record calculus, which extends Milner's type inference algorithm and Damas and Milner's account of ML's let polymorphism. The soundness of the type system and the completeness of the type inference algorithm are shown. To establish an efficient compilation method for the polymorphic record calculus, we first define an implementation calculus, where records are represented as vectors whose elements are accessed by direct indexing, and variants are represented as values tagged with a natural number indicating the position of the code in a switch statement. We then develop an algorithm to translate the polymorphic record calculus into the implementation calculus using type information obtained by the type inference algorithm. The correctness of the compilation algorithm is proved, that is, the compilation algorithm is shown to preserve both typing and the operational behavior of a program. Based on these results, Standard ML has been extended with labeled records and its compiler has been implemented.
We present an unboxed operational semantics for an ML-style polymorphic language. Different from the conventional formalisms, the proposed semantics accounts for actual representations of run-time objects of various types, and supports a refined notion of polymorphism that allows polymorphic functions to be applied directly to values of various different representations. In particular, polymorphic functions can receive multi-word constants such as floating-point numbers without requiring them to be ``boxed'' (i.e. heap allocated.) This semantics will serve as an alternative basis for implementing polymorphic languages. The development of the semantics is based on the technique of type inference based compilation for polymorphic record operations. We first develop a lower-level calculus, called a polymorphic unboxed calculus, that accounts for direct manipulation of unboxed values in a polymorphic language. This unboxed calculus supports efficient value binding through integer representation of variables. Different from de Bruijn indexes, our integer representation of a variable corresponds to the actual offset to the value in a run-time environment consisting of objects of various sizes. Polymorphism is supported through an abstraction mechanism over argument sizes. We then develop an algorithm that translates ML into the polymorphic unboxed calculus by using type information obtained through type inference. At the time of polymorphic $let$ binding, the necessary size abstractions are inserted so that a polymorphic function is translated into a function that is polymorphic not only in the type of the argument but also in its size. The ML type system is shown to be sound with respect to the operational semantics realized by the translation.
In order to find a static type system that adequately supports database languages, we need to express the most general type of a program that involves database operations. This can be achieved through an extension to the type system of ML that captures the polymorphic nature of field selection, together with a technique that generalizes relational operators to arbitrary data structures. The combination provides a statically typed language in which generalized relational databases may be cleanly represented as typed structures. As in ML types are inferred, which relieves the programmer of making the type assertions that may be required in a complex database environment.
These extensions may also be used to provide static polymorphic typechecking in object-oriented languages and databases. A problem that arises with object-oriented databases is the apparent need for dynamic typechecking when dealing with queries on heterogeneous collections of objects. An extension of the type system needed for generalized relational operations can also be used for manipulating collections of dynamically typed values in a statically typed language. A prototype language based on these ideas has been implemented. While it lacks a proper treatment of persistent data, it demonstrates that a wide variety of database structures can be cleanly represented in a polymorphic programming language.
Array based data parallel programming can be generalized in two ways to make it an appropriate paradigm for parallel processing of general recursively defined data. The first is the introduction of a parallel evaluation mechanism for recursively defined data, which applies a same function to all the subterms of a given datum in parallel. The second is a new notion of recursion, which we call parallel recursion, for parallel evaluation of recursively defined data. In contrast with an ordinary recursion, which only uses the final results of the recursive calls of its immediate subterms, the new recursion proceeds by simultaneously unfolding the system of equations denoting the set of the results of the subterm of a given datum. This mechanism exploits more parallelism and achieves significant speedup compared to the conventional parallel evaluation of recursive functions. Based on these observations, we propose a typed lambda calculus for data parallel programming and give a natural semantics that integrates the parallel evaluation mechanism and the new form of recursion in the semantic core of a typed lambda calculus. We also describe an implementation method for massively parallel multicomputers, which makes it possible to execute parallel recursion in the expected performance.