engine262.
As an example of what code may look like when compiled by JSSAT, consider a program such as this one (the following is pseudo code - consider it a JavaScript dialect):
function add(a, b) {
return a + b;
}
print(add(1, 2));
print(add("a", "b"));
JSSAT will generate the following code:
print(3);
print("ab");
thanks to its complex type analysis.
The remainder of this blog post details how that type analysis is performed.
A high-level dynamic IR is emitted from some input JavaScript, as per the ECMAScript specification.
An algorithm is used to lift cross-register dependencies in blocks, making the entire program easier to reason about.
An abstract interpretation engine executes the entire program using type information, and records the results.
The program is walked over using the type information produced by abstract interpretation, and a new program is constructed. It has many more methods than the original, typeless program, but everything now has strong types.
Certain optimizations are applied to the program, prior to sending it to LLVM.
The program is compiled with LLVM, and specific instructions are translated into method calls to the JSSAT Runtime where applicable.
Firstly, JavaScript gets converted into JSSAT IR, which is the high-level IR used in JSSAT. JSSAT IR is much lower level than JavaScript, supporting only a limited subset of JS constructs, and is heavily inspired by LLVM IR.
In order to understand how JavaScript turns into JSSAT IR, it is important to know about the ECMAScript specification. This specification specifies how every statement, expression, and language feature should behave. For example, when a + b
is encountered within the source code, the specification states what to do. These instructions on what to do are modeled in JSSAT IR, which make the compiler easier to work on without worrying about the original JavaScript.
JSSAT IR uses SSA form (static single assignment), BBs (basic blocks) and BB parameters. SSA form means that each register (i.e. variable) can only be assigned to once, and there's an infinite number of registers available to use. Mutability is achieved through the use of instructions that operate on records (similar to objects), and through BB parameters. A JSSAT IR function is composed of only basic blocks. A BB:
An example addition function is written below:
fn add(a, b) {
c = a + b
Return c
}
In this example, there are three registers: a
, b
, and c
. The +
operation adds a
and b
together, then stores that into c
. The Return
instruction will return the value stored in c
to the caller. Return
is only valid at the end of the block because it is considered a control flow instruction. The entire body function is contained within one implicitly defined block.
Another example shows passing values to BB parameters:
fn identity(x) {
Jump Value(x)
Value(x):
Return x
}
In this example, it may be easier to conceptualize BB parameters as simply labels, with additional abstract notation to represent passing along the values of registers when the label is jumped to.
Complex control flow is achieved through the use of BB parameters:
fn print_0_to_10() {
i = 0
Jump Check(i)
Check(i):
cond = i <= 10
if cond {
Jump Loop()
} else {
Jump End()
}
Loop():
print(i)
i2 = i + 1
Jump Check(i2)
End():
Return
}
This example demonstrates the usage of BB parameters for mutability. SSA form requires that all registers must be assigned to exactly once. However, in order to allow for control flow, we must be able to assume different values for registers for various iterations. The BB parameters are used to get around this restriction. We can jump to a block, specifying the registers to use for the parameters. After jumping to the block, the execution of the program continues, under the assumption that the values passed to the block are now the new values for that register used elsewhere in the function and other blocks.
For the astute reader, you may recognize that phi nodes and BB parameters are very similar in behavior. LLVM uses phi nodes while JSSAT uses BB parameters. The reasoning for using BB parameters is that it makes abstract interpretation easier to perform and cache results for, compared to phi nodes, which will be explained more in depth in the abstract interpretation section.
Unfortunately, one drawback of BBs in their current form is that the registers declared in one block may be used in another. This creates an implicit co-dependency upon different blocks to know about registers in other blocks. This phase in the compiler seeks to have these dependencies explicitly modeled, so that it becomes easier for the next step in the compiler to reason about the code.
An example of this co-dependency can be shown below:
fn sum(max) {
i = 0
total = 0
Jump Check(i, total)
Check(i, total)
cond = i <= max
if cond {
Jump Loop()
} else {
Jump End()
}
Loop():
print(i)
total2 = total + i
i2 = i + 1
Jump Check(i2, total2)
End():
Return total
}
In the following example, i
and total
are declared in Check
(do note that different registers of the same name are also declared in the first block of sum
), and used in Loop
and End
. This makes reasoning about the code difficult for the compiler.
The algorithm that makes the co-dependency of registers explicit (referred to as "lifting") is as follows:
fn sum(max) {
i = 0
total = 0
Jump Check(i, total)
- Check(i, total)
+ Check(i, total, max)
cond = i <= max
if cond {
Jump Loop()
} else {
Jump End()
}
- Loop():
+ Loop(i, total):
print(i)
total2 = total + i
i2 = i + 1
Jump Check(i2, total2)
- End():
+ End(total):
Return total
}
fn sum(max) {
i = 0
total = 0
- Jump Check(i, total)
+ Jump Check(i, total, max)
Check(i, total, max)
cond = i <= max
if cond {
- Jump Loop()
+ Jump Loop(i, total)
} else {
- Jump End()
+ Jump End(total)
}
Loop(i, total):
print(i)
total2 = total + i
i2 = i + 1
Jump Check(i2, total2)
End(total):
Return total
}
fn sum(max) {
i = 0
total = 0
Jump Check(i, total, max)
Check(i, total, max)
cond = i <= max
if cond {
Jump Loop(i, total)
} else {
Jump End(total)
}
- Loop(i, total):
+ Loop(i, total, max):
print(i)
total2 = total + i
i2 = i + 1
- Jump Check(i2, total2)
+ Jump Check(i2, total2, max)
End(total):
Return total
}
2
through 3
until all blocks both declare and use their registers fn sum(max) {
i = 0
total = 0
Jump Check(i, total, max)
Check(i, total, max)
cond = i <= max
if cond {
- Jump Loop(i, total)
+ Jump Loop(i, total, max)
} else {
Jump End(total)
}
Loop(i, total, max):
print(i)
total2 = total + i
i2 = i + 1
Jump Check(i2, total2, max)
End(total):
Return total
}
And now that every block has no co-dependencies, we can recognize that these are extremely similar to functions. Thus, it is best to think of them as functions conceptually.
fn sum_Entry(max) {
i = 0
total = 0
Jump sum_Check(i, total, max)
}
fn sum_Check(i, total, max) {
cond = i <= max
if cond {
Jump sum_Loop(i, total, max)
} else {
Jump sum_End(total)
}
}
fn sum_Loop(i, total, max) {
print(i)
total2 = total + i
i2 = i + 1
Jump sum_Check(i2, total2, max)
}
fn sum_End(total) {
Return total
}
The ability to reduce a function full of control flow into small, simple blocks is crucial in reducing the complexity for the next phase: abstract interpretation.
Abstract interpretation is the act of executing a program, but abstractly - that is, instead of concrete values such as true
or false
, a method may be executed with the abstract value Boolean
. This abstract value would cause control flow in if statements
to take both paths simultaneously. Of course, sometimes it is unnecessary to take both paths, which is why true
and false
are also valid values according to the abstract interpretation engine.
When performing abstract interpretation, we start at the main function. From there, we abstractly interpret the entire program until we have traversed over all possible paths. By doing so, we'll be able to determine what values a function accepts and returns.
Consider the below example:
fn main_Entry() {
a = 1
b = 2
c = add_Entry(a, b)
print(c)
}
fn add_Entry(a, b) {
c = a + b
Return c
}
When performing abstract interpretation, we record the type of every register, what every function is called with, and its return value. The above program would get to add_Entry
, and then need to abstractly interpret that function as well. It knows that a
and b
are constant values (1
and 2
), so it abstractly interprets a function whose parameters are assumed to be as such.
fn add_Entry(a = 1, b = 2) {
c = a + b
Return c
}
When we abstractly interpret this function, we will then learn that this function returns 3
. Now, we never have to execute this function with arguments 1
and 2
again - we know that the return type will always be 3
. Thus, we produce an annotated program:
fn main_Entry() {
a: 1 = 1 // a is an Integer with value 1
b: 2 = 2 // b is an Integer with value 2
c: 3 = add_Entry(a, b) // c is of type Integer with value 3
print(c)
}
// we store that we abstractly interpreted this function
// "when we execute `add_Entry` with a `1` and a `2`, we get a `3`"
fn add_Entry(a: 1, b: 2) -> 3 {
c: 3 = a + b
Return c
}
This is the essence of how abstract interpretation allows us to figure out the types of functions in our program. Eventually, we will know the possible values for every register at every state in our program. This information is critical for emitting the code as LLVM IR.
Marking which functions we have called and what their return type is should shed some light on why the previous phases are so important. Control flow becomes much easier to deal with, as we only have to worry about blocks as a series of linear instructions before it comes to a conditional hop. This makes it easier to reuse results from previous abstract interpretations as well.
Not everything is sunshine and rainbows for abstract interpretation. There are various edge cases we must consider. I encourage you to ponder about what these edge cases may be before proceeding.
What is the return type of recurse_Entry
in the following program?:
fn recurse_Entry() {
a = recurse_Entry()
Return a
}
The abstract interpretation engine will assign the special return type Never
to this function. It knows this, because it recurses. When the engine encounters recurse_Entry
, it will check if this function has been executed before, and what the return type will be. The abstract interpretation engine sees that recurse_Entry
has never been executed before, so it will mark it as "in progress". When recurse_Entry
is called, it sees that it is still "in progress", so it determines that the path we're on is impossible to compute. Thus, it assigns the type Never
to recurse_Entry
.
Never
is not a type in the traditional sense of an integer or boolean. It does not even return no value (void
), it simply does not return. Knowing that a function never returns allows us to make optimizations accordingly.
What is the return type of next_Entry
when executed with a value of n = 0
in the following program?:
fn next_Entry(n) {
cond = n <= 1000000000000
if cond {
Jump next_Next(n)
} else {
Jump next_End()
}
}
fn next_Next(n) {
i = next_Entry(n + 1)
Return i + 1
}
fn next_End() {
Return 0
}
According to the current known rules, the abstract interpretation engine would spend a significant amount of time to execute this one single function. However, the abstract interpretation engine will assign the return type Integer
to this function. When the abstract interpretation engine attempts to execute next_Entry(n = 0)
, it will need to compute next_Entry(n = 1)
, next_Entry(n = 2)
, next_Entry(n = 3)
, and so on. If it had to continue doing this, it would spend forever. Thus, there is an arbitrary limit defined to how deep functions may be executed with concrete values until they are promoted to generic values.
Upon reaching this threshold, next_Entry
is executed for n = Integer
. Thus, cond
becomes a Boolean
, and we take both paths. One of the return paths requires evaluating next_Entry
, and thus becomes a Never
type. The other is an Integer
, and the unification of Never | Integer
is Integer
. This may be clarified with the annotated code below:
fn next_Entry(n = Integer) {
cond = n <= 1000000000
// cond = Integer <= 1000000000
// cond = Boolean
if cond {
// if Boolean {
Jump next_Next(n)
// Jump next_Next(Boolean)
// -> `Never`
} else {
Jump next_End()
// -> `0` (=> generalized =>) `Integer`
}
}
fn next_Next(n = Integer) {
i = next_Entry(n + 1)
// i = Never
// because we know that `i` is `Never`, nothing after that will execute,
// we can stop now and annotate that the return type is `Never`
}
fn next_End() {
Return 0
}
What is the return type of either_Entry
when executed with a value of n = Integer
in the following program?:
fn either_Entry(n) {
cond = n <= 10
if cond {
Jump either_Recurse(n)
} else {
Jump either_End()
}
}
fn either_Recurse(n) {
x = either_Entry(n + 1)
Return to_string(x)
}
fn either_End() {
Return 7
}
Even though it is possible for this function to return a string, according to our current rules, this function would return only the integer of value 7
. This is because either_Recurse
will recurse on either_Entry
, calling it with n = Integer
(as Integer - 1
produces an Integer
), which will cause the abstract interpretation engine to incorrectly determine that the true branch returns Never
. However, the false path will always produce 7
, and thus the function gets the return type "integer of value 7
".
fn either_Entry(n) {
cond = n <= 10
// cond = Integer <= 10
// cond = Boolean
if cond {
// if Boolean {
Jump either_Recurse(n)
// Jump either_Recurse(Integer)
// => return type is `Never`
} else {
Jump either_End()
// => `7`
}
// unify `Never` with `7` => the function returns `7`
}
fn either_Recurse(n) {
x = either_Entry(n + 1)
// x = either_Entry(Integer)
// => either_Entry(Integer) is "InProgress", thus exit with type `Never`
Return to_string(x)
}
fn either_End() {
Return 7
}
A problem with this, is that there exists an integer value which will cause the return type to be a string: 10
. When evaluating this function with a constant value 10
, it is easier to see this:
fn either_Entry(10) {
cond = 10 <= 10
// cond = true
if cond {
// if True {
Jump either_Recurse(10)
// Jump either_Recurse(10)
} else {
Jump either_End()
}
}
fn either_Recurse(10) {
x = either_Entry(10 + 1)
// x = either_Entry(11)
Return to_string(x)
// Return "7"
}
fn either_Entry(11) {
cond = 11 <= 10
// cond = false
if cond {
// if False {
Jump either_Recurse(11)
} else {
Jump either_End()
}
}
fn either_End() {
Return 7
}
The abstract value Integer
must accurately represent all possible paths for all integers, but with the algorithm in use in the previous step, it fails on this scenario.
Thus, a solution is to use least fixed point to repeatedly abstractly interpret the function once we gain new type information about it. This is only necessary in scenarios where conditional control flow (i.e. if
) and abstract values (e.g. Boolean
, Integer
) are used.
For example, let us continue executing the previous programs, beginning with the assumption that it only returns an integer of value 7:
fn either_Entry(n) {
cond = n <= 10
// cond = Integer <= 10
// cond = Boolean
if cond {
// if Boolean {
Jump either_Recurse(n)
// Return `"7"`
} else {
Jump either_End()
// Return `7`
}
}
fn either_Recurse(n) {
x = either_Entry(n + 1)
// x = 7
Return to_string(x)
// Return to_string("7")
// Return "7"
}
fn either_End() {
Return 7
}
After a second abstract interpretation, we then learn that either_Entry(n = Integer)
returns "7" | 7
. This process is repeated until we no longer learn anymore information from future abstract interpretations.
Thanks to abstract interpretation, we know about all possible types functions will be executed with and return. In this phase, the type annotations produced by abstract interpretation are used to produce a new program. This new program uses the code from the original block, but annotates each register with the types that abstract interpretation assigned them. Thus, we get a program where we know the types for every register.
Consider the following program which prints 0, 1, and 2:
fn print_0_to_2_Entry() {
i = 0
Jump print_0_to_2_Check(i)
}
fn print_0_to_2_Check(i) {
cond = i <= 2
if cond {
Jump print_0_to_2_Loop(i)
} else {
Jump print_0_to_2_End()
}
}
fn print_0_to_2_Loop(i) {
print(i)
i2 = i + 1
Jump print_0_to_2_Check(i2)
}
fn print_0_to_2_End() {
Return
}
Upon abstract interpretation, we'd end up with the following history of abstract interpretation results:
fn print_0_to_2_Entry() -> Jump print_0_to_2_Check(i: 0)
fn print_0_to_2_Check(i: 0) -> Jump print_0_to_2_Loop (i: 0)
fn print_0_to_2_Loop (i: 0) -> Jump print_0_to_2_Check(i: 1)
fn print_0_to_2_Check(i: 1) -> Jump print_0_to_2_Loop (i: 1)
fn print_0_to_2_Loop (i: 1) -> Jump print_0_to_2_Check(i: 2)
fn print_0_to_2_Check(i: 2) -> Jump print_0_to_2_Loop (i: 2)
fn print_0_to_2_Loop (i: 2) -> Jump print_0_to_2_Check(i: 3)
fn print_0_to_2_Check(i: 3) -> Jump print_0_to_2_End ()
fn print_0_to_2_End (i: 3) -> Void
This phase of the compiler then uses those results to generate a new program, one where all function calls are specialized. The following program may be generated:
fn print_0_to_2_Entry() {
i: 0 = 0
Jump print_0_to_2_Check_0(i)
}
fn print_0_to_2_Check_0(i: 0) {
cond: true = i <= 2
Jump print_0_to_2_Loop_0(i)
}
fn print_0_to_2_Loop_0(i: 0) {
print(i)
i2: 1 = i + 1
Jump print_0_to_2_Check_1(i2)
}
fn print_0_to_2_Check_1(i: 1) {
cond: true = i <= 2
Jump print_0_to_2_Loop_1(i)
}
fn print_0_to_2_Loop_1(i: 1) {
print(i)
i2: 2 = i + 1
Jump print_0_to_2_Check_2(i2)
}
fn print_0_to_2_Check_2(i: 2) {
cond: true = i <= 2
Jump print_0_to_2_Loop_2(i)
}
fn print_0_to_2_Loop_2(i: 2) {
print(i)
i2: 3 = i + 1
Jump print_0_to_2_Check_3(i2)
}
fn print_0_to_2_Check_3(i: 3) {
cond: false = i <= 2
Jump print_0_to_2_End()
}
fn print_0_to_2_End() -> Void {
Return
}
Then, if the compiler ever encounters a call
instruction to print_0_to_2_Entry
, it can follow the control flow path to generate a function as follows:
fn print_0_to_2() -> Void {
print_0_to_2_Entry():
i: 0 = 0
Jump print_0_to_2_Check_0(i)
print_0_to_2_Check_0(i: 0):
cond: true = i <= 2
Jump print_0_to_2_Loop_0(i)
print_0_to_2_Loop_0(i: 0):
print(i)
i2: 1 = i + 1
Jump print_0_to_2_Check_1(i2)
print_0_to_2_Check_1(i: 1):
cond: true = i <= 2
Jump print_0_to_2_Loop_1(i)
print_0_to_2_Loop_1(i: 1):
print(i)
i2: 2 = i + 1
Jump print_0_to_2_Check_2(i2)
print_0_to_2_Check_2(i: 2):
cond: true = i <= 2
Jump print_0_to_2_Loop_2(i)
print_0_to_2_Loop_2(i: 2):
print(i)
i2: 3 = i + 1
Jump print_0_to_2_Check_3(i2)
print_0_to_2_Check_3(i: 3):
cond: false = i <= 2
Jump print_0_to_2_End()
print_0_to_2_End():
Return
}
This function is fully type aware. If you asked for the type of of any register, you would get back an answer. Here, the first i
is of type integer with the exact value 0
. The first occurrence of cond
is of type boolean with the exact value true
.
The program from the previous section is fairly unoptimized, and something should be done about it. At this stage, we now have full knowledge about the types of all registers, and can apply a variety of optimization passes.
Optimization passes in JSSAT must not optimize too much: LLVM handles a lot of optimizations that JSSAT does not need to handle. However, the kinds of optimizations that JSSAT does perform typically revolve around:
although this is not a comprehensive list.
Presently, there is one optimization pass available: constant elimination.
Because of abstract interpretation, we know the exact type (i.e., the type carries the value) of i
in each block. We also know the exact type of cond
. We already know if any registers have constant values - thus, constant propagation has already happened for "free". Thus, this optimization is about removing already known constants where possible.
Constant elimination can occur because we know the exact types of registers, and the parameters that blocks accept. This optimization pass does all of the following:
call
instructions, where the function being called is a foreign function)With the optimizations as described above, the program would look like this after those optimization passes:
fn print_0_to_2() {
print_0_to_2_Entry():
Jump print_0_to_2_Check_0()
print_0_to_2_Check_0():
Jump print_0_to_2_Loop_0()
print_0_to_2_Loop_0():
i = 0
print(i)
Jump print_0_to_2_Check_1()
print_0_to_2_Check_1():
Jump print_0_to_2_Loop_1()
print_0_to_2_Loop_1():
i = 1
print(i)
Jump print_0_to_2_Check_2()
print_0_to_2_Check_2():
Jump print_0_to_2_Loop_2()
print_0_to_2_Loop_2():
i = 2
print(i)
Jump print_0_to_2_Check_3()
print_0_to_2_Check_3():
Jump print_0_to_2_End()
print_0_to_2_End():
Return
}
Last but not least, the program is compiled into LLVM IR. LLVM IR differs significantly from (typed/transformed) JSSAT IR. Because LLVM IR is geared towards languages like C, C++, Rust, etc., it has a different scope than JSSAT IR. Each LLVM IR "module" has a set of:
When JSSAT IR is converted to LLVM IR, it produces an LLVM IR module that requests functions from the JSSAT Runtime (JSSATRT). The preamble of such a module looks something like the following (much omitted for brevity):
; ModuleID = 'jssat'
source_filename = "jssat"
%struct.Runtime = type opaque
declare %struct.Runtime* @jssatrt_runtime_new()
Here, we can see that our newly created JSSAT module requests a struct.Runtime
import and jssatrt_runtime_new
method. When the code is compiled, the system linker will be told where the generated JSSAT module and JSSATRT is, and be able to call the jssatrt_runtime_new
method located inside of the JSSATRT. The runtime is used for providing operations such as garbage collection, maps (objects, { key ↦ value }
pair mappings), and most likely more in the future.
Lets take a look at some real LLVM output. Below is some pseudo code for the input program:
function print_stub(msg) {
print(msg);
}
function sum(max) {
let total = 0;
for (let i = 0; i < max; i++) {
print_stub(i);
total += i;
}
return total;
}
print_stub("Hello, World!");
sum(3);
And following is the the generated LLVM IR:
; ModuleID = 'jssat'
source_filename = "jssat"
%struct.Runtime = type opaque
%struct.Any = type opaque
%struct.String = type opaque
@0 = private unnamed_addr constant [26 x i8] c"H\00e\00l\00l\00o\00,\00 \00W\00o\00r\00l\00d\00!\00"
declare %struct.Runtime* @jssatrt_runtime_new()
declare void @jssatrt_print_any(%struct.Runtime*, %struct.Any*)
declare %struct.Any* @jssatrt_any_new_int(%struct.Runtime*, i64)
declare %struct.String* @jssatrt_string_new_utf16(%struct.Runtime*, i16*, i64)
declare %struct.Any* @jssatrt_any_new_string(%struct.Runtime*, %struct.String*)
define void @main() {
%1 = call %struct.Runtime* @jssatrt_runtime_new()
call void @2(%struct.Runtime* %1)
%2 = call i64 @5(%struct.Runtime* %1)
ret void
}
define void @2(%struct.Runtime* %0) {
%2 = call %struct.String* @jssatrt_string_new_utf16(%struct.Runtime* %0, i16* bitcast ([26 x i8]* @0 to i16*), i64 13)
%3 = call %struct.Any* @jssatrt_any_new_string(%struct.Runtime* %0, %struct.String* %2)
call void @jssatrt_print_any(%struct.Runtime* %0, %struct.Any* %3)
ret void
}
define i64 @5(%struct.Runtime* %0) {
br label %6
2: ; preds = %8
call void @4(%struct.Runtime* %0)
br label %4
3: ; preds = %9
br label %5
4: ; preds = %2
br label %7
5: ; preds = %3
call void @1(%struct.Runtime* %0)
br label %8
6: ; preds = %1
br label %9
7: ; preds = %4
ret i64 3
8: ; preds = %5
br label %2
9: ; preds = %6
call void @3(%struct.Runtime* %0)
br label %3
}
define void @4(%struct.Runtime* %0) {
%2 = call %struct.Any* @jssatrt_any_new_int(%struct.Runtime* %0, i64 2)
call void @jssatrt_print_any(%struct.Runtime* %0, %struct.Any* %2)
ret void
}
define void @1(%struct.Runtime* %0) {
%2 = call %struct.Any* @jssatrt_any_new_int(%struct.Runtime* %0, i64 1)
call void @jssatrt_print_any(%struct.Runtime* %0, %struct.Any* %2)
ret void
}
define void @3(%struct.Runtime* %0) {
%2 = call %struct.Any* @jssatrt_any_new_int(%struct.Runtime* %0, i64 0)
call void @jssatrt_print_any(%struct.Runtime* %0, %struct.Any* %2)
ret void
}
Present in the IR, we can see a lot of runtime machinery, such as setting up an instance of the runtime, and allocating strings and integers and an Any
type. This any type is necessary, because our print function, jssaatrt_print_any
, accepts an Any
to print, and values are implicitly boxed into values that match the FFI signature when possible. This machinery may be eliminated at some point in the future, but it is not a huge priority at the moment.
Most notably, we can see the exact structure of our program, and the specialization that the abstract interpretation engine performed in action. The @2
function correlates with our print_stub
function, and it is specialized to only print Hello, World!
. In addition, @5
corresponds to our sum
function, which calls print_stub
on every iteration. We can see the explicit control flow of our sum
function, like in the previous section, and the occasional calls to another method - which correspond to more specialized versions of print_stub
.
One of the things mentioned in optimization is that we don't want to perform optimizations that LLVM is capable of. For example, we have a lot of control flow and calls to functions that are used once. Let us examine the above LLVM IR, after optimization:
; ModuleID = 'jssat'
source_filename = "jssat"
%struct.Runtime = type opaque
%struct.Any = type opaque
%struct.String = type opaque
@0 = private unnamed_addr constant [26 x i8] c"H\00e\00l\00l\00o\00,\00 \00W\00o\00r\00l\00d\00!\00"
declare void @jssatrt_print_any(%struct.Runtime*, %struct.Any*) local_unnamed_addr
declare %struct.String* @jssatrt_string_new_utf16(%struct.Runtime*, i16*, i64) local_unnamed_addr
declare %struct.Any* @jssatrt_any_new_int(%struct.Runtime*, i64) local_unnamed_addr
declare %struct.Runtime* @jssatrt_runtime_new() local_unnamed_addr
declare %struct.Any* @jssatrt_any_new_string(%struct.Runtime*, %struct.String*) local_unnamed_addr
define void @main() local_unnamed_addr {
%1 = tail call %struct.Runtime* @jssatrt_runtime_new()
%2 = tail call %struct.String* @jssatrt_string_new_utf16(%struct.Runtime* %1, i16* bitcast ([26 x i8]* @0 to i16*), i64 13)
%3 = tail call %struct.Any* @jssatrt_any_new_string(%struct.Runtime* %1, %struct.String* %2)
tail call void @jssatrt_print_any(%struct.Runtime* %1, %struct.Any* %3)
%4 = tail call %struct.Any* @jssatrt_any_new_int(%struct.Runtime* %1, i64 0)
tail call void @jssatrt_print_any(%struct.Runtime* %1, %struct.Any* %4)
%5 = tail call %struct.Any* @jssatrt_any_new_int(%struct.Runtime* %1, i64 1)
tail call void @jssatrt_print_any(%struct.Runtime* %1, %struct.Any* %5)
%6 = tail call %struct.Any* @jssatrt_any_new_int(%struct.Runtime* %1, i64 2)
tail call void @jssatrt_print_any(%struct.Runtime* %1, %struct.Any* %6)
ret void
}
As we can see, our code is now optimized significantly better. What is left is runtime machinery, and can be further optimized using Link Time Optimization. Code that originally had conditionals, comparisons, loops and nested calls, is effectively compile-time evaluated.
There is additional work not covered in this conceptual overview that must be done to complete the process of converting JavaScript to LLVM IR. For starters, some of the techniques mentioned in this post are not yet done in practice. To get an idea of what needs to be done, check out CONTRIBUTING.md in the JSSAT repository.
Hopefully, this post has served to shed new insights on how JSSAT can compile JavaScript into LLVM IR. This post is only meant to be a conceptual overview of how the JSSAT compiler works on a high level. If this project sounds interesting to you, please be sure to come and contribute as compiling JS into LLVM IR is a long and arduous task, rife with extremely complicated problems that need to be solved. All sorts of contributions are welcome.