ShapeSecurity's Javascript VM: Part 1
Intro
ShapeSecurity's Javascript Virtual Machine(VM) has a remarkable reputation for being extremely hard to bypass and reverse. Their primary clients consists of corporations that require the highest level of security when it comes to protecting their API endpoints.
This sort of clientele represents a wide range of banks, e-commerce, utilities and government sites. Their VM is used as way to obfuscate and make reversal harder for their client-side script(Javascript) protection that employs browser fingerprinting mechanisms called signals to detect, fight, stop bots and/or automation scripts.
They go through great lengths to make their proprietary VM a pain-in-the-ass to reverse by dynamically changing numerous properties inside their script, while at the same time, rendering unsuitable to reverse without the proper tools and knowledge.
Finding the Kernel and the Launcher Script
Many have probably come across ShapeSecurity's protection if they had sniffed some http(s) packets or simply opened their Chrome Developer tools to peek around their Network tab to find these weird headers
"x-gyjwza5z-a": "eEPPTs8Gv8bDC3W7dfI....",
"x-gyjwza5z-b": "-nta7wv",
"x-gyjwza5z-c": "AACsnsyMAQAANC0eakpVcMOGQ0SG799YTeslDYQQcjuLaZTjyDgDRryIY6nL",
"x-gyjwza5z-d": "ABaAhIDBCKGFgQGAAYIQgISigaIAwBGAzvpizi_33wc4A0a8iGOpy_______VsEcAP-Ku439tb1K_yH3AYXKR_4",
"x-gyjwza5z-f": "A9BJosyMAQAAZpEmnOwCZxjY3txkFz1zX7rKF9WpV6wa-RhIXP4mWu_qazqMAY2uWsmucirJwH8AADQwAAAAAA==",
"x-gyjwza5z-z": "q"
Those headers transmit a set of encrypted signals that are used to fingerprint every device that touches their clients protected endpoints. The payload data is inside all the -A header(s).
This is done primarily through 2 scripts:
- Kernel - The kernel is a statically pre-compiled VM seed containing signals used for fingerprinting. Each seed contains dynamic encryption keys, xor bytes used to encode each positional signal, and a randomized signal order. It is important to notice that they compile and distribute a new seed every 30 minutes for each of their clients site. Making it uber difficult to reverse if you don't possess the skills and especially the tools to decompile their seeds.
- Launcher - This is sort of like the configuration script that launches the Kernel file and loads it into a script tag, it passes all their configuration through the firing of a custom init event. The encryption keys and the custom base64 alphabet are some of the arguments they pass into the kernel. This script is dynamically generated on a per-request base.
Kernel
Something that is quite refreshing is that the kernel is not obfuscated using any public Javascript obfuscators which is a breath-of-fresh air as most anti-bots use open-source public obfuscators to protect their client-side scripts.
This is a sample of their kernel script. At first glance, it looks horrifying, something out of a scary movie. The script has all of their properties mangled acting as a strong deterrent to those wishing to reverse their VM, this causes very few to even attempt at making sense of their script.
If we simply beautify their obfuscated kernel we can get something that looks like this Even though, we have beautified the script to render it more useful to read, you will see that this will not be enough to help us out understand what it does.
Renaming the Kernel Properties
At the beginning of my journey, I wrote a series of AST transformations using Babel to rename every function, variable, property inside the kernel to make further parsing and analyzing easier to perform.
This is what my beautified version of ShapeSecurity's kernel looks like. Right away, we can
see that I've renamed all the top level functions, variables, properties inside their kernel script and came up with own naming scheme.
Internals
ShapeSecurity's VM internals are supposed to replicate what would have been originally normal Javascript code into an emulator that reads bytecode and executes a set of operations(ops) that replicates the logic of the Javascript code they compiled into bytecode. In other words, it turns expressions, ifs, loops, try, and label statements into bytecode rendering futile any attempt at following along without a proper way to debug the code. Humans are able to read code from top-to-bottom, not from left-to-right when the code extends to hundreds or even thousands of lines, for that matter, we cannot read bytecode.
For that reason alone, working with ShapeSecurity's VM, requires the need to build a proper decompiler. A proper decompiler would be one that can trace every single path of the bytecode and output Javascript code equivalent to the traced bytecode. It should be noted that certain things like variable names will be next to imposssible to retrieve, but the structures like if,loop, try statements, etc. can be retrieved from the bytecode by following the ops it executes.
I'm going to categorize the internals into 2 parts:
- VM Machinery : These functions are the primary components of the VM. They implement all the logic needed to execute the bytecode. They
infrequently ever get updated to new code. - VM Data : These are arrays, lists, objects that the VM requires to operate. They include things such as the bytecode, ops, the thread config, xor strings, array of numbers etc. The bytecode and the ops changes do change from seed to seed.
VM Machinery
The VM Machinery consists primarily of 8 functions that power the VM:
The unit-of-execution in ShapeSecurity's VM are what I like to refer as threads which can be regarded as functions. These threads have access to all the VM Data objects and each thread also contains a private registry instance (vmMemory) with each key in the registry passed from the parent thread or set to void 0 on initiation.
What ends up happening is that at the end of every VM script there is a starting thread, one I like to call thread 0_0. This is the thread that initializes the VM. From there there is one more thread that is created inside thread 0_0 and then a major one that practically contains a tons of other threads that are initialized and called throughout.
I need to be frank with y'all. When I first started reversing ShapeSecurity's VM, I had no prior experience with VMs, how they work or even how to reverse them. So the names that I gave these 8 functions were chosen with an initial guess of what they did. Some of the names do not match the correct descriptive name that they should have been given, but am a great fan of keeping legacy code alive. With that being said, I'm going to talk about these 8 functions in more detail.
1. thread()
function thread(_yIndex, _xIndex, _vmMemory, initializeKeys, transferredKeys, keysFromArgs, argsKey, workResultKey) {
var S = new vmMemory();
var c, X, o;
var w = argsKey !== void 0;
for (c = 0, X = transferredKeys.length; c < X; ++c) {
S.storage[transferredKeys[c]] = _vmMemory.storage[transferredKeys[c]];
}
o = work(_yIndex, _xIndex, S, initializeKeys, keysFromArgs, w, argsKey);
if (workResultKey !== void 0) {
S.setKeyAsUndefined(workResultKey);
S.setKey(workResultKey, o);
}
return o;
}
The thread() function creates a new vmMemory instance, transfers keys from the transferredKeys array and calls work() which returns a callback that can later be used to run the thread/function. As you can tell, thread() merely sets up a vmMemory instance that is later passed into work(), then if the workResultKey is not defined it will setup the callback to a specific memory key inside the vmMemory. This function acts as a sort of wrapper for the function work()
The _yIndex represents the exact location in the bytecode used to start the thread(). The _xIndex is only used to access the thread configuration inside the OPS_SEQUENCE data object. They both go together as _xIndex represents the first dimensional index and _yIndex represents the second dimensional index.
The _vmMemory argument is the vmMemory instance that is passed down from the parent's thread. In the case of the initial thread(0_0
) a newly empty vmMemory instance is passed. The vmMemory is only used to set the initial memory keys that are set on creation of a new thread which are:
-
initializeKeys: These consist of all the keys that are going to be set to void 0 at the thread creation. This argument is an array.
-
transferredKeys: These are the keys that are copied from the the vmMemory instance being passed onto the newly created vmMemory instance for this thread. This argument is an array.
-
workResultKey: This key is rarely set, but when is set it represents the
this
property of the calling thread. Is essentially used as a way to reference itself during the execution of the thread.
Note: It is important to notice that only the keys are copied specified inside transferredKeys array from the passed vmMemory instance onto the
storage
array of the newly created vmMemory instance. The values inside those transferred keys holdmutable
data. Any changes perform on themutable
data will reflect on other places holding a reference to thatmutable
data.
The other array of keys are passed down to the work() function because they are set every time the callback from calling the work() is called.
-
keysFromArgs: This is an array of keys that represents arguments array-like object. In other words, for each item in the array, the index represents index inside arguments and the value represents the key that will be used to store it in the vmMemory instance.
-
argsKey: This is not an array but the key used to store the array-like object arguments inside the vmMemory instance.
2. work()
function work(y, x, memory, initializeKeys, keysFromArgs, argsKeyIsNotUndefined, argsKey) {
var J = keysFromArgs.length;
var P = function () {
"use strict";
var X = memory.clone();
var w = new vmContext(y, x, X, this);
var K,
s,
q = Math.min(arguments.length, J);
if (argsKeyIsNotUndefined) {
X.setKeyAsUndefined(argsKey);
X.setKey(argsKey, arguments);
}
for (K = 0, s = initializeKeys.length; K < s; ++K) {
X.setKeyAsUndefined(initializeKeys[K]);
}
for (K = 0; K < q; ++K) {
X.setKey(keysFromArgs[K], arguments[K]);
}
for (K = q; K < J; ++K) {
X.setKey(keysFromArgs[K], void 0);
}
return vmRunner(w);
};
return P;
}
The work() function creates a callback that executes the following logic:
-
- Clones the passed vmMemory instance from
memory
.
- Clones the passed vmMemory instance from
-
- Creates a new vmContext instance.
-
- Sets
arguments
onto theargsKey
ifargsKey
is not undefined
- Sets
-
- Sets all keys defined inside
initializeKeys
tovoid 0
.
- Sets all keys defined inside
-
- Finds the minimum of the length between
arguments
length andkeysFromArgs
length then sets an equal amount of keys insidememory
from the parameters being passed every time thecallback
is called.
- Finds the minimum of the length between
-
- Executes vmRunner() passing the newly created vmContext instance
3. createThread()
function createThread(key, _yIndex, _xIndex, _vmMemory) {
"use strict";
var s = THREAD_CONFIG[key];
return thread(_yIndex, _xIndex, _vmMemory, s.initializeKeys, s.transferredKeys, s.keysFromArgs, s.argsKey, s.workResultKey);
}
The createThread() function is just a wrapper around the thread() function because, with the exception of the initial thread, the function thread() is never called directly inside the ops. Instead, createThread() is used inside the ops and a thread config object is used to configure the memory keys.
4. vmRunner()
function vmRunner(_vmContext) {
var s, w;
for (;;) {
if (returnValue !== explicitReturn) {
w = returnValue;
returnValue = explicitReturn;
return w;
}
s = _vmContext.next();
if (_vmContext.errors.length === 0) {
OPS_FUNCTIONS[s](_vmContext);
} else {
tryCatchSomething(OPS_FUNCTIONS[s], _vmContext);
}
}
}
The vmRunner() function is basically the engine that runs the thread logic. It runs in an infinite loop until
returnValue
does not equal explicitReturn
. The only argument it takes, _vmContext
, is a vmContext instance.
returnValue
contains the value that is returned when the thread's execution ends. When returnValue
is set to explicitReturn
then it tells the vmRunner() function to keep running while is running.
The only times returnValue
changes is when an op sets the value of returnValue
to:
-
- The last item in the
stack
- The last item in the
-
- An empty object(
{}
)
- An empty object(
-
- Last but not least,
void 0
akaundefined
- Last but not least,
returnValue
is only set to explicitReturn
when the script is initiated and inside the vmRunner() before it returns.
The execution of each op occurs during the _vmContext.next() call which returns the opKey
. This opKey
is used to grab an op
from the OPS_FUNCTIONS array that contains all the ops
of ShapeSecurity's VM.
Now the last part is extremely important and due to my limited understanding regarding ShapeSecurity's VM internals at the time of labelling, _vmContext.errors
was erroneously named .errors
instead of something more like a tryCatchScopeArray
or something of that sort.
ShapeSecurity's VM has two ways of executing an op
. One way is by calling the op function from the OPS_FUNCTION array, the other way is by passing that same op function to tryCatchSomething() function.
By executing the op function inside the tryCatchSomething(), ShapeSecurity's VM is able to catch any exceptions that might arise through the execution of the op. This is how ShapeSecurity's VM implements TryCatch statements.
5. vmContext()
function vmContext(_yIndex, _xIndex, _vmMemory, _thisValue) {
this.stack = [];
this.errorTracker = [];
this.errors = [];
this.initiatedAsUndefined = void 0;
this.xIndex = _xIndex;
this.yIndex = _yIndex;
this._vmMemory = _vmMemory;
this.someGlobalThis = _thisValue == null ? window : Object(_thisValue);
this.thisValue = _thisValue;
this.xyIndex = 0;
}
var L = vmContext.prototype;
Object.defineProperty(L, "next", {
value: function () {
{
var w = OPS_SEQUENCE[this.xIndex][HEAP[this.yIndex++]];
this.xIndex = w[0];
return w[1];
}
}
});
This is by far the most important part of ShapeSecurity's VM as everything that happens inside each op
is done on an vmContext instance.
The vmContext() function takes 4 simple arguments:
-
The
_xIndex
and the_yIndex
has been discussed before as being the indexes for accessing the next op function. The way it works is that OPS_SEQUENCE is an array that contains two-dimensional array. The first dimensional index is grabbed from the_xIndex
and the second dimensional index is derived from_yIndex
. This all happens during thenext()
call which also sets the next_xIndex
value for when the next op is called. -
The
_vmMemory
argument is a vmMemory instance used as a registry to set and get keys. This is also how keys from other threads are passed thru here. -
Last but not least,
_thisValue
represents thewindow
object from the browser when called the first time, otherwise is just a Javascript object.
Now regarding the internal properties of vmContext we can split it into 5 parts:
-
The places where data is stored consists of the
stack
and the_vmMemory
properties. Thestack
property is just an array that is used, as the name suggests, as a stack array where values are pushed, popped, and set. This is because ShapeSecurity's VM is a stack-based VM. The_vmMemory
, which holds a vmMemory instance, is sort of like the global scope that the thread has access too. -
The error handling part was erroneously named. These 2 properties include the
errorTracker
and theerrors
properties. Theerrors
property is an array that containsTryCatch
scopes. For simpleTryStatements
only containing atry
and acatch
side, theerrorTracker
property holds the handled scopes for when they were handled. TheTryCatch
mechanisms thatShapeSecurity VM
uses will be explained in more detail in a latter part. -
The properties
someGlobalThis
andthisValue
are used for referencing thethis
value and thewindow
object. It is not easy to understand them at first since you only see them in a few places being used, but they are essentially used as some sort of glue code for classes that were compiled into simple functions. -
The
xyIndex
is a property that is used to hold a return address for a specific type of op. This kind ofop
will be explained in more detail in a latter post as it will be obvious once we go over theops
inside ShapeSecurity's VM . -
Now the last part is the
What-the-hell
is this property. I say theWhat-the-hell
because this property,initiatedAsUndefined
is never used at all in any parts of the ShapeSecurity's VM . Is almost like it was left there for legacy issues or the ShapeSecurity's VM forgot to eliminate it. Honestly, who the hell knows. Just ignore this property as it is never even used.
6. vmMemory()
function vmMemory() {
this.storage = [];
}
Object.defineProperty(H, "setKeyAsUndefined", {
value: function (_a) {
this.storage[_a] = {
v: void 0
};
}
});
Object.defineProperty(H, "getKey", {
value: function (_a) {
return this.storage[_a].v;
}
});
Object.defineProperty(H, "setKey", {
value: function (_a, _b) {
this.storage[_a].v = _b;
}
Object.defineProperty(H, "clone", {
value: function () {
var w = new vmMemory();
w.storage = [].slice !== [].slice ? this.storage.slice(0) : this.storage.slice(0);
return w;
}
});
});
The vmMemory() function acts a registry component for ShapeSecurity's VM . It is very simple in nature, only having 1 internal property(storage
) and 4 method calls.
Something very interesting is how ShapeSecurity's VM stores keys inside the vmMemory. It uses a property storage
, which is an array, to store values inside keys containing an object with a v
property.
Essentially, when setKeyAsUndefined()
function is called it sets the key to a default value to void 0
:
{ "v" : void 0}
Then, when getKey()
function is called, it returns the v
value from the storage
array based on the index value derived from the key
argument. Likewise, when the setKey()
function is called it sets the v
value.
Note: When the
getKey()
function is called on akey
that results on an index higher than the length ofstorage
then an error occurs. Akey
cannot be accessed if it
hasn't been set tovoid 0
first using thesetKeyAsUndefined()
function first.
The last method, clone()
, clones the vmMemory instance by using the Array.prototype.slice()
method. I'm still unsure why they had to use this line
w.storage = [].slice !== [].slice ? this.storage.slice(0) : this.storage.slice(0);
When a simple line like this would suffice:
w.storage = this.storage.slice(0);
7. tryCatchSomething() and 8. errorHandling()
function tryCatchSomething(opsFunction, _vmContext) {
try {
opsFunction(_vmContext);
} catch (w) {
errorHandling(w, _vmContext);
}
}
function errorHandling(error, _vmContext) {
var s = _vmContext.errors.pop();
for (var q = 0; q < s.totalErrors; ++q) {
_vmContext.errorTracker.pop();
}
_vmContext.errorTracker.push({
wasExceptionHandled: true,
_errorOryIndex: error
});
_vmContext.yIndex = s._yIndex;
_vmContext.xIndex = s._xIndex;
}
These two functions are used to handle the TryStatements
that ShapeSecurity's VM has. They will require their own detailed part and will be explained after part 2 and 3 has been published since it is pointless to try to explain what they do know without having the ops fully explained.
Conclusion
We still haven't talked about the VM Data
part, but we will on part 2, as I do not wish to make these parts extremely long. There are still at least 3-4 more parts to go, so stay tuned until next time.