Optimizations
NOTE: It is important to understand when to optimize. Only optimize when it makes a quantifiable difference. Avoid spending wasting time optimizing code that takes a whopping 30 seconds to execute but is only executed once every 30 days. Time is much better spent optimizing hot code paths that are executed at high frequencies and volumes.
Hash maps are (relatively) slow. Use JavaScript classes to speed things up.
DISCLAIMER: Notice I said relatively. Hash maps and hash tables are famous for their high efficiency and blazing fast lookup times. However, looking up a key-value pair in a hash map is relatively slow compared to jumping to a single place in memory or indexing an array. I'll explain more on that later.
Consider this simple JavaScript object:
var obj = { awesome : 'possum', the-bomb: 'dot-com' };
JavaScript stores an object's property-names and property-values (in this case,
awesome: 'possum'
andthe-bomb: 'dot-com'
) inside of an internal hash map.The engine typically goes through several steps when storing these object properties:
(1)
JavaScript converts the property name into a String literal(2)
JavaScript locates a hashing function within the scope chain(3)
JavaScript puts the property name through the hashing function(4)
The hashing function produces a special number(5)
JavaScript then uses this number to index into an array and stores the property's value into that index(6)
If there are already objects inside of the index, JavaScript will need to traverse the list of objects and append the value to the end (this process is called chaining)In order to retrieve a property, JavaScript would need to go through the same process.
The following is a diagram of how a property and its value is stored into the internal hash map:
awesome | | V 'awesome' | [ .. ] | @----> [list]--['possum'] V | [list]--[value]--[value] [hash function] -----> [index] [list]--[value] [list]--[value]--[value]--[value] [ .. ]
This method of storing properties is actually very efficient, and unless this code lies along extremely hot code paths with tens of thousands of executions, it may not be worth it to optimize. However, if it does happen to reside along a hot code path, then there is a way to make this a bit more efficient.
Here is one thing we do know, and that is array lookups are faster than hash map lookups.
Consider this array object:
var obj = [5, 'a', 'two'];
Array lookup is extremely fast. To grab a value from an array, JavaScript needs to do a few things:
(1)
Jump to the memory address of the array(2)
Add the index value to its current location in memory(3)
Pluck out the valueAlthough hash map lookup is fast, array look up is much faster in comparison.
So we just need to store our properties in arrays? Not quite.
Consider the consequences of executing the following code:
obj[1000000] = 'six';
Here, we are telling JavaScript to store a value at index 1,000,000. Well, now we would need to expand our array to hold 1,000,000 elements, and we would end up using about 8 MB worth of memory just to store 4 primitive values.
The array would look something like:
0 1 2 ... 1000000 [5, 'a', 'two', alot of empty space , 'six']
It turns out that JavaScript is smart enough to see this inefficiency and ends up converting the array from one giant, contiguous block of allocated memory space back into a hash map of key-value pairs.
This is good because we can now access arbitrary indices of our array without all that wasted memory. But wait...remember that the whole point was to use the array for quicker lookup times, and we are now back to a hash map, despite our explicit efforts to create an array.
JavaScript's efforts to save space have resulted in a reduction in performance from a simple array lookup to a hash map lookup. There is a clear trade-off between speed and space, and in this case, JavaScript considers space to be more important.
Is there any way to have both speed and efficient storage? Can we have access to arbitrary properties without the overhead of a hash map?
Here is where JavaScript classes come in.
Consider this
Person
class:function Person (name, age) { this.name = name; this.age = age; }
Let us now create an instance of
Person
and access its properties:var bob = new Person('Bobby', 23); bob.name; // 'Bobby' bob.age; // 23
Compare this with the following object literal:
var sam = { name: 'Sammy', age : 21 } sam.name; // 'Sammy' sam.age; // 21
Believe it or not, accessing
bob's
properties is faster than accessingsam's
properties. Why is this? The reason for this is thatsam's
properties are stored within an internal hash map, which is stored withsam
, as we discussed earlier, butbob's
properties are stored withbob
directly in memory, similar to how an array's elements are stored. We know this because according to our code, when aPerson
object is instantiated, thename
andage
properties are always initialized, and are thus stored directly in memory with that object. JavaScript can then grab the object's properties by going directly to the memory location of the object instead of have to go through the whole hash map lookup process.But what happens when we add a new property to
bob
after instantiation?bob.gender = 'male';
Because JavaScript has already reserved a specific place in memory to put a
Person
object with only two properties,name
andage
, there may not be room for the newgender
property. JavaScript realizes thatbob
is no longer a traditionalPerson
object, but rather, a variant of thePerson
class, and keeps track of these changes.So, why does this matter? Let me explain one more concept, and we will come back to this topic.
When making multiple calls to a function, cluster similar types together, or create separate functions for each type. This allows the JavaScript engine to perform optimizations at runtime.
Consider the following:
var myArray = [1, 2, 3, 4, 5]; var myString = '12345'; function getLength (x) {return x.length}; getLength(myArray ); getLength(myArray ); getLength(myArray ); // [1] Optimization time! getLength(myArray ); getLength(myArray ); getLength(myString); // [2] Oops...de-opt time :(
We have an
Array
, aString
, and a functiongetLength
that callslength
on whatever is passed in. We then callgetLength
over and over again.When we call
getLength
the first time, we pass inmyArray
. JavaScript senses that it is anArray
object and calls theArray
length
function onmyArray
. But before it can do that, it needs to go retrieve thelength
function in memory. When it finds the function, JavaScript calls it onmyArray
and unsurprisingly receives the value of5
. We get to the second call togetLength
, and JavaScript goes though the exact same process all over again, including having to go retrieve thelength
function in memory.By the time we reach
[1]
, the JavaScript engine recognizes the pattern that we are only passing inArray
objects. It then assumes thatx
will always be anArray
and optimizes by keeping around theArray
length
function, rather than having to go fetch it in memory each timegetLength
is called.However, when we reach
[2]
, JavaScript realizes thatx
is no longer anArray
, so the engine deoptimizes by discarding theArray
length
function and repeats the original process by looking up thelength
function forx
.JavaScript's optimization engine is really good at optimizing blocks of code involving large amounts of function calls with similar arguments types.
There are 2 ways to help out JavaScript's optimization engine:
First, you can cluster similar types together.
Consider the following:
// Arrays getLength(myArray ); getLength(myArray ); getLength(myArray ); // Opt getLength(myArray ); . . . getLength(myArray ); // Strings getLength(myString); // De-opt getLength(myString); getLength(myString); // Opt getLength(myString); . . . getLength(myString);
Compare this with:
// Interleave the types getLength(myArray ); getLength(myString); getLength(myArray ); getLength(myString); getLength(myArray ); getLength(myString); getLength(myArray ); getLength(myString); getLength(myArray ); getLength(myString);
Notice that in the code above, JavaScript never gets a chance to optimize.
NOTE: Most engines are smart enough to recognize that you are passing in one of two values, and will then optimize for two values. However, you have still harmed performance to some degree by interleaving the types. The worst case scenario is when you are passing in a large variety of types in an arbitrary order, making it difficult for the engine to perform optimizations effectively.
Second, you can create separate functions for different types.
Consider the following:
var myArray = [1, 2, 3, 4, 5]; var myString = '12345'; function getArrayLength (x) {return x.length}; function getStringLength (x) {return x.length}; getLength(myArray ); getLength(myString); getLength(myArray ); getLength(myString); getLength(myArray ); // Opt getLength(myString); // Opt getLength(myArray ); getLength(myString); getLength(myArray ); . . . getLength(myString); // No de-opts!
Remember, this is only worth it if you are calling these functions at a large frequency.
Although the utilization of polymorphism in JavaScript is convenient, it is also expensive. If possible, send the same types into functions. In other words, call functions monomorphically.
Now, back to
bob
.The only difference between
bob
and a normalPerson
object is thatbob
now has a newgender
property. Because of this difference, and because JavaScript needs to access these objects in memory differently, JavaScript actually treatsbob
and a traditionalPerson
object as two completely different types in terms of optimization. This means that sending inbob
andsam
to a function is as severe as sending anArray
and aString
to a function!This can be harmful to optimization. Keep this in mind when passing objects into functions. Just because they have the same underlying class does not mean JavaScript will treat them as the same type.
Avoid boxing (or wrapping) primitives.
Remember in one of the previous chapters, I mentioned that a primitive gets wrapped by an object wrapper when treated like an object (e.g. calling one of its methods or adding to it a property). This is also known as boxing.
Also remember that objects are stored on the stack as a pointer to the actual object stored in the heap.
Primitives, on the other hand, are stored in memory directly, so there is much faster access to them.
(Writing in progress...)