Optimizations


NOTE: It is important to understand when to optimize. Only optimize when it makes a quantifiable difference. Avoid spending wasting time optimizing code that takes a whopping 30 seconds to execute but is only executed once every 30 days. Time is much better spent optimizing hot code paths that are executed at high frequencies and volumes.

  1. Hash maps are (relatively) slow. Use JavaScript classes to speed things up.

    DISCLAIMER: Notice I said relatively. Hash maps and hash tables are famous for their high efficiency and blazing fast lookup times. However, looking up a key-value pair in a hash map is relatively slow compared to jumping to a single place in memory or indexing an array. I'll explain more on that later.

    Consider this simple JavaScript object:

     var obj = {
         awesome : 'possum', 
         the-bomb: 'dot-com'
     };
    

    JavaScript stores an object's property-names and property-values (in this case, awesome: 'possum' and the-bomb: 'dot-com') inside of an internal hash map.

    The engine typically goes through several steps when storing these object properties:

    (1) JavaScript converts the property name into a String literal

    (2) JavaScript locates a hashing function within the scope chain

    (3) JavaScript puts the property name through the hashing function

    (4) The hashing function produces a special number

    (5) JavaScript then uses this number to index into an array and stores the property's value into that index

    (6) If there are already objects inside of the index, JavaScript will need to traverse the list of objects and append the value to the end (this process is called chaining)

    In order to retrieve a property, JavaScript would need to go through the same process.

    The following is a diagram of how a property and its value is stored into the internal hash map:
            awesome
               |
               |
               V
           'awesome'
               |                          [ .. ]
               |                   @----> [list]--['possum']
               V                   |      [list]--[value]--[value]
         [hash function] -----> [index]   [list]--[value]
                                          [list]--[value]--[value]--[value]
                                          [ .. ]
    

    This method of storing properties is actually very efficient, and unless this code lies along extremely hot code paths with tens of thousands of executions, it may not be worth it to optimize. However, if it does happen to reside along a hot code path, then there is a way to make this a bit more efficient.

    Here is one thing we do know, and that is array lookups are faster than hash map lookups.

    Consider this array object:
     var obj = [5, 'a', 'two'];
    

    Array lookup is extremely fast. To grab a value from an array, JavaScript needs to do a few things:

    (1) Jump to the memory address of the array

    (2) Add the index value to its current location in memory

    (3) Pluck out the value

    Although hash map lookup is fast, array look up is much faster in comparison.

    So we just need to store our properties in arrays? Not quite.

    Consider the consequences of executing the following code:
     obj[1000000] = 'six';
    

    Here, we are telling JavaScript to store a value at index 1,000,000. Well, now we would need to expand our array to hold 1,000,000 elements, and we would end up using about 8 MB worth of memory just to store 4 primitive values.

    The array would look something like:
      0   1     2                ...            1000000
     [5, 'a', 'two',    alot of empty space    , 'six']
    

    It turns out that JavaScript is smart enough to see this inefficiency and ends up converting the array from one giant, contiguous block of allocated memory space back into a hash map of key-value pairs.

    This is good because we can now access arbitrary indices of our array without all that wasted memory. But wait...remember that the whole point was to use the array for quicker lookup times, and we are now back to a hash map, despite our explicit efforts to create an array.

    JavaScript's efforts to save space have resulted in a reduction in performance from a simple array lookup to a hash map lookup. There is a clear trade-off between speed and space, and in this case, JavaScript considers space to be more important.

    Is there any way to have both speed and efficient storage? Can we have access to arbitrary properties without the overhead of a hash map?

    Here is where JavaScript classes come in.

    Consider this Person class:
     function Person (name, age) {
         this.name = name;
         this.age  = age;
     }
    
    Let us now create an instance of Person and access its properties:
     var bob = new Person('Bobby', 23);
    
     bob.name;   // 'Bobby'
     bob.age;    // 23
    
    Compare this with the following object literal:
     var sam = {
         name: 'Sammy',
         age : 21
     }
    
     sam.name;   // 'Sammy'
     sam.age;    // 21
    

    Believe it or not, accessing bob's properties is faster than accessing sam's properties. Why is this? The reason for this is that sam's properties are stored within an internal hash map, which is stored with sam, as we discussed earlier, but bob's properties are stored with bob directly in memory, similar to how an array's elements are stored. We know this because according to our code, when a Person object is instantiated, the name and age properties are always initialized, and are thus stored directly in memory with that object. JavaScript can then grab the object's properties by going directly to the memory location of the object instead of have to go through the whole hash map lookup process.

    But what happens when we add a new property to bob after instantiation?
     bob.gender = 'male';
    

    Because JavaScript has already reserved a specific place in memory to put a Person object with only two properties, name and age, there may not be room for the new gender property. JavaScript realizes that bob is no longer a traditional Person object, but rather, a variant of the Person class, and keeps track of these changes.

    So, why does this matter? Let me explain one more concept, and we will come back to this topic.



  2. When making multiple calls to a function, cluster similar types together, or create separate functions for each type. This allows the JavaScript engine to perform optimizations at runtime.

    Consider the following:
     var myArray  = [1, 2, 3, 4, 5];
     var myString = '12345';
    
     function getLength (x) {return x.length};
    
     getLength(myArray );
     getLength(myArray );
     getLength(myArray );    // [1] Optimization time!
     getLength(myArray );
     getLength(myArray );
     getLength(myString);    // [2] Oops...de-opt time :(
    

    We have an Array, a String, and a function getLength that calls length on whatever is passed in. We then call getLength over and over again.

    When we call getLength the first time, we pass in myArray. JavaScript senses that it is an Array object and calls the Array length function on myArray. But before it can do that, it needs to go retrieve the length function in memory. When it finds the function, JavaScript calls it on myArray and unsurprisingly receives the value of 5. We get to the second call to getLength, and JavaScript goes though the exact same process all over again, including having to go retrieve the length function in memory.

    By the time we reach [1], the JavaScript engine recognizes the pattern that we are only passing in Array objects. It then assumes that x will always be an Array and optimizes by keeping around the Array length function, rather than having to go fetch it in memory each time getLength is called.

    However, when we reach [2], JavaScript realizes that x is no longer an Array, so the engine deoptimizes by discarding the Array length function and repeats the original process by looking up the length function for x.

    JavaScript's optimization engine is really good at optimizing blocks of code involving large amounts of function calls with similar arguments types.

    There are 2 ways to help out JavaScript's optimization engine:

    First, you can cluster similar types together.

    Consider the following:
     // Arrays
     getLength(myArray );
     getLength(myArray );
     getLength(myArray );    // Opt
     getLength(myArray );
                .
                .
                .
     getLength(myArray );
    
     // Strings
     getLength(myString);    // De-opt
     getLength(myString);
     getLength(myString);    // Opt
     getLength(myString);
                .
                .
                .
     getLength(myString);
    
    Compare this with:
     // Interleave the types
     getLength(myArray );
     getLength(myString);
     getLength(myArray );
     getLength(myString);
     getLength(myArray );
     getLength(myString);
     getLength(myArray );
     getLength(myString);
     getLength(myArray );
     getLength(myString);
    

    Notice that in the code above, JavaScript never gets a chance to optimize.

    NOTE: Most engines are smart enough to recognize that you are passing in one of two values, and will then optimize for two values. However, you have still harmed performance to some degree by interleaving the types. The worst case scenario is when you are passing in a large variety of types in an arbitrary order, making it difficult for the engine to perform optimizations effectively.

    Second, you can create separate functions for different types.

    Consider the following:
     var myArray  = [1, 2, 3, 4, 5];
     var myString = '12345';
    
     function getArrayLength  (x) {return x.length};
     function getStringLength (x) {return x.length};
    
     getLength(myArray );
     getLength(myString);
     getLength(myArray );
     getLength(myString);
     getLength(myArray );    // Opt
     getLength(myString);    // Opt
     getLength(myArray );
     getLength(myString);
     getLength(myArray );
                .
                .
                .
     getLength(myString);
    
     // No de-opts!
    

    Remember, this is only worth it if you are calling these functions at a large frequency.

    Although the utilization of polymorphism in JavaScript is convenient, it is also expensive. If possible, send the same types into functions. In other words, call functions monomorphically.

  3. Now, back to bob.

    The only difference between bob and a normal Person object is that bob now has a new gender property. Because of this difference, and because JavaScript needs to access these objects in memory differently, JavaScript actually treats bob and a traditional Person object as two completely different types in terms of optimization. This means that sending in bob and sam to a function is as severe as sending an Array and a String to a function!

    This can be harmful to optimization. Keep this in mind when passing objects into functions. Just because they have the same underlying class does not mean JavaScript will treat them as the same type.

  4. Avoid boxing (or wrapping) primitives.

    Remember in one of the previous chapters, I mentioned that a primitive gets wrapped by an object wrapper when treated like an object (e.g. calling one of its methods or adding to it a property). This is also known as boxing.

    Also remember that objects are stored on the stack as a pointer to the actual object stored in the heap.

    Primitives, on the other hand, are stored in memory directly, so there is much faster access to them.

    (Writing in progress...)

results matching ""

    No results matching ""