Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is 'delete' slow in javascript?

I just stumbled upon this jsperf result: http://jsperf.com/delet-is-slow

It shows that using delete is slow in javascript but I am not sure I get why. What is the javascript engine doing behind the scene to make things slow?

like image 407
Spearfisher Avatar asked Dec 11 '22 02:12

Spearfisher


2 Answers

I think the question is not why delete is slow... The speed of a simple delete operation is not worth measuring...

The JS perf link that you show does the following:

  • Create two arrays of 6 elements each.
  • Delete at one of the indexes of one array.
  • Iterate through all the indexes of each array.

The script shows that iterating through an array o which delete was applied is slower than iterating though a normal array.

You should ask yourself, why delete makes an array slow?

The engine internally stores array elements in contiguous memory space, and access them using an numeric indexer.

That's what they call a fast access array.

If you delete one of the elements in this ordered and contiguous index, you force the array to mutate into dictionary mode... thus, what before was the exact location of the item in the array (the indexer) becomes the key in the dictionary under which the array has to search for the element.

So iterating becomes slow, because don't move into the next space in memory anymore, but you perform over and over again a hash search.

like image 192
Adrian Salazar Avatar answered Dec 22 '22 00:12

Adrian Salazar


You'll get a lot of answers here about micro-optimisation but delete really does sometimes have supreme problems where it becomes incredibly slow in certain scenarios that people must be aware of in JS. These are to my knowledge edge cases and may or may not apply to you.

I recommend to profile and benchmark in different browsers to detect these anomalies.

I honestly don't know the reasons as I tend to workaround it this but I would guess combinations of quirks in the GC (it is might be getting invoked too often), brutal rehashing, optimisations for other cases and weird object structure/bad time complexity.

The cases usually involve moderate to large numbers of keys, for example:

Delete from objects with many keys:

(function() {
    var o={},s,i,c=console;
    s=new Date();for(i=0;i<1000000;i+=10)o[i]=i;c.log('Set: '+(new Date()-s));
    s=new Date();for(i=0;i<50000;i+=10)delete(o[i]);c.log('Delete: '+(new Date()-s));})();

Chrome: Set: 21 Delete: 2084

Firefox: Set: 74 Delete: 2

I have encountered a few variations of this and they are not always easy to reproduce. A signature is that it usually seems to degrade exponentially. In one case in Firefox delete inside a for in loop would degrade to around 3-6 operations a second where as deleting when iterating Object.keys would be fine.

I personally tend to think that these cases can be considered bugs. You get massive asymptotic and disproportionate performance degradation that you can work around in ways that shouldn't change the time or space complexity or that might even normally make performance moderately worse. This means that when considered as a declarative language JS gets the implementation/optimisations wrong. Map does not have the same problem with delete so far that I have seen.

Reasons:

To be sure, you would have to look into the source code or run some profiling.

delete in various scenarios can change speed arbitrarily based on how engines are written and this can change from version to version.

JavaScript objects tend to not be used with large amounts of properties and delete is called relatively infrequently in every day usage. They're also used as part of the language heavily (they're actually just associative arrays). Virtually everything relies on an implementation of an object. IF you create a function, that makes an object, if you put in a numeric literal it's actually an object.

It's quite possible for it to be slow purely because it hasn't been well optimised (either neglect or other priorities) or due to mistakes in implementation.

There are some common possible causes aside from optimisation deficit and mistakes.

Garbage Collection:

A poorly implemented delete function may inadvertently trigger garbage collection excessively.

Garbage collection has to iterate everything in memory to find out of there are any orphans, traversing variables and references as a graph.

The need to detect circular references can make GC especially expensive. GC without circular reference detection can be done using reference counters and a simple check. Circular reference checking requires traversing the reference graph or another structure for managing references and in either case that's a lot slower than using reference counters.

In normal implementations, rather than freeing and rearranging memory every time something is deleted, things are usually marked as possible to delete with GC deferred to perform in bulk or batches intermittently.

Mistakes can potentially lead to the entire GC being triggered too frequently. That may involve someone having put an instruction to run the GC every delete or a mistake in its statistical heuristics.

Resizing:

It's quite possible for large objects as well the memory remapping to shrink them might not be well optimised. When you have dynamically sized structures internally in the engine it can be expensive to resize them. Engines will also have varying levels of their own memory management on top of that of that provides by the operating system which will significantly complicate things.

Where an engine manages its own memory efficiently, even a delete that deletes an object efficiently without the need for a full GC run (has no circular references) this can trigger the need to rearrange memory internally to fill the gap.

That may involve reallocating data of the new size, then copying everything from the old memory to the new memory before freeing the old memory. It can also require updating all of the pointers to the old memory in some cases (IE, where a pointer pointer [all roads lead to Rome] wasn't used).

Rehashing:

It may also rehash (needs as the object is an associative array) on deletes. Often people only rehash on demand, when there are hash collisions but some engines may also rehash on deletes.

Rehashing on deletes prevents a problem where you can add 10 items to an object, then add a million objects, then remove those million objects and the object left with the 10 items will both take up more memory and be slower.

A hash with ten items needs ten slots with an optimum hash, though it'll actually require 16 slots. That times the size of a pointer will be 16 * 8 bytes or 128bytes. When you add the million items then it needs a million slots, 20 bit keys or 8 megabytes. If you delete the million keys without rehashing it then the object you have with ten items is taking up 8 megabyte when it only needs 128 bytes. That makes it important to rehash on item removal.

The problem with this is that when you add you know if you need to rehash or not because there will be a collision. When deleting, you don't know if you need to rehash or not. It's easy to make a performance mistake with that and rehash every time.

There are a number of strategies for reasonable downhashing intervals although without making things complicated it can be easy to make a mistake. Simple approaches tend to work moderately well (IE, time since last rehash, size of key versus minimum size, history of collision pairs, tombstone keys and delete in bulk as part of GC, etc) on average but can also easily get stuck in corner cases. Some engines might switch to a different hash implementation for large objects such as nested where as others might try to implement one for all.

Rehashing tends to work the same as resizing for simply implementations, make an entirely new one then insert the old one into it. However for rehashing a lot more needs to be done beforehand.

No Bulk:

It doesn't let you delete a bunch of things at once from a hash. It's usually much more efficient to delete items from a hash in bulk (most operations on the same thing work better in bulk) but the delete keyword only allows you to delete one by one. That makes it slow by design for cases with multiple deletes on the same object.

Due to this, only a handful of implementation of delete would be comparable to creating a new object and inserting the items you want to keep for speed (though this doesn't explain why with some engines delete is slow in its own right for a single call).

V8:

Slow delete of object properties in JS in V8

Apparently this was caused due to switching out implementations and a problem similar to but different to the downsizing problem seen with hashes (downsizing to flat array / runthrough implementation rather than hash size). At least I was close.

Downsizing is a fairly common problem in programming which results in what is somewhat akin to a game of Jenga.

like image 43
jgmjgm Avatar answered Dec 21 '22 23:12

jgmjgm