Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are there so many data structures absent from high level languages?

Why is it that higher level languages (Javascript, PHP, etc.) don't offer data structures such as linked lists, queues, binary trees, etc. as part of their standard library? Is it for historical/practical/cultural reasons or is there something more fundamental that I'm missing.

like image 550
Olivier Lalonde Avatar asked Nov 27 '10 16:11

Olivier Lalonde


2 Answers

linked lists

You can implement a linked list fairly easily in most dynamic languages, but they aren't that useful. For most cases, dynamic arrays (which most dynamic languages have built-in support for) are a better fit: they have better usage and cache coherence, better performance on indexed lookup, and decent performance on insert and delete. At the application level, there aren't many use cases where you really need a linked list.

queues

Easily implemented using a dynamic array.

binary trees

Binary trees are easily implemented in most dynamic languages. Binary search trees are, as ever, a chore to implement, but rarely needed. Hashtables will give you about the same performance and often better for many use cases. Most dynamic languages provide a built-in hashtable (or dictionary, or map, or table, or whatever you want to call it).

It's definitely important to know these foundational data structures, and if you're writing low-level code, you'll find yourself using them. At the application level where most dynamic languages are used though, hashtables and dynamic arrays really do cover 95% of your data structure needs.

like image 70
munificent Avatar answered Oct 22 '22 06:10

munificent


There are many definitions of "high level" when it comes to languages, but I will take it in this context to refer to languages that are purpose-built for a specific domain (4GL anyone?). Such "specific domains" are typically restricted in scope, for example: web page construction, report writing, database querying, etc. Within that limited scope, there is frequently little need for anything but the most basic data structures.

Is There a Need?

Let's consider the case of Javascript. The scope of this language was originally very bounded, being a scripting language than ran within the confines of a web browser. It was concerned primarily with providing a small amount of dynamic behaviour on otherwise static web pages. Furthermore, the limitations of the technology made it impractical to write large components in that environment (notably performance and the sandbox model).

Since Javascript was confined to address "small problems", there was little need for a rich set of data structures. As data structures go, the Javascript's map is very flexible. You must remember that Basic and FORTRAN went a long way providing only arrays -- maps are considerably more flexible than that. Javascript appears to be undergoing a transformation, escaping the sandbox. Some very ambitious systems are being built in Javascript, both within and outside of the browser. And the technology is advancing to keep up with it (witness the new Javascript engines, persistence models, and so on). I anticipate that the demand for more interesting data structures will increase, and that the demand will be met.

Library capabilities generally appear as needs arise. Many of the basic data structures are so easy to implement that it hardly seems worth adding them to a library -- especially if that library needs to go through some sort of standardization process. This is why so many languages (of all levels) do not provide them out of the box. But I think that there is another force at work that will change all that... the rise of multiprogramming.

A New Need Arising?

It wasn't too long ago that the code that most developers wrote ran within the confines of a single thread. But now, our systems are full of threads, web workers, agents, coroutines, clusters, clouds and all manner of concurrent systems. This changes the whole complexion of implementing data structures from scratch.

In a single-threaded context, it is trivial to implement a linked list in almost any language. But add concurrency to the mix and now it takes a great deal of effort to get it right. One really needs to be a specialist to stand a chance at all. That is why you see rich collection frameworks in all the latest languages. The need to share data structures across thread boundaries (or worse) is being fulfilled.

History... But Not The Future

So, to summarize, I think the reason why rich data structures are conspicuously absent from many languages is largely historical. The need was not great enough to justify the effort. But there are new forces at work, in the form of highly concurrent systems, that force language libraries to provide industrial strength implementations of the richer data structures.

like image 39
WReach Avatar answered Oct 22 '22 08:10

WReach