Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Racket, what is the advantage of lists over vectors?

In my experience with Racket so far, I've not given much thought to vectors, because I gathered that their main benefit — constant-time access to elements — was not significant until you're working with lot of elements.

This doesn't seem quite accurate, however. Even with a small number of elements, vectors have a performance advantage. For example, allocating a list is slower than allocating a vector:

#lang racket

(time (for ([i (in-range 1000000)]) (make-list 50 #t)))
(time (for ([i (in-range 1000000)]) (make-vector 50 #t)))

>cpu time: 1337 real time: 1346 gc time: 987
>cpu time: 123 real time: 124 gc time: 39

And retrieving an element is slower too:

#lang racket

(define l (range 50))
(define v (make-vector 50 0))

(time (for ([i (in-range 1000000)]) (list-ref l 49)))
(time (for ([i (in-range 1000000)]) (vector-ref v 49)))

>cpu time: 77 real time: 76 gc time: 0
>cpu time: 15 real time: 15 gc time: 0

BTW this performance ratio holds if we increase to 10 million:

#lang racket

(define l (range 50))
(define v (make-vector 50 0))

(time (for ([i (in-range 10000000)]) (list-ref l 49)))
(time (for ([i (in-range 10000000)]) (vector-ref v 49)))

>cpu time: 710 real time: 709 gc time: 0
>cpu time: 116 real time: 116 gc time: 0

Sure, these are synthetic examples, to the extent that most programs don't allocate structures or use list-ref a million times in a loop. (And yes, I'm deliberately grabbing the 50th element to illustrate the performance difference.)

But they're also not, because across a whole program that relies on lists, you're going to be incurring a little extra overhead every time you touch those lists, and all those little inefficiencies will add up into slower running time for the overall program.

Thus my question: why not just use vectors all the time? In what situations should we expect better performance from lists?

My best guess is that because it's just as fast to get an item off the front of a list, e.g.:

#lang racket

(define l (range 50))
(define v (make-vector 50 0))

(time (for ([i (in-range 1000000)]) (list-ref l 0)))
(time (for ([i (in-range 1000000)]) (vector-ref v 0)))

>cpu time: 15 real time: 16 gc time: 0
>cpu time: 12 real time: 11 gc time: 0

... that lists are preferred in recursion siutations, because you're mostly working with cons and car and cdr, and it saves space to work with a list (vectors cannot be broken and put back together without copying the whole vector, right?)

But in situations where you're storing and retrieving data elements, it seems that vectors have the advantage, no matter the length.

like image 597
Matthew Butterick Avatar asked Dec 20 '14 21:12

Matthew Butterick


People also ask

What are the advantages of list over vector in R?

Vector may have a default size. List does not have default size. In vector, each element only requires the space for itself only. In list, each element requires extra space for the node which holds the element, including pointers to the next and previous elements in the list.

What is the main difference between a list and a vector?

A list holds different data such as Numeric, Character, logical, etc. Vector stores elements of the same type or converts implicitly. Lists are recursive, whereas vector is not. The vector is one-dimensional, whereas the list is a multidimensional object.

Are lists immutable in racket?

In Racket, lists are immutable, which means that whenever you insert, delete an element from a list, an entire new list is constructed and returned and for large lists, this is very inefficient.

Which is better list or vector in C++?

For a small number of elements, the vector is comparatively much faster than the List as the searching and the insertion becomes very easy and the array which is implemented in the Vector can be traversed and accessed easily.


2 Answers

Since list-ref uses time linear to the index, it is rarely okay to use unless for short lists. If the access pattern is sequential however and the number of elements can vary, then lists are fine. It would be interesting to see a benchmark for summing the elements of a 50 element long list of fixnums.

The access pattern to a data structure is not always sequential though.

Here is how I choose which data structure to use in Racket:

DATA STRUCTURE   ACCESS       NUMBER     INDICES
List:            sequential   Variable   not used
Struct:          random       Fixed      names
Vector:          random       Fixed      integer
Growable vector: random       Variable   integer
Hash:            random       Variable   hashable
Splay:           random       Variable   non-integer, total order
like image 164
soegaard Avatar answered Sep 18 '22 14:09

soegaard


Vectors are the same as arrays in most programming languages. As any arrays they have a fixed size they have O(1) access/update. Increasing the size is expensive since you need to copy every element to a new vector of a larger size. If you do a loop across all elements you can do it O(n).

Lists are singly linked lists. They have dynamic size, but random access/update is O(n). Accessing/modifying the head of the list is O(1) so if you iterate from beginning to end or create from end to beginning. Since list iteration do that each step a whole iteration over n elements is still done O(n) as with vectors. Doing list-ref instead would make it O(n^2) so you don't.

The reason why you have both lists and vectors are because they both have strength and weaknesses. Lists are the heart of functional programming languages as they can be used as immutable objects. You chain one and one pair in each iteration and you end up with a list with size determined by the full procedure. Imaging this:

(define odds (filter odd? lst)) 

This takes a list of numbers of any size and creates a new list with all the odd numbers in the the list. In order to do this with vector you need to do two passes. One that checks what size the resulting vector should have and one that copies every odd element from the old to the new. However, if you need to have random access to any element at any time vectors (or hash tables if you're programming in #!racket) are the obvious choice.

like image 23
Sylwester Avatar answered Sep 18 '22 14:09

Sylwester