Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between a list and a pairlist in R?

Tags:

list

r

In reading the documentation for lists, I found references to pairlists, but it wasn't clear to me how they were different from lists.

like image 401
Christopher Bottoms Avatar asked Apr 02 '15 15:04

Christopher Bottoms


2 Answers

Pairlists in day to day R

There are two places that pair lists will show up commonly in day to day R. One is as function formals:

str(formals(var))

The other is as language objects. For example:

quote(1 + 1)

produces a pairlist of type language (LANGSXP internally). The principal reason why you would even care about being aware of this is that operations such as length(<language object>) or language_object[[x]] can be slow because of how pairlist are stored internally (though long pairlist language objects are somewhat rare; note expressions are not pairlists).

Note that empty elements are just zero length symbols, and you can actually store them in lists if you cheat a bit (though you probably shouldn't do this):

list(x=substitute(x, alist(x=)))  # hack alert

All that said, for the most part, OP is correct that you don't need to worry about pairlists too much unless you are writing C code for use in R.

Internal differences between lists and pairlists

Pairlists and list are different principally in their storage structure. Pairlists are stored as a chain of nodes, where each node points to the location of the next node in addition to the node's contents and the node's "name" (See CAR/CDR wiki article for generic discussion). Among other things this means you can't know how many elements there are in a pairlist unless you know what element is the first one, and you then traverse the entire list.

Pairlists are used extensively in the R internals, and do exist in normal R use, but most of the time are disguised by the print or access methods and/or coerced to lists when accessed.

Lists are also a list of addresses, but unlike pairlists, all the addresses are stored in one contiguous memory location and the total length is tracked. This makes it easy to access any arbitrary member of the list by location since you can just look up the address in the memory table. With a pairlist, you would have to jump from node to node until you eventually got to the desired node. Names are also stored as attributes of the list proper, instead of being attached to each node of a pairlist.

Benefits of pairlists

One (generally small) benefit of pairlists is that you can add to them with minimal overhead since you only need modify at most two nodes (the node ahead of the new node, and the new node itself), whereas with a list you may need to re-allocate the entire address table with an increase in size (this is typically not much of an issue since the address table is usually very small compared to the size of the data the table points to). There are also many algorithms that specialize in pairlist manipulation (e.g. sorting, indexing, etc.), but those can be ported to normal lists as well.

Less relevant for day-to-day use since you can only do this in internals, it is very easy to modify list from a programming perspective by changing what any arbitrary element points to.

Loosely related to the above, pairlists are likely be more efficient when you have highly nested objects. lists can easily replicate this structure, but each list and nested list will be saddled with the extra memory address table. This is likely the reason pairlists are used for language objects that very likely have a high nesting / element ratio.

For more details see R Internals (look for LISTSXP and VECSXP, pairlists and lists respectively, in the linked location).

edit: interestingly an experiment to compare the memory footprint of a list to a pairlist shows the pairlist to be larger, so the storage efficiency argument may be incorrect (not sure if object.size can be trusted here):

> plist_to_list <- function(x) {
+   if(is.call(x)) x <- as.list(x)
+   if(length(x) > 1) for(i in 2:length(x)) x[[i]] <- Recall(x[[i]])
+   x
+ }
> add_quote <- function(x, y) call("+", x, y)
> x <- Reduce(add_quote, lapply(letters, as.name))
> object.size(x)
7056 bytes
> y <- plist_to_list(x)
> object.size(y)
4656 bytes
like image 180
BrodieG Avatar answered Nov 03 '22 00:11

BrodieG


First of all, pairlists are deprecated

pairlists are deprecated for normal use because "generic vectors" are typically more efficient. You won't ever need to worry about them unless you are working on R internals.


lists can contain named elements

Each element in a list in R can have a name. You can access each element in a list either by name or by its numerical index.

Here is an example of a list in which the second element is named 'second':

> my.list <- list('A',second='B','C')
> my.list
[[1]]
[1] "A"

$second
[1] "B"

[[3]]
[1] "C"

All elements can be indexed by its position in the list. Named elements can additionally be accessed by name:

> my.list[[2]]
[1] "B"
> my.list$second
[1] "B"

Also, each element in a list is a vector, even if it is only a vector containing a single element. For more about lists, see How to Correctly Use Lists in R?.


pairlists can contain empty named elements

A pairlist is basically the same as a list, except that a pairlist can contain an empty named element, but a list cannot. Also, a pairlist is constructed using the alist function.

> list('A',second=,'C')
Error in as.pairlist(list(...)) : argument is missing, with no default
> alist('A',second=,'C')
[[1]]
[1] "A"

$second


[[3]]
[1] "C"

But, as mentioned earlier, they are deprecated. They do not have any benefit or advantage over lists that I know of.

like image 32
Christopher Bottoms Avatar answered Nov 02 '22 22:11

Christopher Bottoms