how are vectors, matrices, and data frames implemented in R?

Tags:

I've been trying to learn about the different data structures used in popular languages that I have experience with such as lists and dictionaries in Python, associative arrays in PHP (essentially hash tables), vectors in C++, etc.

I have a lot of colleagues that use R religiously and I was wondering how vectors, matrices, and data frames are implemented in R. What are their strengths and weaknesses? I was looking through source code but I couldn't find the data structures themselves. Where in the source code are these definitions located?

215

asked Dec 18 '12 21:12

grasingerm

2 Answers

As already mentioned, check out the "R internals" manual, as well as this part of "Writing R extensions".

158

answered Sep 27 '22 16:09

Theodore Lytras

From R Internals, 1.1 SEXPs:

... the basic building blocks of R objects are often called nodes... Both types of node structure have as their first three fields a 32-bit spxinfo header and then three pointers (to the attributes and the previous and next node in a doubly-linked list)

So vectors in R are implemented as a doubly-linked list. And, it even appears that there is no data structure smaller than a single-node linked list. This is evident by:

> a <- 4
> a[1]
4

As mentioned by others: builtin.c has do_makevector and do_makelist, and array.c has the source for do_matrix. In addition array.c contains source for allocMatrix and memory.c contains the source for allocVector.

While a lot of things that were going on were over my head, it seems evident that a matrix is simply a doubly-linked list of doubly-linked lists. I believe (though am unsure) that row and column names (like those stored in a data frame) are stored in the 'attributes' of each list.

The response to the "what the strengths and weaknesses" of the implementation of the data structures would be that (from my limited knowledge) doubly linked lists have a strength in that dynamic memory allocation is simpler and doesn't require the overhead of copying and reallocating an entire array, and the weakness being that (depending on how many pointers there are to the list: head, tail, middle, quarters, etc.) accessing a random value v[99] can take the overhead of iterating through several elements before the desired one is found.

Is this correct?

answered Sep 27 '22 18:09

grasingerm

Related questions
                            
                                Assembly-level function fingerprint
                            
                                Minimum time a thread can pause in Linux
                            
                                Can we use va_arg with unions?
                            
                                Recreate dead threads after a fork
                            
                                Override a C function defined in a static library
                            
                                AOP in Objective-C: Inject context-aware code into each method while maintaining DRY
                            
                                MPI Spawn: root process does not communicate to child processes
                            
                                Applied duck-typing in plain C
                            
                                Difference b/w llvm-ld and llvm-link
                            
                                How does gcc/cygwin get the DNS server?
                            
                                XOPEN_SOURCE and signal handling
                            
                                Game boy Color programming in C [closed]
                            
                                Casting to int and floating point errors?
                            
                                C/C++ Framework for distributed computing in a dynamic cluster
                            
                                Generating call graph for C code [closed]
                            
                                Cygwin - Makefile-error: recipe for target `main.o' failed
                            
                                Source code browsing, comprehension and reading tools [closed]
                            
                                Combining source code into a single file for optimization
                            
                                Multiple/Division dilemma in equation
                            
                                Bridge USB host to device

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how are vectors, matrices, and data frames implemented in R?

Tags:

c

data-structures

r

grasingerm

People also ask

2 Answers

Theodore Lytras

grasingerm

Recent Activity

Donate For Us