Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how are vectors, matrices, and data frames implemented in R?

I've been trying to learn about the different data structures used in popular languages that I have experience with such as lists and dictionaries in Python, associative arrays in PHP (essentially hash tables), vectors in C++, etc.

I have a lot of colleagues that use R religiously and I was wondering how vectors, matrices, and data frames are implemented in R. What are their strengths and weaknesses? I was looking through source code but I couldn't find the data structures themselves. Where in the source code are these definitions located?

like image 215
grasingerm Avatar asked Dec 18 '12 21:12

grasingerm


People also ask

How do you implement a matrix in R?

To create a matrix in R you need to use the function called matrix(). The arguments to this matrix() are the set of elements in the vector. You have to pass how many numbers of rows and how many numbers of columns you want to have in your matrix. Note: By default, matrices are in column-wise order.

What is matrix and data frame in R?

In a data frame the columns contain different types of data, but in a matrix all the elements are the same type of data. A matrix in R is like a mathematical matrix, containing all the same type of thing (usually numbers). R often but not always lets these be used interchangably.

What is a vector and matrix in R?

Vectors in R are the same as the arrays in C language which are used to hold multiple data values of the same type. Vectors can also be used to create matrices. Matrices can be created with the help of Vectors by using pre-defined functions in R Programming Language.

How do you create a data frame in RStudio?

How to Create a Data Frame. We can create a dataframe in R by passing the variable a,b,c,d into the data. frame() function. We can R create dataframe and name the columns with name() and simply specify the name of the variables.


2 Answers

As already mentioned, check out the "R internals" manual, as well as this part of "Writing R extensions".

like image 158
Theodore Lytras Avatar answered Sep 27 '22 16:09

Theodore Lytras


From R Internals, 1.1 SEXPs:

... the basic building blocks of R objects are often called nodes... Both types of node structure have as their first three fields a 32-bit spxinfo header and then three pointers (to the attributes and the previous and next node in a doubly-linked list)

So vectors in R are implemented as a doubly-linked list. And, it even appears that there is no data structure smaller than a single-node linked list. This is evident by:

> a <- 4
> a[1]
4

As mentioned by others: builtin.c has do_makevector and do_makelist, and array.c has the source for do_matrix. In addition array.c contains source for allocMatrix and memory.c contains the source for allocVector.

While a lot of things that were going on were over my head, it seems evident that a matrix is simply a doubly-linked list of doubly-linked lists. I believe (though am unsure) that row and column names (like those stored in a data frame) are stored in the 'attributes' of each list.

The response to the "what the strengths and weaknesses" of the implementation of the data structures would be that (from my limited knowledge) doubly linked lists have a strength in that dynamic memory allocation is simpler and doesn't require the overhead of copying and reallocating an entire array, and the weakness being that (depending on how many pointers there are to the list: head, tail, middle, quarters, etc.) accessing a random value v[99] can take the overhead of iterating through several elements before the desired one is found.

Is this correct?

like image 28
grasingerm Avatar answered Sep 27 '22 18:09

grasingerm