I have two vectors and I want to create a list in R where one vector are the keys and the other the values. I thought that I was going to find easily the answer in my books or googleing around and I was expecting to find a solution like when adding names to a vector ( names(v)<- names_vector), but I failed. I have come myself with two possible solutions but none of them seems elegant to me. R is not my main programming language but I assume that being R so pragmatic a better solution should exist (something like list(keys=x, values=y)). My solution 1: the classical loop solution: <pre class="prettyprint"><code> > xx <- 1:3 > yy <- letters1:3 > zz =list() >for(i in 1:length(yy)) {zz[[yy[i]]]<-xx[i]} </code></pre> my solution 2: indirect path through named vectors: <pre class="prettyprint"><code> > names(xx) <- letters[1:3] > as.list(xx) </code></pre> Seems that I have a solution, but my vectors have 1 million or more elements and I am worried not only about coding style (important to me) but also about efficiency (but I don't know how to do profiling in R). Is there a more appropriate way of doing this? Is it a good practice to use the named vector shortcut? [[UPDATE]] my applogies, probably I oversimplify the question to make it reproducible. I wanted to give names to the elements of a list. I tried names() first but seems that I did something wrong and did not work. So I got the wrong idea that names() does not work with lists. But they indeed do as shown by the accepted answer

If your values are all scalars, then there's nothing wrong with having a "key-value store" that's just a vector. <pre class="prettyprint"><code>vals <- 1:1000000 keys <- paste0("key", 1:1000000) names(vals) <- keys </code></pre> You can then retrieve the value corresponding to a given key with <pre class="prettyprint"><code>vals["key42"] [1] 42 </code></pre> IIRC R uses hashing for character-based indexing, so lookups should be fast regardless of the size of your vector. If your values can be arbitrary objects, then you do need a list. <pre class="prettyprint"><code>vals <- list(1:100, lm(speed ~ dist, data=cars), function(x) x^2) names(vals) <- c("numbers", "model", "function") sq <- vals[["function"]] sq(5) [1] 25 </code></pre> <s>If your question is about constructing the list, I wouldn't be too worried. R internally is copy-on-write (objects are only copied if their contents are modified), so doing something like</s> <pre class="prettyprint"><code>vals <- list(1:1000000, 1:1000000, <other big objects>) </code></pre> will not actually make extra copies of everything. Edit: I just checked, and R will copy everything if you do <code>lst <- list(....)</code>. Go figure. So if you're already close to the memory limit on your machine, this won't work. On the other hand, if you do <code>names(lst) <- ....</code>, it won't make another copy of <code>lst</code>. Go figure again.

Another serious option here , is to use <code>data.table</code>. Which use the key to sort your structure and it is very fast to access elements specially when you have a large numbers . Here an example: <pre class="prettyprint"><code>library(data.table) DT <- data.table(xx = 1:1e6, k = paste0("key", 1:1e6),key="k") </code></pre> Dt is a data.table with 2 columns , where I set the column k as a key. DT xx k 1: 1 key1 2: 10 key10 3: 100 key100 4: 1000 key1000 5: 10000 key10000 --- 999996: 999995 key999995 999997: 999996 key999996 999998: 999997 key999997 999999: 999998 key999998 1000000: 999999 key999999 Now I can access my data.table using the key like this: <pre class="prettyprint"><code>DT['key1000'] k xx 1: key1000 1000 </code></pre> Here a benchmarking comparing the data.table solution to a named vector: <pre class="prettyprint"><code>vals <- 1:1000000 DT <- data.table(xx = vals , k = paste0("key", vals),key="k") keys <- paste0("key", vals) names(vals) <- keys library(microbenchmark) microbenchmark( vals["key42"],DT["key42"],times=100) Unit: microseconds expr min lq median uq max neval vals["key42"] 111938.692 113207.4945 114924.010 130010.832 361077.210 100 DT["key42"] 768.753 797.0085 1055.661 1067.987 2058.985 100 </code></pre>

how to create a list in R from two vectors (one would be the keys, the other the values)?

Q: How do I create a vector list in R?

You can stick a vector (a restricted structure where all components have to be of the same type) into a list (unrestricted). But you cannot do the reverse. Use lists of lists of lists ... and then use lapply et al to extract.

Q: How do I make a list of values in R?

How to Create Lists in R? We can use the list() function to create a list. Another way to create a list is to use the c() function. The c() function coerces elements into the same type, so, if there is a list amongst the elements, then all elements are turned into components of a list.

Q: How do I combine vectors in R?

The concatenation of vectors can be done by using combination function c. For example, if we have three vectors x, y, z then the concatenation of these vectors can be done as c(x,y,z). Also, we can concatenate different types of vectors at the same time using the same same function.

Q: Is a list the same as a vector in R?

A list is actually still a vector in R, but it's not an atomic vector. We construct a list explicitly with list() but, like atomic vectors, most lists are created some other way in real life.

Tags:

r

I have two vectors and I want to create a list in R where one vector are the keys and the other the values. I thought that I was going to find easily the answer in my books or googleing around and I was expecting to find a solution like when adding names to a vector ( names(v)<- names_vector), but I failed.

I have come myself with two possible solutions but none of them seems elegant to me. R is not my main programming language but I assume that being R so pragmatic a better solution should exist (something like list(keys=x, values=y)).

My solution 1: the classical loop solution:

    > xx <- 1:3
    > yy <- letters1:3
    > zz =list()
    >for(i in 1:length(yy)) {zz[[yy[i]]]<-xx[i]}

my solution 2: indirect path through named vectors:

    > names(xx) <- letters[1:3]
    > as.list(xx)

Seems that I have a solution, but my vectors have 1 million or more elements and I am worried not only about coding style (important to me) but also about efficiency (but I don't know how to do profiling in R). Is there a more appropriate way of doing this? Is it a good practice to use the named vector shortcut?

[[UPDATE]] my applogies, probably I oversimplify the question to make it reproducible. I wanted to give names to the elements of a list. I tried names() first but seems that I did something wrong and did not work. So I got the wrong idea that names() does not work with lists. But they indeed do as shown by the accepted answer

989

asked Jul 06 '13 15:07

Pablo Marin-Garcia

3 Answers

If your values are all scalars, then there's nothing wrong with having a "key-value store" that's just a vector.

vals <- 1:1000000
keys <- paste0("key", 1:1000000)
names(vals) <- keys

You can then retrieve the value corresponding to a given key with

vals["key42"]
[1] 42

IIRC R uses hashing for character-based indexing, so lookups should be fast regardless of the size of your vector.

If your values can be arbitrary objects, then you do need a list.

vals <- list(1:100, lm(speed ~ dist, data=cars), function(x) x^2)
names(vals) <- c("numbers", "model", "function")

sq <- vals[["function"]]
sq(5)
[1] 25

~~If your question is about constructing the list, I wouldn't be too worried. R internally is copy-on-write (objects are only copied if their contents are modified), so doing something like~~

vals <- list(1:1000000, 1:1000000, <other big objects>)

will not actually make extra copies of everything.

Edit: I just checked, and R will copy everything if you do lst <- list(....). Go figure. So if you're already close to the memory limit on your machine, this won't work. On the other hand, if you do names(lst) <- ...., it won't make another copy of lst. Go figure again.

102

answered Oct 21 '22 08:10

Hong Ooi

It can be done in one statement using setNames:

xx <- 1:3
yy <- letters[1:3]

To create a named list:

as.list(setNames(xx, yy))
# $a
# [1] 1
# 
# $b
# [1] 2
# 
# $c
# [1] 3

Or a named vector:

setNames(xx, yy)
# a b c 
# 1 2 3

In the case of the list, this is programmatically equivalent to your "named vector" approach but maybe a little more elegant.

Here are some benchmarks that show the two approaches are just as fast. Also note that the order of operations is very important in avoiding an unnecessary and costly copy of the data:

f1 <- function(xx, yy) {
  names(xx) <- yy
  as.list(xx)
}

f2 <- function(xx, yy) {
  out <- as.list(xx)
  names(out) <- yy
  out
}

f3 <- function(xx, yy) as.list(setNames(xx, yy))
f4 <- function(xx, yy) setNames(as.list(xx), yy)

library(microbenchmark)
microbenchmark(
  f1(xx, yy),
  f2(xx, yy),
  f3(xx, yy),
  f4(xx, yy)
)
# Unit: microseconds
#        expr    min      lq  median      uq     max neval
#  f1(xx, yy) 41.207 42.6390 43.2885 45.7340 114.853   100
#  f2(xx, yy) 39.187 40.3525 41.5330 43.7435 107.130   100
#  f3(xx, yy) 39.280 41.2900 42.1450 43.8085 109.017   100
#  f4(xx, yy) 76.278 78.1340 79.1450 80.7525 180.825   100

answered Oct 21 '22 08:10

flodel

Another serious option here , is to use data.table. Which use the key to sort your structure and it is very fast to access elements specially when you have a large numbers . Here an example:

library(data.table)   
DT <- data.table(xx = 1:1e6, 
             k = paste0("key", 1:1e6),key="k")

Dt is a data.table with 2 columns , where I set the column k as a key. DT xx k 1: 1 key1 2: 10 key10 3: 100 key100 4: 1000 key1000 5: 10000 key10000 ---
999996: 999995 key999995 999997: 999996 key999996 999998: 999997 key999997 999999: 999998 key999998 1000000: 999999 key999999

Now I can access my data.table using the key like this:

DT['key1000']
         k   xx
1: key1000 1000

Here a benchmarking comparing the data.table solution to a named vector:

vals <- 1:1000000
DT <- data.table(xx = vals ,
                 k = paste0("key", vals),key="k")
keys <- paste0("key", vals)
names(vals) <- keys
library(microbenchmark)
microbenchmark( vals["key42"],DT["key42"],times=100)

Unit: microseconds
          expr        min          lq     median         uq        max neval
 vals["key42"] 111938.692 113207.4945 114924.010 130010.832 361077.210   100
   DT["key42"]    768.753    797.0085   1055.661   1067.987   2058.985   100

answered Oct 21 '22 08:10

agstudy

Related questions
                            
                                Is there an R function to escape a string for regex characters
                            
                                displaying a pdf from a local drive in shiny
                            
                                Convergence error for development version of lme4
                            
                                Is there something in testthat like expect_no_warnings()?
                            
                                Dplyr - Mean for multiple columns
                            
                                Unable to compile PDF in R Studio (LaTeX Error: File `lmodern.sty' not found.)
                            
                                Conditional assignment of one variable to the value of one of two other variables
                            
                                rstudio - is it possible to run a code in the background
                            
                                Favicon in Shiny
                            
                                Is there a way to have conditional markdown chunk execution in Rmarkdown?
                            
                                R programming - Adding extra column to existing matrix
                            
                                If file exists in folder read it else skip the processing part
                            
                                grab n letter words don't count apostrophes regex
                            
                                Changing Column Names in a List of Data Frames in R
                            
                                simplest python equivalent to R's grepl
                            
                                How to increase size of the points in ggplot2, similar to cex in base plots?
                            
                                Multiple ROC curves in one plot ROCR
                            
                                Get the min of two columns
                            
                                Changing tick intervals when x axis values are dates
                            
                                Index value for matrix in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to create a list in R from two vectors (one would be the keys, the other the values)?

Tags:

r

Pablo Marin-Garcia

People also ask

3 Answers

Hong Ooi

flodel

agstudy

Recent Activity

Donate For Us