Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert DataFrames to nested lists

Tags:

regex

r

Providing the following structure of a data.frame()

var1.gender var1.score.raw var1.score.raw.lower var1.score.raw.upper [...] var2.gender var2.score.raw var2.score.raw.lower var2.score.raw.upper [...]

How do i convert this to a multi-dimensional list, split by .?

Sample data:

df <- data.frame('var1.gender' = c(1,1,3,3), 'var1.score.raw' = c(12.3, 12.4, 14.5, 13.2), 'var1.score.raw.lower' = c(11,11,13,12), 'var1.score.raw.upper' = c(13,13,15,14), 'var2.gender' = c(1,1,3,3), 'var2.score.raw' = c(12.3, 12.4, 14.5, 13.2), 'var2.score.raw.lower' = c(11,11,13,12), 'var2.score.raw.upper' = c(13,13,15,14))

The resulting list should look something like this:

$var1
$var1$gender
[1] 1 1 3 3

$var1$score
$var1$score$raw
[1] 12.3 12.4 14.5 13.2

$var1$score$lower
[1] 11 11 13 12

$var1$score$upper
[1] 13 13 15 14



$var2
$var2$gender
[1] 1 1 3 3

$var2$score
$var2$score$raw
[1] 12.3 12.4 14.5 13.2

$var2$score$lower
[1] 11 11 13 12

$var2$score$upper
[1] 13 13 15 14
like image 234
Comfort Eagle Avatar asked Oct 30 '22 08:10

Comfort Eagle


1 Answers

By the way "df" is structured one straightforward approach to building the wanted list is to evaluate a call like list[["X"]][["Y"]][["Z"]][...] = df$X.Y.Z... for each column of "df". This can be done dynamically by manipulating "language" objects.

Defining a function that accepts a list, a character vector of names/indices and a value to assign at that level, we have:

assign_list_element = function(x, inds, val)
{
    cl = bquote(x[[.(inds[1])]])
    for(s in inds[-1]) cl = bquote(.(cl)[[.(s)]])

    cl = call("<-", cl, bquote(.(val))) 
    print(cl); flush.console() 

    eval(cl)  

    return(x)
}

Some bquote calls could be made simpler or replaced with substitute, but, using it as above constructs a better formatted call regarding the indices (for printing purposes).

And, then, for each column of "df", re-structure a -at start empty- list:

nms = strsplit(names(df), ".", TRUE)
l = list()
for(i in seq_along(nms)) l = assign_list_element(l, nms[[i]], df[[i]])
#x[["var1"]][["gender"]] <- c(1, 1, 3, 3)
#x[["var1"]][["score"]][["raw"]] <- c(12.3, 12.4, 14.5, 13.2)
#x[["var1"]][["score"]][["lower"]] <- c(11, 11, 13, 12)
#x[["var1"]][["score"]][["upper"]] <- c(13, 13, 15, 14)
#x[["var2"]][["gender"]] <- c(1, 1, 3, 3)
#x[["var2"]][["score"]][["raw"]] <- c(12.3, 12.4, 14.5, 13.2)
#x[["var2"]][["score"]][["lower"]] <- c(11, 11, 13, 12)
#x[["var2"]][["score"]][["upper"]] <- c(13, 13, 15, 14)

str(l)
#List of 2
# $ var1:List of 2
#  ..$ gender: num [1:4] 1 1 3 3
#  ..$ score :List of 3
#  .. ..$ raw  : num [1:4] 12.3 12.4 14.5 13.2
#  .. ..$ lower: num [1:4] 11 11 13 12
#  .. ..$ upper: num [1:4] 13 13 15 14
# $ var2:List of 2
#  ..$ gender: num [1:4] 1 1 3 3
#  ..$ score :List of 3
#  .. ..$ raw  : num [1:4] 12.3 12.4 14.5 13.2
#  .. ..$ lower: num [1:4] 11 11 13 12
#  .. ..$ upper: num [1:4] 13 13 15 14

Using this approach, the list is re-structured at every iteration, though its elements are not copied.

like image 141
alexis_laz Avatar answered Nov 15 '22 10:11

alexis_laz