Given a list variable, I'd like to have a data frame of the positions of each element. For a simple non-nested list, it seems quite straightforward.
For example, here's a list of character vectors.
l <- replicate(
10,
sample(letters, rpois(1, 2), replace = TRUE),
simplify = FALSE
)
l
looks like this:
[[1]]
[1] "m"
[[2]]
[1] "o" "r"
[[3]]
[1] "g" "m"
# etc.
To get the data frame of positions, I can use:
d <- data.frame(
value = unlist(l),
i = rep(seq_len(length(l)), lengths(l)),
j = rapply(l, seq_along, how = "unlist"),
stringsAsFactors = FALSE
)
head(d)
## value i j
## 1 m 1 1
## 2 o 2 1
## 3 r 2 2
## 4 g 3 1
## 5 m 3 2
## 6 w 4 1
Given a trickier nested list, for example:
l2 <- list(
"a",
list("b", list("c", c("d", "a", "e"))),
character(),
c("e", "b"),
list("e"),
list(list(list("f")))
)
this doesn't easily generalize.
The output I expect for this example is:
data.frame(
value = c("a", "b", "c", "d", "a", "e", "e", "b", "e", "f"),
i1 = c(1, 2, 2, 2, 2, 2, 4, 4, 5, 6),
i2 = c(1, 1, 2, 2, 2, 2, 1, 2, 1, 1),
i3 = c(NA, 1, 1, 2, 2, 2, NA, NA, 1, 1),
i4 = c(NA, NA, 1, 1, 2, 3, NA, NA, NA, 1),
i5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 1)
)
How do I get a data frame of positions for a nested list?
To find an element in the list, use the Python list index() method, The index() is an inbuilt Python method that searches for an item in the list and returns its index. The index() method finds the given element in the list and returns its position.
The indexOf() method of ArrayList returns the index of the first occurrence of the specified element in this list, or -1 if this list does not contain the element. Syntax : public int IndexOf(Object o) obj : The element to search for.
Here's an approach that yields a slightly different output than you showed, but it'll be useful further down the road.
f <- function(l) {
names(l) <- seq_along(l)
lapply(l, function(x) {
x <- setNames(x, seq_along(x))
if(is.list(x)) f(x) else x
})
}
Function f
simply iterates (recursively) through all levels of the given list and names it's elements 1,2,...,n
where n
is the length of the (sub)list. Then, we can make use of the fact that unlist
has a use.names
argument that is TRUE
by default and has effect when used on a named list (that's why we have to use f
to name the list first).
For the nested list l2
it returns:
unlist(f(l2))
# 1.1 2.1.1 2.2.1.1 2.2.2.1 2.2.2.2 2.2.2.3 4.1 4.2 5.1.1 6.1.1.1.1
# "a" "b" "c" "d" "a" "e" "e" "b" "e" "f"
Now, in order to return a data.frame
as asked for in the question, I'd do this:
g <- function(l) {
vec <- unlist(f(l))
n <- max(lengths(strsplit(names(vec), ".", fixed=TRUE)))
require(tidyr)
data.frame(
value = unname(vec),
i = names(vec)
) %>%
separate(i, paste0("i", 1:n), sep = "\\.", fill = "right", convert = TRUE)
}
And apply it like this:
g(l2)
# value i1 i2 i3 i4 i5
#1 a 1 1 NA NA NA
#2 b 2 1 1 NA NA
#3 c 2 2 1 1 NA
#4 d 2 2 2 1 NA
#5 a 2 2 2 2 NA
#6 e 2 2 2 3 NA
#7 e 4 1 NA NA NA
#8 b 4 2 NA NA NA
#9 e 5 1 1 NA NA
#10 f 6 1 1 1 1
An improved version of g
, contributed by @AnandaMahto (thanks!), would use data.table
:
g <- function(inlist) {
require(data.table)
temp <- unlist(f(inlist))
setDT(tstrsplit(names(temp), ".", fixed = TRUE))[, value := unname(temp)][]
}
Edit (credits go to @TylerRinkler - thanks!)
This has the beneft of easily being converted to a data.tree object which can then be converted to many other data types. With a slight mod to g
:
g <- function(l) {
vec <- unlist(f(l))
n <- max(lengths(strsplit(names(vec), ".", fixed=TRUE)))
require(tidyr)
data.frame(
i = names(vec),
value = unname(vec)
) %>%
separate(i, paste0("i", 1:n), sep = "\\.", fill = "right", convert = TRUE)
}
library(data.tree)
x <- data.frame(top=".", g(l2))
x$pathString <- apply(x, 1, function(x) paste(trimws(na.omit(x)), collapse="/"))
mytree <- data.tree::as.Node(x)
mytree
# levelName
#1 .
#2 ¦--1
#3 ¦ °--1
#4 ¦ °--a
#5 ¦--2
#6 ¦ ¦--1
#7 ¦ ¦ °--1
#8 ¦ ¦ °--b
#9 ¦ °--2
#10 ¦ ¦--1
#11 ¦ ¦ °--1
#12 ¦ ¦ °--c
#13 ¦ °--2
#14 ¦ ¦--1
#15 ¦ ¦ °--d
#16 ¦ ¦--2
#17 ¦ ¦ °--a
#18 ¦ °--3
#19 ¦ °--e
#20 ¦--4
#21 ¦ ¦--1
#22 ¦ ¦ °--e
#23 ¦ °--2
#24 ¦ °--b
#25 ¦--5
#26 ¦ °--1
#27 ¦ °--1
#28 ¦ °--e
#29 °--6
#30 °--1
#31 °--1
#32 °--1
#33 °--1
#34 °--f
And to produce a nice plot:
plot(mytree)
Other forms of presenting the data:
as.list(mytree)
ToDataFrameTypeCol(mytree)
More on converting data.tree types:
https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html#tree-conversion http://www.r-bloggers.com/how-to-convert-an-r-data-tree-to-json/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With