I have created a double nested structure for some data. How can I Access the data on the 2nd Level ( or for that matter the nth Level?)
library(gapminder)
library(purrr)
library(tidyr)
gapminder
nest_data <- gapminder %>% group_by(continent) %>% nest(.key = by_continent)
nest_2<-nest_data %>% mutate(by_continent = map(by_continent, ~.x %>% group_by(country) %>% nest(.key = by_country)))
How can I now get the data for China into a dataframe or tibble from nest_2?
I can get the data for all of Asia, but I'm unable to isolate China.
a<-nest_2[nest_2$continent=="Asia",]$by_continent ##Any better way of isolating Asia from nest_2?
I thought I could then do
b<-a[a$country=="China",]$by_country
But I get the following error
Error in a[a$country == "China", ] : incorrect number of dimensions
> glimpse(a)
List of 1
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 33 obs. of 2 variables:
..$ country : Factor w/ 142 levels "Afghanistan",..: 1 8 9 19 25 56 59 60 61 62 ...
..$ by_country:List of 33
So my big error was not recognizing that the product was a list, which could be remedied by adding [[1]] in the end. However, I very much liked the solution by @Floo0. I took the liberty of providing a function taking the names of the variables in case the sequence of columns are different from the one provided.
select_unnest <- function(df, listcol, var, var_val){ ###listcol, var and var_val must enclosed by ""
df[[listcol]][df[[var]]==var_val][[1]]
}
nest_2 %>% select_unnest(listcol = "by_continent", var = "continent", var_val = "Asia") %>%
select_unnest(listcol = "by_country", var = "country", var_val = "China")
You can access a nested list by negative indexing as well. Negative indexes count backward from the end of the list. So, L[-1] refers to the last item, L[-2] is the second-last, and so on.
The items of a list can be accessed using their index numbers. In R, the first character of a string, list, or vector has its index value or position as 1. For example, the first character “H” of the string “Hello” has the index value 1, the second character “e” has index value 2, and so on.
When you want to insert an item at a specific position in a nested list, use insert() method. You can merge one list into another by using extend() method. If you know the index of the item you want, you can use pop() method. It modifies the list and returns the removed item.
To extract only first element from a list, we can use sapply function and access the first element with double square brackets. For example, if we have a list called LIST that contains 5 elements each containing 20 elements then the first sub-element can be extracted by using the command sapply(LIST,"[[",1).
This is a pipe-able (%>%
) base R approach
select_unnest <- function(x, select_val){
x[[2]][x[[1]]==select_val][[1]]
}
nest_2 %>% select_unnest("Asia") %>% select_unnest("China")
Comparing the timings:
Unit: microseconds
min lq mean median uq max neval
aosmith1 3202.105 3354.0055 4045.9602 3612.126 4179.9610 17119.495 100
aosmith2 5797.744 6191.9380 7327.6619 6716.445 7662.6415 24245.779 100
Floo0 227.169 303.3280 414.3779 346.135 400.6735 4804.500 100
Ben Bolker 622.267 720.6015 852.9727 775.172 875.5985 1942.495 100
Code:
microbenchmark::microbenchmark(
{a<-nest_2[nest_2$continent=="Asia",]$by_continent
flatten_df(a) %>%
filter(country == "China") %>%
unnest},
{nest_2 %>%
filter(continent == "Asia") %>%
select(by_continent) %>%
unnest%>%
filter(country == "China") %>%
unnest},
{nest_2 %>% select_unnest("Asia") %>% select_unnest("China")},
{n1 <- nest_2$by_continent[nest_2$continent=="Asia"][[1]]
n2 <- n1 %>% filter(country=="China")
n2$by_country[[1]]}
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With