Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unlisting nested lists and plotting using ggplot

I have a monstrous nested list structure of my own making that looks like this:

str(CMaster)
List of 4
 $ :List of 6
  ..$ :List of 5
  .. ..$ :List of 15
  .. .. ..$ : num [1, 1:14] 0.144 0.2 0.256 0.352 0.446 ...
  .. .. ..$ : num [1, 1:47] 0.144 0.2 0.375 0.54 0.694 ...
etc
$ :List of 6
      ..$ :List of 1
      .. ..$ :List of 15
      .. .. ..$ : num [1, 1:14] 0.144 0.2 0.256 0.352 0.446 ...
      .. .. ..$ : num [1, 1:47] 0.144 0.2 0.375 0.54 0.694 ...

The structure is fixed but the last list of 15 could go up to 150K and I need to try to plot this structure. I'd like to try and plot boxplots categorised by the List of 4 variable for each of the List of 6 similar which condenses all of the numerical data for the List of 15 into this example. Do I need to unlist it all first? Is there an easier way to make a data.frame or data.table which preserves the names of the lists and makes them factors for plotting?

dfs <- lapply(CMaster, data.frame, stringsAsFactors = FALSE)

EDIT: I've added example code

Example code (that gets close to the real structure).

D<-list()
DNSIM<-list()
DTime<-list()
DMaster<-list()

  
for(CC in 1:4){
  for(t in 1:6){
    for(N in 1:5){
    for(i in 1:15){
      
      Dmatrix=runif(15)
      D[[i]]=Dmatrix
    }
    DTime[[t]]=D
    }
    DNSIM[[N]]=DTime
  }
  DMaster[[CC]]=DTime
 }

enter image description here

Dput

It's too big to copy and I my organisation won't allow a sharable link to onedrive. Any easy workaround?

EDIT2

tibble(lists = CMaster) %>% 
+   mutate(CleaningType = row_number()) %>% 
+   unnest_longer(lists, indices_to = "TimePoint") %>% 
+   unnest_longer(lists, indices_to = "Replicate") %>%
+   unnest_longer(lists, indices_to = "BehaviourObservation")
# A tibble: 1,800 x 5
   lists                 BehaviourObservation Replicate TimePoint CleaningType
   <list>                               <int>     <int>     <int>        <int>
 1 <dbl[,14] [1 × 14]>                      1         1         1            1
 2 <dbl[,47] [1 × 47]>                      2         1         1            1
 3 <dbl[,11] [1 × 11]>                      3         1         1            1
 4 <dbl[,40] [1 × 40]>                      4         1         1            1
 5 <dbl[,40] [1 × 40]>                      5         1         1            1
 6 <dbl[,34] [1 × 34]>                      6         1         1            1
 7 <dbl[,92] [1 × 92]>                      7         1         1            1
 8 <dbl[,31] [1 × 31]>                      8         1         1            1
 9 <dbl[,5] [1 × 5]>                        9         1         1            1
10 <dbl[,103] [1 × 103]>                   10         1         1            1
# … with 1,790 more rows

So I tried to add another sub-sub-list and now get an error of incompatible sizes. Any thoughts about this please?

tibble(lists = CMaster) %>% 
+   mutate(CleaningType = row_number()) %>% 
+   unnest_longer(lists, indices_to = "TimePoint") %>% 
+   unnest_longer(lists, indices_to = "Replicate") %>%
+   unnest_longer(lists, indices_to = "BehaviourObservation") %>%
+   unnest_longer(lists, indices_to = "sub_sub_observation") 

Error: Can't combine `..1$lists` <double[,14]> and `..2$lists` <double[,47]>.
✖ Incompatible sizes 14 and 47 along axis 2.
Run `rlang::last_error()` to see where the error occurred.
like image 666
HCAI Avatar asked Jul 17 '20 08:07

HCAI


2 Answers

If you don't mind using the tidyverse, find below some code to rectangle your data using tidyr::unnest_longer. See here for a nice tutorial on how to use unnest_longer (and in general how to turn nested lists into data.frames).

I'm not sure what's the difference between observation and sub_observation in the result, and if this plot is what you actually want.

This might be (too) slow on your large data-set.

library(tidyverse)

df <- tibble(lists = DMaster) %>% 
  mutate(facet = row_number()) %>% 
  unnest_longer(lists, indices_to = "boxplot") %>% 
  unnest_longer(lists, indices_to = "observation") %>%
  unnest_longer(lists, indices_to = "sub_observation")
  
df %>% 
  ggplot(aes(boxplot, lists, group = boxplot)) + 
  geom_boxplot() +
  facet_wrap(~ facet)

Which gives a data.frame with facet (1 to 4), boxplot (1 to 6), observation (1 to 15), sub_observation (1 to 15) and lists (your actual numeric values), and the following plot:

like image 120
Bas Avatar answered Oct 20 '22 20:10

Bas


For the sake of completeness, the melt() function from the reshape2 package has a method for lists which recursively melts each component.

library(magrittr) # piping used to improve readability
reshape2::melt(DMaster) %>% 
  head()
       value L3 L2 L1
1 0.20653283  1  1  1
2 0.96955498  1  1  1
3 0.07924116  1  1  1
4 0.98602539  1  1  1
5 0.72998492  1  1  1
6 0.16022710  1  1  1

Combined with ggplot()

library(ggplot2)
reshape2::melt(DMaster) %>% 
  ggplot(aes(x = L2, y = value, group = L2)) +
  geom_boxplot() +
  facet_wrap(~ L1)

we get enter image description here


reshape2::melt() has also a method for arrays. So, the issue with matrices as list elements reported by the OP is covered as well.

Here is a dummy example of a double nested list of matrices

rep(list(list(matrix(1:4, ncol = 2), matrix(11:19, ncol = 3))), 2) %T>% str() %>% 
  reshape2::melt()
List of 2
 $ :List of 2
  ..$ : int [1:2, 1:2] 1 2 3 4
  ..$ : int [1:3, 1:3] 11 12 13 14 15 16 17 18 19
 $ :List of 2
  ..$ : int [1:2, 1:2] 1 2 3 4
  ..$ : int [1:3, 1:3] 11 12 13 14 15 16 17 18 19
   Var1 Var2 value L2 L1
1     1    1     1  1  1
2     2    1     2  1  1
3     1    2     3  1  1
4     2    2     4  1  1
5     1    1    11  2  1
6     2    1    12  2  1
7     3    1    13  2  1
8     1    2    14  2  1
9     2    2    15  2  1
10    3    2    16  2  1
11    1    3    17  2  1
12    2    3    18  2  1
13    3    3    19  2  1
14    1    1     1  1  2
15    2    1     2  1  2
16    1    2     3  1  2
17    2    2     4  1  2
18    1    1    11  2  2
19    2    1    12  2  2
20    3    1    13  2  2
21    1    2    14  2  2
22    2    2    15  2  2
23    3    2    16  2  2
24    1    3    17  2  2
25    2    3    18  2  2
26    3    3    19  2  2
like image 35
Uwe Avatar answered Oct 20 '22 20:10

Uwe