Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Add a column in nested data sets

As a part of a more complex procedure, I found myself lost in this passage. Below, a reproducible example of what I am dealing with. I need to add a column to each nested dataset with the same number within but a different number between them. Specifically, the number has to be what is written in c1$Age. The code cbind(k, AgeGroup = 3) is only for demonstration. In fact when I used cbind(k, AgeGroup = Age), R gives me the following error Error in mutate_impl(.data, dots): Evaluation error: arguments imply differing number of rows: 5, 2.

library(dplyr)
library(purrr)
library(magrittr)
library(tidyr)

c <- read.table(header = TRUE, text = "Age Verbal  Fluid  Speed 
2     89     94    103    
1     98     88    100    
1    127    115    102    
2     83    101     71    
2    102     92     87   
1     91     97    120   
1     96    129     98   
2     79     92     84    
2    107     95    102")

c1 <- c %>% 
  group_by(Age) %>% 
  nest() %>% 
  dplyr::mutate(db = data %>% map(function(k) cbind(k, AgeGroup = 3)))

#> c1
# A tibble: 2 x 3
#    Age data             db                  
#  <int> <list>           <list>              
#1     2 <tibble [5 x 3]> <data.frame [5 x 4]>
#2     1 <tibble [4 x 3]> <data.frame [4 x 4]>

This is what I have now:

#> c1$db
#[[1]]
#  Verbal Fluid Speed AgeGroup
#1     89    94   103        3
#2     83   101    71        3
#3    102    92    87        3
#4     79    92    84        3
#5    107    95   102        3
#
#[[2]]
#  Verbal Fluid Speed AgeGroup
#1     98    88   100        3
#2    127   115   102        3
#3     91    97   120        3
#4     96   129    98        3

This is what I would like to get.

#> c1$db
#[[1]]
#  Verbal Fluid Speed AgeGroup
#1     89    94   103        2
#2     83   101    71        2
#3    102    92    87        2
#4     79    92    84        2
#5    107    95   102        2
#
#[[2]]
#  Verbal Fluid Speed AgeGroup
#1     98    88   100        1
#2    127   115   102        1
#3     91    97   120        1
#4     96   129    98        1
like image 910
Michael Matta Avatar asked Oct 16 '25 17:10

Michael Matta


2 Answers

You could replace map by map2 and in this way maintain the knowledge of the corresponding value of Age:

c1 <- c %>% group_by(Age) %>% nest() %>% 
  dplyr::mutate(db = data %>% map2(Age, function(k, age) cbind(k, AgeGroup = age)))
c1$db
# [[1]]
#   Verbal Fluid Speed AgeGroup
# 1     89    94   103        2
# 2     83   101    71        2
# 3    102    92    87        2
# 4     79    92    84        2
# 5    107    95   102        2
#
# [[2]]
#   Verbal Fluid Speed AgeGroup
# 1     98    88   100        1
# 2    127   115   102        1
# 3     91    97   120        1
# 4     96   129    98        1

When you tried cbind(k, AgeGroup = Age) directly, the problem was that Age was a vector 2:1, rather than a single corresponding value.

like image 78
Julius Vainora Avatar answered Oct 19 '25 08:10

Julius Vainora


We can use map2 to loop through both Age and data columns and update the data columns using mutate.

library(dplyr)
library(purrr)
library(magrittr)
library(tidyr)

c1 <- c %>% 
  group_by(Age) %>% 
  nest()

c2 <- c1 %>%
  mutate(data = map2(data, Age, ~mutate(.x, AgeGroup = .y)))

c2$data
# [[1]]
# # A tibble: 5 x 4
#   Verbal Fluid Speed AgeGroup
#    <int> <int> <int>    <int>
# 1     89    94   103        2
# 2     83   101    71        2
# 3    102    92    87        2
# 4     79    92    84        2
# 5    107    95   102        2
# 
# [[2]]
# # A tibble: 4 x 4
#   Verbal Fluid Speed AgeGroup
#    <int> <int> <int>    <int>
# 1     98    88   100        1
# 2    127   115   102        1
# 3     91    97   120        1
# 4     96   129    98        1
like image 35
www Avatar answered Oct 19 '25 09:10

www