Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning data frame name to all rows in a column

Tags:

r

I want to add the data frame name to all the rows in a column, for each data frame in a list.

Dummy data:

test_df <- data.frame(x = 1:5, y = c("a","b","c","d","e"))

What I want to end up with is this:

x    y    ref
1    a    test_df
2    b    test_df
3    c    test_df
4    d    test_df
5    e    test_df

The reason is that I am going to rbind multiple data frames later, and I want to be able to filter on which data frame the values came from. I tried the following:

library(dplyr)

test <- function(df) {
  df <- df %>%
    mutate(ref = deparse(substitute(df)))
  return(df)
}

But this only creates a column named ref with the value "df" in each row. Any suggestions with dplyr is greatly appreciated. Or maybe is there a way to directly create this column in the rbind-call?

like image 487
Haakonkas Avatar asked Nov 16 '25 18:11

Haakonkas


1 Answers

Using dplyr, try this:

library(lazyeval)
test <- function(df) {
   df <- df %>% mutate(ref = expr_label(df))
   return(df)
 }
test(test_df)
  x y       ref
1 a `test_df`
2 b `test_df`
3 c `test_df`
4 d `test_df`
5 e `test_df`

Alternatively, this also works, but does not use dplyr:

test2 <- function(df) {
  df$ref <- deparse(substitute(df))
  return(df)
}
test2(test_df)
  x y     ref
1 1 a test_df
2 2 b test_df
3 3 c test_df
4 4 d test_df
5 5 e test_df

To make this work with lists of dataframes and lapply is trickier due to how lapply works, but following workaround works:

test_df <- data.frame(x = 1:5, y = c("a","b","c","d","e"))
test_df2 <- data.frame(x = 11:15, y = c("aa","bb","cc","dd","ee"))

Here I create a named list of dataframes:

dfs <- setNames(list(test_df, test_df2), c("test_df", "test_df2"))
dfs
$test_df
  x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

$test_df2
   x  y
1 11 aa
2 12 bb
3 13 cc
4 14 dd
5 15 ee

Now I change the helper function to accept the name as an argument:

test3 <- function(df, nm) {
  df$ref <- nm
  return(df)
}

Here I only pass the names to lapply and retrieve each dataframe from the named list dfs that I have defined above.

lapply(names(dfs), function(x) test3(dfs[[x]], x))
[[1]]
  x y     ref
1 1 a test_df
2 2 b test_df
3 3 c test_df
4 4 d test_df
5 5 e test_df

[[2]]
   x  y      ref
1 11 aa test_df2
2 12 bb test_df2
3 13 cc test_df2
4 14 dd test_df2
5 15 ee test_df2

That is not the most elegant way, but it works.

Having said that, if you want to combine the dataframes into one single dataframe, there is not much to add to @markus's suggestion of using bind_rows, as in

bind_rows(dfs, .id="ref")
        ref  x  y
1   test_df  1  a
2   test_df  2  b
3   test_df  3  c
4   test_df  4  d
5   test_df  5  e
6  test_df2 11 aa
7  test_df2 12 bb
8  test_df2 13 cc
9  test_df2 14 dd
10 test_df2 15 ee
like image 154
coffeinjunky Avatar answered Nov 18 '25 19:11

coffeinjunky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!