Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use dplyr:mutate to mulitply pairs of columns specified by parts of the variable name

Tags:

r

dplyr

names

I have the following example:

df <- data.frame(
id = c(1,2,3),
  fix_01.2012 = c(2,5,7),
  fix_02.2012 = c(5,1,7),
  fix_03.2012 = c(6,1,5),
  fox_01.2012 = c(0.4, 0.5, 0.7),
  fox_02.2012 = c(0.6, 0.5, 0.8),
  fox_03.2012 = c(0.7, 0.5, 0.9)
  )

  id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012
1  1           2           5           6         0.4         0.6         0.7
2  2           5           1           1         0.5         0.5         0.5
3  3           7           7           5         0.7         0.8         0.9

The table below is what I want to get. I want to create a new column for each date (e.g. "01.2012"):

res_date = fix_date * fox_date

As I have many dates / pairs of dates, I guess this needs to be done by looping through the names.

 id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012 res_01.2012 res_02.2012 res_03.2012
1  1           2           5           6         0.4         0.6         0.7         0.8         3.0         4.2
2  2           5           1           1         0.5         0.5         0.5         2.5         0.5         0.5
3  3           7           7           5         0.7         0.8         0.9         4.9         5.6         4.5

Anyone can help? Thanks very much in advance!

like image 286
user138089 Avatar asked Sep 06 '25 10:09

user138089


2 Answers

If you want a tidyverse approach, it will take using a bit of tidy evaluation to get what you want.

library(tidyverse)

df <- data.frame(
  id = c(1,2,3),
  fix_01.2012 = c(2,5,7),
  fix_02.2012 = c(5,1,7),
  fix_03.2012 = c(6,1,5),
  fox_01.2012 = c(0.4, 0.5, 0.7),
  fox_02.2012 = c(0.6, 0.5, 0.8),
  fox_03.2012 = c(0.7, 0.5, 0.9)
)

# colnames with "fix" 
fix <- names(df)[grepl("fix",names(df))]

# colnames with "fox"
fox <- names(df)[grepl("fox",names(df))]

# Iterate over the two vectors of names and column bind the results (map2_dfc).  
# Since these are strings, we need to have them evaluated as symbols
# Creating the column name just requires the string to be evaluated.

map2_dfc(fix, fox, ~transmute(df, !!paste0("res", str_extract(.x, "_(0\\d)")) := !!sym(.x) * !!sym(.y)))

#>   res_01 res_02 res_03
#> 1    0.8    3.0    4.2
#> 2    2.5    0.5    0.5
#> 3    4.9    5.6    4.5
like image 123
Jake Kaupp Avatar answered Sep 08 '25 03:09

Jake Kaupp


Here is an idea that uses split.default to split the data frame based on similar column names (based on your conditions). We then loop over that list and multiply the columns. In this case, we use Reduce (rather than i[1]*i[2]) to multiply in order to account for more than two columns

do.call(cbind, 
   lapply(split.default(df[-1], gsub('.*_', '', names(df[-1]))), function(i) Reduce(`*`, i)))

#     01.2012 02.2012 03.2012
#[1,]     0.8     3.0     4.2
#[2,]     2.5     0.5     0.5
#[3,]     4.9     5.6     4.5

Bind them back to the original with cbind.data.frame()

like image 22
Sotos Avatar answered Sep 08 '25 04:09

Sotos