Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: how to reference columns by column index rather than column name using mutate?

Tags:

r

dplyr

Using dplyr, you can do something like this:

iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width)    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum 1          5.1         3.5          1.4         0.2  setosa 8.6 2          4.9         3.0          1.4         0.2  setosa 7.9 3          4.7         3.2          1.3         0.2  setosa 7.9 4          4.6         3.1          1.5         0.2  setosa 7.7 5          5.0         3.6          1.4         0.2  setosa 8.6 6          5.4         3.9          1.7         0.4  setosa 9.3 

But above, I referenced the columns by their column names. How can I use 1 and 2 , which are the column indices to achieve the same result?

Here I have the following, but I feel it's not as elegant.

iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))   Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum 1          5.1         3.5          1.4         0.2  setosa 8.6 2          4.9         3.0          1.4         0.2  setosa 7.9 3          4.7         3.2          1.3         0.2  setosa 7.9 4          4.6         3.1          1.5         0.2  setosa 7.7 5          5.0         3.6          1.4         0.2  setosa 8.6 6          5.4         3.9          1.7         0.4  setosa 9.3 
like image 851
Alby Avatar asked Sep 16 '15 21:09

Alby


People also ask

How do I reference specific columns in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

What does mutate in dplyr do?

mutate() is a dplyr function that adds new variables and preserves existing ones. That's what the documentation says. So when you want to add new variables or change one already in the dataset, that's your good ally. Given our dataset df , we can easily add columns with calculations.

Which function of dplyr package helps in adding modifying a column of a data frame?

Add a column to a dataframe in R using dplyr. In my opinion, the best way to add a column to a dataframe in R is with the mutate() function from dplyr .


2 Answers

You can try:

iris %>% head %>% mutate(sum = .[[1]] + .[[2]])    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum 1          5.1         3.5          1.4         0.2  setosa 8.6 2          4.9         3.0          1.4         0.2  setosa 7.9 3          4.7         3.2          1.3         0.2  setosa 7.9 4          4.6         3.1          1.5         0.2  setosa 7.7 5          5.0         3.6          1.4         0.2  setosa 8.6 6          5.4         3.9          1.7         0.4  setosa 9.3 
like image 116
jeremycg Avatar answered Oct 21 '22 20:10

jeremycg


I'm a bit late to the game, but my personal strategy in cases like this is to write my own tidyverse-compliant function that will do exactly what I want. By tidyverse-compliant, I mean that the first argument of the function is a data frame and that the output is a vector that can be added to the data frame.

sum_cols <- function(x, col1, col2){    x[[col1]] + x[[col2]] }  iris %>%   head %>%   mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2)) 
like image 39
SavedByJESUS Avatar answered Oct 21 '22 21:10

SavedByJESUS