Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

specify dplyr column names [duplicate]

How can I pass column names to dplyr if I do not know the column name, but want to specify it through a variable?

e.g. this works:

require(dplyr)
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(group) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

But this does not

require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(someColumn) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
like image 570
user3241888 Avatar asked Jan 27 '14 19:01

user3241888


People also ask

Can you have duplicate column names in R?

Duplicate column names are allowed, but you need to use check. names = FALSE for data. frame to generate such a data frame. However, not all operations on data frames will preserve duplicated column names: for example matrix-like subsetting will force column names in the result to be unique.

How do I replace duplicate column names in R?

The easiest way to remove repeated column names from a data frame is by using the duplicated() function. This function (together with the colnames() function) indicates for each column name if it appears more than once. Using this information and square brackets one can easily remove the duplicate column names.

How do I find duplicate column names in R?

Use the duplicated() function to create a vector that indicates which columns are identical. Optionall, show the names of the duplicated columns using the colnames() function. Remove the duplicated columns with the square brackets [] and the !- symbol.


2 Answers

I just gave a similar answer over at Group by multiple columns in dplyr, using string vector input, but for good measure: functions that allow you to operate on columns using strings have been added to dplyr. These have the same name as the regular dplyr functions, but end in an underscore. The functions are described in detail in this vignette.

Given df and someColumn from the OP, this now works a treat:

gdf <- df %>% group_by_(someColumn) %>% summarise(m1=mean(V1),m2=mean(V2),m3=mean(V3))

Note that it is group_by_, rather than group_by, and the %>% operator is used as %.% is deprecated.

like image 85
edward Avatar answered Sep 30 '22 20:09

edward


Here's an answer to this straightforward question, obtained by picking through hadley's solution to his posted dupe.

gdf <- df %.% regroup( lapply( someColumn, as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

FWIW, my use case involved grouping by one variable column and one constant column. The solution to that is:

gdf <- df %.% regroup( lapply( c( 'constant_column', someColumn), as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

Finally, the posted eval solution doesn't work. That just makes a new column whose values are all what someColumn evals to.

like image 28
StatSandwich Avatar answered Sep 30 '22 21:09

StatSandwich