Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate in R - add column as concat of columns

I have a problem with using mutate{dplyr} function with the aim of adding a new column to data frame. I want a new column to be of character type and to consist of "concat" of sorted words from other columns (which are of character type, too). For example, for the following data frame:

> library(datasets) > states.df <- data.frame(name = as.character(state.name), +                         region = as.character(state.region), +                         division = as.character(state.division)) >  > head(states.df, 3)      name region           division 1 Alabama  South East South Central 2  Alaska   West            Pacific 3 Arizona   West           Mountain  

I would like to get a new column with the following first element:

"Alamaba_East South Central_South"  

I tried this:

mutate(states.df,    concated_column = paste0(sort(name, region, division), collapse="_")) 

But I received an error:

Error in sort(1:50, c(2L, 4L, 4L, 2L, 4L, 4L, 1L, 2L, 2L, 2L, 4L, 4L,  :    'decreasing' must be a length-1 logical vector. Did you intend to set 'partial'? 

Thank you for any help in advance!

like image 788
Marta Karas Avatar asked Feb 13 '14 11:02

Marta Karas


2 Answers

You need to use sep = not collapse =, and why use sort?. And I used paste and not paste0.

library(dplyr) states.df <- data.frame(name = as.character(state.name),                         region = as.character(state.region),                          division = as.character(state.division)) res = mutate(states.df,    concated_column = paste(name, region, division, sep = '_')) 

As far as the sorting goes, you do not use sort correctly. Maybe you want:

as.data.frame(lapply(states.df, sort)) 

This sorts each column, and creates a new data.frame with those columns.

like image 87
Paul Hiemstra Avatar answered Oct 14 '22 06:10

Paul Hiemstra


Adding on to Paul's answer. If you want to sort the rows, you could try order. Here is an example:

res1 <- mutate(states.df,           concated_column = apply(states.df[order(name, region, division), ], 1,                                    function(x) paste0(x, collapse = "_"))) 

Here order sorts the data.frame states.df by name and then breaks the tie by region and division

like image 44
Ray Avatar answered Oct 14 '22 07:10

Ray