Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate row-wise across specific columns of dataframe

I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.

> str(data) 'data.frame':   680420 obs. of  10 variables:  $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...  $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...  $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...  $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...  $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...  $ F              : int  9 9 37 37 37 37 191 191 191 191 ...  $ G              : int  NA NA NA NA NA NA NA NA NA NA ...  $ H              : int  4 4 4 4 4 4 4 4 4 4 ... 

For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:

data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_") 

And below is the undesired result:

  > str(data)     'data.frame':   680420 obs. of  10 variables:      $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...      $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...      $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...      $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...      $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...      $ F              : int  9 9 37 37 37 37 191 191 191 191 ...      $ G              : int  NA NA NA NA NA NA NA NA NA NA ...      $ H              : int  4 4 4 4 4 4 4 4 4 4 ...      $ id             : chr [1:680420, 1:4] "9" "9" "37" "37" ...       ..- attr(*, "dimnames")=List of 2       .. ..$ : NULL       .. ..$ : chr  "V1" "V2" "V3" "V4" 

Any help would be greatly appreciated.

like image 668
Jubbles Avatar asked Jun 10 '11 15:06

Jubbles


People also ask

How do you concatenate rows by grouping data by column?

In the Advanced Combine Rows window, choose the column which you want to combine rows based on, and click Primary Key to set it as key column. 3. Select the column you need to combine, click Combine, and choose one delimiter you use to separate the combined contents.

How do I concatenate 3 columns in R?

First, we used the paste() function from base R. Using this function, we combined two and three columns, changed the separator from whitespaces to hyphen (“-”). Second, we used the str_() function to merge columns. Third, we used the unite() function.

What is row wise concatenation?

NumPy's concatenate function can be used to concatenate two arrays either row-wise or column-wise. Concatenate function can take two or more arrays of the same shape and by default it concatenates row-wise i.e. axis=0. The resulting array after row-wise concatenation is of the shape 6 x 3, i.e. 6 rows and 3 columns.


2 Answers

Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_") 

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep="")) 
like image 55
Dirk Eddelbuettel Avatar answered Oct 08 '22 21:10

Dirk Eddelbuettel


Use unite of tidyr package:

require(tidyr) data <- data %>% unite(id, F, E, D, C, sep = '_') 

First parameter is the desired name, all next up to sep - columns to concatenate.

like image 22
JelenaČuklina Avatar answered Oct 08 '22 19:10

JelenaČuklina