I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.
> str(data) 'data.frame': 680420 obs. of 10 variables: $ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D : chr "AAA" "AAA" "BCB" "CCC" ... $ E : chr "A00001" "A00002" "B00002" "B00001" ... $ F : int 9 9 37 37 37 37 191 191 191 191 ... $ G : int NA NA NA NA NA NA NA NA NA NA ... $ H : int 4 4 4 4 4 4 4 4 4 4 ...
For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:
data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")
And below is the undesired result:
> str(data) 'data.frame': 680420 obs. of 10 variables: $ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D : chr "AAA" "AAA" "BCB" "CCC" ... $ E : chr "A00001" "A00002" "B00002" "B00001" ... $ F : int 9 9 37 37 37 37 191 191 191 191 ... $ G : int NA NA NA NA NA NA NA NA NA NA ... $ H : int 4 4 4 4 4 4 4 4 4 4 ... $ id : chr [1:680420, 1:4] "9" "9" "37" "37" ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "V1" "V2" "V3" "V4"
Any help would be greatly appreciated.
In the Advanced Combine Rows window, choose the column which you want to combine rows based on, and click Primary Key to set it as key column. 3. Select the column you need to combine, click Combine, and choose one delimiter you use to separate the combined contents.
First, we used the paste() function from base R. Using this function, we combined two and three columns, changed the separator from whitespaces to hyphen (“-”). Second, we used the str_() function to merge columns. Third, we used the unite() function.
NumPy's concatenate function can be used to concatenate two arrays either row-wise or column-wise. Concatenate function can take two or more arrays of the same shape and by default it concatenates row-wise i.e. axis=0. The resulting array after row-wise concatenation is of the shape 6 x 3, i.e. 6 rows and 3 columns.
Try
data$id <- paste(data$F, data$E, data$D, data$C, sep="_")
instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.
Edit Even better is
data <- within(data, id <- paste(F, E, D, C, sep=""))
Use unite
of tidyr
package:
require(tidyr) data <- data %>% unite(id, F, E, D, C, sep = '_')
First parameter is the desired name, all next up to sep
- columns to concatenate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With