Is there a way to combine R data columns with other columns all at once? For example, <pre class="prettyprint"><code>asd <- data.frame(a = c("A","B"), b = c("d","f"), c = c("x","y")) asd </code></pre> <pre class="prettyprint lang-none prettyprint-override"><code> a b c 1 A d x 2 B f y </code></pre> Expected output (combine column 'a' with both column b and c): <pre class="prettyprint lang-none prettyprint-override"><code> a b c 1 A Ad Ax 2 B Bf By </code></pre>

You can use <code>lapply</code> in base R - <pre class="prettyprint"><code>asd[-1] <- lapply(asd[-1], function(x) paste0(asd$a, x)) </code></pre> Or <code>across</code> in <code>dplyr</code> - <pre class="prettyprint"><code>library(dplyr) library(stringr) asd %>% mutate(across(-a, ~str_c(a, .x))) # a b c #1 A Ad Ax #2 B Bf By </code></pre>

We can also use the <code>pmap</code> function from <code>purrr</code>: <pre class="prettyprint"><code>library(purrr) asd %>% pmap_dfr(~ c(list(...)[1], setNames(paste(..1, c(...)[-1], sep = ""), names(asd)[-1]))) # A tibble: 2 x 3 a b c <chr> <chr> <chr> 1 A Ad Ax 2 B Bf By </code></pre>

You can try <pre class="prettyprint"><code>asd[-1] <- paste0(asd$a[row(asd[-1])], as.matrix(asd[-1])) </code></pre> which gives <pre class="prettyprint"><code>> asd a b c 1 A Ad Ax 2 B Bf By </code></pre>

Combine column values in an R dataframe all at once

Tags:

r

Is there a way to combine R data columns with other columns all at once?

For example,

asd <- data.frame(a = c("A","B"), b = c("d","f"), c = c("x","y"))
asd

  a b c
1 A d x
2 B f y

Expected output (combine column 'a' with both column b and c):

  a  b  c
1 A Ad  Ax
2 B Bf  By

917

asked Jun 30 '21 07:06

user11740857

Video Answer

4 Answers

You can use paste0 with the first column asd[[1]] and the unlisted other columns unlist(asd[-1]) and assign it back in the data.frame in place of the other columns asd[-1].

asd[-1] <- paste0(asd[[1]], unlist(asd[-1]))
#  a  b  c
#1 A Ad Ax
#2 B Bf By

Disable recursive and use.names in unlist might improve the perfomance:

asd[-1] <- paste0(asd[[1]], unlist(asd[-1], FALSE, FALSE))

The same but using names:

S <- c("b", "c")
asd[S] <- paste0(asd$a, unlist(asd[S]))

Another way is to use paste0 in Map and subset asd once with [-1] excluding the first column and [rep(1,2)] getting the first column 2 times.

asd[-1] <- Map(paste0, asd[rep(1,2)], asd[-1])

The same but using names:

S <- c("b", "c")
asd[S] <- Map(paste0, asd[rep("a", length(S))], asd[S])

Another way will be to use a for loop;

for(i in 2:3) {asd[[i]] <- paste0(asd[[1]], asd[[i]])}

for(i in c("b", "c")) {asd[[i]] <- paste0(asd$a, asd[[i]])}

Comparing the methods:

getDf <- function(nr, nc) { #function to creat example dataset
    data.frame(a = sample(LETTERS, nr, TRUE),
               setNames(replicate(nc, sample(letters, nr, TRUE), simplify=FALSE), paste0("b", seq_len(nc))))
}

library(dplyr)
library(stringr)
library(purrr)
M <- alist(
    unlist = (function(asd) {asd[,-1] <- paste0(asd[,1], unlist(asd[,-1], FALSE, FALSE)); asd})(D)
  , Map = (function(asd) {asd[-1] <- Map(paste0, asd[rep(1,ncol(asd)-1)], asd[-1]); asd})(D)
  , "for" = (function(asd) {for(i in 2:ncol(asd)) {asd[[i]] <- paste0(asd[,1], asd[,i])}; asd})(D)
  , "for+str_c" = (function(asd) {for(i in 2:ncol(asd)) {asd[[i]] <- str_c(asd[,1], asd[,i])}; asd})(D)
  , lapply = (function(asd) {asd[-1] <- lapply(asd[-1], function(x) paste0(asd$a, x)); asd})(D)
  , across = (function(asd) {asd <- asd %>% mutate(across(-a, ~str_c(a, .x))); asd})(D)
  , pmap = (function(asd) {asd <- asd %>%
  pmap_dfr(~ c(list(...)[1], setNames(paste(..1, c(...)[-1], sep = ""), names(asd)[-1]))); as.data.frame(asd)})(D)
  , "row+matrix" = (function(asd) {asd[-1] <- paste0(asd$a[row(asd[-1])], as.matrix(asd[-1])); asd})(D)
  , apply = (function(asd) {asd[-1] <- apply(asd[-1], 2, function(x) paste0(asd[[1]], x)); asd})(D)
)

D <- getDf(1e5,2) #1e5 rows and 2 columsn
bench::mark(exprs = M)
#  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 unlist     29.07ms 29.92ms    29.5     12.68MB    11.8     15     6      509ms
#2 Map        22.94ms 23.02ms    42.6      1.53MB     1.94    22     1   516.38ms
#3 for        22.84ms 22.96ms    42.8      1.53MB     1.94    22     1   514.15ms
#4 for+str_c   9.78ms    10ms    97.2      1.53MB     3.97    49     2   503.89ms
#5 lapply     22.89ms 23.01ms    42.7      1.53MB     1.94    22     1   514.82ms
#6 across     12.29ms 12.57ms    77.8      1.53MB     1.99    39     1   501.43ms
#7 pmap         2.95s   2.95s     0.339    9.54MB     6.45     1    19      2.95s
#8 row+matrix 30.64ms 32.65ms    19.8     14.97MB     6.09    13     4   656.35ms
#9 apply      32.93ms 34.12ms    27.7     19.55MB     5.94    14     3   504.85ms
#Warning message:
#Some expressions had a GC in every iteration; so filtering is disabled.

D <- getDf(1e2, 1e3)
bench::mark(exprs = M)
#  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 unlist      21.4ms  21.7ms     45.2    18.08MB     9.68    14     3      310ms
#2 Map           28ms  28.1ms     35.3    12.53MB     4.41    16     2      453ms
#3 for         39.3ms  39.4ms     25.4      8.5MB     2.11    12     1      473ms
#4 for+str_c   34.1ms  34.3ms     29.1      8.5MB     4.48    13     2      447ms
#5 lapply      21.9ms  22.1ms     44.7    12.48MB     7.46    18     3      402ms
#6 across      80.3ms  80.9ms     12.3     5.98MB     4.93     5     2      406ms
#7 pmap       113.9ms   114ms      8.74    17.5MB     5.83     3     2      343ms
#8 row+matrix  24.5ms  24.6ms     40.2    19.31MB    10.7     15     4      373ms
#9 apply       32.3ms  32.5ms     30.5    21.72MB    11.1     11     4      360ms

Regarding Memory usage across and the for-loop could be recommended. Regarding speed in case of two rows Map, for and lapply in case of 1000 rows unlist and lapply so overall lapply could be recommended. Also using str_c instead of paste could improve performance.

In case all columns have the same type it could be considered to store the data in a matrix what will show advantives in case of many columns.

M <- as.matrix(asd)

M[,-1] <- paste0(M[,1], M[,-1])

M
#     a   b    c   
#[1,] "A" "Ad" "Ax"
#[2,] "B" "Bf" "By"

D <- getDf(1e5,2)
M <- as.matrix(D)
bench::mark(check = FALSE #One gives a data frame the other a matirx
 , lapply = (function(asd) {asd[-1] <- lapply(asd[-1], function(x) paste0(asd$a, x))})(D)
 , lapplyStr_C = (function(asd) {asd[-1] <- lapply(asd[-1], function(x) stringr::str_c(asd$a, x))})(D)
 , matrix = (function(M) {M[,-1] <- paste0(M[,1], M[,-1])})(M)
 , matrixStr_C = (function(M) {M[,-1] <- stringr::str_c(M[,1], M[,-1])})(M)
)
#  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr>  <bch:t> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 lapply       28.3ms 28.8ms      34.7    1.53MB     0       18     0      519ms
#2 lapplyStr_C  13.6ms 13.9ms      71.6    1.53MB     2.05    35     1      489ms
#3 matrix       34.1ms 34.4ms      28.9    7.25MB     7.24    12     3      415ms
#4 matrixStr_C  17.8ms 18.2ms      53.9    7.25MB     7.35    22     3      408ms

D <- getDf(1e2, 1e3)
M <- as.matrix(D)
bench::mark(check = FALSE #One gives a data frame the other a matirx
 , lapply = (function(asd) {asd[-1] <- lapply(asd[-1], function(x) paste0(asd$a, x))})(D)
 , lapplyStr_C = (function(asd) {asd[-1] <- lapply(asd[-1], function(x) stringr::str_c(asd$a, x))})(D)
 , matrix = (function(M) {M[,-1] <- paste0(M[,1], M[,-1])})(M)
 , matrixStr_C = (function(M) {M[,-1] <- stringr::str_c(M[,1], M[,-1])})(M)
)
#  expression       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#1 lapply       32.41ms  32.66ms      30.5   12.48MB    15.2     10     5
#2 lapplyStr_C  26.85ms  27.11ms      36.9   12.48MB    18.4     12     6
#3 matrix       16.28ms  16.94ms      59.4    2.32MB     2.05    29     1
#4 matrixStr_C   7.51ms   7.77ms     127.     2.32MB     6.90    55     3

answered Nov 16 '22 12:11

GKi

You can use lapply in base R -

asd[-1] <- lapply(asd[-1], function(x) paste0(asd$a, x))

Or across in dplyr -

library(dplyr)
library(stringr)

asd %>% mutate(across(-a, ~str_c(a, .x)))

#  a  b  c
#1 A Ad Ax
#2 B Bf By

answered Nov 16 '22 12:11

Ronak Shah

We can also use the pmap function from purrr:

library(purrr)

asd %>%
  pmap_dfr(~ c(list(...)[1], setNames(paste(..1, c(...)[-1], sep = ""), names(asd)[-1])))

# A tibble: 2 x 3
  a     b     c
  <chr> <chr> <chr>
1 A     Ad    Ax
2 B     Bf    By

answered Nov 16 '22 11:11

Anoushiravan R

You can try

asd[-1] <- paste0(asd$a[row(asd[-1])], as.matrix(asd[-1]))

which gives

> asd
  a  b  c
1 A Ad Ax
2 B Bf By

answered Nov 16 '22 12:11

ThomasIsCoding

Related questions
                            
                                R remove objects from a list with if else statement
                            
                                Convert list of list object to dataframe in R
                            
                                How to print a character list from A to Z?
                            
                                Provide shades between dates on x axis [duplicate]
                            
                                Convert numeric values into binary (0/1)
                            
                                Efficiently transform multiple columns of a data frame
                            
                                Reshape R data with user entries in rows, collapsing for each user
                            
                                How to change the color/theme on rmdformats/readthedown?
                            
                                ggplot2 multiline title, different indentations
                            
                                Grouping/recoding factors in the same data.frame
                            
                                Check interval contains number in R
                            
                                install quantstrat for R latest R version ()
                            
                                load multiple packages AND supress messages
                            
                                Removing the first duplicate row and keep the rest?
                            
                                How to put values on a boxplot for median, 1st quartile and last quartile?
                            
                                Creating a new variable from a lookup table
                            
                                Storing tic toc values in R
                            
                                How to modify this Correlation Matrix plot?
                            
                                R if-else not working [duplicate]
                            
                                set.seed with R 2.15.2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combine column values in an R dataframe all at once

Tags:

r

user11740857

People also ask

Video Answer

4 Answers

GKi

Ronak Shah

Anoushiravan R

ThomasIsCoding

Recent Activity

Donate For Us