Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling paste strings across columns

Tags:

r

dplyr

I have this type of data:

df <- data.frame(
  w1 = c("A", "B", "C", "E", "F", "G"),
  w2 = c("B", "G", "C", "D", "E", "V"),
  w3 = c("D", "S", "O", "F", NA, "N"),
  w4 = c("E", "U", NA, "T", NA, NA),
  w5 = c("C", NA, NA, NA, NA, NA)
)

I need to iterate through column pairs to rolling-paste the separate strings into bigrams. Note that in the actual data the strings are of variable character length and character type. I've tried this but it fails:

df[, paste0("bigr_", 1:4, "_", 2:5)] <- lapply(df[, 1:5], 
                                               function(x) paste(x[i], x[i+1], sep = " "))

The expected output is:

  w1 w2   w3   w4   w5 bigr_1_2 bigr_2_3 bigr_3_4 bigr_4_5
1  A  B    D    E    C      A B      B D      D E      E C
2  B  G    S    U <NA>      B G      G S      S U     <NA>
3  C  C    O <NA> <NA>      C C      C O     <NA>     <NA>
4  E  D    F    T <NA>      E D      D F      F T     <NA>
5  F  E <NA> <NA> <NA>      F E     <NA>     <NA>     <NA>
6  G  V    N <NA> <NA>      G V      V N     <NA>     <NA>

I'd be most interested in a dplyr solution but am open and grateful for other solutions as well.

like image 737
Chris Ruehlemann Avatar asked Dec 19 '25 02:12

Chris Ruehlemann


1 Answers

As you said you're most interested in a dplyr solution, this can be achieved using mutate() and across(). You can alter the function applied to each column if this doesn't achieve the exact desired output.

df %>%
  mutate(
    across(
      # For the first four columns (i.e. has number 1-4 in column name)
      matches("[1-4]"),
      
      # Apply custom function
      function(col) {
        
        # Paste together
        paste(
          col, # the data in the current column
          cur_data()[[which(names(cur_data()) == cur_column())+1]], # and the data in the next column along
          sep = " "
        )
      },
      .names = "{gsub(pattern = 'w', replacement = 'bigr_', {col})}" # alter name of new cols (replace 'w' with 'bigr_')
    )
  ) %>%

  # EDIT: added to rename columns to match desired output
  rename_with(.cols = matches("bigr"),
              .fn = function(colname) {
                paste0(colname, "_", as.numeric(gsub(pattern = "bigr_", replacement = "", colname))+1)
              })
like image 179
cnbrownlie Avatar answered Dec 20 '25 14:12

cnbrownlie



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!