Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tidyr::unite across column patterns

Tags:

r

dplyr

tidyr

I have a dataset that looks something like this

site <- c("A", "B", "C", "D", "E")
D01_1 <- c(1, 0, 0, 0, 1)
D01_2 <- c(1, 1, 0, 1, 1)
D02_1 <- c(1, 0, 1, 0, 1)
D02_2 <- c(0, 1, 0, 0, 1)
D03_1 <- c(1, 1, 0, 0, 0)
D03_2 <- c(0, 1, 0, 0, 1)
df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2)

I am trying to unite the D0x_1 and D0x_2 columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine:

library(dplyr)
library(tidyr)

df.unite <- df %>%
  unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>%
  unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>%
  unite(D03, D03_1, D03_2, sep = "/", remove = TRUE)

...but the problem is that it requires me to type out each unite pair multiple times and it is unwieldy across the large number of columns in my dataset. Is there a way in dplyr to unite across similarly patterned column names and then loop across the columns? unite_each doesn't seem to exist.

like image 791
Steven Avatar asked Mar 15 '16 03:03

Steven


People also ask

How do I unite columns in R?

To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.

How do I combine multiple columns into one column in R?

Convert multiple columns into a single column, To combine numerous data frame columns into one column, use the union() function from the tidyr package.

What are the functions of Tidyr?

tidyr provides three main functions for tidying your messy data: gather() , separate() and spread() . Sometimes two variables are clumped together in one column. separate() allows you to tease them apart ( extract() works similarly but uses regexp groups instead of a splitting pattern or position).

Is Dplyr part of Tidyr?

Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.


2 Answers

Two options, which are really the same thing rearranged.


Option 1. Nested calls

First, you can use lapply to apply unite_ (the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the lapply in do.call(cbind to catch columns, and cbind site back to it. Altogether:

cols <- unique(substr(names(df)[-1], 1, 3))
cbind(site = df$site, do.call(cbind,
        lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                                        sep = '/', remove = TRUE) %>% select_(x)})
        ))

#   site D01 D02 D03
# 1    A 1/1 1/0 1/0
# 2    B 0/1 0/1 1/1
# 3    C 0/0 1/0 0/0
# 4    D 0/1 0/0 0/0
# 5    E 1/1 1/1 0/1

Option 2: Chained

Alternately, if you really like pipes, you can actually hack the whole thing into a chain (lapply included!), swapping out a few of the base functions for dplyr ones:

df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>%
  lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                            sep = '/', remove = TRUE) %>% select_(x)}) %>%
  bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D'))

# Source: local data frame [5 x 4]
# 
#    site   D01   D02   D03
#   (chr) (chr) (chr) (chr)
# 1     A   1/1   1/0   1/0
# 2     B   0/1   0/1   1/1
# 3     C   0/0   1/0   0/0
# 4     D   0/1   0/0   0/0
# 5     E   1/1   1/1   0/1

Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.

like image 174
alistaire Avatar answered Oct 23 '22 02:10

alistaire


This is a solution with base functions. First, I looked for indexes of ***_1 in columns. I also created names for columns for the final process, using gsub() and unique(). The sapply part pastes two columns with /. If x = 1, then, x +1 = 2. So you always choose two columns next to each other and handle the pasting job. Then, I added site with cbind() and created a data frame. The last job is to assign column names.

library(magrittr)

ind <- grep(pattern = "1$", x = names(df))

names <- unique(gsub(pattern = "_\\d+$",
                replacement = "", x = names(df)))

sapply(ind, function(x){
        foo <- paste(df[,x], df[, x+1], sep = "/")
        foo
       }) %>%
cbind(as.character(df$site), .) %>%
data.frame -> out

names(out) <- names

#  site D01 D02 D03
#1    A 1/1 1/0 1/0
#2    B 0/1 0/1 1/1
#3    C 0/0 1/0 0/0
#4    D 0/1 0/0 0/0
#5    E 1/1 1/1 0/1
like image 29
jazzurro Avatar answered Oct 23 '22 02:10

jazzurro