<p>I have a dataset that looks something like this</p> <pre class="prettyprint"><code>site <- c("A", "B", "C", "D", "E") D01_1 <- c(1, 0, 0, 0, 1) D01_2 <- c(1, 1, 0, 1, 1) D02_1 <- c(1, 0, 1, 0, 1) D02_2 <- c(0, 1, 0, 0, 1) D03_1 <- c(1, 1, 0, 0, 0) D03_2 <- c(0, 1, 0, 0, 1) df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2) </code></pre> <p>I am trying to unite the <code>D0x_1</code> and <code>D0x_2</code> columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine:</p> <pre class="prettyprint"><code>library(dplyr) library(tidyr) df.unite <- df %>% unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>% unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>% unite(D03, D03_1, D03_2, sep = "/", remove = TRUE) </code></pre> <p>...but the problem is that it requires me to type out each <code>unite</code> pair multiple times and it is unwieldy across the large number of columns in my dataset. Is there a way in <code>dplyr</code> to unite across similarly patterned column names and then loop across the columns? <code>unite_each</code> doesn't seem to exist.</p>

<p>Two options, which are really the same thing rearranged.</p> <hr> <h3>Option 1. Nested calls</h3> <p>First, you can use <code>lapply</code> to apply <code>unite_</code> (the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the <code>lapply</code> in <code>do.call(cbind</code> to catch columns, and <code>cbind</code> <code>site</code> back to it. Altogether: </p> <pre class="prettyprint"><code>cols <- unique(substr(names(df)[-1], 1, 3)) cbind(site = df$site, do.call(cbind, lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE), sep = '/', remove = TRUE) %>% select_(x)}) )) # site D01 D02 D03 # 1 A 1/1 1/0 1/0 # 2 B 0/1 0/1 1/1 # 3 C 0/0 1/0 0/0 # 4 D 0/1 0/0 0/0 # 5 E 1/1 1/1 0/1 </code></pre> <hr> <h3>Option 2: Chained</h3> <p>Alternately, if you really like pipes, you can actually hack the whole thing into a chain (<code>lapply</code> included!), swapping out a few of the base functions for <code>dplyr</code> ones:</p> <pre class="prettyprint"><code>df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>% lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE), sep = '/', remove = TRUE) %>% select_(x)}) %>% bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D')) # Source: local data frame [5 x 4] # # site D01 D02 D03 # (chr) (chr) (chr) (chr) # 1 A 1/1 1/0 1/0 # 2 B 0/1 0/1 1/1 # 3 C 0/0 1/0 0/0 # 4 D 0/1 0/0 0/0 # 5 E 1/1 1/1 0/1 </code></pre> <p>Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.</p>

<p>This is a solution with base functions. First, I looked for indexes of ***_1 in columns. I also created names for columns for the final process, using <code>gsub()</code> and <code>unique()</code>. The sapply part pastes two columns with <code>/</code>. If x = 1, then, x +1 = 2. So you always choose two columns next to each other and handle the pasting job. Then, I added <code>site</code> with <code>cbind()</code> and created a data frame. The last job is to assign column names. </p> <pre class="prettyprint"><code>library(magrittr) ind <- grep(pattern = "1$", x = names(df)) names <- unique(gsub(pattern = "_\\d+$", replacement = "", x = names(df))) sapply(ind, function(x){ foo <- paste(df[,x], df[, x+1], sep = "/") foo }) %>% cbind(as.character(df$site), .) %>% data.frame -> out names(out) <- names # site D01 D02 D03 #1 A 1/1 1/0 1/0 #2 B 0/1 0/1 1/1 #3 C 0/0 1/0 0/0 #4 D 0/1 0/0 0/0 #5 E 1/1 1/1 0/1 </code></pre>

tidyr::unite across column patterns

Tags:

r

dplyr

tidyr

I have a dataset that looks something like this

site <- c("A", "B", "C", "D", "E")
D01_1 <- c(1, 0, 0, 0, 1)
D01_2 <- c(1, 1, 0, 1, 1)
D02_1 <- c(1, 0, 1, 0, 1)
D02_2 <- c(0, 1, 0, 0, 1)
D03_1 <- c(1, 1, 0, 0, 0)
D03_2 <- c(0, 1, 0, 0, 1)
df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2)

I am trying to unite the D0x_1 and D0x_2 columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine:

library(dplyr)
library(tidyr)

df.unite <- df %>%
  unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>%
  unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>%
  unite(D03, D03_1, D03_2, sep = "/", remove = TRUE)

...but the problem is that it requires me to type out each unite pair multiple times and it is unwieldy across the large number of columns in my dataset. Is there a way in dplyr to unite across similarly patterned column names and then loop across the columns? unite_each doesn't seem to exist.

791

asked Mar 15 '16 03:03

Steven

2 Answers

Two options, which are really the same thing rearranged.

Option 1. Nested calls

First, you can use lapply to apply unite_ (the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the lapply in do.call(cbind to catch columns, and cbind site back to it. Altogether:

cols <- unique(substr(names(df)[-1], 1, 3))
cbind(site = df$site, do.call(cbind,
        lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                                        sep = '/', remove = TRUE) %>% select_(x)})
        ))

#   site D01 D02 D03
# 1    A 1/1 1/0 1/0
# 2    B 0/1 0/1 1/1
# 3    C 0/0 1/0 0/0
# 4    D 0/1 0/0 0/0
# 5    E 1/1 1/1 0/1

Option 2: Chained

Alternately, if you really like pipes, you can actually hack the whole thing into a chain (lapply included!), swapping out a few of the base functions for dplyr ones:

df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>%
  lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                            sep = '/', remove = TRUE) %>% select_(x)}) %>%
  bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D'))

# Source: local data frame [5 x 4]
# 
#    site   D01   D02   D03
#   (chr) (chr) (chr) (chr)
# 1     A   1/1   1/0   1/0
# 2     B   0/1   0/1   1/1
# 3     C   0/0   1/0   0/0
# 4     D   0/1   0/0   0/0
# 5     E   1/1   1/1   0/1

Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.

174

answered Oct 23 '22 02:10

alistaire

This is a solution with base functions. First, I looked for indexes of ***_1 in columns. I also created names for columns for the final process, using gsub() and unique(). The sapply part pastes two columns with /. If x = 1, then, x +1 = 2. So you always choose two columns next to each other and handle the pasting job. Then, I added site with cbind() and created a data frame. The last job is to assign column names.

library(magrittr)

ind <- grep(pattern = "1$", x = names(df))

names <- unique(gsub(pattern = "_\\d+$",
                replacement = "", x = names(df)))

sapply(ind, function(x){
        foo <- paste(df[,x], df[, x+1], sep = "/")
        foo
       }) %>%
cbind(as.character(df$site), .) %>%
data.frame -> out

names(out) <- names

#  site D01 D02 D03
#1    A 1/1 1/0 1/0
#2    B 0/1 0/1 1/1
#3    C 0/0 1/0 0/0
#4    D 0/1 0/0 0/0
#5    E 1/1 1/1 0/1

answered Oct 23 '22 02:10

jazzurro

Related questions
                            
                                How to scale edge colors in igraph?
                            
                                Indicate that R package is proprietary
                            
                                Display R console logs on shiny server
                            
                                R script working locally not working on shinyapp.io
                            
                                Reading a huge json file in R , issues
                            
                                In R, using scientific notation 10^ rather than e+
                            
                                Is there a Python equivalent to the mahalanobis() function in R? If not, how can I implement it?
                            
                                How to plot multiple columns in R for the same X-Axis Value [duplicate]
                            
                                write table in database with dplyr
                            
                                nested ifelse() is the worst; what's the best? [duplicate]
                            
                                Source R script using 32 bit R from 64 bit RStudio
                            
                                What are the file formats that read into R the fastest?
                            
                                R - error installing package - Error in curl::curl_fetch_memory(url, handle = handle) : Couldn't connect to server
                            
                                Using parallel package in shiny
                            
                                Mysterious R ggplot error - Error in list2env(members, envir = e)
                            
                                Any way to automatically correct all variable classes in a dataframe
                            
                                Referring to previous row in calculation
                            
                                Replacing specific values in vector with different samples from another vector
                            
                                topoplot in ggplot2 – 2D visualisation of e.g. EEG data
                            
                                scale_fill_discrete does not change label names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With