I am trying to convert a data frame from wide to long format by gathering specific pairs of columns of which example is shown below: An example of data frame <pre class="prettyprint"><code>df <- data.frame(id=c(1,2,3,4,5), var=c("a","d","g","f","i"),a1=c(3,5,1,2,2), b1=c(2,4,1,2,3), a2=c(8,1,2,5,1), b2=c(1,6,4,7,2), a3=c(7,7,2,3,1), b3=c(1,1,4,9,6)) </code></pre> Initial table: <pre class="prettyprint"><code> id var a1 b1 a2 b2 a3 b3 1 1 a 3 2 8 1 7 1 2 2 d 5 4 1 6 7 1 3 3 g 1 1 2 4 2 4 4 4 f 2 2 5 7 3 9 5 5 i 2 3 1 2 1 6 </code></pre> Desired result: <pre class="prettyprint"><code> id var a b 1 1 a 3 2 2 1 a 8 1 3 1 a 7 1 4 2 d 5 4 5 2 d 1 6 6 2 d 7 1 7 3 g 1 1 8 3 g 2 4 9 3 g 2 4 10 4 f 2 2 11 4 f 5 7 12 4 f 3 9 13 5 i 2 3 14 5 i 1 2 15 5 i 1 6 </code></pre> Conditions: <ul> <li>Pair of ai and bi should be gathered: As there are 3 pairs of a and b, "a1 and b1", "a2 and b2" and "a3 and b3", values in those pairs should be moved to a pair of "a and b" by replicating each record in three times</li> <li>First and second fields (id of each sample and its common variable) should be kept in each replicated rows</li> </ul> I was thinking that it is possible to make it by <code>gather()</code> in tidyverse, however, as far as I understand, I suppose that gather function may not be suitable for gathering such specific pairs of fields into specific multiple columns (two columns in this case). It is possible to make it to prepare three data frames separately and binding it into one (example scripts are shown below), however I prefer to make it in one continuous pipe operation in tidyverse not to stop manipulation. <pre class="prettyprint"><code>df1 <- df %>% dplyr::select(id,var,a1,b1) df2 <- df %>% dplyr::select(id,var,a2,b2) df3 <- df %>% dplyr::select(id,var,a3,b3) df.fin <- bind_rows(df1,df2,df3) </code></pre> I would appreciate your elegant suggestons using tidyverse. =================Additional Questions================== @Akrun & Camille Thank you for your suggestions and sorry for my late reply. I am now trying to apply your idea into actual data frame but still struggling with another issue. Followings are column names in actual data frame (sorry, I do not set any values of each columns as it may not be a matter). <pre class="prettyprint"><code>colnames(df) <- c("hid","mid","rel","age","gen","mlic","vlic", "wtaz","staz","ocp","ocpot","emp","empot","expm", "minc","otaz1","op1","dtime1","atime1","dp1","dtaz1", "pur1", "repm1","lg1t1","lg2t1","lg3t1","lg4t1","expt1", "otaz2","op2","dtime2","atime2","dp2","dtaz2","pur2", "repm2","lg1t2","lg2t2","lg3t2","lg4t2","expt2", "otaz3","op3","dtime3","atime3","dp3","dtaz3","pur3", "repm3","lg1t3","lg2t3","lg3t3","lg4t3","expt3", "otaz4","op4","dtime4","atime4","dp4","dtaz4","pur4", "repm4","lg1t4","lg2t4","lg3t4","lg4t4","expt4", "otaz5","op5","dtime5","atime5","dp5","dtaz5","pur5", "repm5","lg1t5","lg2t5","lg3t5","lg4t5","expt5" ) </code></pre> Then, I am trying to apply your suggestions as below: In the data frame, columns 1:15 are commons variables and others are repeated variables with 5 repetitions (1 to 5 located at the end of each varible). I could rund following script but still have problem: <pre class="prettyprint"><code>#### Convert member table into activity table ## Common variables hm.com <- names(hm)[c(1:15)] ## Repeating variables hm.rep <- names(hm)[c(-1:-15)] hm.rename <- unique(sub("\\d+$","",hm.rep)) ## Extract members with trips hm.trip <- hm %>% filter(otaz!=0) %>% data.frame() ## Convert from member into trip table test <- split(hm.rep, sub(".*[^1-9$]", "", hm.rep)) %>% map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>% rename_at(16:28, ~ hm.rename) %>% arrange(hid,mid,dtime,atime) %>% data.frame() </code></pre> The result still have an issue: I could rename first set of repeated variables, however remaining fields from 2 to 5 are still remaining and records are not appropriately stored in the data frame. I mean that, a set of repeated variables, for instance, from otaz2 to expt2, are stored not in the second row of otaz~expt but stored in its original position (from otaz2 to expt2). I suppose map_df is not working correctly in my case. ========== Problem Solved ========== Above script was containing incorrect manipulation: Wrong: <pre class="prettyprint"><code>map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>% rename_at(16:28, ~ hm.rename) </code></pre> Correct: <pre class="prettyprint"><code>map_df(~ hm.trip %>% dplyr::select(hm.com, .x) %>% rename_at(16:28, ~ hm.rename)) </code></pre> Thank you, I could go to the next step.

We could do this with <code>melt</code> from <code>data.table</code> which can take multiple <code>patterns</code> in the <code>measure</code> argument to reshape into 'long' format. In this case we are using column names that start (<code>^</code>) with "a" followed by numbers as one pattern and those start with "b" and followed by numbers as other <pre class="prettyprint"><code>library(data.table) melt(setDT(df), measure = patterns("^a\\d+", "^b\\d+"), value.name = c("a", "b"))[order(id)][, variable := NULL][] # id var a b # 1: 1 a 3 2 # 2: 1 a 8 1 # 3: 1 a 7 1 # 4: 2 d 5 4 # 5: 2 d 1 6 # 6: 2 d 7 1 # 7: 3 g 1 1 # 8: 3 g 2 4 # 9: 3 g 2 4 #10: 4 f 2 2 #11: 4 f 5 7 #12: 4 f 3 9 #13: 5 i 2 3 #14: 5 i 1 2 #15: 5 i 1 6 </code></pre> <hr> Or using <code>tidyverse</code>, we <code>gather</code> the columns of interest to 'long' format (but should be cautious when dealing with groups of columns that are having different classes - where <code>melt</code> is more useful), then <code>separate</code> the 'key' column into two, and <code>spread</code> to 'wide' format <pre class="prettyprint"><code>library(tidyverse) df %>% gather(key, val, a1:b3) %>% separate(key, into = c("key1", "key2"), sep=1) %>% spread(key1, val) %>% select(-key2) # id var a b #1 1 a 3 2 #2 1 a 8 1 #3 1 a 7 1 #4 2 d 5 4 #5 2 d 1 6 #6 2 d 7 1 #7 3 g 1 1 #8 3 g 2 4 #9 3 g 2 4 #10 4 f 2 2 #11 4 f 5 7 #12 4 f 3 9 #13 5 i 2 3 #14 5 i 1 2 #15 5 i 1 6 </code></pre>

Gathering specific pairs of columns into rows by dplyr in R [duplicate]

Tags:

r

multiple-columns

dplyr

tidyverse

I am trying to convert a data frame from wide to long format by gathering specific pairs of columns of which example is shown below:

An example of data frame

df <- data.frame(id=c(1,2,3,4,5), var=c("a","d","g","f","i"),a1=c(3,5,1,2,2), b1=c(2,4,1,2,3), a2=c(8,1,2,5,1), b2=c(1,6,4,7,2), a3=c(7,7,2,3,1), b3=c(1,1,4,9,6))

Initial table:

  id var a1 b1 a2 b2 a3 b3
1  1   a  3  2  8  1  7  1
2  2   d  5  4  1  6  7  1
3  3   g  1  1  2  4  2  4
4  4   f  2  2  5  7  3  9
5  5   i  2  3  1  2  1  6

Desired result:

   id  var a  b
 1  1   a  3  2
 2  1   a  8  1
 3  1   a  7  1
 4  2   d  5  4
 5  2   d  1  6
 6  2   d  7  1
 7  3   g  1  1
 8  3   g  2  4
 9  3   g  2  4
10  4   f  2  2
11  4   f  5  7
12  4   f  3  9
13  5   i  2  3
14  5   i  1  2
15  5   i  1  6

Conditions:

Pair of ai and bi should be gathered: As there are 3 pairs of a and b, "a1 and b1", "a2 and b2" and "a3 and b3", values in those pairs should be moved to a pair of "a and b" by replicating each record in three times
First and second fields (id of each sample and its common variable) should be kept in each replicated rows

I was thinking that it is possible to make it by gather() in tidyverse, however, as far as I understand, I suppose that gather function may not be suitable for gathering such specific pairs of fields into specific multiple columns (two columns in this case).

It is possible to make it to prepare three data frames separately and binding it into one (example scripts are shown below), however I prefer to make it in one continuous pipe operation in tidyverse not to stop manipulation.

df1 <- df %>% dplyr::select(id,var,a1,b1)
df2 <- df %>% dplyr::select(id,var,a2,b2)
df3 <- df %>% dplyr::select(id,var,a3,b3)
df.fin <- bind_rows(df1,df2,df3)

I would appreciate your elegant suggestons using tidyverse.

=================Additional Questions==================

@Akrun & Camille Thank you for your suggestions and sorry for my late reply. I am now trying to apply your idea into actual data frame but still struggling with another issue.

Followings are column names in actual data frame (sorry, I do not set any values of each columns as it may not be a matter).

colnames(df) <- c("hid","mid","rel","age","gen","mlic","vlic",
                  "wtaz","staz","ocp","ocpot","emp","empot","expm",
                  "minc","otaz1","op1","dtime1","atime1","dp1","dtaz1",
                  "pur1", "repm1","lg1t1","lg2t1","lg3t1","lg4t1","expt1",
                  "otaz2","op2","dtime2","atime2","dp2","dtaz2","pur2",
                  "repm2","lg1t2","lg2t2","lg3t2","lg4t2","expt2",
                  "otaz3","op3","dtime3","atime3","dp3","dtaz3","pur3",
                  "repm3","lg1t3","lg2t3","lg3t3","lg4t3","expt3",
                  "otaz4","op4","dtime4","atime4","dp4","dtaz4","pur4",
                  "repm4","lg1t4","lg2t4","lg3t4","lg4t4","expt4",
                  "otaz5","op5","dtime5","atime5","dp5","dtaz5","pur5",
                  "repm5","lg1t5","lg2t5","lg3t5","lg4t5","expt5"
                  )

Then, I am trying to apply your suggestions as below: In the data frame, columns 1:15 are commons variables and others are repeated variables with 5 repetitions (1 to 5 located at the end of each varible). I could rund following script but still have problem:

#### Convert member table into activity table
## Common variables
hm.com <- names(hm)[c(1:15)]
## Repeating variables
hm.rep <- names(hm)[c(-1:-15)]
hm.rename <- unique(sub("\\d+$","",hm.rep))
## Extract members with trips
hm.trip <- hm %>% filter(otaz!=0) %>% data.frame()
## Convert from member into trip table
test <- split(hm.rep, sub(".*[^1-9$]", "", hm.rep)) %>%
    map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>% 
    rename_at(16:28, ~ hm.rename) %>%
    arrange(hid,mid,dtime,atime) %>%
    data.frame()

The result still have an issue:

I could rename first set of repeated variables, however remaining fields from 2 to 5 are still remaining and records are not appropriately stored in the data frame. I mean that, a set of repeated variables, for instance, from otaz2 to expt2, are stored not in the second row of otaz~expt but stored in its original position (from otaz2 to expt2). I suppose map_df is not working correctly in my case.

========== Problem Solved ========== Above script was containing incorrect manipulation:

Wrong:

map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>% 
        rename_at(16:28, ~ hm.rename)

Correct:

map_df(~ hm.trip %>% dplyr::select(hm.com, .x) %>% 
        rename_at(16:28, ~ hm.rename))

Thank you, I could go to the next step.

307

asked May 07 '18 16:05

HSJ

1 Answers

We could do this with melt from data.table which can take multiple patterns in the measure argument to reshape into 'long' format. In this case we are using column names that start (^) with "a" followed by numbers as one pattern and those start with "b" and followed by numbers as other

library(data.table)  
melt(setDT(df), measure = patterns("^a\\d+", "^b\\d+"), 
       value.name = c("a", "b"))[order(id)][, variable := NULL][]
#    id var a b
# 1:  1   a 3 2
# 2:  1   a 8 1
# 3:  1   a 7 1
# 4:  2   d 5 4
# 5:  2   d 1 6
# 6:  2   d 7 1
# 7:  3   g 1 1
# 8:  3   g 2 4
# 9:  3   g 2 4
#10:  4   f 2 2
#11:  4   f 5 7
#12:  4   f 3 9
#13:  5   i 2 3
#14:  5   i 1 2
#15:  5   i 1 6

Or using tidyverse, we gather the columns of interest to 'long' format (but should be cautious when dealing with groups of columns that are having different classes - where melt is more useful), then separate the 'key' column into two, and spread to 'wide' format

library(tidyverse)
df %>% 
  gather(key, val, a1:b3) %>%
  separate(key, into = c("key1", "key2"), sep=1) %>%
  spread(key1, val) %>%
  select(-key2)
#   id var a b
#1   1   a 3 2
#2   1   a 8 1
#3   1   a 7 1
#4   2   d 5 4
#5   2   d 1 6
#6   2   d 7 1
#7   3   g 1 1
#8   3   g 2 4
#9   3   g 2 4
#10  4   f 2 2
#11  4   f 5 7
#12  4   f 3 9
#13  5   i 2 3
#14  5   i 1 2
#15  5   i 1 6

answered Sep 23 '22 05:09

akrun

Related questions
                            
                                How Can I manually obtain predict() values from coef/model.matrix returns on linear model
                            
                                How to simply multiply two columns of a dataframe? [duplicate]
                            
                                Find index of change in a column
                            
                                how to remove words of specific length in a string in R?
                            
                                R: How do I remove the first element from each inner element of a list without converting it to matrix?
                            
                                Arrange ggplot plots (grobs with same widths) using gtable to create 2x2 layout
                            
                                How to add a row names to a data frame in a magrittr chain
                            
                                Observe Event to Hide Action Button in Shiny
                            
                                How to have multiple groups in Python statsmodels linear mixed effects model?
                            
                                How to subset data in R without losing NA rows?
                            
                                calculating mean for every n values from a vector
                            
                                R: Using piping to pass a single argument to multiple locations in a function
                            
                                Specifying same limits for colorbar (legend) in ggplot2
                            
                                How to pass strings denoting expressions to dplyr 0.7 verbs?
                            
                                Rename in dplyr 0.7+ function
                            
                                tidytext, quanteda, and tm returning different tf-idf scores
                            
                                How can I add stars to broom package's tidy() function output?
                            
                                How to create a countdown timer in Shiny?
                            
                                Remove doubles with no decimal places
                            
                                r - Is it right to copy the old r version packages to the new folder that contains the packages of the new version?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With