I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first measurement (<code>Value1</code>) was measured in the period from <code>DateRange1Start</code> to <code>DateRange1End</code>: <pre class="prettyprint"><code>ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3 1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 </code></pre> I'm looking to reshape the data to a long format such that the DateRangeXStart and DateRangeXEnd columns are grouped,. Thus, what was 1 row in the original table becomes 3 rows in the new table: <pre class="prettyprint"><code>ID DateRangeStart DateRangeEnd Value 1 1/1/90 3/1/90 4.4 1 4/5/91 6/7/91 6.2 1 5/5/95 6/6/96 3.3 </code></pre> I know there must be a way to do this with <code>reshape2</code>/<code>melt</code>/<code>recast</code>/<code>tidyr</code>, but I can't seem to figure it out how to map the multiple sets of measure variables into single sets of value columns in this particular way.

Reshaping from wide to long format with multiple value/measure columns is possible with the function <code>pivot_longer()</code> of the tidyr package since version 1.0.0. This is superior to the previous tidyr strategy of <code>gather()</code> than <code>spread()</code> (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below). <pre class="prettyprint lang-r prettyprint-override"><code>library("tidyr") library("magrittr") a <- structure(list(ID = 1L, DateRange1Start = structure(7305, class = "Date"), DateRange1End = structure(7307, class = "Date"), Value1 = 4.4, DateRange2Start = structure(7793, class = "Date"), DateRange2End = structure(7856, class = "Date"), Value2 = 6.2, DateRange3Start = structure(9255, class = "Date"), DateRange3End = structure(9653, class = "Date"), Value3 = 3.3), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")) </code></pre> <code>pivot_longer()</code> (counterpart: <code>pivot_wider()</code>) works similar to <code>gather()</code>. However, it offers additional functionality such as multiple value columns. With only one value column, all colnames of the wide data set would go into one long column with the name given in <code>names_to</code>. For multiple value columns, <code>names_to</code> may receive multiple new names. This is easiest if all column names follow a specific pattern like <code>Start_1</code>, <code>End_1</code>, <code>Start_2</code>, etc. Therefore, I renamed the columns in the first step. <pre class="prettyprint lang-r prettyprint-override"><code>(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a))) #> [1] "ID" "DateRangeStart_1" "DateRangeEnd_1" #> [4] "Value_1" "DateRangeStart_2" "DateRangeEnd_2" #> [7] "Value_2" "DateRangeStart_3" "DateRangeEnd_3" #> [10] "Value_3" pivot_longer(a, cols = -ID, names_to = c(".value", "group"), # names_prefix = "DateRange", names_sep = "_") #> # A tibble: 3 x 5 #> ID group DateRangeEnd DateRangeStart Value #> <int> <chr> <date> <date> <dbl> #> 1 1 1 1990-01-03 1990-01-01 4.4 #> 2 1 2 1991-07-06 1991-05-04 6.2 #> 3 1 3 1996-06-06 1995-05-05 3.3 </code></pre> Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below): <pre class="prettyprint lang-r prettyprint-override"><code>spec <- a %>% build_longer_spec(cols = -ID) %>% dplyr::transmute(.name = .name, group = readr::parse_number(name), .value = stringr::str_extract(name, "Start|End|Value")) pivot_longer(a, spec = spec) </code></pre> Created on 2019-03-26 by the reprex package (v0.2.1) See also: https://tidyr.tidyverse.org/articles/pivot.html

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

Tags:

r

r-faq

reshape

tidyr

reshape2

I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first measurement (Value1) was measured in the period from DateRange1Start to DateRange1End:

ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3

I'm looking to reshape the data to a long format such that the DateRangeXStart and DateRangeXEnd columns are grouped,. Thus, what was 1 row in the original table becomes 3 rows in the new table:

ID DateRangeStart DateRangeEnd Value
1 1/1/90 3/1/90 4.4
1 4/5/91 6/7/91 6.2
1 5/5/95 6/6/96 3.3

I know there must be a way to do this with reshape2/melt/recast/tidyr, but I can't seem to figure it out how to map the multiple sets of measure variables into single sets of value columns in this particular way.

968

asked Sep 17 '12 20:09

daj

3 Answers

reshape(dat, idvar="ID", direction="long", 
             varying=list(Start=c(2,5,8), End=c(3,6,9), Value=c(4,7,10)),
             v.names = c("DateRangeStart", "DateRangeEnd", "Value") )
#-------------
    ID time DateRangeStart DateRangeEnd Value
1.1  1    1          1/1/90        3/1/90    4.4
1.2  1    2          4/5/91        6/7/91    6.2
1.3  1    3          5/5/95        6/6/96    3.3

(Added the v.names per Josh's suggestion.)

answered Oct 13 '22 21:10

IRTFM

data.table's melt function can melt into multiple columns. Using that, we can simply do:

require(data.table)
melt(setDT(dat), id=1L,
     measure=patterns("Start$", "End$", "^Value"), 
     value.name=c("DateRangeStart", "DateRangeEnd", "Value"))

#    ID variable DateRangeStart DateRangeEnd Value
# 1:  1        1         1/1/90       3/1/90   4.4
# 2:  1        2         4/5/91       6/7/91   6.2
# 3:  1        3         5/5/95       6/6/96   3.3

Alternatively, you can also reference the three sets of measure columns by the column position:

melt(setDT(dat), id = 1L, 
     measure = list(c(2,5,8), c(3,6,9), c(4,7,10)), 
     value.name = c("DateRangeStart", "DateRangeEnd", "Value"))

answered Oct 13 '22 21:10

Arun

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_wider()) works similar to gather(). However, it offers additional functionality such as multiple value columns. With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to. For multiple value columns, names_to may receive multiple new names.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc. Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
#>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
#>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
#> [10] "Value_3"

pivot_longer(a, 
             cols = -ID, 
             names_to = c(".value", "group"),
             # names_prefix = "DateRange",
             names_sep = "_")
#> # A tibble: 3 x 5
#>      ID group DateRangeEnd DateRangeStart Value
#>   <int> <chr> <date>       <date>         <dbl>
#> 1     1 1     1990-01-03   1990-01-01       4.4
#> 2     1 2     1991-07-06   1991-05-04       6.2
#> 3     1 3     1996-06-06   1995-05-05       3.3

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
    build_longer_spec(cols = -ID) %>%
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

^{Created on 2019-03-26 by the reprex package (v0.2.1)}

See also: https://tidyr.tidyverse.org/articles/pivot.html

answered Oct 13 '22 23:10

hplieninger

Related questions
                            
                                Export a list into a CSV or TXT file in R
                            
                                Numeric comparison difficulty in R
                            
                                Calculate correlation with cor(), only for numerical columns
                            
                                How to control number of minor grid lines in ggplot2?
                            
                                Range standardization (0 to 1) in R [duplicate]
                            
                                Use superscripts in R axis labels
                            
                                mean() warning: argument is not numeric or logical: returning NA
                            
                                Learning R. Where does one Start? [closed]
                            
                                Find closest value in a vector with binary search
                            
                                Plotting time-series with Date labels on x-axis
                            
                                Calculate cumulative sum (cumsum) by group
                            
                                Unlist a list of dataframes
                            
                                How to hold figure position with figure caption in pdf output of knitr?
                            
                                list output truncated - How to expand listed variables with str() in R
                            
                                Loading .RData files into Python
                            
                                What is the practical difference between data.frame and data.table in R [duplicate]
                            
                                How can I force a line break in rmarkdown's title?
                            
                                What ways are there to edit a function in R?
                            
                                R: Print list to a text file
                            
                                Error: XML Content does not seem to be XML | R 3.1.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With