I have some poorly formatted data that I must work with. It contains two identifiers in the first two rows, followed by the data. The data looks like: <pre class="prettyprint"><code> V1 V2 V3 1 Date 12/16/18 12/17/18 2 Equip a b 3 x1 1 2 4 x2 3 4 5 x3 5 6 </code></pre> I want to <code>gather</code> the data to make it tidy, but gathering only works when you have single column names. I've tried looking at spreading as well. The only solutions I've come up with are very hacky and don't feel right. Is there an elegant way to deal with this? Here's what I want: <pre class="prettyprint"><code> Date Equip metric value 1 12/16/18 a x1 1 2 12/16/18 a x2 3 3 12/16/18 a x3 5 4 12/17/18 b x1 2 5 12/17/18 b x2 4 6 12/17/18 b x3 6 </code></pre> This approach gets me close, but I don't know how to deal with the poor formatting (no header, no row names). It should be easy to <code>gather</code> if the formatting was proper. <pre class="prettyprint"><code>> as.data.frame(t(df)) V1 V2 V3 V4 V5 V1 Date Equip x1 x2 x3 V2 12/16/18 a 1 3 5 V3 12/17/18 b 2 4 6 </code></pre> And here's the <code>dput</code> <pre class="prettyprint"><code>structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c("12/16/18", "a", "1", "3", "5"), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c("V1", "V2", "V3"), row.names = c(NA, -5L)) </code></pre>

Thanks for posting a nicely reproducible question. Here's some gentle <code>tidyr</code>/<code>dplyr</code> massaging. <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyr) df %>% gather(key = measure, value = value, -V1) %>% spread(key = V1, value = value) %>% dplyr::select(-measure) %>% gather(key = metric, value = value, x1:x3) %>% dplyr::arrange(Date, Equip, metric) #> Date Equip metric value #> 1 12/16/18 a x1 1 #> 2 12/16/18 a x2 3 #> 3 12/16/18 a x3 5 #> 4 12/17/18 b x1 2 #> 5 12/17/18 b x2 4 #> 6 12/17/18 b x3 6 </code></pre> Updated for <code>tidyr</code> v1.0.0: This is just a little bit cleaner syntax with the <code>pivot</code> functions. <pre class="prettyprint lang-r prettyprint-override"><code>df %>% pivot_longer(cols = -V1) %>% pivot_wider(names_from = V1) %>% pivot_longer(cols = matches("x\\d"), names_to = "metric") %>% dplyr::select(-name) </code></pre>

gather on first two rows

I have some poorly formatted data that I must work with. It contains two identifiers in the first two rows, followed by the data. The data looks like:

     V1       V2       V3
1  Date 12/16/18 12/17/18
2 Equip        a        b
3    x1        1        2
4    x2        3        4
5    x3        5        6

I want to gather the data to make it tidy, but gathering only works when you have single column names. I've tried looking at spreading as well. The only solutions I've come up with are very hacky and don't feel right. Is there an elegant way to deal with this?

Here's what I want:

      Date Equip metric value
1 12/16/18     a     x1     1
2 12/16/18     a     x2     3
3 12/16/18     a     x3     5
4 12/17/18     b     x1     2
5 12/17/18     b     x2     4
6 12/17/18     b     x3     6

This approach gets me close, but I don't know how to deal with the poor formatting (no header, no row names). It should be easy to gather if the formatting was proper.

> as.data.frame(t(df))
         V1    V2 V3 V4 V5
V1     Date Equip x1 x2 x3
V2 12/16/18     a  1  3  5
V3 12/17/18     b  2  4  6

And here's the dput

structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c("12/16/18", 
"a", "1", "3", "5"), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c("V1", 
"V2", "V3"), row.names = c(NA, -5L))

What does gather () do in R?

A gather () function is used for collecting (gather) multiple columns and converting them into a key-value pair. The column names get duplicated while using the gather (), i.e., the data gets repeated and forms the key-value pairs.

What is the opposite of gather in R?

The function spread() does the reverse of gather(). It takes two columns (key and value) and spreads into multiple columns.

What is Tidyr package in R?

'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

How are the gathering and spreading R functions related?

gather() does the reverse of spread() . gather() collects a set of column names and places them into a single “key” column. It also collects the cells of those columns and places them into a single value column. You can use gather() to tidy table4 .

Thanks for posting a nicely reproducible question. Here's some gentle tidyr/dplyr massaging.

library(tidyr)

df %>%
    gather(key = measure, value = value, -V1) %>%
    spread(key = V1, value = value) %>%
    dplyr::select(-measure) %>%
    gather(key = metric, value = value, x1:x3) %>%
    dplyr::arrange(Date, Equip, metric)
#>       Date Equip metric value
#> 1 12/16/18     a     x1     1
#> 2 12/16/18     a     x2     3
#> 3 12/16/18     a     x3     5
#> 4 12/17/18     b     x1     2
#> 5 12/17/18     b     x2     4
#> 6 12/17/18     b     x3     6

Updated for tidyr v1.0.0:

This is just a little bit cleaner syntax with the pivot functions.

df %>%
  pivot_longer(cols = -V1) %>%
  pivot_wider(names_from = V1) %>%
  pivot_longer(cols = matches("x\\d"), names_to = "metric") %>%
  dplyr::select(-name)

gather on first two rows

Tags:

r

tidyr

reshape2

Lloyd Christmas

People also ask

1 Answers

camille

Recent Activity

Donate For Us

gather on first two rows

Tags:

r

tidyr

reshape2

Lloyd Christmas

People also ask

1 Answers

camille

Related questions

Recent Activity

Donate For Us