Spread with data.frame/tibble with duplicate identifiers

Question

The documentation for tidyr suggests that gather and spread are transitive, but the following example with the "iris" data shows they are not, but it is not clear why. Any clarification would be greatly appreciated

iris.df = as.data.frame(iris)
long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species)
w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species)

I expected the data frame "w.iris.df" to be the same as "iris.df" but received the following error instead:

"Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9..."

My general question is how to reverse an application of "gather" on this sort of dataset.

Amit Kohli · Accepted Answer

Hadley's intervention was unsurprisingly perfect... but I ended up mucking with the syntax a bit after that... so for what it's worth, I post the fully operational code (sorry my syntax is a bit different than above):

library(tidyr)
library(dplyr)

wide <- 
  iris %>%
  mutate(row = row_number()) %>%
  gather(vars, val, -Species, -row) %>%
  spread(vars, val)

head(wide)
#   Species row Petal.Length Petal.Width Sepal.Length Sepal.Width
# 1  setosa   1          1.4         0.2          5.1         3.5
# 2  setosa   2          1.4         0.2          4.9         3.0
# 3  setosa   3          1.3         0.2          4.7         3.2
# 4  setosa   4          1.5         0.2          4.6         3.1
# 5  setosa   5          1.4         0.2          5.0         3.6
# 6  setosa   6          1.7         0.4          5.4         3.9

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

They are the same.... just need to reorder if u feel like it...

wide <- wide[,c(3, 4, 5, 6, 1)]  ## Reorder and then remove "row" column

and done.

Spread with data.frame/tibble with duplicate identifiers

Tags:

r

tidyr

John D Lee

1 Answers

Amit Kohli

Recent Activity

Donate For Us

Spread with data.frame/tibble with duplicate identifiers

Tags:

r

tidyr

John D Lee

1 Answers

Amit Kohli

Related questions

Recent Activity

Donate For Us