Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spread with duplicate identifiers for rows [duplicate]

There has been questions on this topic before here, but I am still struggling with spreading this. I would like so each state to have its own column of temperatures values.

Here is a dput() of my data. I'll call it df

structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20", 
"2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18", 
"2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15", 
"2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12", 
"2018-01-11", "2018-01-11", "2018-01-10", "2018-01-10", "2018-01-09", 
"2018-01-09", "2018-01-08", "2018-01-08", "2018-01-07", "2018-01-07", 
"2018-01-06", "2018-01-06", "2018-01-05", "2018-01-05", "2018-01-04", 
"2018-01-04", "2018-01-03", "2018-01-03", "2018-01-03", "2018-01-03", 
"2018-01-02", "2018-01-02"), tmin = c(24, 31, 31, 29, 44, 17, 
32, 7, 31, 7, 31, 6, 30, 13, 30, 1, 43, 20, 33, 52, 42, 29, 30, 
29, 26, 32, 33, -2, 29, 0, 23, 3, 19, 11, NA, -3, 22, -3, 24, 
-4), state = c("UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", 
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", 
"OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", 
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH")), class = "data.frame", row.names = c(NA, 
-40L), .Names = c("date", "tmin", "state"))

The code I run is

df %>% spread(state,tmin)

which I expected to give me the following format

date UT  OH
... ... ...

but I get the error message

Error: Duplicate identifiers for rows (36, 38), (35, 37)

I have tried a few different things. One thing I tried was grouping by date. I was thinking that rows of the same date were causing a problem for spread. I also tried making a new row with add_rownames() then using spread(state,tmin), but that also failed to solve the issue.

like image 511
Alex Avatar asked Jan 22 '18 14:01

Alex


1 Answers

In order for spread to work as intended, the resulting data frame must have uniquely identified rows and columns. In the case of your data, the "date" column is the only unique identifier after spreading. However, rows 36 and 38 are identical:

         date tmin state
36 2018-01-03   -3    OH
38 2018-01-03   -3    OH

This puts tidyr in the impossible position of trying to resolve two data points to the same row and column. In addition, rows 35 and 37 both have the same date and state, once again creating the impossible situation of placing two different values into the same position in the new data frame:

         date tmin state
35 2018-01-03   NA    UT
37 2018-01-03   22    UT

The following data cleanup will make spreading possible:

df %>% 
  filter(!is.na(tmin)) %>% # remove NA values
  unique %>% # remove duplicated rows
  spread(state, tmin)

         date OH UT
1  2018-01-02 -4 24
2  2018-01-03 -3 22
3  2018-01-04 11 19
4  2018-01-05  3 23
5  2018-01-06  0 29
...
like image 163
jdobres Avatar answered Oct 22 '22 20:10

jdobres