I am trying to spread() a couple of key/value pairs but the common value column does not collapse. I think that it may have to do with some previous processing, or more likely I do not know the right way to spread two or more key/value pairs to get the result I expect.
I'm starting with this data set:
library(tidyverse)
df <- tibble(order = 1:7,
line_1 = c(23,8,21,45,68,31,24),
line_2 = c(63,25,25,24,48,24,63),
line_3 = c(62,12,10,56,67,25,35))
There are 2 pre-spread steps to define order of the "count" values created in the following gather() function. This is the first pre-spread step to define the original order of the "count" variable using the row number:
ntrl <- df %>%
gather(line_1,
line_2,
line_3,
key = "sector",
value = "count") %>%
group_by(order) %>%
mutate(sector_ord = row_number()) %>%
arrange(order,
sector)
This is the second pre-spread step to define the numerical order of the "count" variable:
ord <- ntrl %>%
arrange(order,
count) %>%
group_by(order) %>%
mutate(num_ord = paste0("ord_",
row_number(),
sep=""))
And then finally the spread code that I have been using:
wide <- ord %>%
group_by(order) %>%
spread(key = sector,
value = count) %>%
spread(key = num_ord,
value = sector_ord)
What I'm getting is this:
order line_1 line_2 line_3 ord_1 ord_2 ord_3
1 1 23 NA NA 1 NA NA
2 1 NA 63 NA NA NA 2
3 1 NA NA 62 NA 3 NA
4 2 8 NA NA 1 NA NA
5 2 NA 25 NA NA NA 2
6 2 NA NA 12 NA 3 NA
7 3 21 NA NA NA 1 NA
8 3 NA 25 NA NA NA 2
9 3 NA NA 10 3 NA NA
... and so on thru 21 lines accounting for all 7 "order" lines
The behavior that I am expecting is that the "order" column would collapse in all rows that are the same "order" value to give the following:
order line_1 line_2 line_3 ord_1 ord_2 ord_3
1 1 23 63 62 1 3 2
2 2 8 25 12 1 3 2
3 3 21 25 10 2 3 1
4 4 45 24 56 2 1 3
... and so on, I think that paints the picture
I have reviewed the questions and answers about spreading with duplicate identifiers and the use of the index of row numbers but that does not help.
I figure that it has something to do with the double spreading, but I cannot figure out how to do that.
Thanks for your help.
A solution using tidyverse
starting your df
. The key is to use summarise_all(funs(.[which(!is.na(.))]))
to select the only non-NA value for each column.
library(tidyverse)
df2 <- df %>%
gather(Lines, Value, -order) %>%
group_by(order) %>%
mutate(Rank = dense_rank(Value),
RankOrder = paste0("ord_", row_number())) %>%
spread(Lines, Value) %>%
spread(RankOrder, Rank) %>%
summarise_all(funs(.[which(!is.na(.))]))
df2
# A tibble: 7 x 7
order line_1 line_2 line_3 ord_1 ord_2 ord_3
<int> <dbl> <dbl> <dbl> <int> <int> <int>
1 1 23 63 62 1 3 2
2 2 8 25 12 1 3 2
3 3 21 25 10 2 3 1
4 4 45 24 56 2 1 3
5 5 68 48 67 3 1 2
6 6 31 24 25 3 1 2
7 7 24 63 35 1 3 2
Starting from df
:
df %>%
gather(headers, line, -order) %>%
separate(headers, into = c('dummy', 'rn')) %>%
select(-dummy) %>%
group_by(order) %>%
mutate(ord = rank(line, ties.method='first')) %>%
{data.table::dcast(setDT(.), order ~ rn, value.var = c("line", "ord"))}
# order line_1 line_2 line_3 ord_1 ord_2 ord_3
#1: 1 23 63 62 1 3 2
#2: 2 8 25 12 1 3 2
#3: 3 21 25 10 2 3 1
#4: 4 45 24 56 2 1 3
#5: 5 68 48 67 3 1 2
#6: 6 31 24 25 3 1 2
#7: 7 24 63 35 1 3 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With