I can't find an exact answer to this problem, so I hope I'm not duplicating a question.
I have a dataframe as follows
groupid col1 col2 col3 col4
1 0 n NA 2
1 NA NA 2 2
What I'm trying to convey with this is that there are duplicate IDs where the total information is spread across both rows and I want to combine these rows to get all the information into one row. How do I go about this?
I've tried to play around with group_by and paste but that ends up making the data messier (getting 22 instead of 2 in col4 for example) and sum() does not work because some columns are strings and those that are not are categorical variables and summing them would change the information.
Is there something I can do to collapse the rows and leave consistent data unchanged while filling in NAs?
EDIT:
Sorry desired output is as follows:
groupid col1 col2 col3 col4
1 0 n 2 2
Is this what you want ? zoo
+dplyr
also check the link here
df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())
# A tibble: 1 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n 2 2
EDIT1
without the filter , will give back whole dataframe.
df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))
# A tibble: 2 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n NA 2
2 1 0 n 2 2
filter
here, just slice the last one, na.locf
will carry on the previous not NA
value, which mean the last row in your group is what you want.
Also base on @ thelatemail recommended. you can do the following , give back the same answer.
df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))
EDIT2
Assuming you have conflict and you want to show them all.
df <- read.table(text="groupid col1 col2 col3 col4
1 0 n NA 2
1 1 NA 2 2",
header=TRUE,stringsAsFactors=FALSE)
df
groupid col1 col2 col3 col4
1 1 0 n NA 2
2 1 1(#)<NA> 2 2(#)
df %>%
group_by(groupid) %>%
summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
groupid col1 col2 col3 col4
<int> <chr> <chr> <chr> <chr>
1 1 0, 1 n 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With