I have a data frame that looks something like this:
participant | Sex | Age | interval | reproduction | condition |
---|---|---|---|---|---|
22014 | Female | 18 | NA | NA | NA |
22014 | Female | 18 | 1.536131 | NA | NA |
22014 | Female | 18 | NA | NA | NA |
22014 | Female | 18 | 1.416826 | NA | NA |
22014 | Female | 18 | NA | NA | NA |
22014 | Female | 18 | 1.549845 | NA | NA |
22014 | Female | 18 | NA | NA | NA |
22014 | Female | 18 | 1.542681 | NA | NA |
22014 | Female | 18 | NA | NA | NA |
22014 | Female | 18 | 1.265929 | NA | NA |
22014 | Female | 18 | NA | 1.2531 | NA |
22014 | Female | 18 | NA | 1.2507 | NA |
22014 | Female | 18 | NA | 1.7841 | NA |
22014 | Female | 18 | NA | 1.3536 | NA |
22014 | Female | 18 | NA | 0.8031 | NA |
22014 | Female | 18 | NA | NA | Non-Causal |
etc.
I need to do 3 things:
'backfill' the values in 'condition' upwards so that every cell in 'condition' upwards from a valid entry (here Non-Causal) is filled with that valid entry.
match the 5 entries in 'reproduction' with the 5 entries in 'interval' in corresponding order, i.e. so that 1.2531 is moved up to be next to 1.536131, and 1.2507 with 1.416826 etc
get rid of the NA rows so that in the end there are only 5 rows left, with valid entries in each of the columns
Any hints on how to tackle this? The actual dataframe is much longer, and 'condition' takes on different values; there will always be 5 entries, though ,per condition, and they should have matched interval and reproduction entries
You can group and summarize:
library(dplyr)
dat %>%
group_by(participant, Sex, Age) %>%
summarize(across(c(interval, reproduction, condition), ~ .[!is.na(.)])) %>%
ungroup()
# # A tibble: 5 x 6
# participant Sex Age interval reproduction condition
# <int> <chr> <int> <dbl> <dbl> <chr>
# 1 22014 Female 18 1.54 1.25 Non-Causal
# 2 22014 Female 18 1.42 1.25 Non-Causal
# 3 22014 Female 18 1.55 1.78 Non-Causal
# 4 22014 Female 18 1.54 1.35 Non-Causal
# 5 22014 Female 18 1.27 0.803 Non-Causal
(This will glitch if the number of non-NA
in condition
is other than 1
, or if the number of non-NA
in the other columns is not the same.)
You can so most of the work with dplyr
and tidyr
. For example if your data is in a data.frame named dd
,
library(dplyr)
library(tidyr)
dd %>%
group_by(participant, Sex, Age) %>%
fill(condition, .direction="up") %>%
summarize(across(everything(), ~head(na.omit(.x), 5)))
We use tidyr::fill
to back fill the condition, then use use dplyr::summarize()
to keep only the first 5 non-NA for all the columns that are not use for grouping the rows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With