Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I match-up rows in R data frame

Tags:

dataframe

r

dplyr

I have a data frame that looks something like this:

participant Sex Age interval reproduction condition
22014 Female 18 NA NA NA
22014 Female 18 1.536131 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.416826 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.549845 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.542681 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.265929 NA NA
22014 Female 18 NA 1.2531 NA
22014 Female 18 NA 1.2507 NA
22014 Female 18 NA 1.7841 NA
22014 Female 18 NA 1.3536 NA
22014 Female 18 NA 0.8031 NA
22014 Female 18 NA NA Non-Causal

etc.

I need to do 3 things:

  1. 'backfill' the values in 'condition' upwards so that every cell in 'condition' upwards from a valid entry (here Non-Causal) is filled with that valid entry.

  2. match the 5 entries in 'reproduction' with the 5 entries in 'interval' in corresponding order, i.e. so that 1.2531 is moved up to be next to 1.536131, and 1.2507 with 1.416826 etc

  3. get rid of the NA rows so that in the end there are only 5 rows left, with valid entries in each of the columns

Any hints on how to tackle this? The actual dataframe is much longer, and 'condition' takes on different values; there will always be 5 entries, though ,per condition, and they should have matched interval and reproduction entries

like image 807
Marc Buehner Avatar asked Dec 17 '22 11:12

Marc Buehner


2 Answers

You can group and summarize:

library(dplyr)
dat %>%
  group_by(participant, Sex, Age) %>%
  summarize(across(c(interval, reproduction, condition), ~ .[!is.na(.)])) %>%
  ungroup()
# # A tibble: 5 x 6
#   participant Sex      Age interval reproduction condition 
#         <int> <chr>  <int>    <dbl>        <dbl> <chr>     
# 1       22014 Female    18     1.54        1.25  Non-Causal
# 2       22014 Female    18     1.42        1.25  Non-Causal
# 3       22014 Female    18     1.55        1.78  Non-Causal
# 4       22014 Female    18     1.54        1.35  Non-Causal
# 5       22014 Female    18     1.27        0.803 Non-Causal

(This will glitch if the number of non-NA in condition is other than 1, or if the number of non-NA in the other columns is not the same.)

like image 117
r2evans Avatar answered Jan 15 '23 14:01

r2evans


You can so most of the work with dplyr and tidyr. For example if your data is in a data.frame named dd,

library(dplyr)
library(tidyr)
dd %>% 
  group_by(participant, Sex, Age) %>% 
  fill(condition, .direction="up") %>% 
  summarize(across(everything(), ~head(na.omit(.x), 5)))

We use tidyr::fill to back fill the condition, then use use dplyr::summarize() to keep only the first 5 non-NA for all the columns that are not use for grouping the rows.

like image 21
MrFlick Avatar answered Jan 15 '23 14:01

MrFlick