Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivoting/joining/expanding dataframes based on condition in R

I am struggling with the following issue, respectively I am able to make this work with looping over each row of liveDataset and call custom function on the go - but me real dataset is quite huge so I need to do it in "more R way" to avoid looping.

I have tried multiple approaches founded here or somewhere else, still was not able to do what I need.

Here is the data

liveDataset <- 
          data.frame(
            Value1 = 
              c(
                100, 110, 115, 108, 90, 95
              ),
            Value2 = 
              c(
                105, 108, 118, 103, 94, 104
              ),
            PredValues = 
              c(
                0.7, 1.6, 1.3, 0.1, 0.4, -0.6
              )
          )

controlDataset <- 
          data.frame(
            Value1 = 
              c(
                98, 111, 114, 109, 85, 95
              ),
            Value2 = 
              c(
                100, 108, 115, 103, 90, 104
              ),
            PredValues = 
              c(
                1, 2, 1, 0.5, -2, -0.3
              )
          )


desiredOutput <- 
          data.frame(
            GroupId =
              c(
                1, 1, 1, 1,
                2, 2, 2, 2, 2,
                3, 3, 3, 3, 3, 3, 3,
                4, 4, 4,
                5, 5,
                6, 6
              ),
            Value1 = 
              c(
                100, 98, 85, 95,
                110, 98, 109, 85, 95,
                115, 98, 111, 114, 109, 85, 95,
                108, 98, 85, 
                90, 85,
                95, 85
                
              ),
            Value2 = 
              c(
                100, 103, 90, 104,
                108, 100, 103, 90, 104,
                118, 100, 108, 115, 103, 90, 104,
                103, 100, 90, 
                94, 90,
                104, 90 
                
              ),
            PredValues =
              c(
                0.7, 1, -2, -0.3,
                1.6, 1, 0.5, -2, -0.3,
                1.3, 1, 2, 1, 0.5, -2, -0.3,
                0.1, 1, -2,
                0.4, -2,
                -0.6, -2
                
              )
          )

This is just handmade example and there is my explanation/description of the issue.

I am working with liveDataset and for each row I need to subset controlDataset and add this subset bellow the liveDataset.

The condition applied in this handmade example is like

controlDataset %>%
   filter(
     Value1 < liveDataset$Value1 & Value2 < liveDataset$Value2
     )

What am I struggling with all approaches I have tested is that does not work for each row separately (rowwise didnt help or dunno how to use it properly).

Another issue is with desired GroupId because I really cannot imagine now how to handle this without looping.

Any ideas?

PS: I need to have the original row from liveDataset as first within the group (or give it another tag/id to be able define what was originally in liveDataset and what was not).

Thank you in advance, any help really appreciated.

Otto

like image 479
Bury Avatar asked Apr 14 '26 19:04

Bury


1 Answers

library(tidyverse)


map_dfr(1:nrow(liveDataset), \(x) bind_rows(liveDataset |> slice(x),
                                            controlDataset |> filter(Value1 < liveDataset$Value1[x] & Value2 < liveDataset$Value2[x])) |> 
          mutate(GroupId = x, .before = 1))

   GroupId Value1 Value2 PredValues
1        1    100    105        0.7
2        1     98    100        1.0
3        1     85     90       -2.0
4        1     95    104       -0.3
5        2    110    108        1.6
6        2     98    100        1.0
7        2    109    103        0.5
8        2     85     90       -2.0
9        2     95    104       -0.3
10       3    115    118        1.3
11       3     98    100        1.0
12       3    111    108        2.0
13       3    114    115        1.0
14       3    109    103        0.5
15       3     85     90       -2.0
16       3     95    104       -0.3
17       4    108    103        0.1
18       4     98    100        1.0
19       4     85     90       -2.0
20       5     90     94        0.4
21       5     85     90       -2.0
22       6     95    104       -0.6
23       6     85     90       -2.0

I return one less row that you did in your desired output, the last one for GroupId 4, but I am not sure that one should be included since 104>103?

like image 142
Lennyy Avatar answered Apr 16 '26 09:04

Lennyy