Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table: (dynamic) forward looking Cross-Joins

I was wondering if there is an option for the CJ() method in data.table to take vectors formed by an evaluated condition instead of running the full cross join.

Data

library(data.table)
df<-data.table(
  ID=c(18L,  18L,  18L,  46L,  74L,  74L, 165L, 165L), 
  cat=c(1300L, 1320L, 1325L, 1300L, 1300L, 1325L, 1300L, 1325L),
  low=c(24.625, 16.250, 14.500, 43.625, 58.250, 45.375, 90.750, 77.875),
  high=c(26.625, 17.500, 15.500, 45.625, 60.000, 47.375, 92.750, 79.875)
  )

df
    ID  cat    low   high
1:  18 1300 24.625 26.625
2:  18 1320 16.250 17.500
3:  18 1325 14.500 15.500
4:  46 1300 43.625 45.625
5:  74 1300 58.250 60.000
6:  74 1325 45.375 47.375
7: 165 1300 90.750 92.750
8: 165 1325 77.875 79.875

Here, I have a total of 8 observations of 4 different items (IDs 18, 46, 74 and 165). Each item is recorded in several categories (cat 1300, 1320, 1325) and two measurements are taken (low and high).

Desired Output

I now want to create a table that for each item (ID) joins the low value of each category (cat) with all high values of the categories that are larger via cross join. My desired output thus looks like

    ID  cat  cat_large    low   high
1:  18 1300      1320  24.625 17.500
2:  18 1300      1325  24.625 15.500
3:  18 1320      1325  16.250 15.500
4:  74 1300      1325  58.250 47.375
5: 165 1300      1325  90.750 79.875

where I added cat_high to indicate which two categories are being joined in low/high.

Unfortunately, I cannot find the correct way to amend my full cross join, df[,CJ(low=low,high=high),by=.(ID)], to behave like this. I am grateful for any help/hints.

like image 646
Daedalus Avatar asked Dec 11 '22 09:12

Daedalus


2 Answers

I think you can do this via the .EACHI in a non-equi join. You can then use the i. prefix to select which table to take the output variable from:

df[, c(.SD,.(larger_cat=cat))][
  df, on=.(ID==ID, cat > cat), .(larger_cat, low=i.low, high), by=.EACHI, nomatch=0
]

#    ID  cat larger_cat    low   high
#1:  18 1300       1320 24.625 17.500
#2:  18 1300       1325 24.625 15.500
#3:  18 1320       1325 16.250 15.500
#4:  74 1300       1325 58.250 47.375
#5: 165 1300       1325 90.750 79.875
like image 159
thelatemail Avatar answered Dec 28 '22 22:12

thelatemail


One way:

df[, c(
  CJ(cat = cat, lcat = cat, sorted = FALSE),
  CJ(low = low, high = high, sorted = FALSE)  
), by=ID][lcat > cat]

    ID  cat lcat    low   high
1:  18 1300 1320 24.625 17.500
2:  18 1300 1325 24.625 15.500
3:  18 1320 1325 16.250 15.500
4:  74 1300 1325 58.250 47.375
5: 165 1300 1325 90.750 79.875
like image 26
Frank Avatar answered Dec 28 '22 23:12

Frank