I was wondering if there is an option for the CJ() method in data.table to take vectors formed by an evaluated condition instead of running the full cross join.
Data
library(data.table)
df<-data.table(
ID=c(18L, 18L, 18L, 46L, 74L, 74L, 165L, 165L),
cat=c(1300L, 1320L, 1325L, 1300L, 1300L, 1325L, 1300L, 1325L),
low=c(24.625, 16.250, 14.500, 43.625, 58.250, 45.375, 90.750, 77.875),
high=c(26.625, 17.500, 15.500, 45.625, 60.000, 47.375, 92.750, 79.875)
)
df
ID cat low high
1: 18 1300 24.625 26.625
2: 18 1320 16.250 17.500
3: 18 1325 14.500 15.500
4: 46 1300 43.625 45.625
5: 74 1300 58.250 60.000
6: 74 1325 45.375 47.375
7: 165 1300 90.750 92.750
8: 165 1325 77.875 79.875
Here, I have a total of 8 observations of 4 different items (IDs 18, 46, 74 and 165). Each item is recorded in several categories (cat 1300, 1320, 1325) and two measurements are taken (low and high).
Desired Output
I now want to create a table that for each item (ID) joins the low value of each category (cat) with all high values of the categories that are larger via cross join. My desired output thus looks like
ID cat cat_large low high
1: 18 1300 1320 24.625 17.500
2: 18 1300 1325 24.625 15.500
3: 18 1320 1325 16.250 15.500
4: 74 1300 1325 58.250 47.375
5: 165 1300 1325 90.750 79.875
where I added cat_high to indicate which two categories are being joined in low/high.
Unfortunately, I cannot find the correct way to amend my full cross join, df[,CJ(low=low,high=high),by=.(ID)]
, to behave like this. I am grateful for any help/hints.
I think you can do this via the .EACHI
in a non-equi join. You can then use the i.
prefix to select which table to take the output variable from:
df[, c(.SD,.(larger_cat=cat))][
df, on=.(ID==ID, cat > cat), .(larger_cat, low=i.low, high), by=.EACHI, nomatch=0
]
# ID cat larger_cat low high
#1: 18 1300 1320 24.625 17.500
#2: 18 1300 1325 24.625 15.500
#3: 18 1320 1325 16.250 15.500
#4: 74 1300 1325 58.250 47.375
#5: 165 1300 1325 90.750 79.875
One way:
df[, c(
CJ(cat = cat, lcat = cat, sorted = FALSE),
CJ(low = low, high = high, sorted = FALSE)
), by=ID][lcat > cat]
ID cat lcat low high
1: 18 1300 1320 24.625 17.500
2: 18 1300 1325 24.625 15.500
3: 18 1320 1325 16.250 15.500
4: 74 1300 1325 58.250 47.375
5: 165 1300 1325 90.750 79.875
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With