I have a data.table DT
with the current (F0YR
) and the next (F1YR
) fiscal year-end (FYE) encoded as integers. Since every next FYE will eventually become
a current FYE, the integer will be both in the column F1YR
and F0YR
. Also, my data contains monthly observations so the same FYE will be in the data set
multiple times:
library(data.table)
DT <- data.table(ID = rep(c("A", "B"), each=9),
MONTH = rep(100L:108L, times=2),
F0YR = rep(c(1L, 4L, 7L), each=3, times=2),
F1YR = rep(c(4L, 7L, 9L), each=3, times=2),
value = c(rep(1:5, each=3), 6, 6, 7),
key = "ID,F0YR")
DT
ID MONTH F0YR F1YR value
[1,] A 100 1 4 1
[2,] A 101 1 4 1
[3,] A 102 1 4 1
[4,] A 103 4 7 2
[5,] A 104 4 7 2
[6,] A 105 4 7 2
[7,] A 106 7 9 3
[8,] A 107 7 9 3
[9,] A 108 7 9 3
[10,] B 100 1 4 4
[11,] B 101 1 4 4
...
For every ID
and F1YR
combination, I want to get the value for the ID
and F0YR
combination. As an example: Company A had a value of 2
for FOYR==4
. Now,
I want an additional column for all combinations with ID=="A"
and F1YR==4
which is set to 2, next to the already existent value of 1.
intDT <- DT[CJ(unique(ID), unique(F0YR)), list(ID, F0YR, valueNew = value), mult="last"]
setkey(intDT, ID, F0YR)
setkey(DT, ID, F1YR)
DT <- intDT[DT]
setnames(DT, c("F0YR.1", "F0YR"), c("F0YR", "F1YR"))
DT
ID F1YR valueNew MONTH F0YR value
[1,] A 4 2 100 1 1
[2,] A 4 2 101 1 1
[3,] A 4 2 102 1 1
[4,] A 7 3 103 4 2
[5,] A 7 3 104 4 2
[6,] A 7 3 105 4 2
[7,] A 9 NA 106 7 3
[8,] A 9 NA 107 7 3
[9,] A 9 NA 108 7 3
[10,] B 4 5 100 1 4
[11,] B 4 5 101 1 4
...
(Note that I use mult="last"
here because, although the values should only change with F0YR or F1YR changes, sometimes they don't and this is just my
tie breaker).
This looks improvable. First of all, I have to make a copy of my DT. Second, since I join basically the same data.table
, all the column names have the same name
and I have to rename them. I thought that a self join
would be the way forward, but I tried and tried and couldn't get a nice solution. I have the hope
that there is something easy out there which I just don't see...Does anyone have a clue? Or is my data set up in such a way that it is actually hard
(maybe because I have monthly observations, but want to join only quarterly or yearly changing values).
Merge cellsClick the first cell and press Shift while you click the last cell in the range you want to merge. Important: Make sure only one of the cells in the range has data. Click Home > Merge & Center.
In use cases like this, the mantra "aggregate first, then join with that" often helps. So, starting with your DT
, and using v1.8.1 :
> agg = DT[,last(value),by=list(ID,F0YR)]
> agg
ID F0YR V1
1: A 1 1
2: A 4 2
3: A 7 3
4: B 1 4
5: B 4 5
6: B 7 7
I called it agg
because I couldn't think of a better name. In this case you wanted last
which isn't really an aggregate as such, but you know what I mean.
Then update DT
by reference by group. Here we're grouping by i
.
setkey(DT,ID,F1YR)
DT[agg,newcol:=V1]
ID MONTH F0YR F1YR value newcol
1: A 100 1 4 1 2
2: A 101 1 4 1 2
3: A 102 1 4 1 2
4: A 103 4 7 2 3
5: A 104 4 7 2 3
6: A 105 4 7 2 3
7: A 106 7 9 3 NA
8: A 107 7 9 3 NA
9: A 108 7 9 3 NA
10: B 100 1 4 4 5
11: B 101 1 4 4 5
12: B 102 1 4 4 5
13: B 103 4 7 5 7
14: B 104 4 7 5 7
15: B 105 4 7 5 7
16: B 106 7 9 6 NA
17: B 107 7 9 6 NA
18: B 108 7 9 7 NA
Is that right? Not sure I fully followed. Those ops should be very fast, without any copies, and should scale to large data. At least, that's the intention.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With