Say I have these two data frames:
big.table <- data.frame("idx" = 1:100)
small.table <- data.frame("idx" = sample(1:100, 10), "color" = sample(colors(),10))
I want to merge them together like this:
merge(small.table, big.table, by = "idx", all.y=TRUE)
idx color
1 1 <NA>
2 2 <NA>
3 3 salmon2
4 4 <NA>
5 5 <NA>
6 6 <NA>
...
20 20 <NA>
21 21 <NA>
22 22 blue4
23 23 grey99
24 24 <NA>
25 25 <NA>
26 26 <NA>
...
Now I need to fill the values in the 'color' column down the table so that all the NAs are set to values that come before in the table.
NOTES: The problem involves a log file generated from a computer program, not in any standard log format. Blocks of lines in this log file belong to a 'process' that is identified in the first line of the block. I've pulled out information in the relevant lines of the log file, most of which belong to a process, and created a data table containing that information (the line number, time stamp, etc.). Now I need to fill into this table the 'process' names that correspond to each line from a small.table which has a line number.
There might not be a 'process' (color in the example above) for the lines at the top of the big.table. Those lines should remain NA.
Once the first 'process' starts, every line between that process start line and the next belongs to the first process. When the second process starts, every line between that process start line and the next process start line belongs to the second process. And so on. The process lines are never the same line number as the other lines that I've collected into my log file data frame.
My plan is to create the big.table to be a sequence of all log line numbers and merge the small table to it. Then I can "fill down" the process name and merge the big table to the log file keeping only the log file with everything joined to it.
I'm open to other approaches.
It sounds like you need na.locf
from the package zoo (stands for last observation carried forward):
library(zoo)
tbl <- merge(small.table, big.table, by = "idx", all.y=TRUE)
tbl$color2 <- na.locf(tbl$color,na.rm = FALSE)
A data.table
solution:
require(data.table)
b <- data.table(big.table, key="idx")
s <- data.table(small.table, key="idx")
s[b, roll=T]
# idx color
# 1: 1 NA
# 2: 2 NA
# 3: 3 NA
# 4: 4 blue3
# 5: 5 blue3
# 6: 6 blue3
# 7: 7 blue3
# 8: 8 blue3
# 9: 9 blue3
# 10: 10 blue3
# 11: 11 navajowhite1
# 12: 12 navajowhite1
# . . . .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With