I've been doing some logging to try and illustrate to Comcast Business the frequency of their service interruptions at my office. I'm logging ping response times to a file then parsing that file with R. In the log file a value of 1000 means the ping timed out. My script logs the pings every 5 seconds. So if my Comcast service is down for 30 seconds that would result in ~6 log entries with value of 1000. I'd like to parse my logs in such a way that I could create a summary table that showed when each outage started, and how long it lasted. What are some good ways to do this?
Here's some example data from today and some graphs that illustrate my time series:
require(xts)
outFile <- "http://pastebin.com/raw.php?i=SJuMQ9rD"
pingLog <- read.csv(outFile, header=FALSE,
col.names = c("time","ms"),
colClasses=c("POSIXct", "numeric"))
xPingLog <- as.xts(pingLog$ms, order.by=pingLog$time)
outages <- subset(pingLog, ms==1000)
xOutages <- as.xts(outages$ms, order.by=outages$time)
par(mfrow=c(2,1))
plot(xPingLog)
plot(outages)
outages
You've got to love Run length encoding, alias rle
:
offline <- ifelse(pingLog$ms==1000, TRUE, FALSE)
rleOffline <- rle(offline)
offlineTable <- data.frame(
endtime = pingLog$time[cumsum(rleOffline$lengths)],
duration = rleOffline$lengths * 5,
offline = rleOffline$values
)
Results in:
offlineTable
endtime duration offline
1 2011-11-20 13:20:19 1030 FALSE
2 2011-11-20 13:20:35 5 TRUE
3 2011-11-20 13:24:37 240 FALSE
4 2011-11-20 13:25:57 25 TRUE
5 2011-11-20 13:53:28 1640 FALSE
First construct a logical vector that indicates online vs. offline. ifelse
is handy for this.
offline <- ifelse(pingLog$ms==1000, TRUE, FALSE)
Then use rle
to calculate the run length encoding:
rle(offline)
Run Length Encoding
lengths: int [1:5] 206 1 48 5 328
values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
This table tells how how many runs of either TRUE or FALSE occurred, and also how long each run was. In this case, the first run was 206 periods with a value of FALSE (i.e. online for 206*5=1030 seconds.
The final step is to use the rle
information to index against the original pingLog
to find the times. The extra bit of magic is to use cumsum
to calculate the cumulative sum of the run-lengths. The real-world meaning of this is the index position where each run terminated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With