Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve R code, getting numbers with regular expressions

Tags:

r

I want to plot the ping times to a specific server and therefore i am parsing the output of this String:

[1] "PING google.de (216.58.213.195): 56 data bytes"                
[2] "64 bytes from 216.58.213.195: icmp_seq=0 ttl=58 time=15.583 ms"
[3] "64 bytes from 216.58.213.195: icmp_seq=1 ttl=58 time=11.057 ms"
[4] "64 bytes from 216.58.213.195: icmp_seq=2 ttl=58 time=10.866 ms"
[5] ""                                                              
[6] "--- google.de ping statistics ---"                             
[7] "3 packets transmitted, 3 packets received, 0.0% packet loss"   
[8] "round-trip min/avg/max/stddev = 10.866/12.502/15.583/2.180 ms" 

I am using regular expressions and searching for 'time=' and ' ms' to get the position where the time information begins and ends. Then i am using Substr to extract the number. Actually it works, but that is my very first attempt to parse a string in R, my feeling is that my solution could be more elegant. Could you help me? ty

X <- system("ping -c 3 google.de",intern=TRUE)
start<-regexpr("time=",X)
end<-regexpr(" ms",X)
start<-start+5
end<-end-1

erg<-substr(X,start,end)
erg<-erg[2:4]
erg

erg<-as.numeric(erg)

hist(erg)
like image 1000
UDE_Student Avatar asked Nov 03 '15 18:11

UDE_Student


2 Answers

We can use str_extract to extract the numbers

library(stringr)
na.omit(as.numeric(str_extract(X, '(?<=time=)[0-9.]+(?=\\s*ms)')))
#[1] 15.583 11.057 10.866

data

X <- c("PING google.de (216.58.213.195): 56 data bytes", 
"64 bytes from 216.58.213.195: icmp_seq=0 ttl=58 time=15.583 ms", 
"64 bytes from 216.58.213.195: icmp_seq=1 ttl=58 time=11.057 ms", 
"64 bytes from 216.58.213.195: icmp_seq=2 ttl=58 time=10.866 ms", 
"", "--- google.de ping statistics ---", 
"3 packets transmitted, 3 packets received, 0.0% packet loss", 
"round-trip min/avg/max/stddev = 10.866/12.502/15.583/2.180 ms")
like image 126
akrun Avatar answered Sep 22 '22 02:09

akrun


With your current vector X, you could try grep() to get the relevant lines, then gsub() to get the times. The numbers below will differ from yours because I ran the first line of your code to assign X.

tms <- grep("time=", X, fixed = TRUE, value = TRUE)
as.numeric(gsub(".*time=(\\d+.?\\d+).*", "\\1", tms))
# [1]  19.7  21.3 162.0

However, since you are already getting the ping data via a system() call, you could try doing the rest of work from the command line as well.

X <- as.numeric(system(
    "ping -c 3 google.de | grep time= | cut -d '=' -f 4 | cut -d ' ' -f 1", 
    intern = TRUE
))
X
# [1] 29.2 17.8 23.8

Or you could use awk instead of having two cut calls.

as.numeric(system(
    "ping -c 3 -n google.de | grep time= | awk -F '=| ' '{ print $10 }'",
    intern = TRUE
))
# [1] 23.4 19.6 29.3

Another option would be sed, but I will leave than one to you.

like image 32
Rich Scriven Avatar answered Sep 20 '22 02:09

Rich Scriven