Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl: Is this good or bad regex, and how to improve it?

Tags:

regex

perl

I'm trying to capture the temperature output of sensors, for which I have the following relevant lines:

temp1:       +39.5 C  (crit = +105.0 C)
Core 0:      +40.0 C  (high = +100.0 C, crit = +100.0 C)
Core 1:      +40.0 C  (high = +100.0 C, crit = +100.0 C)

I only need the first temperature of each line (39.5, 40.0, 40.0). The issue of course is that I can't really on the word number since there's an extra space in "Core 0" / "Core 1".

I've come up with the following regex which does work, however I've been told that use of .* is a somewhat lazy and dirty approach to regex.

$core_data =~ s/^.*\+(.*)C\ .*$/$1/g;

I was wondering, is there a tighter or better way to accomplish this or am I doing OK?

like image 493
DanH Avatar asked May 29 '11 09:05

DanH


5 Answers

A more concise regex

/\+(\d+\.?\d*) C/

this will match the first temperature with optional decimal value.

#!/usr/bin/perl
use strict;
use warnings;

my $re = qr{\+(\d+\.?\d*) C};
while (my $line = <DATA>){
    $line =~$re and print $1,"\n";
}
__DATA__
temp1:       +39.5 C  (crit = +105.0 C)
Core 0:      +40.0 C  (high = +100.0 C, crit = +100.0 C)
Core 1:      +40.0 C  (high = +100.0 C, crit = +100.0 C)

output:

39.5
40.0
40.0
like image 60
Toto Avatar answered Nov 15 '22 06:11

Toto


A more precise regex

 $core_data =~ s/^.*\+([\d.]+ )C\ .*$/$1/g;

But probably the following is enough because only the numeical value seems to be interesting.

 $cpu_head = $1 if m/:\s*\+([\d.]+) C/;

Note : \s stands for any space and \d for any digit.

like image 23
VGE Avatar answered Nov 15 '22 07:11

VGE


I don't understand why you're doing a search and replace with your regex (s///g), if you're just trying to capture the first temperature. Your regex appears to rely on .* being greedy. Assuming you can rely on the name: temp C (... format, this regex will work without having to match the whole string:

$core_data =~ m/^(?:\w*\b)*:\s*(\+?\d+\.\d+)/;

... or to capture without the + in front:

$core_data =~ m/^(?:\w*\b)*:\s*\+?(\d+\.\d+)/;
like image 21
CrystaLight Avatar answered Nov 15 '22 07:11

CrystaLight


IMHO, .* is perfectly fine when it makes sense, although when when you can narrow it down to something more specific, then all the better.

In your case, you could say

S/^[^+]+\+([0-9.]) C.*$/$1/g

In this regex, I focus on what I am looking for and characterize the temperature as a sequence of digits with a point somewhere while the rest is just not relevant to me. Since you have two temperatures in each line, and you only want the first one, I used [^+] at the beginning, which matches everything that is not a +, so it will stop right where the first temperature starts. once I got the temperature, I gobble everything out using .* up to the end of the line.

This is just an example of reasoning, it does not pretend to be the best regex you can come up with to solve your problem.

like image 26
sergio Avatar answered Nov 15 '22 05:11

sergio


This looks more suited to a split than a regex. split will clear all the unnecessary whitespace automatically, and you do not need to plan ahead for changes in the data.

my $tag;
($tag, $core_data) = split (/:/, $core_data);
my @fields = split (/\s/, $core_data);
my $temp   = $fields[0];

That will store the strings "+39.5", and "+40.0" in the different example lines, which can be converted to a number automagically, I believe.

Also, you will have easy access to the label of the line in $tag.

If you want, you can chop off the added info inside the parentheses with a regex:

if ($core_data =~ s/\(([^\)]*)\)//) {
    my $tmp = $1;
    $tmp =~ s/[\s\+C]//g; # clear away junk
    %data = split (/=/, (split (/,/, $tmp)));
}
for my $key (keys %data) {
    printf "%-7s = %s\n", $key, $data{$key};
}
like image 45
TLP Avatar answered Nov 15 '22 06:11

TLP