Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to find integers and decimals in string

I have a string like:

$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;

I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).

I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.

Thanks for your help!

like image 580
HWD Avatar asked Dec 07 '22 13:12

HWD


2 Answers

If you just want to grab the data, you can just use a loose regex:

([\d.]+)\s+(\S+)
  • ([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.

  • Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).

You can get the number in the first capturing group, and the unit in the 2nd capturing group.

You can be a bit stricter on the number:

(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
  • The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
  • \d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
  • \.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.

This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.

like image 51
nhahtdh Avatar answered Dec 24 '22 16:12

nhahtdh


use this regular expression \b\d+([\.,]\d+)?

like image 23
burning_LEGION Avatar answered Dec 24 '22 16:12

burning_LEGION