Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting numbers from a String in Java by splitting on a regex

Tags:

java

regex

I want to extract numbers from Strings like this:

String numbers[] = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123,.34".split(PATTERN);

From such String I'd like to extract these numbers:

  • 0.286
  • -3.099
  • -0.44
  • -2.901
  • -0.436
  • 123
  • 0.123
  • .34

That is:

  • There can be garbage characters like "M", "c", "c"
  • The "-" sign is to include in the number, not to split on
  • A "number" can be anything that Float.parseFloat can parse, so .34 is valid

What I have so far:

String PATTERN = "([^\\d.-]+)|(?=-)";

Which works to some degree, but obviously far from perfect:

  • Doesn't skip the starting garbage "M" in the example
  • Doesn't handle consecutive garbage, like the ,,, in the middle

How to fix PATTERN to make it work?

like image 643
janos Avatar asked Oct 06 '14 21:10

janos


2 Answers

You could use a regex like this:

([-.]?\d+(?:\.\d+)?)

Working demo

enter image description here

Match Information:

MATCH 1
1.  [1-6]   `0.286`
MATCH 2
1.  [6-12]  `-3.099`
MATCH 3
1.  [12-17] `-0.44`
MATCH 4
1.  [18-24] `-2.901`
MATCH 5
1.  [25-31] `-0.436`
MATCH 6
1.  [34-37] `123`
MATCH 7
1.  [38-43] `0.123`
MATCH 8
1.  [44-47] `.34`

Update

Jawee's approach

As Jawee pointed in his comment there is a problem for .34.34, so you can use his regex that fix this problem. Thanks Jawee to point out that.

(-?(?:\d+)?\.?\d+)

To have graphic idea about what happens behind this regex you can check this Debuggex image:

Regular expression visualization

Engine explanation:

1st Capturing group (-?(?:\d+)?\.?\d+)
   -? -> matches the character - literally zero and one time
   (?:\d+)? -> \d+ match a digit [0-9] one and unlimited times (using non capturing group)
   \.? matches the character . literally zero and one time
   \d+ match a digit [0-9] one and unlimited times
like image 173
Federico Piazza Avatar answered Nov 04 '22 04:11

Federico Piazza


Try this one (-?(?:\d+)?\.?\d+)
Example as below:

Demo Here

Thanks a lot for nhahtdh's comments. That's true, we could update as below:

[-+]?(?:\d+(?:\.\d*)?|\.\d+)

Updated Demo Here

Actually, if we take all possible float input String format (e.g: Infinity, -Infinity, 00, 0xffp23d, 88F), then it could be a little bit complicated. However, we still could implement it as below Java code:

String sign = "[-+]?";
String hexFloat = "(?>0[xX](((\\p{XDigit}+)\\.?)|((\\p{XDigit}*)\\.(\\p{XDigit}+)))[pP]([-+])?(\\p{Digit}+)[fFdD]?)";
String nan = "(?>NaN)";
String inf = "(?>Infinity)";

String dig = "(?>\\d+(?:\\.\\d*)?|\\.\\d+)";
String exp = "(?:[eE][-+]?\\d+)?";
String suf = "[fFdD]?";
String digFloat = "(?>" + dig + exp + suf + ")";

String wholeFloat = sign + "(?>" + hexFloat + "|" + nan + "|" + inf + "|" + digFloat + ")";

String s = "M0.286-3.099-0.44c-2.901,-0.436,,,123,0.123d,.34d.34.34M24.NaNNaN,Infinity,-Infinity00,0xffp23d,88F";

Pattern floatPattern = Pattern.compile(wholeFloat);
Matcher matcher = floatPattern.matcher(s);
int i = 0;
while (matcher.find()) {
    String f =  matcher.group();
    System.out.println(i++ + " : " + f + " --- " +  Float.parseFloat(f) );
}  

Then the output is as below:

0 : 0.286 --- 0.286
1 : -3.099 --- -3.099
2 : -0.44 --- -0.44
3 : -2.901 --- -2.901
4 : -0.436 --- -0.436
5 : 123 --- 123.0
6 : 0.123d --- 0.123
7 : .34d --- 0.34
8 : .34 --- 0.34
9 : .34 --- 0.34
10 : 24. --- 24.0
11 : NaN --- NaN
12 : NaN --- NaN
13 : Infinity --- Infinity
14 : -Infinity --- -Infinity
15 : 00 --- 0.0
16 : 0xffp23d --- 2.13909504E9
17 : 88F --- 88.0
like image 38
jawee Avatar answered Nov 04 '22 05:11

jawee