Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to solve €25.99 vs 25,99€ preg_match problem?

Tags:

regex

php

euro

If I have these strings:

$string1 = "This book costs €25.99 in our shop."

and on the other side

$string2 = "This book costs 25,99€ in our shop."

How to get the "€25.99" or "25,99€" using preg_match ? How will the code look like?

Please, notice that there are 2 ways of writing the euro symbol. The correct way in EU is to write the symbol after the number like 25,99€ and using comma as desimal separator. However, a lot of US people are stuck to the dollar way (€25.99) and dot as desimal separator.

How to do this check for both cases and get the value with symbol in the cleanest and most effiecient way?

like image 354
bradinwerk Avatar asked Aug 04 '11 19:08

bradinwerk


2 Answers

Here's the raw regex: €\d+(?:[,.]\d+)?|\d+(?:[,.]\d+)?€

preg_match ( "/€\d+(?:[,.]\d+)?|\d+(?:[,.]\d+)?€/" , $string1, $matches)

If you want to consider optional spaces between euro and the value, use this:

preg_match ( "/€ ?\d+(?:[,.]\d+)?|\d+(?:[,.]\d+)? ?€/" , $string1, $matches)
like image 56
agent-j Avatar answered Sep 23 '22 14:09

agent-j


agent-j's pattern is on the right track, but I would do something slightly more restrictive:

/€\d+(:?[.,]\d{2})?|\d+(:?[.,]\d{2})?€/

The only difference is that the decimal part is limited to 2 places, if it exists. I don't think you want to allow something like 99,999€, especially since that could mean "99 thousand, 999 euros" if written in the American style.

What I think you're trying to get at in your reference to the cleanest and most efficient way is that the above pattern seems awkward and redundant when you look at it. It's basically the \d+(:?[.,]\d{2})? portion repeated twice, with the € symbol switching sides. This feels wrong, but it isn't. You can't really get around it without bringing in just as much complexity, if not more. Even if you try to get around it with fancy lookarounds, it's going to look something like this:

/^(?=.*€)€?\d+(:?[.,]\d{2})?((?<!€.*)€)?$/

Clearly not an improvement. Sometimes the most obvious solution is the best one, even if it makes you feel dirty.

Note: If you want to get really crazy with it, you can try a variation (caution: untested, and I haven't done much PHP in a while):

$inner = "(:?\d{1,3}(?:([.,])\d{3})*(?:(?!\1)[.,]\d{2})?|\d*(?:[.,]\d{2})?)";

Usage:

preg_match ( "/€" . $inner . "|" . $inner . "€/", $string1, $matches)

That should also accept things like 99,999.99; 999999,99; 9.999.999,99; .99; etc.

like image 20
Justin Morgan Avatar answered Sep 23 '22 14:09

Justin Morgan