Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to extract label-value pairs in Java

I have a file containing several lines similar to:

Name: Peter
Address: St. Serrano número 12, España
Country: Spain

And I need to extract the address using a regular expression, taking into account that it can contain dots, special characters (ñ, ç), áéíóú...

The current code works, but it looks quite ugly:.

Pattern p = Pattern.compile("^(.+?)Address: ([a-zA-Z0-9ñÑçÇáéíóú., ]+)(.+?)$",
                            Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = p.matcher(content);
if (m.matches()) { ... }

Edit: The Address field could also be divided into multiple lines

Name: Peter
Address: St. Serrano número 12,   
Madrid
España
Country: Spain

Edit: I can't use a Properties object or a YAML parser, as the file contains other kind of information, too.

like image 310
Guido Avatar asked Dec 25 '08 19:12

Guido


2 Answers

I don't know Java's regex objects that well, but something like this pattern will do it:

^Address:\s*((?:(?!^\w+:).)+)$

assuming multiline and dotall modes are on.

This will match any line starting with Address, followed by anything until a newline character and a single word followed by a colon.

If you know the next field has to be "Country", you can simplify this a little bit:

^Address:\s*((?:(?!^Country:).)+)$

The trick is in the lookahead assertion in the repeating group. '(?!Country:).' will match everything except the start of the string 'Country:', so we just stick it in noncapturing parentheses (?:...) and quantify it with +, then group all of that in normal capturing parentheses.

like image 152
ʞɔıu Avatar answered Sep 28 '22 03:09

ʞɔıu


You might want to look into Properties class instead of regex. It provides you ways to manage plain text or XML files to represent key-value pairs.

So you can read in your example file and then get the values like so after loading to a Properties object:

Properties properties = new Properties();
properties.load(/* InputStream of your file */);

Assert.assertEquals("Peter", properties.getProperty("Name"));
Assert.assertEquals("St. Serrano número 12, España", properties.getProperty("Address"));
Assert.assertEquals("Spain", properties.getProperty("Country"));
like image 23
Cem Catikkas Avatar answered Sep 28 '22 03:09

Cem Catikkas