Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Extract strings with Regex

Tags:

I've this string

String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45"; 

and I need to extract these 3 substrings
1234
06:30
07:45

If I use this regex \\d{2}\:\\d{2} I'm only able to extract the first hour 06:30

Pattern depArrHours = Pattern.compile("\\d{2}\\:\\d{2}"); Matcher matcher = depArrHours.matcher(myString); String firstHour = matcher.group(0); String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1) 

matcher.group(1) throws an exception.
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ '
Do you have any idea on how to match these strings with regex expressions?

UPDATE

Thanks to Adam suggestion I've now this regex that match my string

Pattern p = Pattern.compile(".*XX~ (\\d{3,4}).*(\\d{1,2}:\\d{2}).*(\\d{1,2}:\\d{2})"; 

I match the number, and the 2 hours with matcher.group(1); matcher.group(2); matcher.group(3);

like image 837
mickthompson Avatar asked Aug 03 '09 22:08

mickthompson


People also ask

How do I extract a string in Java?

You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.


1 Answers

The matcher.group() function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". A capturing group is created using a pair of parenthesis "(...)". Anything within the parenthesis is captures. Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). Since there are no parenthesis in your regular expression, there can be no group 1.

The javadoc on the Pattern class covers the regular expression syntax.

If you are looking for a pattern that might recur some number of times, you can use Matcher.find() repeatedly until it returns false. Matcher.group(0) once on each iteration will then return what matched that time.

If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match() and then Matcher.group(n) where n is 1, 2 and 3 respectively. Of course Matcher.match() might also return false, in which case the pattern did not match, and you can't retrieve any of the groups.

In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example.

Lets say I had strings of the form:

Eat 12 carrots at 12:30 Take 3 pills at 01:15 

And I wanted to extract the quantity and times. My regular expression would look something like:

"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})" 

The code would look something like:

Pattern p = Pattern.compile("\\w+ (\\d+) [\\w ]+ (\\d{2}:\\d{2})"); Matcher m = p.matcher(oneline); if(m.matches()) {     System.out.println("The quantity is " + m.group(1));     System.out.println("The time is " + m.group(2)); } 

The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague.

like image 119
Adam Batkin Avatar answered Oct 11 '22 00:10

Adam Batkin