This question has been bugging me for a long time now but essentially I'm looking for the most efficient way to grab all Strings between two Strings.
The way I have been doing it for many months now is through using a bunch of temporary indices, strings, substrings, and it's really messy. (Why does Java not have a native method such as String substring(String start, String end)
?
Say I have a String:
abcabc [pattern1]foo[pattern2] abcdefg [pattern1]bar[pattern2] morestuff
The end goal would be to output foo
and bar
. (And later to be added into a JList)
I've been trying to incorporate regex in .split()
but haven't been successful. I've tried syntax using *
's and .
's but I don't think it's quite what my intention is especially since .split()
only takes one argument to split against.
Otherwise I think another way is to use the Pattern and Matcher classes? But I'm really fuzzy on the appropriate procedure.
To get a substring between two characters:Get the index after the first occurrence of the character. Get the index of the last occurrence of the character. Use the String. slice() method to get a substring between the 2 characters.
To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this: \.
You can construct the regex to do this for you:
// pattern1 and pattern2 are String objects String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);
This will treat the pattern1
and pattern2
as literal text, and the text in between the patterns is captured in the first capturing group. You can remove Pattern.quote()
if you want to use regex, but I don't guarantee anything if you do that.
You can add some customization of how the match should occurs by adding flags to the regexString
.
(?iu)
at the beginning of regexString
, or supply Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE
flag to Pattern.compile
method.(?s)
before (.*?)
, i.e. "(?s)(.*?)"
, or supply Pattern.DOTALL
flag to Pattern.compile
method.Then compile the regex, obtain a Matcher
object, iterate through the matches and save them into a List
(or any Collection
, it's up to you).
Pattern pattern = Pattern.compile(regexString); // text contains the full text that you want to extract data Matcher matcher = pattern.matcher(text); while (matcher.find()) { String textInBetween = matcher.group(1); // Since (.*?) is capturing group 1 // You can insert match into a List/Collection here }
Testing code:
String pattern1 = "hgb"; String pattern2 = "|"; String text = "sdfjsdkhfkjsdf hgb sdjfkhsdkfsdf |sdfjksdhfjksd sdf sdkjfhsdkf | sdkjfh hgb sdkjfdshfks|"; Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2)); Matcher m = p.matcher(text); while (m.find()) { System.out.println(m.group(1)); }
Do note that if you search for the text between foo
and bar
in this input foo text foo text bar text bar
with the method above, you will get one match, which is text foo text
.
Here's a one-liner that does it all:
List<String> strings = Arrays.asList( input.replaceAll("^.*?pattern1", "") .split("pattern2.*?(pattern1|$)"));
The breakdown is:
.*?
) between pattern2 and pattern1 (or end of input)Arrays.asList()
to generate a List<String>
Here's some test code:
public static void main( String[] args ) { String input = "abcabc pattern1foopattern2 abcdefg pattern1barpattern2 morestuff"; List<String> strings = Arrays.asList( input.replaceAll("^.*?pattern1", "").split("pattern2.*?(pattern1|$)")); System.out.println( strings); }
Output:
[foo, bar]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With