Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: How to extract substring between two characters from a string?

Tags:

java

regex

I am relatively new to Java and I need some help to extract multiple substrings from a string. An example of a string is as given below:

String = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/."

Desired result: WRB MD PRP VB DT NN IN NNS POS JJ NNS

I have a text file with possibly thousands of similar POS-tagged lines that I need to extract the POS tags from and do some calculation based on the POS tags.

I have tried using tokenizer but didn't really get the result I wanted. I even tried using split() and saving to arrays because I need to store it and use it later and that still didn't work.

Lastly, I tried using Pattern Matcher and I am having problems with the regex as it return the word with the forward slash.

Regex: [\/](.*?)\s\b
Result: /WRB /MD ....

If there's a better way to do this, please let me know or if anyone can help me figure out what's wrong with my regex.

like image 557
Cryssie Avatar asked Sep 03 '12 11:09

Cryssie


People also ask

How do I extract a string between two characters?

To extract part string between two different characters, you can do as this: Select a cell which you will place the result, type this formula =MID(LEFT(A1,FIND(">",A1)-1),FIND("<",A1)+1,LEN(A1)), and press Enter key. Note: A1 is the text cell, > and < are the two characters you want to extract string between.

How do I find a particular substring in a string in Java?

To locate a substring in a string, use the indexOf() method. Let's say the following is our string. String str = "testdemo"; Find a substring 'demo' in a string and get the index.

How do you trim a string after a specific character in Java?

Java String trim() The Java String class trim() method eliminates leading and trailing spaces. The Unicode value of space character is '\u0020'. The trim() method in Java string checks this Unicode value before and after the string, if it exists then the method removes the spaces and returns the omitted string.

How do I extract a specific word from a string in Java?

You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.


3 Answers

This should work:

String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
System.out.println(string.replaceAll("[^/]+/([^ ]+ ?)", "$1"));

Prints: WRB MD PRP VB DT NN IN NNS POS JJ NNS .

like image 172
sp00m Avatar answered Oct 12 '22 20:10

sp00m


If you still wanted to use pattern matching, look at positive lookbehinds. It will allow you to match a word that begins with a slash, but not actually match the slash itself.

An example would be something like this:

(?<=/).+?(?= |$)

Matches anything that starts with a slash, and is followed by a space OR the end of the string

Here is a working example written in Java:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.LinkedList;

public class SO {
    public static void main(String[] args) {
        String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
        Pattern pattern = Pattern.compile("(?<=/).+?(?= |$)");
        Matcher matcher = pattern.matcher(string);

        LinkedList<String> list = new LinkedList<String>();

        // Loop through and find all matches and store them into the List
        while(matcher.find()) { 
            list.add(matcher.group()); 
        }

        // Print out the contents of this List
        for(String match : list) { 
            System.out.println(match); 
        }
    }
}
like image 35
Jay Avatar answered Oct 12 '22 19:10

Jay


String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";

string = string .replaceAll("\\S+/", "").replace(".", "");  

System.out.println(string );
like image 2
subodh Avatar answered Oct 12 '22 19:10

subodh