I am relatively new to Java and I need some help to extract multiple substrings from a string. An example of a string is as given below:
String = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/."
Desired result: WRB MD PRP VB DT NN IN NNS POS JJ NNS
I have a text file with possibly thousands of similar POS-tagged lines that I need to extract the POS tags from and do some calculation based on the POS tags.
I have tried using tokenizer but didn't really get the result I wanted. I even tried using split()
and saving to arrays because I need to store it and use it later and that still didn't work.
Lastly, I tried using Pattern Matcher and I am having problems with the regex as it return the word with the forward slash.
Regex: [\/](.*?)\s\b
Result: /WRB /MD ....
If there's a better way to do this, please let me know or if anyone can help me figure out what's wrong with my regex.
To extract part string between two different characters, you can do as this: Select a cell which you will place the result, type this formula =MID(LEFT(A1,FIND(">",A1)-1),FIND("<",A1)+1,LEN(A1)), and press Enter key. Note: A1 is the text cell, > and < are the two characters you want to extract string between.
To locate a substring in a string, use the indexOf() method. Let's say the following is our string. String str = "testdemo"; Find a substring 'demo' in a string and get the index.
Java String trim() The Java String class trim() method eliminates leading and trailing spaces. The Unicode value of space character is '\u0020'. The trim() method in Java string checks this Unicode value before and after the string, if it exists then the method removes the spaces and returns the omitted string.
You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.
This should work:
String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
System.out.println(string.replaceAll("[^/]+/([^ ]+ ?)", "$1"));
Prints: WRB MD PRP VB DT NN IN NNS POS JJ NNS .
If you still wanted to use pattern matching, look at positive lookbehinds. It will allow you to match a word that begins with a slash, but not actually match the slash itself.
An example would be something like this:
(?<=/).+?(?= |$)
Matches anything that starts with a slash, and is followed by a space OR the end of the string
Here is a working example written in Java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.LinkedList;
public class SO {
public static void main(String[] args) {
String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
Pattern pattern = Pattern.compile("(?<=/).+?(?= |$)");
Matcher matcher = pattern.matcher(string);
LinkedList<String> list = new LinkedList<String>();
// Loop through and find all matches and store them into the List
while(matcher.find()) {
list.add(matcher.group());
}
// Print out the contents of this List
for(String match : list) {
System.out.println(match);
}
}
}
String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
string = string .replaceAll("\\S+/", "").replace(".", "");
System.out.println(string );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With