Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get words around a position in a string

Tags:

java

string

I would like to get the words that are around a certain position in a string. For example two words after and two words before.

For example consider the string:

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";

for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1))
{
    System.out.println(index);
}

This writes out the index of where the word "I" is. But I want to be able to get a substring of the words around these positions.

I want to be able to print out "John and I like to" and "and hiking I have two".

Not only single word strings should be able to be selected. Search for "John and" will return " name is John and I like".

Is there any neat, smart way of doing this?

like image 236
user1506145 Avatar asked May 05 '13 18:05

user1506145


People also ask

How do I extract certain words from a string?

Extract a specific word from a string using find() method. If we want to extract a specific word from the string and we do not know the exact position of the word, we can first find the position of the word using find() method and then we can extract the word using string slicing.

How do you retrieve a position of desired word from the string?

Method #1 : Using re. findall() + index() This is one of the way in which we can find the location where word exists.

How do I find a character in a string by position?

Java String indexOf() Method The indexOf() method returns the position of the first occurrence of specified character(s) in a string. Tip: Use the lastIndexOf method to return the position of the last occurrence of specified character(s) in a string.

How do I split a string in a specific position?

To split a string at a specific index, use the slice method to get the two parts of the string, e.g. str. slice(0, index) returns the part of the string up to, but not including the provided index, and str. slice(index) returns the remainder of the string. Copied!


3 Answers

Single word:

You can achiveve that using String's split() method. This solution is O(n).

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and "+
                         "hiking I have two sisters and one brother.";
    String find = "I";

    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        if (sp[i].equals(find)) {
            // have to check for ArrayIndexOutOfBoundsException
            String surr = (i-2 > 0 ? sp[i-2]+" " : "") +
                          (i-1 > 0 ? sp[i-1]+" " : "") +
                          sp[i] +
                          (i+1 < sp.length ? " "+sp[i+1] : "") +
                          (i+2 < sp.length ? " "+sp[i+2] : "");
            System.out.println(surr);
        }
    }
}

Output:

John and I like to
and hiking I have two

Multi-word:

Regex is a great and clean solution for case when find is a multi-word. Due to its nature, though, it misses the cases when the the words around also match find (see the an example of this below).

The algorithm below takes care of all cases (all solutions' space). Bear in mind that, due to the nature of the problem, this solution in the worst case is O(n*m) (with n being str's length and m being find's length).

public static void main(String[] args) {
    String str = "Hello my name is John and John and I like to go...";
    String find = "John and";

    String[] sp = str.split(" +"); // "+" for multiple spaces

    String[] spMulti = find.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        int j = 0;
        while (j < spMulti.length && i+j < sp.length 
                                  && sp[i+j].equals(spMulti[j])) {
            j++;
        }           
        if (j == spMulti.length) { // found spMulti entirely
            StringBuilder surr = new StringBuilder();
            if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); }
            if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); }
            for (int k = 0; k < spMulti.length; k++) {
                if (k > 0){ surr.append(" "); }
                surr.append(sp[i+k]);
            }
            if (i+spMulti.length < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length]);
            }
            if (i+spMulti.length+1 < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length+1]);
            }
            System.out.println(surr.toString());
        }
    }
}

Output:

name is John and John and
John and John and I like
like image 142
acdcjunior Avatar answered Oct 14 '22 13:10

acdcjunior


Here is another way I found out using Regex:

        String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";

        String find = "I";

        Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)");
        Matcher matcher = pattern.matcher(str);

        while (matcher.find())
        {
            System.out.println(matcher.group(1));
            System.out.println(matcher.group(2));
        }

Output:

John and
like to 
and hiking
have two 
like image 30
Vishy Avatar answered Oct 14 '22 13:10

Vishy


Use String.split() to split the text into words. Then search for "I" and concatenate the words back together:

String[] parts=str.split(" ");

for (int i=0; i< parts.length; i++){
   if(parts[i].equals("I")){
     String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc..
   }
}

Ofcourse you need to check if i-2 is a valid index, and using a StringBuffer would be handy performance wise, if you have a lot of data ...

like image 24
bluevoid Avatar answered Oct 14 '22 13:10

bluevoid