Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:

    String[] words = instring.split("\\s+");     for (int i = 0; i < words.length; i++) {         words[i] = words[i].toLowerCase();     }     String[] wordsout = new String[50];     Arrays.fill(wordsout,"");     int e = 0;     for (int i = 0; i < words.length; i++) {         if (words[i] != "") {             wordsout[e] = words[e];             wordsout[e] = wordsout[e].replaceAll(" ", "");             e++;         }     }     return wordsout; 

I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.

like image 202
TheDoctor Avatar asked Sep 16 '13 14:09

TheDoctor


People also ask

How do you remove punctuation in Java?

The standard solution to remove punctuations from a String is using the replaceAll() method. It can remove each substring of the string that matches the given regular expression. You can use the POSIX character class \p{Punct} for creating a regular expression that finds punctuation characters.

How do I remove all punctuation marks from a string in Java?

Remove Punctuation From String Using the replaceAll() Method in Java. We can use a regex pattern in the replaceAll() method with the pattern as \\p{Punct} to remove all the punctuation from the string and get a string punctuation free. The regex pattern is \\p{Punct} , which means all the punctuation symbols.

How do you remove punctuation from a string?

We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.

How do you check punctuation in Java?

*)[\\p{P}](. *) ", it will include all preceding and proceeding characters of the punctuation because the matches() requires to match all the sentence.


1 Answers

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+"); 

Spaces are initially left in the input so the split will still work.

By removing the rubbish characters before splitting, you avoid having to loop through the elements.

like image 88
Bohemian Avatar answered Oct 13 '22 01:10

Bohemian