Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove begining punctuation from a word

Tags:

java

regex

I have seen a couple of threads here that kindof matches what I am asking here. But none are concrete. If I have a string like "New Delhi", I want my code to extract New Delhi. So here the quotes are stripped off. I want to strip off any punctuation, in general at start and end.

So far, this helps to strip the punctuations at the end:

String replacedString = replaceable_string.replaceAll("\\p{Punct}*([a-z]+)\\p{Punct}*", "$1");

What am I doing wrong here? My output is "New Delhi with the beginning quote still there.

like image 975
Knight Avatar asked Apr 04 '13 18:04

Knight


People also ask

How do you remove punctuation from a string?

One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate() method typically takes a translation table, which we'll do using the . maketrans() method.

How do I remove punctuation from a panda string?

To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .

How do you ignore punctuation in Java?

The standard solution to remove punctuations from a String is using the replaceAll() method. It can remove each substring of the string that matches the given regular expression. You can use the POSIX character class \p{Punct} for creating a regular expression that finds punctuation characters.


1 Answers

The following will remove a punctuation character from both the beginning and end of a String object if present:

String s = "\"New, Delhi\"";

// Output: New, Delhi
System.out.println(s.replaceAll("^\\p{Punct}|\\p{Punct}$", ""));

The ^ part of the Regex represents the beginning of the text, and $ represents the end of the text. So, ^\p{Punct} will match a punctuation that is a first character and \p{Punct}$ will match a punctuation that is a last character. I used | (OR) to match either the first expression or the second one, resulting in ^\p{Punct}|\p{Punct}$.

In case you want to remove all punctuation characters from the beginning and the end of the String object, you can use the following:

String s = "\"[{New, Delhi}]\"";

// Output: New, Delhi
System.out.println(s.replaceAll("^\\p{Punct}+|\\p{Punct}+$", ""));

I simply added the + sign after each \p{Punct}. The + sign means "One or more", so it will match many punctuations if they are present at the beginning or end of the text.

Hope this is what you were looking for :)

like image 86
Ben Barkay Avatar answered Oct 14 '22 19:10

Ben Barkay