I have seen a couple of threads here that kindof matches what I am asking here. But none are concrete. If I have a string like "New Delhi"
, I want my code to extract New Delhi
. So here the quotes are stripped off. I want to strip off any punctuation, in general at start and end.
So far, this helps to strip the punctuations at the end:
String replacedString = replaceable_string.replaceAll("\\p{Punct}*([a-z]+)\\p{Punct}*", "$1");
What am I doing wrong here? My output is "New Delhi
with the beginning quote still there.
One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate() method typically takes a translation table, which we'll do using the . maketrans() method.
To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .
The standard solution to remove punctuations from a String is using the replaceAll() method. It can remove each substring of the string that matches the given regular expression. You can use the POSIX character class \p{Punct} for creating a regular expression that finds punctuation characters.
The following will remove a punctuation character from both the beginning and end of a String
object if present:
String s = "\"New, Delhi\"";
// Output: New, Delhi
System.out.println(s.replaceAll("^\\p{Punct}|\\p{Punct}$", ""));
The ^
part of the Regex represents the beginning of the text, and $
represents the end of the text. So, ^\p{Punct}
will match a punctuation that is a first character and \p{Punct}$
will match a punctuation that is a last character. I used |
(OR) to match either the first expression or the second one, resulting in ^\p{Punct}|\p{Punct}$
.
In case you want to remove all punctuation characters from the beginning and the end of the String
object, you can use the following:
String s = "\"[{New, Delhi}]\"";
// Output: New, Delhi
System.out.println(s.replaceAll("^\\p{Punct}+|\\p{Punct}+$", ""));
I simply added the +
sign after each \p{Punct}
. The +
sign means "One or more", so it will match many punctuations if they are present at the beginning or end of the text.
Hope this is what you were looking for :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With