Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove all leading and trailing punctuation?

I want to remove all the leading and trailing punctuation in a string. How can I do this?

Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation.

  1. ., @, _, &, /, - are allowed if surrounded by letters or digits
  2. \' is allowed if preceded by a letter or digit

I tried

Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(term);
boolean a = m.find();
if(a)
    term=term.replaceAll("(^\\p{Punct})", "");

but it didn't work!!

like image 518
user1618820 Avatar asked Sep 20 '12 05:09

user1618820


2 Answers

Ok. So basically you want to find some pattern in your string and act if the pattern in matched.

Doing this the naiive way would be tedious. The naiive solution could involve something like

while(myString.StartsWith("." || "," || ";" || ...)
  myString = myString.Substring(1);

If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.

Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial

As for your problem, you could try this:

myString.replaceFirst("^[^a-zA-Z]+", "")

The meaning of the regex:

  • the first ^ means that in this pattern, what comes next has to be at the start of the string.

  • The [] define the chars. In this case, those are things that are NOT (the second ^) letters (a-zA-Z).

  • The + sign means that the thing before it can be repeated and still match the regex.

You can use a similar regex to remove trailing chars.

myString.replaceAll("[^a-zA-Z]+$", "");

the $ means "at the end of the string"

like image 191
K.L. Avatar answered Sep 21 '22 17:09

K.L.


You could use a regular expression:

private static final Pattern PATTERN =
    Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");

public static String trimPunctuation(String s) {
  Matcher m = PATTERN.matcher(s);
  m.find();
  return m.group(1);
}

The boundary matchers ^ and $ ensure the whole input is matched.

A dot . matches any single character.

A star * means "match the preceding thing zero or more times".

The parentheses () define a capturing group whose value is retrieved by calling Matcher.group(1).

The ? in (.*?) means you want the match to be non-greedy, otherwise the trailing punctuation would be included in the group.

like image 31
dnault Avatar answered Sep 17 '22 17:09

dnault