Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove a specific special character pattern from a string

Tags:

java

string

I have a string name s,

String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";  

I want to remove all <NOUN> and </NOUN> tags from the string. I used this to remove tags,

s.replaceAll("[<NOUN>,</NOUN>]","");

Yes it removes the tag. but it also removes letter 'U' and 'O' characters from the string which gives me following output.

 Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel

Can anyone please tell me how to do this correctly?

like image 515
Roshanck Avatar asked Aug 03 '12 08:08

Roshanck


People also ask

How do I remove special characters from a string?

Using str_replace() Method: The str_replace() method is used to remove all the special characters from the given string str by replacing these characters with the white space (” “).

How do I remove special characters from a string in HTML?

This should do what you're looking for: function clean($string) { $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens. return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars. }

How do I remove a specific character from a string in C++?

In C++ we can do this task very easily using erase() and remove() function. The remove function takes the starting and ending address of the string, and a character that will be removed.

How do I remove a specific character from a string in Java?

The idea is to use the deleteCharAt() method of StringBuilder class to remove first and the last character of a string. The deleteCharAt() method accepts a parameter as an index of the character you want to remove. Remove last character of a string using sb. deleteCharAt(str.


2 Answers

Try:

s.replaceAll("<NOUN>|</NOUN>", "");

In RegEx, the syntax [...] will match every character inside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe (|) to match both "<NOUN>" and "</NOUN>".

The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:

s.replaceAll("</?NOUN>", "");
like image 191
Hubro Avatar answered Oct 05 '22 08:10

Hubro


String.replaceAll() takes a regular expression as its first argument. The regexp:

"[<NOUN>,</NOUN>]"

defines within the brackets the set of characters to be identified and thus removed. Thus you're asking to remove the characters <,>,/,N,O,U and comma.

Perhaps the simplest method to do what you want is to do:

s.replaceAll("<NOUN>","").replaceAll("</NOUN>","");

which is explicit in what it's removing. More complex regular expressions are obviously possible.

like image 37
Brian Agnew Avatar answered Oct 05 '22 09:10

Brian Agnew