I have a string name s,
String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";
I want to remove all <NOUN> and </NOUN> tags from the string. I used this to remove tags,
s.replaceAll("[<NOUN>,</NOUN>]","");
Yes it removes the tag. but it also removes letter 'U' and 'O' characters from the string which gives me following output.
Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel
Can anyone please tell me how to do this correctly?
Using str_replace() Method: The str_replace() method is used to remove all the special characters from the given string str by replacing these characters with the white space (” “).
This should do what you're looking for: function clean($string) { $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens. return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars. }
In C++ we can do this task very easily using erase() and remove() function. The remove function takes the starting and ending address of the string, and a character that will be removed.
The idea is to use the deleteCharAt() method of StringBuilder class to remove first and the last character of a string. The deleteCharAt() method accepts a parameter as an index of the character you want to remove. Remove last character of a string using sb. deleteCharAt(str.
Try:
s.replaceAll("<NOUN>|</NOUN>", "");
In RegEx, the syntax [...]
will match every character inside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe (|
) to match both "<NOUN>" and "</NOUN>".
The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:
s.replaceAll("</?NOUN>", "");
String.replaceAll() takes a regular expression as its first argument. The regexp:
"[<NOUN>,</NOUN>]"
defines within the brackets the set of characters to be identified and thus removed. Thus you're asking to remove the characters <
,>
,/
,N
,O
,U
and comma.
Perhaps the simplest method to do what you want is to do:
s.replaceAll("<NOUN>","").replaceAll("</NOUN>","");
which is explicit in what it's removing. More complex regular expressions are obviously possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With