Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ruby on rails regular expression to remove html tags and its content from text

I want regular expression in ruby on rails which remove all the html tags and its content from the given text.

For example if my text is :-INPUT :-

<span id="span_is"><br><br><u><i>Hi</i></u></span> 

then it should display only OUTPUT should be as follows:-

Hi

in short i want regular expression or a function which remove <> and whatever the content between <>.

Thanks & Regards,

Salil Gaikwad

like image 259
Salil Avatar asked Mar 19 '10 07:03

Salil


3 Answers

'<span id="span_is"><br><br><u><i>Hi</i></u></span>'.gsub(/<\/?[^>]+>/, '')
like image 68
Jimmy Avatar answered Nov 16 '22 20:11

Jimmy


Your string is quite simple and that solution might work. However, you shouldn't reinvent the wheel. Rails already includes some powerful sanitization helpers.

string = '<span id="span_is"><br><br><u><i>Hi</i></u></span>'
strip_tags(string)
like image 44
Simone Carletti Avatar answered Nov 16 '22 20:11

Simone Carletti


Don't do this. Please.

While your sample-input is fairly trivial, you mention that you want to use it in a lot broader scope.

http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

For Ruby, you can try using http://hpricot.com/ to parse HTML instead.

like image 23
Commander Keen Avatar answered Nov 16 '22 20:11

Commander Keen