Strip text from HTML document using Ruby

Tags:

There are lots of examples of how to strip HTML tags from a document using Ruby, Hpricot and Nokogiri have inner_text methods that remove all HTML for you easily and quickly.

What I am trying to do is the opposite, remove all the text from an HTML document, leaving just the tags and their attributes.

I considered looping through the document setting inner_html to nil but then really you'd have to do this in reverse as the first element (root) has an inner_html of the entire rest of the document, so ideally I'd have to start at the inner most element and set inner_html to nil whilst moving up through the ancestors.

Does anyone know a neat little trick for doing this efficiently? I was thinking perhaps regex's might do it but probably not as efficiently as an HTML tokenizer/parser might.

919

asked Sep 30 '09 11:09

davidsmalley

1 Answers

This works too:

doc = Nokogiri::HTML(your_html)
doc.xpath("//text()").remove

answered Sep 23 '22 12:09

andre-r

Related questions
                            
                                How can I set the maximum length of 6 and minimum length of 6 in a textbox? [duplicate]
                            
                                Making a simple tooltip with only HTML and CSS
                            
                                execute javascript function after 5 seconds [duplicate]
                            
                                file upload: check if valid image
                            
                                Adding ... when text is too long in a div with only CSS [duplicate]
                            
                                How to center absolutely positioned children of a flex container in Safari?
                            
                                save html-formatted text to database
                            
                                hide alt tag in firefox
                            
                                How to change the visibility of a <div> tag using javascript in the same page?
                            
                                Bootstrap 3 right align button
                            
                                Divs vs tables for tabular data
                            
                                Getting the selected radio without using "Id" but "name"
                            
                                Jquery: how to do a callback after attr?
                            
                                Get parent <li> item with jQuery
                            
                                How to give font-awesomes icon a fixed width?
                            
                                Making a drop-down menu scrollable
                            
                                Why onbeforeunload event is not firing
                            
                                AngularJs: How make ui-select working properly?
                            
                                Find elements using part of ID
                            
                                jQuery / Javascript replace <space> in anchor link with %20

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Strip text from HTML document using Ruby

Tags:

html

ruby

nokogiri

hpricot

davidsmalley

People also ask

1 Answers

andre-r

Recent Activity

Donate For Us