Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to remove newlines from object/embed tags only using Nokogiri

I need to remove newlines from any object/embed tags. I am currently attempting to do so using Nokogiri by doing the following:

s = "<div>
<object height='450' width='600'>
<param name='allowfullscreen' value='true'>
<param name='allowscriptaccess' value='always'>
<param name='movie' value='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1'>
<embed src='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1' type='application/x-shockwave-flash' allowfullscreen='true' allowscriptaccess='always' height='450' width='600'>
</embed>
</object>
</div>"
doc = Nokogiri::HTML(s)
doc.css('object').each { |o| o.inner_html.gsub!(/\n/, ""); puts o.inner_html }

Please note that the example is for object tags only.

Printing o.inner_html at the end of the block shows that no replacement has occurred, even though the gsub text appears correct. Also, once that part is resolved, I need to make sure that the actual object node in the doc object is saved with the updated values.

Any help is most appreciated. Thanks.

like image 689
modulaaron Avatar asked Dec 15 '25 10:12

modulaaron


1 Answers

Got it!

require 'nokogiri'
s = <<ENDHTML
<div>
<object height='450' width='600'>
  <param name='allowfullscreen' value='true'><param name='allowscriptaccess' value='always'>
  <param name='movie' value='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1'>
<embed src='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1' type='application/x-shockwave-flash' allowfullscreen='true' allowscriptaccess='always' height='450' width='600'>
</embed>
</object>
</div>
ENDHTML

doc = Nokogiri::HTML(s)
doc.css('object,embed').each{ |e| e.inner_html = e.inner_html.gsub(/\n/,'') }
puts doc.serialize( save_with: 0 )

#=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
#=> <html><body><div>
#=> <object height="450" width="600"><param name="allowfullscreen" value="true"><param name="allowscriptaccess" value="always"><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1"><embed src="http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" height="450" width="600"></embed></object>
#=> </div></body></html>
  1. Removing all text nodes does not fully clean the document; you need to use the inner_html.
  2. Calling inner_html.gsub! is not the same as inner_html = inner_html.gsub.
  3. As shown, you need to use serialize with the hash :save_with => 0 passed in to prevent Nokogiri from generating newlines between tags in the output.
like image 102
Phrogz Avatar answered Dec 17 '25 01:12

Phrogz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!