Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find and replace HTML tags

I have the following HTML:

<html>
<body>
<h1>Foo</h1>
<p>The quick brown fox.</p>
<h1>Bar</h1>
<p>Jumps over the lazy dog.</p>
</body>
</html>

I'd like to change it into the following HTML:

<html>
<body>
<p class="title">Foo</p>
<p>The quick brown fox.</p>
<p class="title">Bar</p>
<p>Jumps over the lazy dog.</p>
</body>
</html>

How can I find and replace certain HTML tags? I can use the Nokogiri gem.

like image 681
Javier Avatar asked Mar 04 '09 13:03

Javier


People also ask

Can you Find and replace in HTML?

Click within the HTML Editor pane to ensure that it is the active pane. On the main menu, click Edit, then click Find/Replace (or press CTRL+F). The Find/Replace dialog box appears. In the Search for box, type the words to search for or change.

How do you remove tags in HTML?

For HTML tags, you can press Alt+Enter and select Remove tag instead of removing an opening tag and then a closing tag.

How do I remove a string in HTML?

The HTML tags can be removed from a given string by using replaceAll() method of String class. We can remove the HTML tags from a given string by using a regular expression. After removing the HTML tags from a string, it will return a string as normal text.


2 Answers

Try this:

require 'nokogiri'

html_text = "<html><body><h1>Foo</h1><p>The quick brown fox.</p><h1>Bar</h1><p>Jumps over the lazy dog.</p></body></html>"

frag = Nokogiri::HTML(html_text)
frag.xpath("//h1").each { |div|  div.name= "p"; div.set_attribute("class" , "title") }
like image 124
SimonV Avatar answered Oct 21 '22 08:10

SimonV


Seems like this works right:

require 'rubygems'
require 'nokogiri'

markup = Nokogiri::HTML.parse(<<-somehtml)
<html>
<body>
<h1>Foo</h1>
<p>The quick brown fox.</p>
<h1>Bar</h1>
<p>Jumps over the lazy dog.</p>
</body>
</html>
somehtml

markup.css('h1').each do |el|
  el.name = 'p'
  el.set_attribute('class','title')
end

puts markup.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p class="title">Foo</p>
# >> <p>The quick brown fox.</p>
# >> <p class="title">Bar</p>
# >> <p>Jumps over the lazy dog.</p>
# >> </body></html>
like image 24
dylanfm Avatar answered Oct 21 '22 06:10

dylanfm