Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract IMG tags in Ruby

Is it possible to extract the IMG tag (or just the src attribute of an IMG tag) from a block of HTML in Ruby?

For example, if I have a block of HTML such as:

<p>Lorem ipsum dolor sit amet, labore et dolore magna aliqua.<img src="example.jpg" alt="" /> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.</p>

Could I extract just the IMG tag or src of that IMG tag via Regex or some other method?

Thanks in advance for any suggestions!

like image 844
ericalli Avatar asked Apr 28 '11 03:04

ericalli


3 Answers

Using Nokogiri:

require 'nokogiri' # gem install nokogiri
doc = Nokogiri::HTML( my_html_string )
img_srcs = doc.css('img').map{ |i| i['src'] } # Array of strings
like image 200
Phrogz Avatar answered Nov 15 '22 14:11

Phrogz


You can use this regular expression

html_str[/img.*?src="(.*?)"/i,1]

If you want a more advance html parser, I recommend nokogiri

like image 44
Jhony Fung Avatar answered Nov 15 '22 14:11

Jhony Fung


Use Nokogiri to parse the HTML and search for img tags to extract the src attribute from.

like image 34
JohnD Avatar answered Nov 15 '22 13:11

JohnD