Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve web site favicons?

I am using Ruby on Rails v3.0.9 and I would like to retrieve the favicon.ico image of each web site for which I set a link.

That is, if in my application I set the http://www.facebook.com/ URL I would like to retrieve the Facebook' icon and use\insert that in my web pages. Of course I would like to do that also for all other web sites.

How can I retrieve favicon.ico icons from web sites in an "automatic" way (with "automatic" I mean to search for a favicon in a web site and get the link to it - I think no because not all web sites have a favicon named exactly 'favicon.ico'. I would like to recognize that in an "automatic" way)?

P.S.: What I would like to make is something like Facebook makes when to add a link\URL in your Facebook page: it recognizes the related web site logo and then appends that to the link\URL.

like image 388
Backo Avatar asked Aug 25 '11 13:08

Backo


3 Answers

http://getfavicon.appspot.com/ works great for fetching favicons. Just give it the url for the site and you'll get the favicon back:

http://g.etfv.co/http://www.google.com

like image 163
magnushjelm Avatar answered Oct 23 '22 14:10

magnushjelm


Recently I have written some similar solution.

If we want find favicon url, that can be not only .ico file and can be not in the root, we should parse target site html.

In Ruby on Rails, I have used nokogiri gem for html parsing. First we parse all meta tags where itemprop attribute contains image keyword. It is necessary in situations where target site used https://schema.org/WebPage template, that more modern technology than just link tag.

If we found it, we can use content attribute as favicon url. But we should check it for really URL existence, just to be sure.

If we can't found some meta tags, then we search for standard link tags, where rel attribute contains icon keyword. This is W3C standard situation (https://www.w3.org/2005/10/howto-favicon)

And some code of my solution:

require 'open-uri'

def site_icon_link site
    icon_link = nil
    url = nil
    doc = Nokogiri::HTML(open(site))
    metas = doc.css("meta[itemprop*=image]")

    if metas.any? 
        url = metas.first.attributes['content'].value
    else 
        links = doc.css("link[rel*=icon]")
        if links.any? 
            url = links.first.attributes['href'].value
        end
    end

    if url =~ URI::regexp
        icon_link = url
    elsif (site + url) =~ URI::regexp
        icon_link = site + url
    end

    icon_link

end
like image 2
Rustery Avatar answered Oct 23 '22 14:10

Rustery


The favicons are being found by two ways. First, there is a 'hardcoded', traditional name of `http://example.com/favicon.ico'.

Second, the HTML pages may define the favicon in their <head> sections, by <link rel="icon"...> and a few other. (You may want to read the Wikipedia article about favicon)

So, your automat may fetch the main page of given website, parse it and check whether there are proper <link> tags, and then, as a fallback, try the "hardcoded" favicon.ico name.

like image 1
Arsen7 Avatar answered Oct 23 '22 13:10

Arsen7