How to perform a background load and scraping of a page with XUL/Firefox Extension

Tags:

I want to scrape the user pages of SO to give the owners of my toolbar the updated information on their questions/answers/etc...

This means I need to do this in the background, parse the pages, extract the content, compare it with the last run and then present the results either on the toolbar or the status bar, or alternatively, on a pop-up window of some kind. And all of this has to be done while the user is going about his business not being interrupted or even being on SO.

I've searched quite thoroughly both on Google and on the Mozilla Wiki for some kind of hint. I've even gone to the extent of downloading a few other extensions that I think do the same. Unfortunately I've not had the time to go through all of them and the ones I've looked at, all use data APIs(Services, WebServices, XML), not html scrapping.

Old question text

I'm looking for a nice place to learn how I can load a page inside a function called buy the infamous set_timeout() to process a screen-scraping in the background.

My idea is to present the results of such scraping in a status bar extension, just in case any thing changed from the last run.

Is there a hidden overlay or some other subterfuge?

527

asked Dec 27 '08 16:12

Gustavo Carreno

1 Answers

In case of XUL/Firefox, what you need is the nsIIOService interface, which you can get like this:

var mIOS = Components.classes["@mozilla.org/network/io-service;1"].
   getService(Components.interfaces.nsIIOService);

Then you need to create a channel, and open an asynchronous link:

var channel = mIOS.newChannel(urlToOpen, 0, null);
channel.asyncOpen(new StreamListener(), channel);

The key here is the StreamListener object:

var StreamListener = function() {
    return {
        QueryInterface: function(aIID) {
            if (aIID.equals(Components.interfaces.nsIStreamListener) ||
                aIID.equals(Components.interfaces.nsISupportsWeakReference) ||
                aIID.equals(Components.interfaces.nsISupports))
                return this;
            throw Components.results.NS_NOINTERFACE;

        onStartRequest: function(aRequest, aContext)
           { return 0; },

        onStopRequest: function(aRequest, aChannel /* aContext */, aStatusCode)
           { return 9; },

        onDataAvailable: function(aRequest, aContext, aStream, aOffset, aCount)
           { return 0; }
    };
}

You have to fill in the details in the onStartRequest, onStopRequest, onDataAvailable functions, but that should be enough to get you going. You can have a look at how I used this interface in my Firefox extension (it is called IdentFavIcon, and it can be found on the mozilla add-ons site).

The part which I'm uncertain about is how you can trigger this page request from time to time, set_timeout() should probably work, though.

Edit:

See example here (see section Downloading Images) for an example on how to collect downloaded data into a single variable; and
See this page on how to convert an HTML source into a DOM tree.

HTH.

187

answered Oct 31 '22 19:10

David Hanak

Related questions
                            
                                Firefox extension: gBrowser is not defined
                            
                                jQuery.height() behaves differently in WebKit and Firefox when using box-sizing:border-box
                            
                                Firefox/Internet explorer behavior with selecting/highlighting an item in the select tag
                            
                                Fontello Glyph Font odd behaviour on Firefox, shows unicode codes instead of icons
                            
                                How do I correct fixed positioning for a sidebar in Firefox?
                            
                                How to disable "This Connection is Untrusted" Certificate in FireFox?
                            
                                Progressive JPG background image trouble in Firefox
                            
                                CSS display:flex align-items:baseline not working in Firefox
                            
                                how to make screen readers read entire page when angular changes states?
                            
                                Post back not working in Firefox for asp.net(C#) pages
                            
                                How to view this svg file on a Mac?
                            
                                WebRTC: Renegotiation in firefox
                            
                                Firefox disable the add on I'm developing
                            
                                new Date() shows differents results in Chrome or Firefox
                            
                                Browser support for class syntax in Javascript [duplicate]
                            
                                How to test an unsigned Firefox extension?
                            
                                Firefox: calc() invalid property value
                            
                                CORS issue with Spring Boot
                            
                                1px calculation issue with browsers (sub-pixel problems)
                            
                                visiblity:hidden of table-cell hides background-color of parent table-row

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to perform a background load and scraping of a page with XUL/Firefox Extension

Tags:

firefox

firefox-addon

xul

Gustavo Carreno

People also ask

1 Answers

David Hanak

Recent Activity

Donate For Us