Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically get web request initiator

The Chrome Dev Tools network tab has an initiator column that will show you exactly what code initiated the network request.

network tab of chrome dev tools

I'd like to be able to get network request initiator information programmatically, so I could run a script with a url and request search string argument, and it would return details about where every request with a url matching request search string came from on the page at url. So given the arguments www.stackoverflow.com and google the output might look something like this (showing requesting url, line number, and requested url):

/   19  http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
/   4291    http://www.google-analytics.com/analytics.js

I looked into PhantomJS, but its onResourceRequested callback doesn't provide any initiator information, or context from which it can be derived, according to the documentation: http://phantomjs.org/api/webpage/handler/on-resource-requested.html

Is it possible to do with with PhantomJS at all, or some other tool or service such as selenium?

UPDATE

From the comments and answers so far it seems as though this isn't currently supported by Phantom, Selenium or anything else. So here's an alternative approach that might work: Load the page, and all of the assets, and then find any occurrences of request search string in all of the files. How could I do that?

like image 466
Ben Dowling Avatar asked Nov 26 '15 04:11

Ben Dowling


People also ask

How do I get a request initiator chain?

# Request Initiator Chains in the Initiator tabAfter logging network activity in the Network panel, click a resource and then go to the Initiator tab to view its Request Initiator Chain: The inspected resource is bold. In the screenshot above, https://web.dev/default-627898b5.js is the inspected resource.

What is request initiator chain?

The Show request initiator chain option shows the chain of requests leading up to the selected request. The Request Chain tab shows a waterfall that only contains requests that lead up to this request. If the selected request is an initiator for other requests these Initiated Requests will also be shown.


2 Answers

You should file a feature request in the issue tracker against the DevTools. The initiator information is not exported in the HAR, so getting it out of there isn't going to work. As far as I know, no existing API allows for this either.

like image 168
Garbee Avatar answered Sep 18 '22 00:09

Garbee


I've been able to implement a solution that uses PhantomJS to get all of the URLs loaded by a page, and then use a combination of xargs, curl and grep to find the search string at those URLs.

The first piece is this PhantomJS script, which simply outputs every URL requested by a page:

system = require('system');
var page = require('webpage').create();

page.onResourceRequested= function(req) {
    console.log(req.url);
};

page.open(system.args[1], function(status) {
    phantom.exit(1);
});

Here it is in action:

$ phantomjs urls.js http://www.stackoverflow.com | head -n6
http://www.stackoverflow.com/
http://stackoverflow.com/
http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
http://cdn.sstatic.net/Js/stub.en.js?v=06bb9dbfaca7
http://cdn.sstatic.net/stackoverflow/all.css?v=af4b547e0e9f
http://cdn.sstatic.net/img/share-sprite-new.svg?v=d09c08f3cb07

For my problem I'm not interested in images, and those can be fitlered out by adding the phantomjs arg --load-images=no.

The second piece is taking all of the URLs and searching them. It's not enough to just output the match, I also need the context around which URL was matched, and ideally which line number too. Here's how to do that:

$ cat urls | xargs -I% sh -c "curl -s % | grep -E -n -o '(.{0,30})SEARCH_TERM(.{0,30})' | sed 's#^#% #'"

We can wrap this all up in a little script, where we'll pipe the output back through grep to get color highlighting on the search string:

#!/bin/bash
phantomjs --load-images=no urls.js $1 | xargs -I% sh -c "curl -s % | grep -E -n -o '(.{0,30})$2(.{0,30})' | sed 's#^#% #' | grep $2 --color=always"

We can then use it to search for any term on any site. Here we're looking for adzerk.net on stackoverflow.com:

enter image description here

So you can see that the adzerk.net request gets initiated somewhere around line 4158 of the main stackoverflow page. It's not a perfect solution because the invocation might be somewhere completely different from where the URL is defined, but it's probably a close, and certainly a good point to start tracking down the exact invocation site.

There might be a better way to search the contents of each URL. It doesn't look like PhantonJS's onResourceReceived handler currently exposes the resource content, but there is ongoing work to address that, and once that's available all of this will be much simpler.

like image 28
Ben Dowling Avatar answered Sep 19 '22 00:09

Ben Dowling