The Chrome Dev Tools network tab has an initiator column that will show you exactly what code initiated the network request.
I'd like to be able to get network request initiator information programmatically, so I could run a script with a url
and request search string
argument, and it would return details about where every request with a url matching request search string
came from on the page at url
. So given the arguments www.stackoverflow.com
and google
the output might look something like this (showing requesting url, line number, and requested url):
/ 19 http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
/ 4291 http://www.google-analytics.com/analytics.js
I looked into PhantomJS, but its onResourceRequested
callback doesn't provide any initiator information, or context from which it can be derived, according to the documentation: http://phantomjs.org/api/webpage/handler/on-resource-requested.html
Is it possible to do with with PhantomJS at all, or some other tool or service such as selenium?
UPDATE
From the comments and answers so far it seems as though this isn't currently supported by Phantom, Selenium or anything else. So here's an alternative approach that might work: Load the page, and all of the assets, and then find any occurrences of request search string
in all of the files. How could I do that?
# Request Initiator Chains in the Initiator tabAfter logging network activity in the Network panel, click a resource and then go to the Initiator tab to view its Request Initiator Chain: The inspected resource is bold. In the screenshot above, https://web.dev/default-627898b5.js is the inspected resource.
The Show request initiator chain option shows the chain of requests leading up to the selected request. The Request Chain tab shows a waterfall that only contains requests that lead up to this request. If the selected request is an initiator for other requests these Initiated Requests will also be shown.
You should file a feature request in the issue tracker against the DevTools. The initiator information is not exported in the HAR, so getting it out of there isn't going to work. As far as I know, no existing API allows for this either.
I've been able to implement a solution that uses PhantomJS to get all of the URLs loaded by a page, and then use a combination of xargs, curl and grep to find the search string at those URLs.
The first piece is this PhantomJS script, which simply outputs every URL requested by a page:
system = require('system');
var page = require('webpage').create();
page.onResourceRequested= function(req) {
console.log(req.url);
};
page.open(system.args[1], function(status) {
phantom.exit(1);
});
Here it is in action:
$ phantomjs urls.js http://www.stackoverflow.com | head -n6
http://www.stackoverflow.com/
http://stackoverflow.com/
http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js
http://cdn.sstatic.net/Js/stub.en.js?v=06bb9dbfaca7
http://cdn.sstatic.net/stackoverflow/all.css?v=af4b547e0e9f
http://cdn.sstatic.net/img/share-sprite-new.svg?v=d09c08f3cb07
For my problem I'm not interested in images, and those can be fitlered out by adding the phantomjs arg --load-images=no
.
The second piece is taking all of the URLs and searching them. It's not enough to just output the match, I also need the context around which URL was matched, and ideally which line number too. Here's how to do that:
$ cat urls | xargs -I% sh -c "curl -s % | grep -E -n -o '(.{0,30})SEARCH_TERM(.{0,30})' | sed 's#^#% #'"
We can wrap this all up in a little script, where we'll pipe the output back through grep to get color highlighting on the search string:
#!/bin/bash
phantomjs --load-images=no urls.js $1 | xargs -I% sh -c "curl -s % | grep -E -n -o '(.{0,30})$2(.{0,30})' | sed 's#^#% #' | grep $2 --color=always"
We can then use it to search for any term on any site. Here we're looking for adzerk.net on stackoverflow.com:
So you can see that the adzerk.net request gets initiated somewhere around line 4158 of the main stackoverflow page. It's not a perfect solution because the invocation might be somewhere completely different from where the URL is defined, but it's probably a close, and certainly a good point to start tracking down the exact invocation site.
There might be a better way to search the contents of each URL. It doesn't look like PhantonJS's onResourceReceived handler currently exposes the resource content, but there is ongoing work to address that, and once that's available all of this will be much simpler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With