What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.
The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest
or Origin
.
The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.
Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?
var pageContent = getPageContent(url); // TODO: Implement var items = $(pageContent).find('.item'); // Display items with further selections
Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.
Whether it's a web or mobile application, JavaScript now has the right tools. This article will explain how the vibrant ecosystem of NodeJS allows you to efficiently scrape the web to meet most of your requirements.
Attempt to use XHR2 responseType = "document"
and fall back on (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type"))
with my text/html
patch. See https://gist.github.com/1138724 for an example of how I detect responseType = "document
support (synchronously checking response === null
on an object URL created from a text/html
blob).
Use the Chrome WebRequest API to hide X-Requested-With
, etc. headers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With