Web Scraping in a Google Chrome Extension (JavaScript + Chrome APIs)

Tags:

What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.

The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest or Origin.

The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.

Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?

var pageContent = getPageContent(url); // TODO: Implement var items = $(pageContent).find('.item'); // Display items with further selections

Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.

217

asked Jun 28 '11 14:06

Seb Nilsson

1 Answers

Attempt to use XHR2 responseType = "document" and fall back on (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type")) with my text/html patch. See https://gist.github.com/1138724 for an example of how I detect responseType = "document support (synchronously checking response === null on an object URL created from a text/html blob).

Use the Chrome WebRequest API to hide X-Requested-With, etc. headers.

101

answered Oct 06 '22 09:10

Eli Grey

Related questions
                            
                                Cross-browser way to flip html/image via Javascript/CSS?
                            
                                Date parsing in javascript is different between safari and chrome
                            
                                I've Heard Global Variables Are Bad, What Alternative Solution Should I Use?
                            
                                Javascript Print iframe contents only
                            
                                JavaScript, Typescript switch statement: way to run same code for two cases?
                            
                                jQuery javascript regex Replace <br> with \n
                            
                                Replace HTML page with contents retrieved via AJAX
                            
                                How to override $exceptionHandler implementation
                            
                                How do I update states `onChange` in an array of object in React Hooks
                            
                                Angular - Can't make ng-repeat orderBy work
                            
                                Javascript to csv export encoding issue
                            
                                How to use in jQuery :not and hasClass() to get a specific element without a class
                            
                                How can I keep the "Console Drawer" hidden by default every time I open Chrome DevTools?
                            
                                Run command after webpack build
                            
                                Vue template or render function not defined yet I am using neither?
                            
                                how to define index in angular material table
                            
                                Converting 24 hour time to 12 hour time w/ AM & PM using Javascript
                            
                                How do I separate an integer into separate digits in an array in JavaScript?
                            
                                How can I get the memory address of a JavaScript variable?
                            
                                Developing a HTML5 offline storage solution for iOS/Android in 2011

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Web Scraping in a Google Chrome Extension (JavaScript + Chrome APIs)

Tags:

javascript

xmlhttprequest

google-chrome

google-chrome-extension

web-scraping

Seb Nilsson

People also ask

1 Answers

Eli Grey

Recent Activity

Donate For Us