I would like to make a web scraping application that is able to log in to a website (I was able to do this with twill (python)), and also to be able to execute JavaScript which trigger access to other pages.
I would definitely prefer to use something in python, but I am ready to try something new. I have installed mechanize, watir, Hojocki, etc. but not sure if this really helps.
Web scraping with JavaScript is a very useful technique to extract data from the Internet for presentation or analysis.
Python is your best bet. Libraries such as requests or HTTPX makes it very easy to scrape websites that don't require JavaScript to work correctly. Python offers a lot of simple-to-use HTTP clients. And once you get the response, it's also very easy to parse the HTML with BeautifulSoup for example.
Web scraping is the process of extracting data from a website in an automated way and Node. js can be used for web scraping. Even though other languages and frameworks are more popular for web scraping, Node. js can be utilized well to do the job too.
I'd recommend PhantomJS.
It's a full Webkit browser, but headless and scriptable.
It's ideal for this sort of thing.
I believe there are a few modules (such as Ghost), but I have used Selenium/WebDriver for things like this. It is ostensibly a testing framework, but it provides you with a lot of methods to allow you to interact with the page just as if you had loaded it as a normal user. You also have the benefit of running it so that a browser actually opens and you can watch the code execute (makes debugging easier), or in a 'headless' mode where the code just executes (there are other sites/SO answers with much better explanations than I can give :) ).
That being said, Ghost looks great as well, so try them both and hopefully one will get you what you need!
Also, see Javascript (and HTML rendering) engine without a GUI for automation? for a similar question that may have some additional answers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With