Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the easiest web scraping tool that handles javascripts [closed]

I would like to make a web scraping application that is able to log in to a website (I was able to do this with twill (python)), and also to be able to execute JavaScript which trigger access to other pages.

I would definitely prefer to use something in python, but I am ready to try something new. I have installed mechanize, watir, Hojocki, etc. but not sure if this really helps.

like image 987
RockridgeKid Avatar asked Aug 15 '12 15:08

RockridgeKid


People also ask

Is web scraping possible with JavaScript?

Web scraping with JavaScript is a very useful technique to extract data from the Internet for presentation or analysis.

Which is better for web scraping JavaScript or Python?

Python is your best bet. Libraries such as requests or HTTPX makes it very easy to scrape websites that don't require JavaScript to work correctly. Python offers a lot of simple-to-use HTTP clients. And once you get the response, it's also very easy to parse the HTML with BeautifulSoup for example.

Is Node JS good for scraping?

Web scraping is the process of extracting data from a website in an automated way and Node. js can be used for web scraping. Even though other languages and frameworks are more popular for web scraping, Node. js can be utilized well to do the job too.


2 Answers

I'd recommend PhantomJS.

It's a full Webkit browser, but headless and scriptable.

It's ideal for this sort of thing.

like image 185
SDC Avatar answered Oct 18 '22 11:10

SDC


I believe there are a few modules (such as Ghost), but I have used Selenium/WebDriver for things like this. It is ostensibly a testing framework, but it provides you with a lot of methods to allow you to interact with the page just as if you had loaded it as a normal user. You also have the benefit of running it so that a browser actually opens and you can watch the code execute (makes debugging easier), or in a 'headless' mode where the code just executes (there are other sites/SO answers with much better explanations than I can give :) ).

That being said, Ghost looks great as well, so try them both and hopefully one will get you what you need!

Also, see Javascript (and HTML rendering) engine without a GUI for automation? for a similar question that may have some additional answers.

like image 43
RocketDonkey Avatar answered Oct 18 '22 11:10

RocketDonkey