Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Can I Scrape Twitter Now That They Require JavaScript?

I have a couple sites that monitor Twitter for specific types of statements and scrape relevant Tweets using curl in PHP. A few days ago those sites stopped scraping Twitter. I figured they probably redesigned the layout of their mobile.twitter site and all I would have to do is change my xPath query to a different class or something, but instead I found out that whenever you try to visit Twitter without JavaScript enabled you are given a prompt to enable JavaScript to access Twitter. There seems to be no way around this. Before this change one could access a version of Twitter that did not require JavaScript, so I could scrape Tweets with a simple curl request and xPath query.

I have searched Google for ways to enable JavaScript support for curl request but have found nothing. Is it possible to add something to a curl request to parse JavaScript or do I need to find soem other solution?

like image 437
PostAlmostAnything Avatar asked Dec 22 '20 04:12

PostAlmostAnything


Video Answer


1 Answers

You can not "Enable" JavaScript on curl. It is not a browser, it only does HTTP requests. Have you considered using the Twitter API?

You can also intercept XHRs on twitter using your browser's development tools and work your way through them to figure out what HTTP request you need to make in order to get the data you want.

Another solution is to use an scriptable "headless" browser. check out CapsperJS. Simply put it is a fully functional browser that does not show any UI and you can control it via JS.

like image 157
kdcode Avatar answered Oct 17 '22 02:10

kdcode