Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clicking a Javascript link to make a post request in Python

I'm writing a webscraper/automation tool. This tool needs to use POST requests to submit form data. The final action uses this link:

<a id="linkSaveDestination" href='javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("linkSaveDestination", "", true, "", "", false, true))'>Save URL on All Search Engines</a>

to submit data from this form:

<input name="sem_ad_group__destination_url" type="text" maxlength="1024" id="sem_ad_group__destination_url" class="TextValueStyle" style="width:800px;">

I've been using requests and BeautifulSoup. I understand that these libraries can't interact with Javascript, and people recommend Selenium. But as I understand it Selenium can't do POSTs. How can I handle this? Is it possible to do without opening an actual browser like Selenium does?

like image 916
Steven Werner Avatar asked Jan 15 '15 00:01

Steven Werner


2 Answers

Yes. You can absolutely duplicate what the link is doing by just submitting a POST to the proper url (this is, in reality, eventually going to be the same thing that the javascript that fires when the link is clicked does).

You'll find the relevant section in the requests docs here: http://docs.python-requests.org/en/latest/user/quickstart/#more-complicated-post-requests

So, that'll look something like this for your particular case:

payload = {'sem_ad_group__destination_url': 'yourTextValueHere'}
r = requests.post("theActionUrlForTheFormHere", data=payload)

If you're having trouble figuring out what url it is actually be posted to, just monitor the network tab (in chrome dev tools) while you manually click the link yourself, you should be able to find the right request and pull any information off of that.

Good Luck!

like image 200
RutledgePaulV Avatar answered Oct 12 '22 22:10

RutledgePaulV


With selenium you mimic the real-user interactions in a real browser - tell it to locate an input, write a text inside, click a button etc - high-level approach - you don't even need to know what is there under-the-hood, you see what a real user sees. The downside here is that there is a real browser involved which, at least, slows things down. You can though, automate a a headless browser (PhantomJS), or use a Xvfb virtual framebuffer if you don't have conditions to open up a browser with a UI. Example:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get('url here')

button = driver.find_element_by_id('linkSaveDestination')
button.click()

With requests+BeautifulSoup, you are going down to the bare metal - using browser developer tools you research/analyze what requests are made to a server and mimic them in your code. Sometimes the way a page is constructed and requests made are too complicated to automate, or there are anti-web-scraping technique used.

There are pros & cons about both approaches - which option to choose depends on many things.

like image 33
alecxe Avatar answered Oct 12 '22 22:10

alecxe