Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for Requests equivalent of Mechanize capabilities

I am interested in seeing if Requests can handle some tasks I have primarily been doing in Mechanize.

Mechanize can easily handle filling out forms and submitting forms and I am having a hard time trying to do the same thing in Requests.

For example,

import mechanize
br = mechanize.Browser()
url = "https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7"
br.open(url)
br.select_form(nr=1)
br.form['format']=['2']
br.form['date_format']=['2']
response = br.submit().read()

Would the Requests equivalent not be:

import requests
url = "https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7"
payload = {'format':'2','date_format':'2'}
r = requests.post(url, data=payload)

Does requests.post not submit the form to download the CSV embedded on the page?

Also, for additional information, here are what the forms on the page look like:

for form in br.forms():
    print form

<POST https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7  application/x-www-form-urlencoded
    <TextControl(search_block_form=)>
    <SubmitControl(op=Search) (readonly)>
    <RadioControl(search_type=[*quote, site])>
    <HiddenControl(form_build_id=form-af2eb21e9b6448ffca4e358d0b52f499) (readonly)>
    <HiddenControl(form_id=search_block_form) (readonly)>
    <HiddenControl(search_target=search_instruments) (readonly)>
    <HiddenControl(search_language=&lan=) (readonly)>>
<POST https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7 application/x-www-form-urlencoded
  <RadioControl(format=[*1, 2, 3])>
  <RadioControl(layout=[*2, 1])>
  <RadioControl(decimal_separator=[*1, 2])>
  <RadioControl(date_format=[*1, 2])>
  <SubmitControl(op=Go) (readonly)>
  <SubmitControl(op=Cancel) (readonly)>
  <HiddenControl(form_build_id=form-37e81285a4dbf60e091037f904bac2eb) (readonly)>
  <HiddenControl(form_id=nyx_download_form) (readonly)>>
like image 361
Jake DeVries Avatar asked Dec 25 '22 04:12

Jake DeVries


1 Answers

requests does not fill the same role as Mechanize.

Mechanize loads the actual HTML form and parses this, letting you fill in values for the various elements in the form. When you then ask Mechanize to submit the form, it'll use all information in the form to produce a valid request to the server. This includes any form elements you didn't provide a new value for, using default values if present. This includes any hidden form elements not visible in your browser.

Use a project like robobrowser instead; it wraps requests as well as BeautifulSoup to load webpages, parse out the form elements, help you fill out those elements and submit them back again.

If you want to use just requests, you'll need to make sure you are posting all fields defined by the form. This means you need to look at the method attribute (defaults to GET), the action attribute (defaults to the current URL), and at all the input, select, textarea and button elements. The server may also be expecting additional information in the HTTP request, such as cookies or the Referer (sic) header.

The Mechanize information you printed indicates that it has parsed several more fields from the forms for which you did not provide values, for example. The form in question also contains a hidden input field named form_build_id for example, which the server may be relying on. Mechanize would also have captured any cookies sent with the original form request, and those cookies may also be required for the server to accept the request. robobrowser would take the same context into account.

like image 164
Martijn Pieters Avatar answered Dec 28 '22 06:12

Martijn Pieters