I am porting a bash script that uses curl and POSTs the payloads in the code to the URL's and works. The basic issue is that, with robobrowser, I'm running into trouble posting using the page forms.
Stepping through the site:
I have been able to successfully authenticate to the site and perform GETs with both RoboBrowser and Requests+bs4 however I'm confused on POSTing back to the pages themselves.
Using RoboBrowser (liboncall.py)
#!/usr/bin/python
from robobrowser import RoboBrowser
from bs4 import BeautifulSoup as BS
oc_mailbox = '123456'
oc_password_hashed = 'ABCDEFG'
base_uri = 'http://example.com'
auth_uri = oc_base_uri + '/SubLogin.aspx'
find_uri = oc_base_uri + '/FindMe.aspx'
phne_uri = oc_base_uri + '/PhoneLists.aspx'
p_auth_payload = {
'SubLoginControl:javascriptTest': 'true',
'SubLoginControl:mailbox': mailbox,
'SubLoginControl:phoneNumber': '',
'SubLoginControl:password': password_hashed,
'SubLoginControl:btnLogOn': 'Logon',
'SubLoginControl:webLanguage': 'en-US',
'SubLoginControl:initialLanguage': 'en-US',
'SubLoginControl:errorCallBackNumber': 'Entered telephone number contains non-dialable characters.',
'SubLoginControl:cookieMailbox': 'mailbox',
'SubLoginControl:cookieCallbackNumber': 'callbackNumber',
'SubLoginControl:serverDomain': ''
}
p_find_payload = {
'FindMeControl:enableFindMe': 'on',
'FindMeControl:MasterDataControl:focusElement': '',
'FindMeControl:MasterDataControl:masterList:_ctl0:enabled': 'on',
'FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid': '',
'FindMeControl:MasterDataControl:hidSelectedScheduleName': '',
'FindMeControl:MasterDataControl:hidbtnStatus': '',
'FindMeControl:MasterDataControl:hidScheduleXML': '',
'FindMeControl:MasterDataControl:tempScheduleXML': '',
'FindMeControl:MasterDataControl:hidSelectedScheduleGUID': '',
'FindMeControl:MasterDataControl:hidChangedScheduleList': '',
'FindMeControl:btnPhoneLists': 'Phone Lists',
'FindMeControl:enableFindMeHidden': '',
'FindMeControl:applySet': 'false'
}
p_phne_payload = {
'__EVENTARGUMENT': '',
'__EVENTTARGET': 'PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton',
'PhoneListsControl:MasterDataControl:focusElement': '',
'PhoneListsControl:MasterDataControl:masterList:_ctl0:itemGuid': '',
'PhoneListsControl:MasterDataControl:hidSelectedScheduleName': '',
'PhoneListsControl:MasterDataControl:hidbtnStatus': '',
'PhoneListsControl:MasterDataControl:hidScheduleXML': '',
'PhoneListsControl:MasterDataControl:tempScheduleXML': '',
'PhoneListsControl:MasterDataControl:hidSelectedScheduleGUID': '',
'PhoneListsControl:MasterDataControl:hidChangedScheduleList': '',
'PhoneListsControl:applySet': 'false'
}
def auth(mailbox, password):
browser = RoboBrowser(history=False)
browser.open(oc_auth_uri)
signin = browser.get_form(id='aspnetForm')
signin['SubLoginControl:mailbox'].value = mailbox
signin['SubLoginControl:password'].value = password
signin['SubLoginControl:javascriptTest'].value = 'true'
signin['SubLoginControl:btnLogOn'].value = 'Logon'
signin['SubLoginControl:webLanguage'].value = 'en-US'
signin['SubLoginControl:initialLanguage'].value = 'en-US'
signin['SubLoginControl:errorCallBackNumber'].value = 'Entered+telephone+number+contains+non-dialable+characters.'
signin['SubLoginControl:cookieMailbox'].value = 'mailbox'
signin['SubLoginControl:cookieCallbackNumber'].value = 'callbackNumber'
signin['SubLoginControl:serverDomain'].value = ''
browser.submit_form(signin)
return browser
Login to site and show URL to verify we're in:
In [20]: from liboncall import *
In [21]: m = auth(oc_mailbox, oc_password_hashed)
In [22]: m.url
Out[22]: u'http://example.com/OptionsSummary.aspx'
Open "/FindMe.aspx":
In [24]: m.open(find_uri)
In [25]: m.url
Out[25]: u'http://example.com/FindMe.aspx'
Initially "/FindMe.aspx" will load a form and a button "Phone Lists", (FindMeControl:btnPhoneLists
).
In [26]: m.select('title')
Out[26]: [<title>Find Me</title>]
In [27]: form_find_a = m.get_form(action="FindMe.aspx")
In [28]: for i in form_find_a.keys():
print(i)
....:
__VIEWSTATE
__EVENTVALIDATION
FindMeControl:enableFindMe
FindMeControl:MasterDataControl:focusElement
FindMeControl:MasterDataControl:masterList:_ctl0:enabled
FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid
FindMeControl:MasterDataControl:btnAdd
FindMeControl:MasterDataControl:btnDelete
FindMeControl:MasterDataControl:btnRename
FindMeControl:MasterDataControl:btnCancel
FindMeControl:MasterDataControl:btnEnter
FindMeControl:MasterDataControl:btnUpdate
FindMeControl:MasterDataControl:hidSelectedScheduleName
FindMeControl:MasterDataControl:hidbtnStatus
FindMeControl:MasterDataControl:hidScheduleXML
FindMeControl:MasterDataControl:tempScheduleXML
FindMeControl:MasterDataControl:hidSelectedScheduleGUID
FindMeControl:MasterDataControl:hidChangedScheduleList
FindMeControl:btnApply
FindMeControl:btnSchedules
FindMeControl:btnPhoneLists
FindMeControl:enableFindMeHidden
FindMeControl:applySet
Remove un-needed form fields, fill out form and submit:
In [29]: find_remove = (
'FindMeControl:MasterDataControl:btnAdd',
'FindMeControl:MasterDataControl:btnDelete',
'FindMeControl:MasterDataControl:btnRename',
'FindMeControl:MasterDataControl:btnCancel',
'FindMeControl:MasterDataControl:btnEnter',
'FindMeControl:MasterDataControl:btnUpdate',
'FindMeControl:btnApply',
'FindMeControl:btnSchedules')
In [30]: for i in find_remove:
form_find_a.fields.pop(i)
In [31]: form_find_a['FindMeControl:enableFindMe'].value = 'on'
form_find_a['FindMeControl:MasterDataControl:focusElement'].value = ''
form_find_a['FindMeControl:MasterDataControl:masterList:_ctl0:enabled'].value = 'on'
form_find_a['FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidSelectedScheduleName'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidbtnStatus'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidScheduleXML'].value = ''
form_find_a['FindMeControl:MasterDataControl:tempScheduleXML'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidSelectedScheduleGUID'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidChangedScheduleList'].value = ''
form_find_a['FindMeControl:btnPhoneLists'].value = 'Phone Lists'
form_find_a['FindMeControl:enableFindMeHidden'].value = ''
form_find_a['FindMeControl:applySet'].value = 'false'
Out [31]: ...
In [32]: m.submit_form(form_find_a)
Verifying that page has updated and has the list item "Work":
In [33]: m.parsed.find('title')
Out[33]: <title>Phone Lists</title>
In [34]: m.parsed.find('a', id='PhoneListsControl_MasterDataControl_masterList__ctl0_SelectButton')
Out[34]: <a class="linkButtonItem" href="javascript:__doPostBack('PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton','')" id="PhoneListsControl_MasterDataControl_masterList__ctl0_SelectButton" onclick="javascript:onClick();">Work</a>
Get the "PhoneLists.aspx" form, remove un-needed fields, fill out and submit.
In [35]: form_find_b = m.get_form(action='PhoneLists.aspx')
In [36]: phne_remove = (
'PhoneListsControl:MasterDataControl:btnAdd',
'PhoneListsControl:MasterDataControl:btnDelete',
'PhoneListsControl:MasterDataControl:btnRename',
'PhoneListsControl:MasterDataControl:btnCancel',
'PhoneListsControl:MasterDataControl:btnEnter',
'PhoneListsControl:MasterDataControl:btnUpdate',
'PhoneListsControl:btnApply',
'PhoneListsControl:btnBack')
In [37]: for i in phne_remove:
form_find_b.fields.pop(i)
In [38]: form_find_b['PhoneListsControl:MasterDataControl:focusElement'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidChangedScheduleList'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidScheduleXML'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidSelectedScheduleGUID'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidSelectedScheduleName'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidbtnStatus'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:masterList:_ctl0:itemGuid'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:tempScheduleXML'].value = ''
form_find_b['PhoneListsControl:applySet'].value = 'false'
In [39]: m.submit_form(form_find_b)
Review the post to see if user list loaded. In this instance, it did not load the user list.
In [40]: m.parsed.findAll('div', id='PhoneListsControl_phoneListMembersText')
Out[41]: [<div class="displayText" id="PhoneListsControl_phoneListMembersText"></div>]
If it was successfull the above would return:
<div id="PhoneListsControl_phoneListMembersText" class="displayText" style="top: 315px; left: 281px;"> Work </div>
Along with the following items in a table, (PhoneListsControl_phoneListDetail
):
<input name="PhoneListsControl:phoneListDetail:_ctl2:number" type="text" value="95551234567" maxlength="50" id="PhoneListsControl_phoneListDetail__ctl2_number" onkeyup="enableApplyButton('PhoneListsControl_')" style="width:140px;">
...
<input name="PhoneListsControl:phoneListDetail:_ctl3:number" type="text" value="95551236789" maxlength="50" id="PhoneListsControl_phoneListDetail__ctl2_number" onkeyup="enableApplyButton('PhoneListsControl_')" style="width:140px;">
...
At this venture I figured out that Robobrowser isn't including all the required
formdata for the post to "PhoneLists.aspx" to work as expected, ('__EVENTTARGET':'PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton'
and __EVENTARGUMENT
). Setting the params and then doing submit_form(form_find_b)
does not achieve desired results either. I wonder if the add_field()
from robobrowser.forms.form
would work but I'm not understanding how to properly utilize it, (if it is to be used at all as I wanted. e.g. Add the __EVENTTARGET
and __EVENTARGUMENT
hidden input fields to the form).
Is there something else I am missing or does RoboBrowser/Requests not support this type of post? Is it that the form requires javascript to execute as mentioned here with mechanize?
After much googling, re-posting for help on reddit and then randomly stumbling this RoboBrowser issue that showed me how to properly use the 'fields.add_field()' method; the problem is solved.
e.g.
b_e_arg = robobrowser.forms.fields.Input('\<input name="__EVENTARGUMENT" value="" \/\>')
b_e_target = robobrowser.forms.fields.Input('\<input name="__EVENTTARGET" value="PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton" \/\>')
In [30]: form_find_b.add_field(b_e_target)
In [31]: form_find_b.add_field(b_e_arg)
Once the form was updated with these values, the form submit to "PhoneLists.aspx" works as expected.
In [33]: m.submit_form(form_find_b)
In [34]: m.url
Out[34]: u'http://example/PhoneLists.aspx'
In [35]: m.parsed.findAll('div', id='PhoneListsControl_phoneListMembersText')
Out[35]: [<div class="displayText" id="PhoneListsControl_phoneListMembersText"> Work </div>]
In [36]: m.parsed.findAll('input', id='PhoneListsControl_phoneListDetail__ctl2_number')
Out[36]: [<input id="PhoneListsControl_phoneListDetail__ctl2_number" maxlength="50" name="PhoneListsControl:phoneListDetail:_ctl2:number" onkeyup="enableApplyButton('PhoneListsControl_')" type="text" value="95551234567"/>]
I hope anyone else that has to scrape ASPX sites finds this useful. Happy hacking to all!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With