I'm trying to fetch some tabular content from a webpage using the script below. To populate the content manually, it is necessary to choose the options from the dropdown shown in this image before hitting the Submit
button. I've tried to mimic the post http requests accordingly. However, I might have gone somewhere wrong and which is why the script is not working. To be specific, this is what I'm trying to fetch.
This is how I've tried:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.lgindiasocial.com/microsites/brand-store-web-five/locate.aspx'
headers = {
'x-microsoftajax': 'Delta=true',
'origin': 'https://www.lgindiasocial.com',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'referer': 'https://www.lgindiasocial.com/microsites/brand-store-web-five/locate.aspx',
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
r = s.get(URL)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
payload['ddlState:'] = 'Assam'
payload['ddlCity'] = 'Golaghat'
payload['ddllocation'] = 'Golaghat'
s.headers.update(headers)
r = s.post(URL,data=payload)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("table")
print(item)
When I run the script, I get None as output.
How can I fetch the tabular content from the search results using post requests?
EDIT: If I copy the content of payload directly from dev tools and use the same within payload I get desired results.
import requests
from bs4 import BeautifulSoup
URL = 'https://www.lgindiasocial.com/microsites/brand-store-web-five/locate.aspx'
payload = "ScriptManager1=UpdatePanel1%7Cbtnsubmit&hidcity=&ddlState=Assam&ddlCity=Golaghat&ddllocation=Golaghat&__EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=M%2BqldpZhV90EX2sawXMrHD7jYtOMXnrPuP8XfVtS21GKmxK0YYuBnqm3I7tU%2BKMtFGZgzWpsYK%2FYJtfTBUK%2F0WobR21tjbWjdrZiXS5FlLcS6qgYMNKfqyZRcK13dbz667H7T6QZqpITTRSqsM%2BrM91VW989KXoknFdx0H6EkRFCJRu4WsBsUxeJnd5Lf5IAUN%2BTNKDYE5GuclDNKnmU1pMmHhrjKQysvYtw8cjD5DdDkNb7NDkLiVxm7DISyXZtVJyOBV6dFa%2Blm1%2FR9M7F2nyepARAl0XIiNP9dhFvomLNdlP%2BU%2FNyllJ5IXW4D%2Fl5Kfx5yaRP8XSKURtAc915i%2F2T48a0dyAR42tJ40eit1IWs7MCwgesNtF35zkuKN1SRhyhHqcnKjcMYW%2BkLqKsLvKpLQcDuXrIAzYyqlgJZ%2FlBQJo%2BiM4tTOH4mEqDkSZW%2Fk94KX1OM70s9%2FS%2Fd5trrHIgNoKw1bCRI8IQ41ZEicMsJPTp67KnqoMZz0F0cCmo%2F49zYkuHw0kqaZmKCrRUNW8Xcr%2F5A3AfNg%2FB8WURD0g2x%2BwzcLXDcVCJ6ngf0LdOc%2BTppM6EOZpTGJGjjDqK116tzWAOPfiJHgBuIPkiZJTaEHnwwjcYXuuLN%2FTgPFUJkXVjBSyRdCnPXsebInNd4Wsu2lnNdwZUO3rnNuu5eY%2FHf7YemcmCEzji%2FxLG%2FynnG0sG61TC1bJCyFw2E3V6ZGshbuqDfh7QQyxqPDEt2uaCN7s%2FOZ%2FwiXeVY2henUVBZSVrxUvF6QT0eO4SIY0OlNYBLK7cO4YG4zC0tURSBr7lZwR%2B%2FowLieNGSO7sOeLQVwL71GKnzBAOZVQH1hw%2B8FIRPoc0pn3v7RjK5CMgTtrZlar67Cv1lTi2nUyAIpX%2BhGkaQeOsg%2ByaIqDIo%2FWwcrg9VV9QP%2FdmwP8hTtq3KTVs0Ncja4Yvizm12BkEwWtMJ9fqzLBXt%2F2J2EjsG7GudgXypwSU7U8oY%2Fq%2BCk93y%2FeTr1ftEFbpGRTRm4hNVXeoCYRyuJceU%2BvO4U5E29ZPqBIolidYtKKH7lnRxKNk2BHtY93VNHPZEjTEDnHcGbgtHmxlBjHRQZlzJKWTjY5ccdFABihGx%2FzY0VCwaehpx2BWxy5qXqW1fX7e5uxxxHteYVt7YyrzYPsX%2B%2FlKiYwt23fsJzmmVkHwmu5%2FTSk1Ms9yJmBE%2B8pEF%2Bum01L8jRH4zxyTaD4s779uLZwLAUUzpi5cfseKTrjGv7uNjCpNci9BXbSdCdqrKa8aPiJX0lWUH9zid%2B8Jc7Jhx%2Bb6nzJpbZ8E9sPpUlcHVGUSzqixsiK91W%2FDDk2LCOvTqJJ9JXmy5cwRhL9r95okWq%2BDImTetFhdYk9%2F9VH3JsACpv4dqqdviEjjFpvmEp7SBMLSWw7toPUIRortPtriz3u9velTqNpHgmbmig8Znb%2F4Q8JrYfjPZzfRxN%2FuQXQyxUNUY2IsYbC5Bm7JWTMZe869muBdE%2FlMLujUkOFCXaOwZXuZHbr7neq0nro3RvYUggBLqxGFlG1Bp52iDNklcx8nfjVMOhOybfCMcxz6mq4Ew2hdLv4IslLRawI5u%2FPQe0vu0TG9LeBeR6Ok1sf72rWpvhD6yl4GTy8oJC1UglabWo8i5aMprxxAWuz%2BzLzizI3aRTQsl1MFKsD9gIGZsaFNAIb7gEgFgw%2B%2BSjTGR51mGES3sOUYXscIJVBciBs3F9vnr8u5gfKD3hLwqvc4djKMBxVQfjLEs%2FQwb7mlOx8XodaV6uOrkiZpw2WZNja5RPBIp4VXeXKXIxqBNsNA4eGT%2Bx2b2JadVB8%3D&__VIEWSTATEGENERATOR=06ED1D24&__VIEWSTATEENCRYPTED=&__ASYNCPOST=true&btnsubmit=Submit"
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
s.headers.update({'content-type':'application/x-www-form-urlencoded; charset=UTF-8'})
r = s.post(URL,data=payload)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("table")
print(item)
First a small typo in your code, (extra colon in there)
payload['ddlState:'] = 'Assam'
The larger problem is to do with the way the page is constructed. The page has three dropdowns, and those dropdowns send a POST request. Each of the POST requests returns a modified __VIEWSTATE that needs to be included in the header of the subsequent request.
In your code, you are taking the __VIEWSTATE from the input[form] on the original GET request only, you need to get the __VIEWSTATE from the last POST request. So the following should work:
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
r = s.get(URL)
soup = BeautifulSoup(r.text, "lxml")
# first POST = Select State
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
payload['ddlState'] = 'Assam'
payload['ddlCity'] = 'Select City'
payload['ddllocation'] = 'Select Location'
payload['__EVENTTARGET'] = 'ddlState'
r = s.post(URL, data=payload)
soup = BeautifulSoup(r.text, "lxml")
# second POST = Select City
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
payload['ddlCity'] = 'Golaghat'
payload['ddllocation'] = 'Select Location'
payload['__EVENTTARGET'] = 'ddlCity'
r = s.post(URL, data=payload)
soup = BeautifulSoup(r.text, "lxml")
# third POST = Select Location
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
payload['ddlCity'] = 'Golaghat'
payload['ddllocation'] = 'Golaghat'
payload['__EVENTTARGET'] = ''
s.headers.update(headers)
r = s.post(URL, data=payload)
soup = BeautifulSoup(r.text, "lxml")
item = soup.select_one("table")
print(item)
There's room for some optimisation of this code. I tried to make the problem transparent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With