Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't parse tabular content from some search results using post requests

I'm trying to fetch some tabular content from a webpage using the script below. To populate the content manually, it is necessary to choose the options from the dropdown shown in this image before hitting the Submit button. I've tried to mimic the post http requests accordingly. However, I might have gone somewhere wrong and which is why the script is not working. To be specific, this is what I'm trying to fetch.

This is how I've tried:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.lgindiasocial.com/microsites/brand-store-web-five/locate.aspx'

headers = {
    'x-microsoftajax': 'Delta=true',
    'origin': 'https://www.lgindiasocial.com',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'referer': 'https://www.lgindiasocial.com/microsites/brand-store-web-five/locate.aspx',
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
    r = s.get(URL)
    soup = BeautifulSoup(r.text,"lxml")
    payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
    payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
    payload['ddlState:'] = 'Assam'
    payload['ddlCity'] = 'Golaghat'
    payload['ddllocation'] = 'Golaghat'
    s.headers.update(headers)
    r = s.post(URL,data=payload)
    soup = BeautifulSoup(r.text,"lxml")
    item = soup.select_one("table")
    print(item)

When I run the script, I get None as output.

How can I fetch the tabular content from the search results using post requests?

EDIT: If I copy the content of payload directly from dev tools and use the same within payload I get desired results.

import requests
from bs4 import BeautifulSoup

URL = 'https://www.lgindiasocial.com/microsites/brand-store-web-five/locate.aspx'

payload = "ScriptManager1=UpdatePanel1%7Cbtnsubmit&hidcity=&ddlState=Assam&ddlCity=Golaghat&ddllocation=Golaghat&__EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=M%2BqldpZhV90EX2sawXMrHD7jYtOMXnrPuP8XfVtS21GKmxK0YYuBnqm3I7tU%2BKMtFGZgzWpsYK%2FYJtfTBUK%2F0WobR21tjbWjdrZiXS5FlLcS6qgYMNKfqyZRcK13dbz667H7T6QZqpITTRSqsM%2BrM91VW989KXoknFdx0H6EkRFCJRu4WsBsUxeJnd5Lf5IAUN%2BTNKDYE5GuclDNKnmU1pMmHhrjKQysvYtw8cjD5DdDkNb7NDkLiVxm7DISyXZtVJyOBV6dFa%2Blm1%2FR9M7F2nyepARAl0XIiNP9dhFvomLNdlP%2BU%2FNyllJ5IXW4D%2Fl5Kfx5yaRP8XSKURtAc915i%2F2T48a0dyAR42tJ40eit1IWs7MCwgesNtF35zkuKN1SRhyhHqcnKjcMYW%2BkLqKsLvKpLQcDuXrIAzYyqlgJZ%2FlBQJo%2BiM4tTOH4mEqDkSZW%2Fk94KX1OM70s9%2FS%2Fd5trrHIgNoKw1bCRI8IQ41ZEicMsJPTp67KnqoMZz0F0cCmo%2F49zYkuHw0kqaZmKCrRUNW8Xcr%2F5A3AfNg%2FB8WURD0g2x%2BwzcLXDcVCJ6ngf0LdOc%2BTppM6EOZpTGJGjjDqK116tzWAOPfiJHgBuIPkiZJTaEHnwwjcYXuuLN%2FTgPFUJkXVjBSyRdCnPXsebInNd4Wsu2lnNdwZUO3rnNuu5eY%2FHf7YemcmCEzji%2FxLG%2FynnG0sG61TC1bJCyFw2E3V6ZGshbuqDfh7QQyxqPDEt2uaCN7s%2FOZ%2FwiXeVY2henUVBZSVrxUvF6QT0eO4SIY0OlNYBLK7cO4YG4zC0tURSBr7lZwR%2B%2FowLieNGSO7sOeLQVwL71GKnzBAOZVQH1hw%2B8FIRPoc0pn3v7RjK5CMgTtrZlar67Cv1lTi2nUyAIpX%2BhGkaQeOsg%2ByaIqDIo%2FWwcrg9VV9QP%2FdmwP8hTtq3KTVs0Ncja4Yvizm12BkEwWtMJ9fqzLBXt%2F2J2EjsG7GudgXypwSU7U8oY%2Fq%2BCk93y%2FeTr1ftEFbpGRTRm4hNVXeoCYRyuJceU%2BvO4U5E29ZPqBIolidYtKKH7lnRxKNk2BHtY93VNHPZEjTEDnHcGbgtHmxlBjHRQZlzJKWTjY5ccdFABihGx%2FzY0VCwaehpx2BWxy5qXqW1fX7e5uxxxHteYVt7YyrzYPsX%2B%2FlKiYwt23fsJzmmVkHwmu5%2FTSk1Ms9yJmBE%2B8pEF%2Bum01L8jRH4zxyTaD4s779uLZwLAUUzpi5cfseKTrjGv7uNjCpNci9BXbSdCdqrKa8aPiJX0lWUH9zid%2B8Jc7Jhx%2Bb6nzJpbZ8E9sPpUlcHVGUSzqixsiK91W%2FDDk2LCOvTqJJ9JXmy5cwRhL9r95okWq%2BDImTetFhdYk9%2F9VH3JsACpv4dqqdviEjjFpvmEp7SBMLSWw7toPUIRortPtriz3u9velTqNpHgmbmig8Znb%2F4Q8JrYfjPZzfRxN%2FuQXQyxUNUY2IsYbC5Bm7JWTMZe869muBdE%2FlMLujUkOFCXaOwZXuZHbr7neq0nro3RvYUggBLqxGFlG1Bp52iDNklcx8nfjVMOhOybfCMcxz6mq4Ew2hdLv4IslLRawI5u%2FPQe0vu0TG9LeBeR6Ok1sf72rWpvhD6yl4GTy8oJC1UglabWo8i5aMprxxAWuz%2BzLzizI3aRTQsl1MFKsD9gIGZsaFNAIb7gEgFgw%2B%2BSjTGR51mGES3sOUYXscIJVBciBs3F9vnr8u5gfKD3hLwqvc4djKMBxVQfjLEs%2FQwb7mlOx8XodaV6uOrkiZpw2WZNja5RPBIp4VXeXKXIxqBNsNA4eGT%2Bx2b2JadVB8%3D&__VIEWSTATEGENERATOR=06ED1D24&__VIEWSTATEENCRYPTED=&__ASYNCPOST=true&btnsubmit=Submit"

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
    s.headers.update({'content-type':'application/x-www-form-urlencoded; charset=UTF-8'})
    r = s.post(URL,data=payload)
    soup = BeautifulSoup(r.text,"lxml")
    item = soup.select_one("table")
    print(item)
like image 591
MITHU Avatar asked Oct 16 '22 01:10

MITHU


1 Answers

First a small typo in your code, (extra colon in there)

payload['ddlState:'] = 'Assam'

The larger problem is to do with the way the page is constructed. The page has three dropdowns, and those dropdowns send a POST request. Each of the POST requests returns a modified __VIEWSTATE that needs to be included in the header of the subsequent request.

In your code, you are taking the __VIEWSTATE from the input[form] on the original GET request only, you need to get the __VIEWSTATE from the last POST request. So the following should work:

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
    r = s.get(URL)
    soup = BeautifulSoup(r.text, "lxml")

    # first POST = Select State
    payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
    payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
    payload['ddlState'] = 'Assam'
    payload['ddlCity'] = 'Select City'
    payload['ddllocation'] = 'Select Location'
    payload['__EVENTTARGET'] = 'ddlState'
    r = s.post(URL, data=payload)
    soup = BeautifulSoup(r.text, "lxml")

    # second POST = Select City
    payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
    payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
    payload['ddlCity'] = 'Golaghat'
    payload['ddllocation'] = 'Select Location'
    payload['__EVENTTARGET'] = 'ddlCity'
    r = s.post(URL, data=payload)
    soup = BeautifulSoup(r.text, "lxml")

    # third POST = Select Location
    payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
    payload['ScriptManager1'] = 'UpdatePanel1|btnsubmit'
    payload['ddlCity'] = 'Golaghat'
    payload['ddllocation'] = 'Golaghat'
    payload['__EVENTTARGET'] = ''

    s.headers.update(headers)
    r = s.post(URL, data=payload)
    soup = BeautifulSoup(r.text, "lxml")
    item = soup.select_one("table")
    print(item)

There's room for some optimisation of this code. I tried to make the problem transparent.

like image 122
Rusticus Avatar answered Oct 19 '22 02:10

Rusticus