I am trying to scrape shopee sites using requests. With an example site https://shopee.co.id/Paha-Fillet-Ayam-Organik-Lacto-Farm-500gr-Paha-Fillet-Segar-Ayam-Probiotik-Organik-Paha-Boneless-Ayam-MPASI-Ayam-Sehat-Ayam-Anti-Alergi-Daging-Ayam-MPASI-i.382368918.8835294847
I notice that it is using an api
My current code is as follows
import requests
url='https://shopee.co.id/api/v4/item/get?itemid=8835294847&shopid=382368918'
header={
"x-api-source": 'pc',
'af-ac-enc-dat': 'null'
}
response=requests.get(url,headers=header,verify=True)
The response json that I am getting
{'tracking_id': '396e3995-dff2-4813-82e7-f7326026d714',
'action_type': 2,
'error': 90309999,
'is_customized': False,
'is_login': False,
'platform': 0,
'report_extra_info': 'eyJlbmNyeXB0X2tleSI6Im.....}
the response headers is as follows:
{'Server': 'SGW', 'Date': 'Sat, 14 Jan 2023 02:14:33 GMT',
'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked',
'Connection': 'keep-alive', 'Vary': 'Accept-Encoding',
'cache-control': 'no-store, max-age=0', 'Content-Encoding': 'gzip'}
Can someone help me, as I am not understanding why it does not return the response.json properly.
The site does not return json due to missing Cross Site Request Forgery protection token. You will need to add the X-CSRFToken header to the request, which can usually be retrieved from:
csrftoken cookie()
the csrf token meta tag in html
Shopee has a csrf token cookie, but at the moment I can't figure out how it got there (usually the server sends it in the cookie response, but shopee doesn't do that).
Edit:
I forgot that the site also sends the af-ac-enc-dat header in the https://shopee.co.id/api/v4/item/get?itemid=8835294847&shopid=382368918 request, but I have no idea how to get it. So I wrote the request interception code in selenium webdriver to get the response of this request. And it works!
Install selenium wire to capture requests:
pip install selenium-wire
Code:
from seleniumwire import webdriver
import zlib
site_url = "https://shopee.co.id/Paha-Fillet-Ayam-Organik-Lacto-Farm-500gr-Paha-Fillet-Segar-Ayam-Probiotik-Organik-Paha-Boneless-Ayam-MPASI-Ayam-Sehat-Ayam-Anti-Alergi-Daging-Ayam-MPASI-i.382368918.8835294847"
driver = webdriver.Chrome()
driver.maximize_window()
# define scope to capture only specified url
driver.scopes = ["https://shopee.co.id/api/v4/item/get\?.*"]
print("starting to capture")
driver.get(site_url)
assert driver.last_request, "no request found"
target_response = driver.last_request.response
target_encoding = target_response.headers["content-encoding"]
target_response_data = target_response.body
if target_encoding:
if target_encoding == "gzip":
print("content is encoded")
# from https://stackoverflow.com/a/2695575
target_response_data = zlib.decompress(target_response_data, 16 + zlib.MAX_WBITS)
else:
raise ValueError("unsupported encoding")
print()
print("found data: ")
print(target_response_data)
print()
print("closing window")
driver.close()
Outputs:
starting to capture
content is encoded
found data:
b'{"error":null,"error_msg":null,"data":{"itemid":8835294847,"shopid":382368918,"userid":0,"price_max_before_discount":-1,"has_lowest_price_guarantee":false,"price_before_discount":0,"price_min_before_discount":-1,"exclusive_price_info":null,"hidden_price_display":null,"price_min":7500000000,"price_max":7500000000,"price":7500000000,"stock":50,"discount":null,"historical_sold":12,"sold":0,"show_discount":0,"raw_discount":0,"min_purchase_limit":0,"overall_purchase_limit":{"order_max_purchase_limit":0,"overall_purchase_limit":null,"item_overall_quota":null,"start_date":null,"end_date":null},"pack_size":null,"is_live_streaming_price":null,"show_free_return":null,"name":"Paha Fillet Ayam Organik Lacto Farm 500gr, Paha Fillet Segar, Ayam Probiotik Organik, Paha Boneless | Ayam MPASI | Ayam Sehat | Ayam Anti Alergi | Daging Ayam MPASI"........ cut line'
closing window
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With