Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dynamic JS generated Code while scraping a site

I'm a newbie to scraping. I'm trying to scrape the value from this site with button Buy Now.
Option I've tried is:

from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebPage

class Client(QWebPage):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        # self.loadFinished.connect(self.on_page_load)
        # self.mainFrame().load(QUrl(url))
        # self.app.exec_()
    def on_page_load(self):
        self.app.quit()
    def mypage(self, url):
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
client_response = Client()
def parse(url):                # OSRS + RS3
    client_response.mypage(url)
    source = client_response.mainFrame().toHtml()
    soup = BeautifulSoup(source, 'html.parser')
    osrs_text = soup.findAll('input', attrs={'type': 'number'})
    quantity = (osrs_text[0])['min']
    final = 0
    if(quantity == '1'):
        final_osrs = round(float(soup.findAll('span', attrs={'id':'goldprice'})[0].text),3)
        print(final_osrs)

    else:
        price = round(float(soup.findAll('span', attrs={'id':'goldprice'})[0].text),3)
        final_rs3 = price/int(quantity)
        print(final_rs3)

This approach is not good because it's taking too much time to scrape. I also tried Selenium Approach but that's also not needed at the moment.
Can u guys please suggest me the better way to scrape the value? Here is what I need. Any help will highly be appreciated. Thanks.



P.S: I tried this library because the content was dynamically generated.

like image 642
woloho Avatar asked Dec 07 '25 19:12

woloho


1 Answers

I am not sure how much difference in performance you will get, but you can try and check this solution.

import requests
from bs4 import BeautifulSoup

baseUrl = 'https://www.rsmalls.com/osrs-gold'
postUrl = 'https://www.rsmalls.com/index.php?route=common/quickbuy/rsdetail'

with requests.Session() as session:
    res = session.get(baseUrl)
    soup = BeautifulSoup(res.text, 'lxml')
    game_id = soup.select_one("#choose-game > option[selected]")['value']
    response = session.post(postUrl, data={'game_id': game_id}).json()
    print(f"{'Gold Price:'} {response['price']}")

In this code, first I am getting the id of "Runescape 2007", just in case if the website owner changes it. You may skip that step and directly provide value '345' as id to next post request, if you are sure that it will not change.

The price is loaded with JS code as you mentioned. Using browser dev tools, I could get the actual POST request made to get the price, which requires the id selected from dropdown. The POST request to https://www.rsmalls.com/index.php?route=common/quickbuy/rsdetail, gives a json response like:

{"success":true,"product_id":"30730","price":0.85,"server_id":"1661","server_option":"463","quantity":"1|5|10|20|50|100|200|300|500|1000|1500|2000","name":"M"}

So, I have parsed the response as json and got the price from it.
Let me know if you have any questions.

EDIT:

There is different POST request made on https://rsmalls.com/runescape3-gold, so the same solution doesn't work. The POST request can be different for each page/website/data. You can find such post request by yourself using browser devtools as shown here. In the right, where you can see that POST request to a URL is made, at the bottom you will find the data sent to POST request as well. Also note that, in the response to this request, it is always replying with price of 1 unit, so it may not match if the default number of units on website is more than 1(like 5 in below screenshot).

enter image description here

like image 181
Kamal Avatar answered Dec 09 '25 09:12

Kamal