Required item not in soup object - BeautifulSoup Python

Question

So I want to extract "bilibili-player-video-info-people-number" from this link: https://www.bilibili.com/video/BV1a44y167wK. When I create my beautifulsoup object and search it, this class is not there. Is it due to the parser? I did try lxml and html5lib but neither did any better.

<span class="bilibili-player-video-info-people-number">585</span>

That's the full element that I want to extract - the number updates every minute to show how many people are viewing currently.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
import re
import html5lib

driver = webdriver.Chrome(r'C:\Users\Rob\Downloads\chromedriver.exe')

driver.get('https://www.bilibili.com/video/BV1a44y167wK')

content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content, 'html5lib')

viewers = soup.findAll('span', class_='bilibili-player-video-info-people-text')

print(viewers[0])

print(viewers[0]) returns an out of range error as there is nothing in the viewers object.

Thank you!

baduker · Accepted Answer

Almost the entire site is behind JavaScript so bs4 is useless, unless the element you want is in the requested HTML. In your case, it's not.

However, there's an API endpoint that you can query that carries this data (and much more).

With a bit of regex and requests you can get the online count (of viewers).

Here's how:

import re

import requests

with requests.Session() as connection:
    page_url = "https://www.bilibili.com/video/BV1a44y167wK"
    page = connection.get(page_url).text
    cid = re.search(r"cid\":(\d+),\"page", page).group(1)
    aid = re.search(r"aid\":(\d+),", page).group(1)
    url = f"https://api.bilibili.com/x/player/v2?cid={cid}&aid={aid}&bvid={page_url.rsplit('/', 1)[-1]}"
    print(connection.get(url).json()["data"]["online_count"])

Output (note: it might change, as viewers come and go):

Required item not in soup object - BeautifulSoup Python

Tags:

python

beautifulsoup

web-scraping

exoticdisease

1 Answers

baduker

Recent Activity

Donate For Us

Required item not in soup object - BeautifulSoup Python

Tags:

python

beautifulsoup

web-scraping

exoticdisease

1 Answers

baduker

Related questions

Recent Activity

Donate For Us