I have this HTML code which I'm creating the script for:
https://i.sstatic.net/KFSPT.jpg
I would like to extract the highlighted text ("some text") and print it.
I tried going through every nested div in the way to the div I needed, like this:
import requests
from bs4 import BeautifulSoup
url = "the url this is from"
r = requests.get(url)
for div in soup.find_all("div", {"id": "main"}):
for div2 in div.find_all("div", {"id": "app"}):
for div3 in div2.find_all("div", {"id": "right-sidebar"}):
for div4 in div3.find_all("div", {"id": "chat"}):
for div5 in div4.find_all("div", {"id": "chat-messages"}):
for div6 in div5.find_all("div", {"class": "chat-message"}):
for div7 in div6.find_all("div", {"class": "chat-message-content selectable"}):
print(div7.text.strip())
I implemented what I've seen in guides and similar questions online, but I bet this is not even close and there must be a much easier way.
This doesn't work. It doesn't print anything, and I'm a bit lost. How can I print the highlighted line (which is essentially the very first div child of the div with the id "chat-messages")?
HTML CODE:
<!DOCTYPE html>
<html>
<head>
<title>
</title>
</head>
<body>
<div id="main">
<div data-reactroot="" id="app">
<div class="top-bar-authenticated" id="top-bar">
</div>
<div class="closed" id="navigation-bar">
</div>
<div id="right-sidebar">
<div id="chat">
<div id="chat-head">
</div>
<div id="chat-title">
</div>
<div id="chat-messages">
<div class="chat-message">
<div class="chat-message-avatar" style="background-image: url("https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/65/657dcec97cc00bc378629930ecae1776c0d981e0.jpg");">
</div>
<a class="chat-message-username clickable">
<div class="iron-color">
aloe
</div></a>
<div class="chat-message-content selectable">
<!-- react-text: 2532 -->some text<!-- /react-text -->
</div>
</div>
<div class="chat-message">
</div>
<div class="chat-message">
</div>
<div class="chat-message">
</div>
<div class="chat-message">
</div>
<div class="chat-message">
</div>
Using lxml parser (i.e. soup = BeautifulSoup(data, 'lxml')) you can use .find with multiple classes just as simple as single classes to find nested divs:
soup.find('div',{'class':'chat-message-content selectable'}).text
The line above should work for you as long as the occurence of that class is the only one in the html.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With