Extract text inside HTML paragraph using BeautifulSoup in Python

Question

<p>
    <a name="533660373"></a>
    <strong>Title: Point of Sale Threats Proliferate</strong><br />
    <strong>Severity: Normal Severity</strong><br />
    <strong>Published: Thursday, December 04, 2014 20:27</strong><br />
    Several new Point of Sale malware families have emerged recently, to include LusyPOS,..<br />
    <em>Analysis: Emboldened by past success and media attention, threat actors  ..</em>
    <br />
</p>

This is a paragraph I want to extact from an HTML page using BeautifulSoup in Python. I am able to get values inside tags, using the .children & .string methods. But I am unable to get the text "Several new Point of Sale malware fa..." which is inside paragraph without any tag. I tried using soup.p.text , .get_text() etc.. but no use.

golden_boy · Accepted Answer

import urllib.request
from bs4 import BeautifulSoup

url = "https://www.geeksforgeeks.org/how-to-automate-an-excel-sheet-in-python/?ref=feed"

html = urllib.request.urlopen(url)

htmlParse = BeautifulSoup(html, 'html.parser')

for para in htmlParse.find_all("p"):
    print(para.get_text())

alecxe · Answer

Use find_all() with text=True to find all text nodes and recursive=False to search only among direct children of the parent p tag:

from bs4 import BeautifulSoup

data = """
<p>
    <a name="533660373"></a>
    <strong>Title: Point of Sale Threats Proliferate</strong><br />
    <strong>Severity: Normal Severity</strong><br />
    <strong>Published: Thursday, December 04, 2014 20:27</strong><br />
    Several new Point of Sale malware families have emerged recently, to include LusyPOS,..<br />
    <em>Analysis: Emboldened by past success and media attention, threat actors  ..</em>
    <br />
</p>
"""

soup = BeautifulSoup(data)
print ''.join(text.strip() for text in soup.p.find_all(text=True, recursive=False))

Prints:

Several new Point of Sale malware families have emerged recently, to include LusyPOS,..

Extract text inside HTML paragraph using BeautifulSoup in Python

Tags:

python

html

beautifulsoup

web-scraping

Remis Haroon - رامز

2 Answers

golden_boy

alecxe

Recent Activity

Donate For Us

Extract text inside HTML paragraph using BeautifulSoup in Python

Tags:

python

html

beautifulsoup

web-scraping

Remis Haroon - رامز

2 Answers

golden_boy

alecxe

Related questions

Recent Activity

Donate For Us