I'm getting a text article from one website with help of python
and BeatifulSoup
. Now I have strange problem... I just wana print out the text inside multiple p
tags which are located in div with class dr_article
. Now the with code looking like this:
from bs4 import BeautifulSoup
def getArticleText(webtext):
soup = BeautifulSoup(webtext)
divTag = soup.find_all("div", {"class":"dr_article"})
for tag in divTag:
pData = tag.find_all("p").text
print pData
I'm getting following error:
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
execfile("word_rank/main.py")
File "word_rank/main.py", line 7, in <module>
articletext.getArticleText(webtext)
File "word_rank\articletext.py", line 7, in getArticleText
pData = tag.find_all("p").text
AttributeError: 'list' object has no attribute 'text'
But when I choose just the first element with [0]
before .text
I'm not getting the error and it works as supposed to. It gets first element text. To be precise I modify my code and it looks like this:
from bs4 import BeautifulSoup
def getArticleText(webtext):
soup = BeautifulSoup(webtext)
divTag = soup.find_all("div", {"class":"dr_article"})
for tag in divTag:
pData = tag.find_all("p")[0].text
print pData
My question is how can I get text from all element at once? What to modify so I would not get text from only one element but from all?
You are getting all element, so the function returns the list. Try to go through it:
from bs4 import BeautifulSoup
def getArticleText(webtext):
soup = BeautifulSoup(webtext)
divTag = soup.find_all("div", {"class":"dr_article"})
for tag in divTag:
for element in tag.find_all("p"):
pData = element.text
print pData
Or you can select each element separately:
tag.find_all("p")[0].text
tag.find_all("p")[1].text
tag.find_all("p")[..].text
tag.find_all("p")[N - 1].text
tag.find_all("p")[N].text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With