I have extracted data wrapped within multiple HTML p tags from a webpage using BeautifulSoup4. I have stored all of the extracted data in a list. But I want each of the extracted data as separate list elements separated by a comma.
HTML content structure:
<ul>
<li>
<p>
<span class="TextRun">
<span class="NormalTextrun"> Data 1 </span>
</span>
</p>
</li>
<li>
<p>
<span class="TextRun">
<span class="NormalTextrun"> Data 2 </span>
</span>
</p>
</li>
<li>
<p>
<span class="TextRun">
<span class="NormalTextrun"> Data 3 </span>
</span>
</p>
</li>
</ul>
Code to extract:
for data in elem.find_all('span', class_="TextRun"):
data = ''.join([' '.join(item.text.split()) for item in elem.select(".NormalTextRun")])
data = data.replace(u'\xa0', '')
events_parsed_thisweek.append(data)
print (events_parsed_thisweek)
Current output: [Data1Data2Data3]
Expected output: [Data1, Data2, Data3]
Any help is much appreciated!
data = [x.text.strip() for x in elem.find_all('span', {'class': 'NormalTextrun'})]
Printing data will give you: ['Data 1', 'Data 2', 'Data 3']
I think what @Sagun Shrestha suggest works. To deal with it more detailly like the inner span and the extra spaces. Maybe you should try:
data = [s.text.strip() for s in b.find_all('span', class_='NormalTextrun')]
print(data)
If you specifically want the string output without the quotation marks. You can try this:
data = [s.text.strip() for s in b.find_all('span', class_='NormalTextrun')]
print('[', ', '.join(data), ']', sep='')
Hope it's what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With