Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting from BS4 and storing as list elements in Python

I have extracted data wrapped within multiple HTML p tags from a webpage using BeautifulSoup4. I have stored all of the extracted data in a list. But I want each of the extracted data as separate list elements separated by a comma.

HTML content structure:

<ul>
   <li>
      <p>
        <span class="TextRun">
          <span class="NormalTextrun"> Data 1 </span>
        </span>
      </p>
   </li>
   <li>
      <p>
        <span class="TextRun">
          <span class="NormalTextrun"> Data 2 </span>
        </span>
      </p>
   </li>
   <li>
      <p>
        <span class="TextRun">
          <span class="NormalTextrun"> Data 3 </span>
        </span>
      </p>
   </li>
</ul>

Code to extract:

for data in elem.find_all('span', class_="TextRun"):
    data = ''.join([' '.join(item.text.split()) for item in elem.select(".NormalTextRun")])
    data = data.replace(u'\xa0', '')
    events_parsed_thisweek.append(data)
    print (events_parsed_thisweek)

Current output: [Data1Data2Data3]

Expected output: [Data1, Data2, Data3]

Any help is much appreciated!

like image 245
Pooja Avatar asked May 11 '26 13:05

Pooja


2 Answers

data = [x.text.strip() for x in elem.find_all('span', {'class': 'NormalTextrun'})]

Printing data will give you: ['Data 1', 'Data 2', 'Data 3']

like image 121
Keyur Potdar Avatar answered May 14 '26 03:05

Keyur Potdar


I think what @Sagun Shrestha suggest works. To deal with it more detailly like the inner span and the extra spaces. Maybe you should try:

data = [s.text.strip() for s in b.find_all('span', class_='NormalTextrun')]
print(data)

If you specifically want the string output without the quotation marks. You can try this:

data = [s.text.strip() for s in b.find_all('span', class_='NormalTextrun')]
print('[', ', '.join(data), ']', sep='')

Hope it's what you want.

like image 33
gepcel Avatar answered May 14 '26 02:05

gepcel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!