Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup and class with spaces

With BeautifulSoul and Python, I want to find_all all the tr items matching a given class attribute that contains multiple names like this one:

<tr class="admin-bookings-table-row bookings-history-row  paid   ">

I have tried several ways to match that class. Regular expressions, wildcards but I always get an empty list.

Is there any way to use regular expressions, wildcards or how to match this class?

There is posted the same question here with no answer.

like image 872
RuBiCK Avatar asked Oct 12 '17 20:10

RuBiCK


2 Answers

you can use a css selector to match many classes :

from bs4 import BeautifulSoup as soup
html = '''
<tr class="admin-bookings-table-row bookings-history-row  paid   "></tr>
<tr class="admin-bookings-table-row  nope  paid   "></tr>
'''
soup = soup(html, 'lxml')

res = soup.select('tr.admin-bookings-table-row.bookings-history-row.paid')
print(res)

>>> [<tr class="admin-bookings-table-row bookings-history-row paid "></tr>]

Otherwise, maybe this answer can help you too : https://stackoverflow.com/a/46719501/6655211

like image 67
PRMoureu Avatar answered Oct 21 '22 22:10

PRMoureu


HTML class can't contain spaces. This element has multiple classes.

Searching by either of these classes works:

from bs4 import BeautifulSoup

html = '<tr id="history_row_938220" style="" class="admin-bookings-table-row bookings-history-row  paid   ">'


soup = BeautifulSoup(html, 'html.parser')

print(soup.find_all(attrs={'class': 'admin-bookings-table-row'}))
print(soup.find_all(attrs={'class': 'bookings-history-row'}))
print(soup.find_all(attrs={'class': 'paid'}))

All output

[<tr class="admin-bookings-table-row bookings-history-row paid " 
 id="history_row_938220" style=""></tr>]
like image 6
DeepSpace Avatar answered Oct 21 '22 22:10

DeepSpace