Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup partial div class matching

I need to fetch milestone information from Github by scraping. The milestone information is embedded in 2 types of div classes: table-list-item milestone notdue and table-list-item milestone.

How can I retrieve the information contained in both classes?

I have: milestones = soup.find_all('div', {'class': 'table-list-item milestone'}) but this line returns empty list for table-list-item milestone notdue

Right now I am doing the following (ugly hack):

milestones = soup.find_all('div', {'class':'table-list-item milestone'})
milestones.extend(soup.findAll('div', {'class': 'table-list-item milestone notdue'}))

Is there any elegant solution for this?

As per this question, BeautifulSoup is supposed to return all matching ones. My issue is exactly opposite!

like image 603
okkhoy Avatar asked Mar 25 '26 17:03

okkhoy


1 Answers

soup.find_all('div', {'class': 'milestone'})

or use CSS selector:

soup.select('.milestone')

in bs4, class is Multi-valued attributes:

it's store in list:[table-list-item, milestone, notdue] and [table-list-item, milestone]

what you need to do is find the shared value,like milestone

like image 61
宏杰李 Avatar answered Mar 29 '26 11:03

宏杰李