Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to get a table row by a string inside it via BeatifullSoup?

<tr class="list even">
    <td class="list">5</td>
    <td class="list"><s>BI</s>→MU</td>
    <td class="list"><s>TEACHER</s>→TEACHER</td>
    <td class="list">Hello I am a Text</td>
    <td class="list">5b</td>
    <td class="list">BI3</td></tr>

This is one of the table rows. There are some with one row as an inline header, but idc for them.

So, I want to only get the rows that contains the string "8f" but not only the td, the whole tr In case there are multiple rows containing the string it should give me a list from them

for row in soup.find_all('tr', class_='list even'):
    if '5b' in row.text:
        print(row)
        for cell in row.find_all('td'):
            if "5b" not in cell.text:
                print(cell.text)

for row in soup.find_all('tr', class_='list odd'):
    if '5b' in row.text:
        for cell in row.find_all('td'):
            if "5b" not in cell.text:
                print(cell.text)

I have this now, but it adds a newline before the last table field: https://haste.thevillage.chat/foguvakixa.py

if "5b" not in cell.text:

This is because if i request the data for 5d i dont need to know again that its 5d. So this just filters the class itselfs out

like image 608
Niwla23 Avatar asked Nov 30 '25 07:11

Niwla23


2 Answers

You could use pandas read_html to grab table then filter on klasse column

import pandas as pd

def get_lectures_two(df, klasse):    
    new_df = df[df['(Klasse(n))'] == klasse]
    return new_df

def get_df(url):
    df = pd.read_html(url)[0]
    df = df[~df['Stunde'].str.contains("LEHRER")]
    return df

df = get_df('https://niwla23.gitlab.io/download/vertreterdemo.html')
print(get_lectures_two(df, '5b'))

With bs4 4.7.1 + you can use :contains and :has, along with the appropriate column index via nth-of-type to target the appropriate rows (I use pandas here just to quickly generate a nice tabular output for viewing.... you already have the lists of lists from bs4 and could use csv to write for example)

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

def get_lectures(klasse):
    rows = []
    for row in soup.select(f'.mon_list tr:has(td:nth-of-type(5):contains("{klasse}"))'):
        rows.append([td.text.replace('\xa0','') for td in row.select('td')])
    return rows

r = requests.get('https://niwla23.gitlab.io/download/vertreterdemo.html')
soup = bs(r.content, 'lxml')
headers = [th.text for th in soup.select('th.list')]
klasse = '5b'

df = pd.DataFrame(get_lectures(klasse), columns = headers)
print(df)
like image 132
QHarr Avatar answered Dec 02 '25 22:12

QHarr


Try the following code.fetch the row text and check if its having 5b

from bs4 import BeautifulSoup
import requests
res=requests.get("http://niwla23.gitlab.io/download/vertreterdemo.html")
soup=BeautifulSoup(res.text,'lxml')

for row in soup.find_all('tr', class_='list even'):
    if '5b' in row.text:
        print(row.text)
like image 21
KunduK Avatar answered Dec 02 '25 21:12

KunduK



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!