Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing table data using Beautifulsoup

I am trying to parse this webpage.

As shown below, each page has the ability stats. I am eventually trying to parse all abilities into an object. e.g. {'corners': 15, 'crossing': 15...}

I first started to parse a single stat, corners by doing:

from bs4 import BeautifulSoup as bs
import requests
url = 'https://fmdataba.com/19/p/1165/lionel-messi/'
page = requests.get(url)
soup = bs(page.content, 'html.parser')
print(soup.prettify())
soup.find({"id": "fm_cro"})

but this returns an empty list.

Could anyone please help?

enter image description here

like image 926
Dawn17 Avatar asked May 06 '26 13:05

Dawn17


2 Answers

With bs4 4.7.1 you can use nth-child(odd) and nth-child(even) to get the different tds within each row to create your dict; and use :has and :contains to get the right table for each keyword and build your outer dict to house each inner.

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://fmdataba.com/19/p/1165/lionel-messi/', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
abilities = ['TECHNICAL', 'MENTAL' , 'PHYSICAL']

def get_abilities(soup, keyword):
    table = soup.select_one('div:has(h3:contains("' + ability + '")) + div > table')
    d = {item.select_one('td:nth-child(odd)').text: int(item.select_one('td:nth-child(even)').text) for item in table.select('tr')}
    return d

results = {}

for ability in abilities:
    results[ability] = get_abilities(soup, ability)

print(results)  

Output:

enter image description here


CSS explanation:

The css selector line as follows:

soup.select_one('div:has(h3:contains("' + ability + '")) + div > table')

select_one is like select in that it applies the css selector within to the soup object but only returns the first match.

:has and :contains are pseudo classes like :nth-child(). Looking at the html in question for the first ability table here is an explanation of the parts:

Click on image to enlarge.

enter image description here


Additional reading:

  1. Pseudo class selectors
  2. Adjacent sibling combinator
  3. Child combinator
  4. Css selectors general
  5. select_one
like image 57
QHarr Avatar answered May 08 '26 03:05

QHarr


You can also use pandas:

import pandas as pd
import requests

url = 'https://fmdataba.com/19/p/1165/lionel-messi/'
page = requests.get(url, headers={'User-Agent':'Mozilla/5.0'})

tables = pd.read_html(page.text)
all_data = {}
for idx, name in [(2, 'TECHNICAL'), (3, 'MENTAL'), (4, 'PHYSICAL')]:
    tbl = tables[idx]
    data = {r[0]: r[1] for _, r in tbl.iterrows()}
    all_data[name] = data

tables[2] is the TECHNICAL table, tables[3] is the MENTAL table and tables[4] is the PHYSICAL table.

like image 39
spadarian Avatar answered May 08 '26 03:05

spadarian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!