Passing table data using Beautifulsoup

Question

I am trying to parse this webpage.

As shown below, each page has the ability stats. I am eventually trying to parse all abilities into an object. e.g. {'corners': 15, 'crossing': 15...}

I first started to parse a single stat, corners by doing:

from bs4 import BeautifulSoup as bs
import requests
url = 'https://fmdataba.com/19/p/1165/lionel-messi/'
page = requests.get(url)
soup = bs(page.content, 'html.parser')
print(soup.prettify())
soup.find({"id": "fm_cro"})

but this returns an empty list.

Could anyone please help?

enter image description here

QHarr · Accepted Answer

With bs4 4.7.1 you can use nth-child(odd) and nth-child(even) to get the different tds within each row to create your dict; and use :has and :contains to get the right table for each keyword and build your outer dict to house each inner.

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://fmdataba.com/19/p/1165/lionel-messi/', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
abilities = ['TECHNICAL', 'MENTAL' , 'PHYSICAL']

def get_abilities(soup, keyword):
    table = soup.select_one('div:has(h3:contains("' + ability + '")) + div > table')
    d = {item.select_one('td:nth-child(odd)').text: int(item.select_one('td:nth-child(even)').text) for item in table.select('tr')}
    return d

results = {}

for ability in abilities:
    results[ability] = get_abilities(soup, ability)

print(results)

Output:

enter image description here

CSS explanation:

The css selector line as follows:

soup.select_one('div:has(h3:contains("' + ability + '")) + div > table')

select_one is like select in that it applies the css selector within to the soup object but only returns the first match.

:has and :contains are pseudo classes like :nth-child(). Looking at the html in question for the first ability table here is an explanation of the parts:

Click on image to enlarge.

enter image description here

Additional reading:

Pseudo class selectors
Adjacent sibling combinator
Child combinator
Css selectors general
select_one

spadarian · Answer

You can also use pandas:

import pandas as pd
import requests

url = 'https://fmdataba.com/19/p/1165/lionel-messi/'
page = requests.get(url, headers={'User-Agent':'Mozilla/5.0'})

tables = pd.read_html(page.text)
all_data = {}
for idx, name in [(2, 'TECHNICAL'), (3, 'MENTAL'), (4, 'PHYSICAL')]:
    tbl = tables[idx]
    data = {r[0]: r[1] for _, r in tbl.iterrows()}
    all_data[name] = data

tables[2] is the TECHNICAL table, tables[3] is the MENTAL table and tables[4] is the PHYSICAL table.

Passing table data using Beautifulsoup

Tags:

python

html

beautifulsoup

web-scraping

Dawn17

2 Answers

QHarr

spadarian

Recent Activity

Donate For Us

Passing table data using Beautifulsoup

Tags:

python

html

beautifulsoup

web-scraping

Dawn17

2 Answers

QHarr

spadarian

Related questions

Recent Activity

Donate For Us