Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't Scrape a Specific Table using BeautifulSoup4 (Python 3)

I would like to scrape a table from the Ligue 1 football website. Specifically the table which contains information on cards and referees.

http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1

I am using the following code:

import requests
from bs4 import BeautifulSoup
import csv

r=requests.get("http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1")

soup= BeautifulSoup(r.content, "html.parser")
table=soup.find_all('table')

This returns another table somewhere else in the html. I have tried to circumnavigate this by using [0], [1] etc after the find all function but return nothing. I have also searched for tr and td but get similar results. I have no idea why beautiful soup ignores this table.

The table I am looking for is in the HTML code below

<table>
<thead>
  <tr>
    <th class="{sorter: false} hide position">Position</th>
    <th class="{sorter: false} joueur">Referees</th>
    <th class="chiffre header"><span class="icon icon_carton_jaune">Yellow card</span></th>
    <th class="chiffre header"><span class="icon icon_carton_rouge">Red card</span></th>
    <th class="chiffre header">Matches</th>
  </tr>
</thead>
    <tbody><tr>
  <td class="position"></td>
  <td class="joueur">Benoît BASTIEN</td>
  <td class="chiffre"><a href="/stats_arbitre_details/245">25</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/245">4</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Hakim BEN EL HADJ</td>
  <td class="chiffre"><a href="/stats_arbitre_details/259">55</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/259">4</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Wilfried BIEN</td>
  <td class="chiffre"><a href="/stats_arbitre_details/162">44</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/162">3</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Ruddy BUQUET</td>
  <td class="chiffre"><a href="/stats_arbitre_details/269">33</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/269">2</a></td>
  <td class="chiffre">7</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Tony CHAPRON</td>
  <td class="chiffre"><a href="/stats_arbitre_details/102">43</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/102">1</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Amaury DELERUE</td>
  <td class="chiffre"><a href="/stats_arbitre_details/343">30</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/343">0</a></td>
  <td class="chiffre">6</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Saïd ENNJIMI</td>
  <td class="chiffre"><a href="/stats_arbitre_details/113">27</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/113">1</a></td>
  <td class="chiffre">6</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Fredy FAUTREL</td>
  <td class="chiffre"><a href="/stats_arbitre_details/338">25</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/338">2</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Antony GAUTIER</td>
  <td class="chiffre"><a href="/stats_arbitre_details/331">31</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/331">8</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Johan HAMEL</td>
  <td class="chiffre"><a href="/stats_arbitre_details/334">43</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/334">7</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Lionel JAFFREDO</td>
  <td class="chiffre"><a href="/stats_arbitre_details/124">40</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/124">2</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Stéphane JOCHEM</td>
  <td class="chiffre"><a href="/stats_arbitre_details/294">33</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/294">4</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Stéphane LANNOY</td>
  <td class="chiffre"><a href="/stats_arbitre_details/127">24</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/127">0</a></td>
  <td class="chiffre">6</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Mikael LESAGE</td>
  <td class="chiffre"><a href="/stats_arbitre_details/286">38</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/286">3</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Jérôme MIGUELGORRY</td>
  <td class="chiffre"><a href="/stats_arbitre_details/239">32</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/239">1</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Benoît MILLOT</td>
  <td class="chiffre"><a href="/stats_arbitre_details/287">43</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/287">0</a></td>
  <td class="chiffre">11</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Sébastien MOREIRA</td>
  <td class="chiffre"><a href="/stats_arbitre_details/148">38</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/148">5</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Nicolas RAINVILLE</td>
  <td class="chiffre"><a href="/stats_arbitre_details/188">40</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/188">7</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Frank SCHNEIDER</td>
  <td class="chiffre"><a href="/stats_arbitre_details/247">33</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/247">4</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Clément TURPIN</td>
  <td class="chiffre"><a href="/stats_arbitre_details/333">26</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/333">3</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Bartolomeu VARELA</td>
  <td class="chiffre"><a href="/stats_arbitre_details/288">35</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/288">3</a></td>
  <td class="chiffre">9</td>
</tr>
</tbody></table>

I have also tried searching for td with a specific class as well which should work but it can't pick out the table in the first place.

like image 786
Richard Hudson Avatar asked Oct 31 '22 13:10

Richard Hudson


1 Answers

The problem is that (i assume) you are watching the HTML code generated by the browser, and what you are missing is that the table is appended to the page using javascript.

You can confirm this using chrome (or any other browser), and instead of "Inspect", look for "View Page Source", and you will notice that there is no such table in the server response.

The URL it calls is "http://www.ligue1.com/stats_arbitre?competition=D1", but there is a trick, you must indicate via http headers, that the request is a XHR. If you try in the browser with this URL, you'll get 500 response.

Try this curl example to check is the table you want.

curl --header "X-Requested-With: XMLHttpRequest" http://www.ligue1.com/stats_arbitre?competition=D1

In your code, do this:

import requests
from bs4 import BeautifulSoup
import csv

headers = {'X-Requested-With': 'XMLHttpRequest'}
r = requests.get('http://www.ligue1.com/stats_arbitre?competition=D1', headers=headers)

...

Hope it helps

like image 93
Nodiel Clavijo Llera Avatar answered Nov 08 '22 15:11

Nodiel Clavijo Llera