I would like to scrape a table from the Ligue 1 football website. Specifically the table which contains information on cards and referees.
http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1
I am using the following code:
import requests
from bs4 import BeautifulSoup
import csv
r=requests.get("http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1")
soup= BeautifulSoup(r.content, "html.parser")
table=soup.find_all('table')
This returns another table somewhere else in the html. I have tried to circumnavigate this by using [0]
, [1]
etc after the find all function but return nothing. I have also searched for tr
and td
but get similar results. I have no idea why beautiful soup ignores this table.
The table I am looking for is in the HTML code below
<table>
<thead>
<tr>
<th class="{sorter: false} hide position">Position</th>
<th class="{sorter: false} joueur">Referees</th>
<th class="chiffre header"><span class="icon icon_carton_jaune">Yellow card</span></th>
<th class="chiffre header"><span class="icon icon_carton_rouge">Red card</span></th>
<th class="chiffre header">Matches</th>
</tr>
</thead>
<tbody><tr>
<td class="position"></td>
<td class="joueur">Benoît BASTIEN</td>
<td class="chiffre"><a href="/stats_arbitre_details/245">25</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/245">4</a></td>
<td class="chiffre">8</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Hakim BEN EL HADJ</td>
<td class="chiffre"><a href="/stats_arbitre_details/259">55</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/259">4</a></td>
<td class="chiffre">10</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Wilfried BIEN</td>
<td class="chiffre"><a href="/stats_arbitre_details/162">44</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/162">3</a></td>
<td class="chiffre">9</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Ruddy BUQUET</td>
<td class="chiffre"><a href="/stats_arbitre_details/269">33</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/269">2</a></td>
<td class="chiffre">7</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Tony CHAPRON</td>
<td class="chiffre"><a href="/stats_arbitre_details/102">43</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/102">1</a></td>
<td class="chiffre">8</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Amaury DELERUE</td>
<td class="chiffre"><a href="/stats_arbitre_details/343">30</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/343">0</a></td>
<td class="chiffre">6</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Saïd ENNJIMI</td>
<td class="chiffre"><a href="/stats_arbitre_details/113">27</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/113">1</a></td>
<td class="chiffre">6</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Fredy FAUTREL</td>
<td class="chiffre"><a href="/stats_arbitre_details/338">25</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/338">2</a></td>
<td class="chiffre">8</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Antony GAUTIER</td>
<td class="chiffre"><a href="/stats_arbitre_details/331">31</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/331">8</a></td>
<td class="chiffre">9</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Johan HAMEL</td>
<td class="chiffre"><a href="/stats_arbitre_details/334">43</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/334">7</a></td>
<td class="chiffre">9</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Lionel JAFFREDO</td>
<td class="chiffre"><a href="/stats_arbitre_details/124">40</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/124">2</a></td>
<td class="chiffre">9</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Stéphane JOCHEM</td>
<td class="chiffre"><a href="/stats_arbitre_details/294">33</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/294">4</a></td>
<td class="chiffre">8</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Stéphane LANNOY</td>
<td class="chiffre"><a href="/stats_arbitre_details/127">24</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/127">0</a></td>
<td class="chiffre">6</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Mikael LESAGE</td>
<td class="chiffre"><a href="/stats_arbitre_details/286">38</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/286">3</a></td>
<td class="chiffre">9</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Jérôme MIGUELGORRY</td>
<td class="chiffre"><a href="/stats_arbitre_details/239">32</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/239">1</a></td>
<td class="chiffre">10</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Benoît MILLOT</td>
<td class="chiffre"><a href="/stats_arbitre_details/287">43</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/287">0</a></td>
<td class="chiffre">11</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Sébastien MOREIRA</td>
<td class="chiffre"><a href="/stats_arbitre_details/148">38</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/148">5</a></td>
<td class="chiffre">10</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Nicolas RAINVILLE</td>
<td class="chiffre"><a href="/stats_arbitre_details/188">40</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/188">7</a></td>
<td class="chiffre">10</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Frank SCHNEIDER</td>
<td class="chiffre"><a href="/stats_arbitre_details/247">33</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/247">4</a></td>
<td class="chiffre">10</td>
</tr>
<tr class="odd">
<td class="position"></td>
<td class="joueur">Clément TURPIN</td>
<td class="chiffre"><a href="/stats_arbitre_details/333">26</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/333">3</a></td>
<td class="chiffre">8</td>
</tr>
<tr>
<td class="position"></td>
<td class="joueur">Bartolomeu VARELA</td>
<td class="chiffre"><a href="/stats_arbitre_details/288">35</a></td>
<td class="chiffre"><a href="/stats_arbitre_details/288">3</a></td>
<td class="chiffre">9</td>
</tr>
</tbody></table>
I have also tried searching for td
with a specific class as well which should work but it can't pick out the table in the first place.
The problem is that (i assume) you are watching the HTML code generated by the browser, and what you are missing is that the table is appended to the page using javascript.
You can confirm this using chrome (or any other browser), and instead of "Inspect", look for "View Page Source", and you will notice that there is no such table in the server response.
The URL it calls is "http://www.ligue1.com/stats_arbitre?competition=D1", but there is a trick, you must indicate via http headers, that the request is a XHR. If you try in the browser with this URL, you'll get 500 response.
Try this curl example to check is the table you want.
curl --header "X-Requested-With: XMLHttpRequest" http://www.ligue1.com/stats_arbitre?competition=D1
In your code, do this:
import requests
from bs4 import BeautifulSoup
import csv
headers = {'X-Requested-With': 'XMLHttpRequest'}
r = requests.get('http://www.ligue1.com/stats_arbitre?competition=D1', headers=headers)
...
Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With