Python requests.get() returns broken source code instead of expected source code?

Question

Made a request on the above Wikipedia page. Specifically I need to scrape "results matrix" from https://en.wikipedia.org/wiki/2017%E2%80%9318_La_Liga#Results

selectedSeasonPage = requests.get('https://en.wikipedia.org/wiki/2017–18_La_Liga', features='html5lib')

Doing pprint.pprint(selectedSeasonPage.text) and jumping to source code of matrix, it can be seen it's incomplete.

Snippet of HTML returned by requests.get() :

<table class="wikitable plainrowheaders" style="text-align:center;font-size:100%;">
.
.
<th scope="row" style="text-align:right;"><a href="/wiki/Deportivo_Alav%C3%A9s" title="Deportivo Alavés">Alavés</a></th>
<td style="font-weight: normal;background-color:transparent;">— </td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">2–1</td>

HTML returned by requests.get() viewed through browser and as expected its not complete. Can check this image for reference.

Snippet from view-source and the output needed.

<table class="wikitable plainrowheaders" style="text-align:center;font-size:100%;">
.
.
<a href="/wiki/Deportivo_Alav%C3%A9s" title="Deportivo Alavés">Alavés</a></th>
<td style="font-weight: normal;background-color:transparent;">&#8212;</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">3–1</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#FFBBBB;">0–1</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#FFBBBB;">0–2</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">2–1</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">1–0</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#FFBBBB;">1–2</td>

Posting a sample HTML for reference since posting entire output is not possible. Can post more specific parts if required.

My question is how to get entire source of matrix without resulting in loss of values?

From what I understand going through previous questions, requests fails to return expected output if some part of page is rendered by JavaScript. But this page seems to be simple HTML and CSS (at least the part that is required). Cannot use Selenium need to scrape multiple pages. Would be grateful for solution using requests or something equivalent.

Requests version is 2.19.1. Python version is 3.7.0.

Is anything missing? I am new to this stuff, any help appreciated.

Noah B. Johnson · Accepted Answer

Almost your exact code without the "features" parameter in the get call:

import requests
selectedSeasonPage = requests.get('https://en.wikipedia.org/wiki/2017–18_La_Liga')
print(selectedSeasonPage.text)

Gives me:

<th scope="row" style="text-align:right;"><a href="/wiki/Deportivo_Alav%C3%A9s" title="Deportivo Alavés">Alavés</a>
</th>
<td style="font-weight:normal;background:transparent;">&#8212;</td>
<td style="white-space:nowrap;font-weight:normal;background:#BBF3FF;">3–1</td>
<td style="white-space:nowrap;font-weight:normal;background:#FBB;">0–1</td>
<td style="white-space:nowrap;font-weight:normal;background:#FBB;">0–2</td>
<td style="white-space:nowrap;font-weight:normal;background:#BBF3FF;">2–1</td>
<td style="white-space:nowrap;font-weight:normal;background:#BBF3FF;">1–0</td>
<td style="white-space:nowrap;font-weight:normal;background:#FBB;">1–2</td>

Python requests.get() returns broken source code instead of expected source code?

Tags:

python

python-3.x

python-requests

Fatal Python Error

1 Answers

Noah B. Johnson

Recent Activity

Donate For Us

Python requests.get() returns broken source code instead of expected source code?

Tags:

python

python-3.x

python-requests

Fatal Python Error

1 Answers

Noah B. Johnson

Related questions

Recent Activity

Donate For Us