Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing json var inside script tag

Tags:

python

lxml

I'm currently trying to scrape the json output of the follow 'https://sports.bovada.lv/soccer/premier-league'

it has a source with the following

<script type="text/javascript">var swc_market_lists = {"items":[{"description":"Game Lines","id":"23", ... </script>

I'm trying to get the contents of the swc_market_lists var

Now the issue I have is that when I use the following code

import requests
from lxml import html



url = 'https://sports.bovada.lv/soccer/premier-league'
r = requests.get(url)
tree = html.fromstring(r.content)
var = tree.xpath('//script')
print(var)

I get an empty var value.

I have also tried saving the r.text and viewing it but I don't see the script tags in there.

What am I missing?

like image 822
nadermx Avatar asked Feb 10 '16 04:02

nadermx


People also ask

How to parse the JSON object in JavaScript?

Use the JavaScript function JSON. parse() to convert text into a JavaScript object: const obj = JSON. parse('{"name":"John", "age":30, "city":"New York"}');

How to parse JSON to string in JavaScript?

Use the JavaScript function JSON. stringify() to convert it into a string. const myJSON = JSON. stringify(obj);

How to get value from JSON object in JavaScript?

Parsing JSON Data in JavaScript In JavaScript, you can easily parse JSON data received from the web server using the JSON. parse() method. This method parses a JSON string and constructs the JavaScript value or object described by the string.

What is JSON parsing?

JSON parsing is the process of converting a JSON object in text format to a Javascript object that can be used inside a program. In Javascript, the standard way to do this is by using the method JSON.


1 Answers

You need to pass the User-Agent header to make it work:

r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})

To get the desired script, you can check for presence of swc_market_lists in the text:

script = tree.xpath('//script[contains(., "swc_market_lists")]/text()')[0]
print(script)

To extract the swc_market_lists variable value:

import re

data = re.search(r"var swc_market_lists = (.*?);$", script).group(1)
print(data)

Then, to make it easy to work with it, load it with json.loads() into a Python dictionary:

import json
data = json.loads(data)
like image 165
alecxe Avatar answered Oct 24 '22 04:10

alecxe