Python web scraping - how to get resources with beautiful soup when page loads contents via JS?

Tags:

So I am trying to scrape a table from a specific website using BeautifulSoup and urllib. My goal is to create a single list from all the data in this table. I have tried using this same code using tables from other websites, and it works fine. However, while trying it with this website the table returns a NoneType object. Can someone help me with this? I've tried looking for other answers online but I'm not having much luck.

Here's the code:

import requests
import urllib

from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib.request.urlopen("http://www.teamrankings.com/ncaa-basketball/stat/free-throw-pct").read())

table = soup.find("table", attrs={'class':'sortable'})

data = []
rows = table.findAll("tr")
for tr in rows:
    cols = tr.findAll("td")
    for td in cols:
        text = ''.join(td.find(text=True))
        data.append(text)

print(data)

259

asked Apr 20 '15 16:04

QwErTy99

1 Answers

It looks like this data is loaded via an ajax call:

enter image description here

You should target that url instead: http://www.teamrankings.com/ajax/league/v3/stats_controller.php

import requests
import urllib

from bs4 import BeautifulSoup


params = {
    "type":"team-detail",
    "league":"ncb",
    "stat_id":"3083",
    "season_id":"312",
    "cat_type":"2",
    "view":"stats_v1",
    "is_previous":"0",
    "date":"04/06/2015"
}

content = urllib.request.urlopen("http://www.teamrankings.com/ajax/league/v3/stats_controller.php",data=urllib.parse.urlencode(params).encode('utf8')).read()
soup = BeautifulSoup(content)

table = soup.find("table", attrs={'class':'sortable'})

data = []
rows = table.findAll("tr")
for tr in rows:
    cols = tr.findAll("td")
    for td in cols:
        text = ''.join(td.find(text=True))
        data.append(text)

print(data)

Using your web inspector you can also view the parameters that are passed along with the POST request.

enter image description here

Generally the server on the other end will check for these values and reject your request if you do not have some or all of them. The above code snippet ran fine for me. I switched to urllib2 because I generally prefer to use that library.

If the data loads in your browser it is possible to scrape it. You just need to mimic the request your browser sends.

answered Oct 07 '22 02:10

Farmer Joe

Related questions
                            
                                pandas plot doesn't show in ipython notebook as inline
                            
                                Vectorized spherical bessel functions in python?
                            
                                ImportError when using console_scripts in setuptools
                            
                                Python RESTful client like Guzzle from PHP
                            
                                Boto-like library for Google Cloud Storage
                            
                                axes.fmt_xdata in matplotlib not being called
                            
                                matplotlib: Set width or height of figure without changing aspect ratio
                            
                                Prevent matplotlib from interpreting underscore as subscript in plot title
                            
                                Pycharm 3.4.1 - "AppRegistryNotReady: Models aren't loaded yet". Django Rest framewrok
                            
                                US Census API - Get The Population of Every City in a State Using Python
                            
                                Python list of Objects taking up too much memory
                            
                                How to upload complete folder to Dropbox using python
                            
                                Calculating just a specific property in regionprops python
                            
                                HOW-TO: LDAP bind+authenticate using python-ldap
                            
                                Django 1.8 - KeyError 'request'
                            
                                Changing the __name__ of a generator
                            
                                python highlighting in Rmarkdown in RStudio
                            
                                Using multiple labels with Neomodel
                            
                                python pandas conditional count across columns
                            
                                How can I express a multi-line regex in assertRegex in Python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python web scraping - how to get resources with beautiful soup when page loads contents via JS?

Tags:

python

beautifulsoup

urllib

screen-scraping

QwErTy99

People also ask

1 Answers

Farmer Joe

Recent Activity

Donate For Us