Parse the JavaScript returned from BeautifulSoup

Tags:

I would like to parse the webpage http://dcsd.nutrislice.com/menu/meadow-view/lunch/ to grab today's lunch menu. (I've built an Adafruit #IoT Thermal Printer and I'd like to automatically print the menu each day.)

I initially approached this using BeautifulSoup but it turns out that most of the data is loaded in JavaScript and I'm not sure BeautifulSoup can handle it. If you view source you'll see the relevant data stored in bootstrapData['menuMonthWeeks'].

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://dcsd.nutrislice.com/menu/meadow-view/lunch/"
soup = BeautifulSoup(urllib2.urlopen(url).read())

This is an easy way to get the source and review.

My question is: what is the easiest way to extract this data so that I can do something with it? Literally, all I want is a string something like:

Southwest Cheese Omelet, Potato Wedges, The Harvest Bar (THB), THB - Cheesy Pesto Bread, Ham Deli Sandwich, Red Pepper Sticks, Strawberries

I've thought about using webkit to process the page and get the HTML (i.e. what a browser does) but that seems unnecessarily complex. I'd rather simply find something that can parse the bootstrapData['menuMonthWeeks'] data.

885

asked Jan 11 '14 23:01

Wade

1 Answers

Something like PhantomJS may be more robust, but here's some basic Python code to extract it the full menu:

import json
import re
import urllib2

text = urllib2.urlopen('http://dcsd.nutrislice.com/menu/meadow-view/lunch/').read()
menu = json.loads(re.search(r"bootstrapData\['menuMonthWeeks'\]\s*=\s*(.*);", text).group(1))

print menu

After that, you'll want to search through the menu for the date you're interested in.

EDIT: Some overkill on my part:

import itertools
import json
import re
import urllib2

text = urllib2.urlopen('http://dcsd.nutrislice.com/menu/meadow-view/lunch/').read()
menus = json.loads(re.search(r"bootstrapData\['menuMonthWeeks'\]\s*=\s*(.*);", text).group(1))

days = itertools.chain.from_iterable(menu['days'] for menu in menus)

day = next(itertools.dropwhile(lambda day: day['date'] != '2014-01-13', days), None)

if day:
    print '\n'.join(item['food']['description'] for item in day['menu_items'])
else:
    print 'Day not found.'

101

answered Sep 22 '22 04:09

user94559

Related questions
                            
                                How to create unique constraint in Elasticsearch database?
                            
                                jQuery on vs bind for invalid event type
                            
                                What is cygwin and what does it do [closed]
                            
                                AngularJS best practice REST / CRUD
                            
                                how to add authentication header to $window.open
                            
                                PostgreSQL check constraint for foreign key condition
                            
                                Using View-Models with Repository pattern
                            
                                difference between cocos2d-x vs cocos2d-js
                            
                                Color highlighting of Makefile warnings and errors
                            
                                Elasticsearch term filter not working?
                            
                                Is it possible to write inline assembly in Swift?
                            
                                Why containers being removed after build?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With