I'm trying to parse some data for the moon phase today using Python's library BeautifulSoup.
from bs4 import BeautifulSoup
import urllib2
moon_url = "http://www.moongiant.com/phase/today/"
try:
rqest = urllib2.urlopen(moon_url)
moon_Soup = BeautifulSoup(rqest, 'lxml')
moon_angle = 0
moon_illumination = 0
main_data = moon_Soup.find('div', {'id' : 'moonDetails'})
print main_data
except urllib2.URLError:
print "Error"
But the output instead of this:
<div id="moonDetails">
Phase: <span>Waxing Crescent</span><br>Illumination: <span>36%
</span><br>Moon Age: <span>6.00 days</span><br>Moon Angle: <span>0.55</span><br>Moon Distance: <span>364,</span>434.78 km<br>Sun Angle: <span>0.53</span><br>Sun Distance: <span>149,</span>571,918.47 km<br>
</div>
Is only this:
<div id="moonDetails">
</div>
Any idea?
As stated by RaminNietzsche in the comments, you should extract the text of your script in this particular script
tag. You can use regex
or built-in methods
(like split()
, strip()
and replace()
, for example.
Code:
from bs4 import BeautifulSoup
import requests
import re
import json
moon_url = "http://www.moongiant.com/phase/today/"
html_source = requests.get(moon_url).text
moon_soup = BeautifulSoup(html_source, 'html.parser')
data = moon_soup.find_all('script', {'type' : 'text/javascript'})
for d in data:
d = d.text
if 'var jArray=' in d:
jArray = re.search('\{(.*?)\}', d).group()
moon_data = json.loads(jArray)
print(moon_data)
#if you want mArray data too, you just have to:
# 1. add `'var mArray=' in d` in the if clause, and
# 2. uncomment the following lines
#mArray = re.search('\[+(.*?)\];', d).group()
#print(mArray)
Output:
{'3': ['<b>April 4</b>', '58%\n', 'Sun Angle: 0.53291621763825', 'Sun Distance: 149657950.85286', 'Moon Distance: 369697.55153449', 'Moon Age: 8.1316595947356', 'Moon Angle: 0.53870564539409', 'Waxing Gibbous', 'April 4'], '2': ["<span style='color:#c7b699'><b>April 3</b></span>", 'Illumination: <span>47%\n</span>', 'Sun Angle: <span>0.53', 'Sun Distance: <span>149,</span>614,</span>943.28', 'Moon Distance: <span>366,</span>585.35', 'Moon Age: <span>7.08', 'Moon Angle: <span>0.54', 'First Quarter', '<b>Monday, April 3, 2017</b>', 'April', 'Phase: <span>First Quarter</span>', 'April 3'], '1': ['<b>April 2</b>', '36%\n', 'Sun Angle: 0.53322274612254', 'Sun Distance: 149571918.46739', 'Moon Distance: 364434.77975454', 'Moon Age: 6.002888839693', 'Moon Angle: 0.54648504798072', 'Waxing Crescent', 'April 2'], '4': ['<b>April 5</b>', '69%\n', 'Sun Angle: 0.53276322269153', 'Sun Distance: 149700928.5008', 'Moon Distance: 373577.14506795', 'Moon Age: 9.1657967733025', 'Moon Angle: 0.53311119464703', 'Waxing Gibbous', 'April 5'], '0': ['<b>April 1</b>', '25%\n', 'Sun Angle: 0.53337618944887', 'Sun Distance: 149528889.15122', 'Moon Distance: 363387.67496992', 'Moon Age: 4.9078487808877', 'Moon Angle: 0.54805974945761', 'Waxing Crescent', 'April 1']}
Since it's loaded as a JSON
, you can navigate through it like this:
Example Code:
print(moon_data['4'])
print('-')*5
print(moon_data['4'][2])
Output:
['<b>April 5</b>', '69%\n', 'Sun Angle: 0.53276322269153', 'Sun Distance: 149700928.5008', 'Moon Distance: 373577.14506795', 'Moon Age: 9.1657967733025', 'Moon Angle: 0.53311119464703', 'Waxing Gibbous', 'April 5']
-----
Sun Angle: 0.53276322269153
Actually after the RaminNietzsche's comment I used dryscrape library.
from bs4 import BeautifulSoup
import urllib2
import dryscrape
moon_url = "http://www.moongiant.com/phase/today/"
try:
rqest = urllib2.urlopen(moon_url)
session = dryscrape.Session()
session.visit(moon_url)
response = session.body()
soup = BeautifulSoup(response, 'lxml')
moon_data = soup.findAll('div', {'id':'moonDetails'})
print moon_data
As a result the output now is:
<div id="moonDetails">
Phase: <span>Waxing Crescent</span><br>Illumination: <span>36%
</span><br>Moon Age: <span>6.00 days</span><br>Moon Angle: <span>0.55</span><br>Moon Distance: <span>364,</span>434.78 km<br>Sun Angle: <span>0.53</span><br>Sun Distance: <span>149,</span>571,918.47 km<br>
</div>
Thank's everyone for the answers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With