I am attempting to extract campaign_hearts and postal_code from the code in the script tag here (the entire code is too long to post):
<script>
...
"campaign_hearts":4817,"social_share_total":11242,"social_share_last_update":"2020-01-17T10:51:22-06:00","location":{"city":"Los Angeles, CA","country":"US","postal_code":"90012"},"is_partner":false,"partner":{},"is_team":true,"team":{"name":"Team STEVENS NATION","team_pic_url":"https://d2g8igdw686xgo.cloudfront.net
...
I can identify the script I need with the following code:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from time import sleep
import requests
import re
import json
page = requests.get("https://www.gofundme.com/f/eric-stevens-care-trust")
soup = BeautifulSoup(page.content, 'html.parser')
all_scripts = soup.find_all('script')
all_scripts[0]
However, I'm at a loss for how to extract the values I want. (I'm very new to Python.) This thread recommended the following solution for a similar problem (edited to reflect the html I'm working with).
data = json.loads(all_scripts[0].get_text()[27:])
However, running this produces an error: JSONDecodeError: Expecting value: line 1 column 1 (char 0).
What can I do to extract the values I need now that I have the correct script identified? I have also tried the solutions listed here, but had trouble importing Parser.
You can parse the content of <script>
with json
module and then get your values. For example:
import re
import json
import requests
url = 'https://www.gofundme.com/f/eric-stevens-care-trust'
txt = requests.get(url).text
data = json.loads(re.findall(r'window\.initialState = ({.*?});', txt)[0])
# print( json.dumps(data, indent=4) ) # <-- uncomment this to see all data
print('Campaign Hearts =', data['feed']['campaign']['campaign_hearts'])
print('Postal Code =', data['feed']['campaign']['location']['postal_code'])
Prints:
Campaign Hearts = 4817
Postal Code = 90012
The more libraries you use; the more inefficient a code becomes! Here is a simpler solution-
#This imports the website content.
import requests
url = "https://www.gofundme.com/f/eric-stevens-care-trust"
a = requests.post(url)
a= (a.content)
print(a)
#These will show your data.
campaign_hearts = str(a,'utf-8').split('campaign_hearts":')[1]
campaign_hearts = campaign_hearts.split(',"social_share_total"')[0]
print(campaign_hearts)
postal_code = str(a,'utf-8').split('postal_code":"')[1]
postal_code = postal_code.split('"},"is_partner')[0]
print(postal_code)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With