I am trying to scrape Google results when I search "What is 2+2", but the following code is returning 'NoneType' object has no attribute 'text'
. Please help me in achieving the required goal.
text="What is 2+2"
search=text.replace(" ","+")
link="https://www.google.com/search?q="+search
headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}
source=requests.get(link,headers=headers).text
soup=BeautifulSoup(source,"html.parser")
answer=soup.find('span',id="cwos")
self.respond(answer.text)
The only problem is with id
in soup.find
, however I have chosen this id very closely. I shouldn't be mistaken. I also tried answer=soup.find('span',class_="cwcot gsrt")
, but neither worked.
A big gotcha when parsing websites is that the source code can look very different in your browser when compared to what requests
sees. The difference is javascript, which can hugely modify the DOM in a javascript capable browser.
I'd suggest 3 options:
requests
to get the page, and then examine it closely - does that tag exist when the page is retrieved by a non-js enabled agent?Next time use the query string exactly as it is.
import requests
from bs4 import BeautifulSoup
search="2%2B2"
link="https://www.google.com/search?q="+search
headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}
source=requests.get(link,headers=headers).text
soup=BeautifulSoup(source,"html.parser")
answer=soup.find('span',id="cwos")
print(answer.text)
Output:
4
Visit these urls - they do not return the same result
https://www.google.com/search?q=What+is+2+2
https://www.google.com/search?q=2%2B2
https://www.google.com/search?q=2+2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With