Python code to request the URL:
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} #using agent to solve the blocking issue
response = requests.get('https://www.naukri.com/jobs-in-andhra-pradesh', headers=agent)
#making the request to the link
Output when printing the html :
<!DOCTYPE html>
<html>
<head>
<title>Naukri reCAPTCHA</title> #the title in the actual title of the URL that I am requested for
<meta name="robots" content="noindex, nofollow">
<link rel="stylesheet" href="https://static.naukimg.com/s/4/101/c/common_v62.min.css" />
<script src="https://www.google.com/recaptcha/api.js" async defer></script>
</head>
</html>
If your web scraper is encountering CAPTCHAs, your first recourse should be to rotate your IP address. This helps surprisingly often, especially if you're using a quality proxy network. Otherwise, there are two main approaches to bypassing CAPTCHAs: you can either try to solve the challenge or avoid it altogether.
Use a VPN. VPN locations allow you to legitimately bypass Google's ReCAPTCHA roadblocks. For the best results, choose a well-known VPN service instead of a free VPN which would arrive with its own set of problems. Good VPNs disguise your traffic, protect your device details and don't record logs.
You'll find the site-key when you inspect the element of the page like this: Copy this site key and store it. When we will send a request to solve captcha to 2captcha we will receive the response of the solved captcha which we will need to enter in the hidden text field with ID g-recaptcha-response .
Using Google Cache
along with a referer
(in the header) will help you bypass the captcha.
Things to note:
header = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
'referer':'https://www.google.com/'
}
r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.naukri.com/jobs-in-andhra-pradesh",headers=header)
This gives:
>>> r.content
[Squeezed 2554 lines]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With