I have made a web scraper in python
to give me information on when free bet offers from various bookie websites have changed or new ones have been added.
However, the bookies tend to record information relating to IP
traffic and MAC
addresses in order to flag up matched betters.
How can I spoof my IP
address when using the Request()
method in the urllib.request
module?
My code is below:
req = Request('https://www.888sport.com/online-sports-betting-promotions/', headers={'User-Agent': 'Mozilla/5.0'})
site = urlopen(req).read()
content = bs4.BeautifulSoup(site, 'html.parser')
Internet Protocol (IP) spoofing is a type of malicious attack where the threat actor hides the true source of IP packets to make it difficult to know where they came from. The attacker creates packets, changing the source IP address to impersonate a different computer system, disguise the sender's identity or both.
You can write a script to grab all the proxies you need and construct this list dynamically every time you initialize your web scraper. Once you have the list of Proxy IPs to rotate, the rest is easy. The function get_proxies will return a set of proxy strings that can be passed to the request object as proxy config.
Beyond sharing data, IP addresses are also used to identify traffic sources to a system or server. IP spoofing occurs when a hacker tampers with their packet to change their IP source address.
I faced the same problem a while ago. Here is my code snippet, which I am using, in order to scrape anonymously.
from urllib.request import Request, urlopen
from fake_useragent import UserAgent
import random
from bs4 import BeautifulSoup
from IPython.core.display import clear_output
# Here I provide some proxies for not getting caught while scraping
ua = UserAgent() # From here we generate a random user agent
proxies = [] # Will contain proxies [ip, port]
# Main function
def main():
# Retrieve latest proxies
proxies_req = Request('https://www.sslproxies.org/')
proxies_req.add_header('User-Agent', ua.random)
proxies_doc = urlopen(proxies_req).read().decode('utf8')
soup = BeautifulSoup(proxies_doc, 'html.parser')
proxies_table = soup.find(id='proxylisttable')
# Save proxies in the array
for row in proxies_table.tbody.find_all('tr'):
proxies.append({
'ip': row.find_all('td')[0].string,
'port': row.find_all('td')[1].string
})
# Choose a random proxy
proxy_index = random_proxy()
proxy = proxies[proxy_index]
for n in range(1, 20):
req = Request('http://icanhazip.com')
req.set_proxy(proxy['ip'] + ':' + proxy['port'], 'http')
# Every 10 requests, generate a new proxy
if n % 10 == 0:
proxy_index = random_proxy()
proxy = proxies[proxy_index]
# Make the call
try:
my_ip = urlopen(req).read().decode('utf8')
print('#' + str(n) + ': ' + my_ip)
clear_output(wait = True)
except: # If error, delete this proxy and find another one
del proxies[proxy_index]
print('Proxy ' + proxy['ip'] + ':' + proxy['port'] + ' deleted.')
proxy_index = random_proxy()
proxy = proxies[proxy_index]
# Retrieve a random index proxy (we need the index to delete it if not working)
def random_proxy():
return random.randint(0, len(proxies) - 1)
if __name__ == '__main__':
main()
That will create some proxies which are working. And the this part:
user_agent_list = (
#Chrome
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
'Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
#Firefox
'Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)',
'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)',
'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (Windows NT 6.2; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)',
'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
)
Which will create different "headers", pretending to be a browser. Last but not least just enter those into you request().
# Make a get request
user_agent = random.choice(user_agent_list)
headers= {'User-Agent': user_agent, "Accept-Language": "en-US, en;q=0.5"}
proxy = random.choice(proxies)
response = get("your url", headers=headers, proxies=proxy)
Hope that works with you problem.
Otherwise look here: https://www.scrapehero.com/how-to-fake-and-rotate-user-agents-using-python-3/
Cheers
In order to overcome IP rate ban and hide your real IP you need to use proxies. There are a lot of different services that provide proxies. Consider using them as managing proxies by yourself is a real headache and cost would be much higher. I suggest https://botproxy.net among others. They provide rotating proxies though a single endpoint. Here is how you can make requests using this service:
#!/usr/bin/env python
import urllib.request
opener = urllib.request.build_opener(
urllib.request.ProxyHandler(
{'http': 'http://user-key:[email protected]:8080',
'https': 'http://user-key:[email protected]:8080'}))
print(opener.open('https://httpbin.org/ip').read())
or using requests library
import requests
res = requests.get(
'http://httpbin.org/ip',
proxies={
'http': 'http://user-key:[email protected]:8080',
'https': 'http://user-key:[email protected]:8080'
},
headers={
'X-BOTPROXY-COUNTRY': 'US'
})
print(res.text)
They also have proxies in different countries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With