Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rotate proxies on a Python requests

I'm trying to do some scraping, but I get blocked every 4 requests. I have tried to change proxies but the error is the same. What should I do to change it properly?

Here is some code where I try it. First I get proxies from a free web. Then I go do the request with the new proxy but it doesn't work because I get blocked.

from fake_useragent import UserAgent
import requests

def get_player(id,proxy):
    ua=UserAgent()
    headers = {'User-Agent':ua.random}

    url='https://www.transfermarkt.es/jadon-sancho/profil/spieler/'+str(id)

    try:
        print(proxy)
        r=requests.get(u,headers=headers,proxies=proxy)
    execpt:

....
code to manage the data
....

Getting proxies

def get_proxies():
    ua=UserAgent()
    headers = {'User-Agent':ua.random}
    url='https://free-proxy-list.net/'

    r=requests.get(url,headers=headers)
    page = BeautifulSoup(r.text, 'html.parser')

    proxies=[]

    for proxy in page.find_all('tr'):
        i=ip=port=0

    for data in proxy.find_all('td'):
        if i==0:
            ip=data.get_text()
        if i==1:
            port=data.get_text()
        i+=1

    if ip!=0 and port!=0:
        proxies+=[{'http':'http://'+ip+':'+port}]

return proxies

Calling functions

proxies=get_proxies()
for i in range(1,100):
    player=get_player(i,proxies[i//4])

....
code to manage the data  
....

I know that proxies scrape is well because when i print then I see something like: {'http': 'http://88.12.48.61:42365'} I would like to don't get blocked.

like image 270
Javier Jiménez de la Jara Avatar asked Apr 26 '19 17:04

Javier Jiménez de la Jara


People also ask

How do I get a rotating proxy?

Rotating IP addresses, or rotating IP proxy, change your IP address at each request you send to the target. The easiest way to quickly implement and start using a rotating proxy is by buying a residential proxy service. What are different types of proxies?

Should I use sticky or rotating proxies?

It is vital to keep in mind that when you need to use an IP address for a continuous duration, proxies with sticky sessions are preferred. In contrast, you would use a rotating proxy when you can not use the same IP address for a longer duration.

How do I use a proxy request in Python?

In order to use proxies in the requests Python library, you need to create a dictionary that defines the HTTP, HTTPS, and FTP connections. This allows each connection to map to an individual URL and port. This process is the same for any request being made, including GET requests and POST requests.


1 Answers

I recently had this same issue, but using proxy servers online as recommended in other answers is always risky (from privacy standpoint), slow, or unreliable.

Instead, you can use the requests-ip-rotator python library to proxy traffic through AWS API Gateway, which gives you a new IP each time:
pip install requests-ip-rotator

This can be used as follows (for your site specifically):

import requests
from requests_ip_rotator import ApiGateway, EXTRA_REGIONS

gateway = ApiGateway("https://www.transfermarkt.es")
gateway.start()

session = requests.Session()
session.mount("https://www.transfermarkt.es", gateway)

response = session.get("https://www.transfermarkt.es/jadon-sancho/profil/spieler/your_id")
print(response.status_code)

# Only run this line if you are no longer going to run the script, as it takes longer to boot up again next time.
gateway.shutdown() 

Combined with multithreading/multiprocessing, you'll be able to scrape the site in no time.

The AWS free tier provides you with 1 million requests per region, so this option will be free for all reasonable scraping.

like image 153
George Avatar answered Sep 29 '22 12:09

George