Scraping free proxy listing website

Tags:

web-scraping

I am trying to scrape one of the free proxy listings website but, I just couldn't be able to scrape the proxies.

Below is my code:

import requests
import re

url = 'https://free-proxy-list.net/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}

source = requests.get(url, headers=headers, timeout=10).text

proxies = re.findall(r'([0-9]{1,3}\.){3}[0-9]{1,3}(:[0-9]{2,4})?', source)

print(proxies)

I would highly appreciate if someone could help me without the use of additional libraries/modules like BeautifulSoup.

468

asked Jan 24 '18 15:01

1 Answers

It is generally best to use a parser such as BeautifulSoup to extra data from html rather than regular expressions because it is very difficult to reproduce BeautifulSoup's acturacy; however, you can try this with pure regex:

import re
url = 'https://free-proxy-list.net/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}
source = str(requests.get(url, headers=headers, timeout=10).text)
data = [list(filter(None, i))[0] for i in re.findall('<td class="hm">(.*?)</td>|<td>(.*?)</td>', source)]
groupings = [dict(zip(['ip', 'port', 'code', 'using_anonymous'], data[i:i+4])) for i in range(0, len(data), 4)]

Sample output (actual length is 300):

[{'ip': '47.88.242.10', 'port': '80', 'code': 'SG', 'using_anonymous': 'anonymous'}, {'ip': '118.189.172.136', 'port': '80', 'code': 'SG', 'using_anonymous': 'elite proxy'}, {'ip': '147.135.210.114', 'port': '54566', 'code': 'PL', 'using_anonymous': 'anonymous'}, {'ip': '5.148.150.155', 'port': '8080', 'code': 'GB', 'using_anonymous': 'elite proxy'}, {'ip': '186.227.8.21', 'port': '3128', 'code': 'BR', 'using_anonymous': 'anonymous'}, {'ip': '49.151.155.60', 'port': '8080', 'code': 'PH', 'using_anonymous': 'anonymous'}, {'ip': '52.170.255.17', 'port': '80', 'code': 'US', 'using_anonymous': 'anonymous'}, {'ip': '51.15.35.239', 'port': '3128', 'code': 'NL', 'using_anonymous': 'elite proxy'}, {'ip': '163.172.27.213', 'port': '3128', 'code': 'GB', 'using_anonymous': 'elite proxy'}, {'ip': '94.137.31.214', 'port': '8080', 'code': 'RU', 'using_anonymous': 'anonymous'}]

Edit: to concatenate the ip and the port, iterate over each grouping and use string formatting:

final_groupings = [{'full_ip':"{ip}:{port}".format(**i)} for i in groupings]

Output:

[{'full_ip': '47.88.242.10:80'}, {'full_ip': '118.189.172.136:80'}, {'full_ip': '147.135.210.114:54566'}, {'full_ip': '5.148.150.155:8080'}, {'full_ip': '186.227.8.21:3128'}, {'full_ip': '49.151.155.60:8080'}, {'full_ip': '52.170.255.17:80'}, {'full_ip': '51.15.35.239:3128'}, {'full_ip': '163.172.27.213:3128'}, {'full_ip': '94.137.31.214:8080'}]

137

answered Sep 19 '22 18:09

Ajax1234

Related questions
                            
                                Paramiko / scp - check if file exists on remote host
                            
                                Get serializer field value in api-view
                            
                                Datetime strptime in Python pandas : what's wrong?
                            
                                Importing Numpy results in error even though Anaconda says it's installed?
                            
                                Efficient Double Sum of Products
                            
                                python find string pattern in numpy array of strings
                            
                                how to open chrome in incognito mode from Python
                            
                                Extracting key value pairs from string with quotes
                            
                                How to install Python 3.5 on Raspbian Jessie
                            
                                Anaconda Python virtualdev can't find libpython3.5m.so.1.0 on Windows Subsystem for Linux (Ubuntu 14.04)
                            
                                Repeat list to max number of elements [duplicate]
                            
                                list comprehension in pandas
                            
                                How can I choose the language, using Flask + Babel?
                            
                                Python can't find 'main' module
                            
                                How delete tag from node in lxml without tail?
                            
                                Error message with nltk.sentiment.vader in Python
                            
                                Why is dill much faster and more disk-efficient than pickle for numpy arrays
                            
                                Why are these tuples returned from a function identical?
                            
                                Python 3 - Counting up with two different values
                            
                                Can I make random mask with Numpy？

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scraping free proxy listing website

Tags:

python

web-scraping

wished

People also ask

1 Answers

Ajax1234

Recent Activity

Donate For Us