I'm trying to write a tool to check if a proxy server is up and available for use. So far, I've come up with two methods in the class below (I've removed setters and getters that are superfluous to this question).
The first method uses cURL
and tries to request a page via the proxy, the second tool uses fsockopen
and just tries to open a connection to the proxy.
class ProxyList {
/**
* You could set this to localhost, depending on your environment
* @var string The URL that the proxy validation method will use to check proxies agains
* @see ProxyList::validate()
*/
const VALIDATION_URL = "http://m.www.yahoo.com/robots.txt";
const TIMEOUT = 3;
private static $valid = array(); // Checked and valid proxies
private $proxies = array(); // An array of proxies to check
public function validate($useCache=true) {
$mh = curl_multi_init();
$ch = null;
$handles = array();
$delay = count($this->proxies) * 10000;
$running = null;
$proxies = array();
$response = null;
foreach ( $this->proxies as $p ) {
// Using the cache and the proxy already exists? Skip the rest of this crap
if ( $useCache && !empty(self::$valid[$p]) ) {
$proxies[] = $p;
continue;
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_URL, self::VALIDATION_URL);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, true);
curl_setopt($ch, CURLOPT_PROXY, $p);
curl_setopt($ch, CURLOPT_NOBODY, true); // Also sets request method to HEAD
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, self::TIMEOUT);
curl_multi_add_handle($mh, $ch);
$handles[$p] = $ch;
}
// Execute the multi-handle
do {
curl_multi_exec($mh, $running);
usleep($delay);
} while ( $running );
// Get the results of the requests
foreach ( $handles as $proxy => $ch ) {
$status = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Great success
if ( $status >= 200 && $status < 300 ) {
self::$valid[$proxy] = true;
$proxies[] = $proxy;
}
else {
self::$valid[$proxy] = false;
}
// Cleanup individual handle
curl_multi_remove_handle($mh, $ch);
}
// Cleanup multiple handle
curl_multi_close($mh);
return $this->proxies = $proxies;
}
public function validate2($useCache=true) {
$proxies = array();
foreach ( $this->proxies as $proxy ) {
// Using the cache and the proxy already exists? Skip the rest of this crap
if ( $useCache && !empty(self::$valid[$proxy]) ) {
$proxies[] = $proxy;
continue;
}
list($host, $post) = explode(":", $proxy);
if ( $conn = @fsockopen($host, $post, $errno, $error, self::TIMEOUT) ) {
self::$valid[$proxy] = true;
$proxies[] = $proxy;
fclose($conn);
} else {
self::$valid[$proxy] = false;
}
}
return $this->proxies = $proxies;
}
}
So far, I prefer the cURL
method since it allows me to check large batches of proxies in parallel, which is wicked fast, instead of one at a time like fsockopen
.
I haven't done much work with proxies, so it's hard for me to tell if either of these methods are sufficient for validating that the proxy is available, or if there is a better method that I am missing.
Hm. Trying to establish a connection to a safe (most probably available) URL through the proxy, and checking for errors, sounds o.k. to me.
For absolutely maximum security, you maybe want to add another call to another validation URL (e.g. something at Google), or make it two calls, just in case.
cURL is the preferred way, because of the multi_exec.
I wouldn't bother doing two check, but do the google (or a Proxyjudge) call immediately. Proxies sometimes can allow sockets, but just wont fetch a thing: therefore your cURL method would be secure and not that slow.
As Pekka above mentions: it depends on the intended use.
Did you use Charon and harvested a load of proxies, I would want them checked against a proxyjudge and I would like to know the turnaround time(to avoid slow proxies) and anonimity.
If you want to use it as a monitoring system for corporate proxies, I would just want to make sure it can fetch a page.
a (chaotic) Example of checking a proxy via fetching an URL with cURL.
TLDR: use the cURL, it can handle parallel requests and is the most stable without being to slow (by not doing the doublecheck). http://www.oooff.com/php-affiliate-seo-blog/php-automation-coding/easy-php-proxy-checker-writing-tutorial/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With