I need to write a php script which will accept the csv file as input and then parse the product urls provided in it.
After then I need to validate which product url is exist and which is not.
I have got these two options for it curl()
and get_headers()
.
So can you please let me know which one is more faster and reliable ?
Any help will be much appreciated.
I tested this with 100 unique domains:
$urls = [
"http://familyshare.com/",
"http://elitedaily.com/",
"http://www.pickthebrain.com/",
"http://i100.independent.co.uk/",
"http://thingsorganizedneatly.tumblr.com/",
"http://www.cheatsheet.com/",
"https://jet.com/",
"https://nightwalk.withgoogle.com/en/panorama",
"http://www.vumble.com/",
"http://fusion.net/",
"https://www.zozi.com",
"http://joshworth.com/dev/pixelspace/pixelspace_solarsystem.html",
"http://how-old.net/",
"https://www.dosomething.org",
"https://devart.withgoogle.com/",
"http://www.ranker.com/",
"http://the-toast.net/",
"https://www.futurelearn.com/",
"https://croaciaaudio.com/",
"http://www.thesimpledollar.com/",
"http://giphy.com/giphytv",
"http://snapzu.com/",
"https://www.touchofmodern.com/",
"http://www.howstuffworks.com/",
"http://www.sporcle.com/",
"http://www.factcheck.org/",
"https://www.privacytools.io/",
"http://tiffanithiessen.com/",
"http://www.supercook.com/",
"http://www.livescience.com/",
"http://www.freshnessmag.com",
"http://www.abeautifulmess.com/",
"http://cardboardboxoffice.com/",
"http://www.takepart.com/",
"http://www.fixya.com/",
"http://bestreviews.com/",
"http://theodysseyonline.com/",
"http://justdelete.me/",
"http://adventure.com/",
"http://www.carryology.com/",
"http://whattheysee.tumblr.com/",
"https://unsplash.com/",
"http://fromwhereidrone.com/",
"http://www.attn.com/",
"http://ourworldindata.org/",
"http://www.melty.com/",
"http://www.truthdig.com/",
"https://tosdr.org/",
"https://thinga.com/",
"http://forvo.com/",
"http://tiii.me/",
"https://snapguide.com/",
"http://www.tubefilter.com/",
"http://www.inherentlyfunny.com/",
"http://www.someecards.com/",
"https://this.cm/",
"http://littlebigdetails.com/",
"http://clapway.com/",
"http://www.nerdfitness.com/",
"http://iwantdis.com/",
"http://Racked.com",
"http://thesweetsetup.com/",
"http://www.we-heart.com/",
"https://www.revealnews.org/",
"https://featuredcreature.com/",
"http://www.scotthyoung.com/blog/",
"http://www.thehandandeye.com/",
"http://www.thenorthernpost.com/",
"http://www.welzoo.com/",
"http://www.tickld.com/",
"http://thinksimplenow.com/",
"http://www.quietrev.com/",
"http://www.freshoffthegrid.com/",
"https://www.generosity.com/",
"http://addicted2success.com/",
"http://cubiclane.com/",
"http://waitbutwhy.com/",
"http://toolsandtoys.net/",
"http://googling.co/",
"http://penelopetrunk.com/",
"http://iaf.tv/",
"http://artofvisuals.com/",
"http://www.lifeaftercollege.org/blog",
"http://listverse.com/",
"http://chrisguillebeau.com/",
"http://expeditionportal.com/",
"http://www.marieforleo.com/",
"http://mostexclusivewebsite.com/",
"http://www.alphr.com/",
"http://www.rtings.com/",
"http://all-that-is-interesting.com/",
"http://theunbeatnpath.xyz/",
"http://www.keepinspiring.me/",
"https://paidtoexist.com/blog/",
"http://www.lovethispic.com/",
"http://riskology.co/blog/",
"http://geyserofawesome.com/",
"http://www.eugenewei.com/",
"http://clickotron.com/"
];
$startTime = microtime(true);
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
$headers1 = [];
foreach ($urls as $url) {
$headers1[] = get_headers($url);
}
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
echo "Execution time : $elapsed seconds \n";
$startTime = microtime(true);
$headers2 = [];
foreach ($urls as $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$header2[] = curl_exec($ch);
}
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
echo "Execution time : $elapsed seconds \n";
get_headers
(GET) vs cURL:
Execution time : 139.95884609222 seconds
Execution time : 65.998840093613 seconds
get_headers
(HEAD) vs cURL:
Execution time : 114.60515785217 seconds
Execution time : 66.077962875366 seconds
So indeed cURL is significantly faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With