Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does cURL sometimes require the "www." part of a URL to work, and vice versa?

For instance, using this code:

 $curl = curl_init();
 curl_setopt_array( $curl, array(
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_URL => "$url" ) );
 curl_exec( $curl );
 $header = curl_getinfo( $curl, CURLINFO_HTTP_CODE );
 curl_close( $curl );

$url = "http://upenn.edu" will not work, while $url = "http://www.upenn.edu" will work.

Without the www. the response code I get is 0, whereas with the www. it is 200.

If I were to use PHP get_headers("http://upenn.edu"), I would get two errors:

Warning: get_headers() [function.get-headers]: php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known

and

Warning: get_headers(http://upenn.edu) [function.get-headers]: failed to open stream: php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known

However, when I use the exact same code, http://google.com will work (as well as the expected http://www.google.com.)

Then, for a website such as http://www.dogpile.com, the www. part included returns a response code of 0 whereas without the www., I get a 302.

Why is this? and is there a better method to use in order to ensure reliable results (i.e., where a www. is not present, yet the response code is still returned?)

I am new to using cURL and dealing with headers and response codes, so any help is appreciated. Thank you.

like image 978
Friendly King Avatar asked Jan 14 '23 17:01

Friendly King


2 Answers

Not all domains treat www.domain.com and domain.com the same. Usually they do, but if you wanted to you could have two completely different websites on them.

Personally, I like to have all requests to www.mydomains.com redirected to the www-less version, but that's just my preference.

There is no realiable way of automatically detecting whether or not to use www.

like image 167
Niet the Dark Absol Avatar answered Jan 17 '23 07:01

Niet the Dark Absol


Your question, even asked because of using curl now, is actually something totally independent to curl. Other client http libraries will be the same with these examples because it is related to the domain name system and services running on a computer.

Curl is a HTTP library. If you do a HTTP request, by default you will try to connect to port 80 on a remote computer.

The remote computer is identified by an IP address. That is a number like 173.194.35.134 - you probably know that already.

Most often not the numbers are used but some domain names, for example google.com for 173.194.35.134.

So telling curl to use the URI http://google.com/ will open a connection to

173.194.35.134:80

The domain name system will resolve the domain google.com to the IP address.

Domain names can be organized in levels. Each level is separated by a dot .. The so called Top Level Domain (TLD) is the part most on the right, for google.com that is com. The Second Level Domain (SLD) is respectively google then. And with www.google.com you have another domain name, with three levels then. The www is commonly refered to as Subdomain.

The most important part here is that for every different domain the DNS system can return a different IP address.

Therefore www.google.com and google.com can be two totally different things. The www subdomain is only a common convention to name the webserver on a network organized with a SLD.TLD.

So by this being common you could try both and see which one works. However I would not try more than with and w/o www.

like image 34
hakre Avatar answered Jan 17 '23 08:01

hakre