Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check for a valid domain name in a string?

I am using python and would like a simple api or regex to check for a domain name's validity. By validity I am the syntactical validity and not whether the domain name actually exists on the Internet or not.

like image 689
demos Avatar asked May 24 '10 05:05

demos


People also ask

How do I check if a domain name is valid?

The valid domain name must satisfy the following conditions: The domain name should be a-z or A-Z or 0-9 and hyphen (-). The domain name should be between 1 and 63 characters long. The domain name should not start or end with a hyphen(-) (e.g. -geeksforgeeks.org or geeksforgeeks.org-).

How do I get a valid domain name?

A domain name consists of minimum two and maximum 63 characters. All letters from a to z, all numbers from 0 to 9 and a hyphen (-) are possible. A domain name mustn't consist of a hyphen (-) on the third and fourth position at the same time.

Is a valid domain character?

A valid domain name character is that which contains a set of alphanumeric ASCII characters (i.e., a-z, A-Z), numbers (i.e. 0-9) and dashes (-) or a combination all of these. It's a domain name that has the valid characters and length. It mostly has a minimum of 3 and a maximum of 63 characters.


2 Answers

Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):

from socket import getaddrinfo

result = getaddrinfo("www.google.com", None)
print result[0][4]

Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.

The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.

like image 185
Dean Harding Avatar answered Oct 02 '22 06:10

Dean Harding


Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).

So:

r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'

would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?

like image 26
Alex Martelli Avatar answered Oct 02 '22 08:10

Alex Martelli