Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I download all domain WHOIS data?

Tags:

dns

I'm writing some software that analyzes registered domain names and looks for trends. I'm experimenting with some machine learning to help predict what domain names will be purchased in the future based on what types of domains are being registered.

I've been looking around searching for a way to download "all" of the registered domains that exist, but I haven't been able to find a way to do so.

It's easy for me to query individual domain names using the whois command line tool, for example:

$ whois google.com
   Domain Name: GOOGLE.COM
   Registry Domain ID: 2138514_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.markmonitor.com
   Registrar URL: http://www.markmonitor.com
   Updated Date: 2018-02-21T18:36:40Z
   Creation Date: 1997-09-15T04:00:00Z
   Registry Expiry Date: 2020-09-14T04:00:00Z
   Registrar: MarkMonitor Inc.
   Registrar IANA ID: 292
   Registrar Abuse Contact Email: [email protected]
   Registrar Abuse Contact Phone: +1.2083895740
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited
   Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited
   Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited
   Name Server: NS1.GOOGLE.COM
   Name Server: NS2.GOOGLE.COM
   Name Server: NS3.GOOGLE.COM
   Name Server: NS4.GOOGLE.COM
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of whois database: 2018-03-20T03:16:59Z <<<

For more information on Whois status codes, please visit https://icann.org/epp

NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar.  Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.

TERMS OF USE: You are not authorized to access or query our Whois
database through the use of electronic processes that are high-volume and
automated except as reasonably necessary to register domain names or
modify existing registrations; the Data in VeriSign Global Registry
Services' ("VeriSign") Whois database is provided by VeriSign for
information purposes only, and to assist persons in obtaining information
about or related to a domain name registration record. VeriSign does not
guarantee its accuracy. By submitting a Whois query, you agree to abide
by the following terms of use: You agree that you may use this Data only
for lawful purposes and that under no circumstances will you use this Data
to: (1) allow, enable, or otherwise support the transmission of mass
unsolicited, commercial advertising or solicitations via e-mail, telephone,
or facsimile; or (2) enable high volume, automated, electronic processes
that apply to VeriSign (or its computer systems). The compilation,
repackaging, dissemination or other use of this Data is expressly
prohibited without the prior written consent of VeriSign. You agree not to
use electronic processes that are automated and high-volume to access or
query the Whois database except as reasonably necessary to register
domain names or modify existing registrations. VeriSign reserves the right
to restrict your access to the Whois database in its sole discretion to ensure
operational stability.  VeriSign may restrict or terminate your access to the
Whois database for failure to abide by these terms of use. VeriSign
reserves the right to modify these terms at any time.

The Registry database contains ONLY .COM, .NET, .EDU domains and
Registrars.
Domain Name: google.com
Registry Domain ID: 2138514_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2018-02-21T10:45:07-0800
Creation Date: 1997-09-15T00:00:00-0700
Registrar Registration Expiration Date: 2020-09-13T21:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +1.2083895740
Domain Status: clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)
Domain Status: clientTransferProhibited (https://www.icann.org/epp#clientTransferProhibited)
Domain Status: clientDeleteProhibited (https://www.icann.org/epp#clientDeleteProhibited)
Domain Status: serverUpdateProhibited (https://www.icann.org/epp#serverUpdateProhibited)
Domain Status: serverTransferProhibited (https://www.icann.org/epp#serverTransferProhibited)
Domain Status: serverDeleteProhibited (https://www.icann.org/epp#serverDeleteProhibited)
Registry Registrant ID: 
Registrant Name: Domain Administrator
Registrant Organization: Google LLC
Registrant Street: 1600 Amphitheatre Parkway, 
Registrant City: Mountain View
Registrant State/Province: CA
Registrant Postal Code: 94043
Registrant Country: US
Registrant Phone: +1.6502530000
Registrant Phone Ext: 
Registrant Fax: +1.6502530001
Registrant Fax Ext: 
Registrant Email: [email protected]
Registry Admin ID: 
Admin Name: Domain Administrator
Admin Organization: Google LLC
Admin Street: 1600 Amphitheatre Parkway, 
Admin City: Mountain View
Admin State/Province: CA
Admin Postal Code: 94043
Admin Country: US
Admin Phone: +1.6502530000
Admin Phone Ext: 
Admin Fax: +1.6502530001
Admin Fax Ext: 
Admin Email: [email protected]
Registry Tech ID: 
Tech Name: Domain Administrator
Tech Organization: Google LLC
Tech Street: 1600 Amphitheatre Parkway, 
Tech City: Mountain View
Tech State/Province: CA
Tech Postal Code: 94043
Tech Country: US
Tech Phone: +1.6502530000
Tech Phone Ext: 
Tech Fax: +1.6502530001
Tech Fax Ext: 
Tech Email: [email protected]
Name Server: ns1.google.com
Name Server: ns4.google.com
Name Server: ns2.google.com
Name Server: ns3.google.com
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2018-03-19T20:13:36-0700 <<<

The Data in MarkMonitor.com's WHOIS database is provided by MarkMonitor.com for
information purposes, and to assist persons in obtaining information about or
related to a domain name registration record.  MarkMonitor.com does not guarantee
its accuracy.  By submitting a WHOIS query, you agree that you will use this Data
only for lawful purposes and that, under no circumstances will you use this Data to:
 (1) allow, enable, or otherwise support the transmission of mass unsolicited,
     commercial advertising or solicitations via e-mail (spam); or
 (2) enable high volume, automated, electronic processes that apply to
     MarkMonitor.com (or its systems).
MarkMonitor.com reserves the right to modify these terms at any time.
By submitting this query, you agree to abide by this policy.

MarkMonitor is the Global Leader in Online Brand Protection.

MarkMonitor Domain Management(TM)
MarkMonitor Brand Protection(TM)
MarkMonitor AntiPiracy(TM)
MarkMonitor AntiFraud(TM)
Professional and Managed Services

Visit MarkMonitor at http://www.markmonitor.com
Contact us at +1.8007459229
In Europe, at +44.02032062220

For more information on Whois status codes, please visit
 https://www.icann.org/resources/pages/epp-status-codes-2014-06-16-en
--

The WHOIS data contains everything I need, but I can't find a way to download the WHOIS data for all currently registered domains.

Is there some way for me to get this data? I feel like it must be publicly available somewhere since the whois CLI tool can so easily query the info.

What am I missing here?

like image 722
rdegges Avatar asked Mar 20 '18 03:03

rdegges


People also ask

How do I get WHOIS data?

Performing WHOIS Lookups To perform a search, users only need to go to http://whois.icann.org, enter a domain name, and click "Lookup."

How do I find all available domains?

Navigate to https://domains.google.com/registrar. Enter your preferred domain name in the search box. Review the search results to determine if the domain is available. If it is and you're ready to purchase, buy the domain.

What type of data you can retrieve from WHOIS services?

Typically, each Whois record will contain information such as the name and contact information of the Registrant (who owns the domain), the name and contact information of the registrar Registrar (the organization or commercial entity that registered the domain name), the registration dates, the name servers, the most ...

Which tool can look at historical WHOIS records?

The easiest way to find a domain owner's history is to use a free tool called Who.is. This provides you with a comprehensive historical record of all the registered information about the domain and previous domain owners. You can also use Wayback Machine to see any sites that have ever been built on the domain.


1 Answers

TL;DR: You can not (download all "whois" data).

(side preliminary note: "whois data", while often used is kind of incorrect. You use the whois protocol with a whois client to query a whois server at a registry, and more specifically here a domain name registry, that stores contact data about domain names it sponsors. For the same reason there is no "whois database".)

Now for the long sad story:

It is not possible for many obvious technical and non technical reasons. And you are deeply mistaken if you think the whois CLI command is simple (see my other answer here: https://unix.stackexchange.com/a/407030/211833 for details on that point)

First your question makes no sense for all TLDs at once. You have at least to separate ccTLDs from gTLDs.

1) ccTLDs

ccTLDs have often stricter rules about privacy on personal data and this ought to be even stricter with ongoing European regulations such as GDPR. Basically some of them already forbid to have access to the complete list of domain names (which is often refered as the "zonefile") which has no personal data, so there is no way you will get access to all the content and the personal data. You may try to approach some and ask if there is anything possible like for research studies, but I doubt you will be successful and you will need to deal with each ccTLD registry separately as they each deal with their own content (all data on the domain names in the TLD they manage)

2) gTLDs

For them, the situation is quite different.

First, since things are by default more liberal (no protection of personal data), you will see that many registrars/companies provide proxy/privacy services which means that even in a whois query output you will not see much useful data.

But still due to GDPR and assimilated, things are changing. Do a whois on godaddy.com for example and watch all these stars for contact names and emails, and hence the need to go to a website.

However registrars and registries are under contract with ICANN. Which means they both have some requirements, and they are uniform.

First, all registries are mandated to give access to their zonefiles. It is often done throught the CZDA, for which you can find details on ICANN website. Note that it is in fact the list of all domain names publishes, not exactly the list of all domain names registered as you can register a domain name and not put it visible on the DNS.

As for the contact data, that is the rest of the information visible in whois, there are other points not wellknown. See the registrar agreement at https://www.icann.org/resources/pages/approved-with-specs-2013-09-17-en and specially section 3.3.6 that provide bulk acces to registrar "whois" data. Note how it is tied to some money (USD$10 000) and comes with various limitations on what you can do with it. Remember that you would need to do it per registrar, so in the gTLDs world that is more than 1000 of them.

There is no equivalent provisions in the registry agreements for public bulk access (see https://newgtlds.icann.org/sites/default/files/agreements/agreement-approved-31jul17-en.html).

Things are complicated because as up today and for some months yet, .COM/.NET remains a thin registry that is one without the contact data stored at registry level, only at registrars.

Also all the above will change in the coming months/years because of the new regulations and also because RDAP, a new protocol, is slated to replace whois at one point. RDAP will allow far greater level of granularity on the access given and the amount of data returned.

Of course, in all cases above, nothing technically forbids anyone to just do regular whois queries and store the results locally. As you can see in a whois output your use of the data is constrained by various limits and bulk querying whois servers always expose you to the risk of being blacklisted or at least heavily rate limited. Note that for the input (which names to query the whois server for), it is easy to start with zonefiles, even cross TLDs (if site.example exists you can try also site.test even if you do not have .test zonefile), or search engines queries, or dictionaries, etc.

Multiple companies do that and provide tools to search their data, like to do reverse queries and things like that. Maybe some could deliver you bulk results, but certainly not for free.

like image 54
Patrick Mevzek Avatar answered Nov 15 '22 06:11

Patrick Mevzek