Finding "all" domains of a country

Question

I'm searching a way to find "all" the sites ending with an given tld. I had several ideas on how to realize that, but I'm not sure what is the best/most effectiv way to realize this. I'm aware that pages that are linked nowhere aren't findable by spiders etc, so fir this example I'll not care about isolated pages. What I want to do, I want to have an TLD as input for my programm, and I which to have a list of sites as output. For example:

# <program> .de
- spiegel.de
- deutsche-bank.de
...
- bild.de

So what is the best way to reach this? Are there tools available to help me, or how would you program this?

# <program> .de
- spiegel.de
- deutsche-bank.de
...
- bild.de

So what is the best way to reach this? Are there tools available to help me, or how would you program this?

Nimir · Accepted Answer

This answer might be a bit late but I've just found this.

You could try using Common Crawler awesome data.

So, what is Common Crawler?

Common Crawl is a 501(c)(3) non-profit organization dedicated to providing a copy of the internet to internet researchers, companies and individuals at no cost for the purpose of research and analysis.

Using their url search tool query for .de then download the result as a json file.

You will get a nice file of results then you will need to do some work on it since it includes all the site map of a domain (hence crawling).

Another drawback that some sites use unwelcoming robot.txt file so crawlers won't be included them still it's the best result i could find so far.

Finding "all" domains of a country

Tags:

web-crawler

tld

user1620678

1 Answers

Nimir

Recent Activity

Donate For Us

Finding "all" domains of a country

Tags:

web-crawler

tld

user1620678

1 Answers

Nimir

Related questions

Recent Activity

Donate For Us