Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up reverse DNS lookups for large batch of IPs

For analytics purposes, I'd like to perform reverse DNS lookups on large batches of IPs. "Large" meaning, at least tens of thousands per hour. I'm looking for ways to increase the processing rate, i.e. lower the processing time per batch.

Wrapping the async version of Dns.GetHostEntry into await-able tasks has already helped a lot (compared to sequential requests), leading to a throughput of appox. 100-200 IPs/second:

static async Task DoReverseDnsLookups()
{
    // in reality, thousands of IPs
    var ips = new[] { "173.194.121.9", "173.252.110.27", "98.138.253.109" }; 
    var hosts = new Dictionary<string, string>();

    var tasks =
        ips.Select(
            ip =>
                Task.Factory.FromAsync(Dns.BeginGetHostEntry,
                    (Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry, 
                    ip, null)
                    .ContinueWith(t => 
                    hosts[ip] = ((t.Exception == null) && (t.Result != null)) 
                               ? t.Result.HostName : null));

    var start = DateTime.UtcNow;
    await Task.WhenAll(tasks);
    var end = DateTime.UtcNow;

    Console.WriteLine("Resolved {0} IPs in {1}, that's {2}/sec.", 
      ips.Count(), end - start, 
      ips.Count() / (end - start).TotalSeconds);
}

Any ideas how to further improve the processing rate?

For instance, is there any way to send a batch of IPs to the DNS server?

Btw, I'm assuming that under the covers, I/O Completion Ports are used by the async methods - correct me if I'm wrong please.

like image 676
Max Avatar asked May 29 '14 21:05

Max


People also ask

How long does reverse DNS take to propagate?

DNS propagation is the time frame it takes for DNS changes to be updated across the Internet. A change to a DNS record—for example, changing the IP address defined for a specific hostname—can take up to 72 hours to propagate worldwide, although it typically takes a few hours.

Can you reverse lookup an IP address?

Security. A reverse IP lookup can be used to find the IP address' A records, mapping a domain name to the physical IP address of the device hosting that domain. The results help determine the virtual hosts served from a web server and identify server vulnerabilities.


2 Answers

Hello here are some tips so you can improve:

  1. Cache the queries locally since this information don't usually change for days or even years. This way you don't have to resolve every time.
  2. Most DNS servers will automatically cache the information, so the next time it will resolve pretty fast. Usually the cache is 4 hours, at least it is the default on Windows servers. This means that if you run this process in a batch in a short period, it will perform better that if you resolve the addresses several times during the day allowing cahce to expire.
  3. It is good that you are using Task Parallelism but you are still asking the same DNS servers configured on your machine. I think that having two machines using different DNS servers will improve the process.

I hope this helps.

like image 82
Baltico Avatar answered Oct 20 '22 21:10

Baltico


  • As always, I would suggest using TPL Dataflow's ActionBlock instead of firing all requests at once and waiting for all to complete. Using an ActionBlock with a high MaxDegreeOfParallelism lets the TPL decide for itself how many calls to fire concurrently, which can lead to a better utilization of resources:

var block = new ActionBlock<string>(
    async ip => 
    { 
        try
        {
            var host = (await Dns.GetHostEntryAsync(ip)).HostName;
            if (!string.IsNullOrWhitespace(host))
            {
                hosts[ip] = host;
            }
        }
        catch
        {
            return;
        }
    },
    new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5000});
  • I would also suggest adding a cache, and making sure you don't resolve the same ip more than once.

  • When you use .net's Dns class it includes some fallbacks beside DNS (e.g LLMNR), which makes it very slow. If all you need are DNS queries you might want to use a dedicated library like ARSoft.Tools.Net.


P.S: Some remarks about your code sample:

  1. You should be using GetHostEntryAsync instead of FromAsync
  2. The continuation can potentially run on different threads so you should really be using ConcurrentDictionary.
like image 42
i3arnon Avatar answered Oct 20 '22 23:10

i3arnon