For analytics purposes, I'd like to perform reverse DNS lookups on large batches of IPs. "Large" meaning, at least tens of thousands per hour. I'm looking for ways to increase the processing rate, i.e. lower the processing time per batch.
Wrapping the async version of Dns.GetHostEntry into await-able tasks has already helped a lot (compared to sequential requests), leading to a throughput of appox. 100-200 IPs/second:
static async Task DoReverseDnsLookups()
{
// in reality, thousands of IPs
var ips = new[] { "173.194.121.9", "173.252.110.27", "98.138.253.109" };
var hosts = new Dictionary<string, string>();
var tasks =
ips.Select(
ip =>
Task.Factory.FromAsync(Dns.BeginGetHostEntry,
(Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry,
ip, null)
.ContinueWith(t =>
hosts[ip] = ((t.Exception == null) && (t.Result != null))
? t.Result.HostName : null));
var start = DateTime.UtcNow;
await Task.WhenAll(tasks);
var end = DateTime.UtcNow;
Console.WriteLine("Resolved {0} IPs in {1}, that's {2}/sec.",
ips.Count(), end - start,
ips.Count() / (end - start).TotalSeconds);
}
Any ideas how to further improve the processing rate?
For instance, is there any way to send a batch of IPs to the DNS server?
Btw, I'm assuming that under the covers, I/O Completion Ports are used by the async methods - correct me if I'm wrong please.
DNS propagation is the time frame it takes for DNS changes to be updated across the Internet. A change to a DNS record—for example, changing the IP address defined for a specific hostname—can take up to 72 hours to propagate worldwide, although it typically takes a few hours.
Security. A reverse IP lookup can be used to find the IP address' A records, mapping a domain name to the physical IP address of the device hosting that domain. The results help determine the virtual hosts served from a web server and identify server vulnerabilities.
Hello here are some tips so you can improve:
I hope this helps.
TPL Dataflow
's ActionBlock
instead of firing all requests at once and waiting for all to complete. Using an ActionBlock
with a high MaxDegreeOfParallelism
lets the TPL
decide for itself how many calls to fire concurrently, which can lead to a better utilization of resources:
var block = new ActionBlock<string>(
async ip =>
{
try
{
var host = (await Dns.GetHostEntryAsync(ip)).HostName;
if (!string.IsNullOrWhitespace(host))
{
hosts[ip] = host;
}
}
catch
{
return;
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5000});
I would also suggest adding a cache, and making sure you don't resolve the same ip more than once.
When you use .net's Dns
class it includes some fallbacks beside DNS (e.g LLMNR), which makes it very slow. If all you need are DNS queries you might want to use a dedicated library like ARSoft.Tools.Net.
P.S: Some remarks about your code sample:
GetHostEntryAsync
instead of FromAsync
ConcurrentDictionary
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With