Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pinging ~ 100,000 servers, is multithreading or multiprocessing better?

I have created a simple script that iterates through a list of servers that I need to both ping, and nslookup. The issue is, pinging can take some time, especially pinging more server than that are seconds in a day.

Im fairly new to programming and I understand that multiprocessing or multithreading could be a solution to make my job run faster.

My plan is to take my server list and either 1. Break it into lists of even size, with the number of lists matching the threads / processes or 2. If one of these options support it, loop through the single list passing each a new server name to a thread or process after it finishes its previous ping and nslookup. This is preferable since it ensures I spend the least time, where as if list 1 has 200 offline servers and list 6 has 2000, it will need to wait for the process using list 6 to finish, even though all others would be free at that point.

  1. Which one is superior for this task and why?

  2. If possible, how would I make sure that each thread or process has essentially the same runtime

code snippet even though rather simple right now

import subprocess
import time
server_file = open(r"myfilepath", "r")
initial_time = time.time()
for i in range(1000):
    print(server_file.readline()[0:-1]+ ' '+str(subprocess.run('ping '+server_file.readline()[0:-1]).returncode)) #This returns a string with the server name, and return code,
print(time.time()-initial_time)

The issue arises because a failed ping takes over 3 seconds each on average. Also I am aware that not putting the print statement will make it faster, but I wanted to monitor it for a small case. I am pinging something to the effect of 100,000 servers, and this will need to be done routinely, and the list will keep growing

like image 940
AlbinoRhino Avatar asked Jan 30 '20 20:01

AlbinoRhino


2 Answers

For best performance you want neither; with 100,000 active jobs it's best to use asynchronous processing, in a single or possibly a handful of threads or processes (but not exceeding the number of available cores).

With async I/O many networking tasks can be performed in a single thread, easily achieving rates of 100,000 or more due to savings on context switching (that is, you can theoretically ping 100,000 machines in 1 second).

Python supports asynchronous I/O via asyncio (here's a nice intro into asyncio and coroutines).

It is also important to not depend on an external process like ping, because spawning a new process is a very costly operation.

aioping is an example of a native Python ping done using asyncio (note that a ping is actually a pair of ICMP request/reply packets). It should be easy to adapt it to perform multiple pings simultaneously.

like image 112
rustyx Avatar answered Oct 20 '22 11:10

rustyx


TLDR; MultiThreading is the solution for you- The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory.

As for question 1-

For IO tasks, like querying a database or loading a webpage the CPU is just doing nothing but waiting for an answer and that's a waste of resources, thus multithreading is the answer (:

As for question 2-

You can just create pool of threads that will manage them to run simultaneously without you needing to break your head.

like image 34
Yoel Nisanov Avatar answered Oct 20 '22 10:10

Yoel Nisanov