Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread vs Event Loop - network programming (language agnostic) [closed]

I am writing a simple daemon to receive data from N many mobile devices. The device will poll the server and send the data it needs as simple JSON. In generic terms the server will receive the data and then "do stuff" with it.

I know this topic has been beaten a bunch of times but I am having a hard time understanding the pros and cons.

Would threads or events (think Twisted in Python) work better for this situation as far as concurrency and scalability is concerned? The event model seems to make more sense but I wanted to poll you guys. Data comes in -> Process data -> wait for more data. What if the "do stuff" was something very computationally intensive? What if the "do stuff" was very IO intensive (such as inserting into a database). Would this block the event loop? What are the pros and drawbacks of each approach?

like image 865
user3621014 Avatar asked Feb 13 '23 00:02

user3621014


2 Answers

I can only answer in the context of Python, since that's where most of my experience is. The answer is actually probably a little different depending on the language you choose. Python, for example, is a lot better at parallelizing I/O intensive operations than CPU intensive operations.

Asynchronous programming libraries like twisted, tornado, gevent, etc. are really good at handling lots of I/O in parallel. If your workload involves many clients connecting, doing light CPU operations and/or lots of I/O operations (like db reads/writes), or if your clients are making long-lasting connections primarily used for I/O (think WebSockets), then an asynchronous library will work really well for you. Most of the asynchronous libraries for Python have asynchronous drivers for popular DBs, so you'll be able to interact with them without blocking your event loop.

If your server is going to be doing lots of CPU intensive work, you can still use asynchronous libraries, but have to understand that every time you're doing CPU work, the event loop will be blocked. No other clients are going to be able to anything at all. However, there are ways around this. You can use thread/process pools to farm the CPU work out, and just wait on the response asynchronously. But obviously that complicates your implementation a little bit.

With Python, using threads instead actually doesn't buy you all that much with CPU operations, because in most cases only one thread can run a time, so you're not really reaping the benefits of having a multi-core CPU (google "Python GIL" to learn more about this). Ignoring Python-specific issues with threads, threads will let you avoid the "blocked event loop" problem completely, and threaded code is usually easier to understand than asynchronous code, especially if you're not familiar with how asynchronous programming works already. But you also have to deal with the usual thread headaches (synchronizing shared state, etc.), and they don't scale as well as asynchronous I/O does with lots of clients (see http://en.wikipedia.org/wiki/C10k_problem)

Both approaches are used very successfully in production, so its really up to you to decide what fits your needs/preferences better.

like image 69
dano Avatar answered Apr 27 '23 22:04

dano


I think your question is in the 'it depends' category.

Different languages have different strengths and weaknesses when it comes to threading/process/events (python having some special weaknesses in threading tied to the global interpreter lock)

Beyond that, operating systems ALSO have different strengths and weaknesses when you look at processes vs threads vs events. What is right on unix isn't going to be the same as windows.

With that said, the way that I sort out multifaceted IO projects is:

These projects are complex, no tool with simply make the complexity go away, therefor you have two choices on how you can deal:

  1. Have the OS deal with as much complexity as possible, making life easier for the programers, but at the cost of machine efficiency
  2. Have the programer take on as much complexity as is practical so that they can optimize the design and squeeze as much performance out machine as possible, at the cost of more complex code that requires significantly higher-end programers.

Option 1 is normally best accomplished by breaking apart the task into threads or processes with one blocking state-machine per thread/process

Option 2 is normally best accomplished by multiplexing all the tasks into one process and using the OS hook for an event system. (select/poll/epoll/kqueue/WaitForMultipleObjects/CoreFoundation/ libevent etc..)

In my experience project framework/internal-arch often come down to the skills of the programers at hand (and the budget the project has for hardware).

If you have programmers with a background in OS internals: Twisted will work great for python, Node.js will work great for Javascript, libevent/libev will work great for C or C++. You'll also end up with super efficient code that can scale easily, though you'll have a nightmare trying to hire more programmers

If you have newbie programers and you can dump money into lots of cloud services then breaking the project into many threads or processes will give you the best chance of getting something working, though scaling will eventually become a problem.

All-in-all I would say the sanest pattern for a project with multiple iterations is to prototype in simple blocking tools (flask) and then re-write into something harder/more-scalable (twisted), otherwise your falling in the classic Premature optimization is the root of all evil trap

like image 22
Mike Lutz Avatar answered Apr 27 '23 22:04

Mike Lutz