Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js GPS device tracking performance considerations

Using node.js as a tcp server, I am going to manage relatively large number of GPS devices( ~3000 device ) and as first step just going to store incoming data in database, but even in this phase i envision some performance issues which bothers me and I'd like to caught them before they bite me.

1 - Looking at written similar servers using languages like java or ruby I see some code like the following:

java

Thread serverThread = new Thread(() -> {
  System.out.println("Listening to server port 9000");
  while (true) {
    try {
      Socket socket = serverSocket.accept();
  ...

ruby

 require 'socket'
   server = TCPServer.new ("127.0.0.1",8080)
   loop do
     Thread.start(server.accept) do |client|
     ...

Which seems they gives separate thread to every device(socket) which get connected to tcp server? As node.js is single-threaded and acts asynchronously, should i be concerned about incoming connections or something like the following simple approach will satisfy large number of simultaneous connections?

net.createServer(function(device) {
  device.on('data', function(data) {
    // parse data
    // store in database
  });
});

2 - Should I confine database connections using connection pool? As database also query from the other side for GIS and monitoring, how much the pool size should be?

3 - How could I benefit caching( for example using redis ) in such system?

It should be great if someone sheds some light on this thoughts. I also willingly would like to hear any other performance thoughts you might be also experiencing or aware of in implementing such systems. Thanks.

like image 463
dNitro Avatar asked Feb 24 '17 13:02

dNitro


2 Answers

  1. Choosing among the options you have listed I would say NodeJS is actually a better option for your use case because it does not use one thread per connection like the other two options. Threads are normally a finite resource on a given machine. Java and Ruby do have 'evented' servers though and these are worth looking at if you want an apples to apples comparison.

  2. I think you need to say more about the database you intend to use if you want advice on connection pooling. However reusing connections if they are costly to setup would be a good thing to do. It is probably a good idea to have the facility to configure the minimum and maximum size of the pool. Ultimately the correct size to use is a matter of testing.

  3. I think the benefit of caching in this system would be minimal as you are mostly writing data. If the data is valuable you will want to write it to disk rather than memory. On the other hand, if you have clients that are reading the data collected perhaps caching their reads in something like Redis might be a good idea.

like image 63
Frank Wilson Avatar answered Nov 14 '22 15:11

Frank Wilson


I'm sure you're aware, but this sounds like you're trying to prematurely optimize your application here.

1- Node being event-driven and non-blocking makes it a perfect candidate for holding a large number of open socket connections, no need for forking per connection. As always though, make sure your application is properly clustered. I was able to hold ~100k open TCP sockets on a dirt cheap laptop. If the number of device you need to support ever grows beyond that, just scale accordingly.

2- I saw you were planning on using postgres. Pools are always a good thing.

3- Caching is useful for 'hot' data. Stuff that gets queried a lot, and therefore having it in memory or inside redis (in-memory storage) makes these data lookups faster and removes strain on the system. In your case, if you just need to get certain chunks of data, for analytics or for more causal use, I would recommend spark or solr as opposed to a plain caching layer. It's also going to be much cheaper and easier to maintain.

like image 31
NodeNodeNode Avatar answered Nov 14 '22 13:11

NodeNodeNode