Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IoT data system design: Google Pub/Sub vs Kafka vs Kinesis vs PubNub for IoT data ingestion?

I'm trying to build an IoT + data analytics system and I'm having trouble deciding on what technology or service to use for ingestion.

A high level description of the end goal is:

  1. IoT devices push data to an IoT gateway (using Zigbee, Z-wave, Bluetooth, etc)
  2. IoT gateway (which is connected to the internet) pushes data to a pub/sub system
  3. Backend services process the data coming out of the pub/sub system, updates dashboards and sends out alerts

My question is what kind of pubsub system should we use if we only need ~10 second response time? (E.g. The following is acceptable: IoT device senses event and then about 10 seconds later it shows up on a user's dashboard or sets off an alert)

Below are some questions I have:

  1. I see PubNub advertised a lot for use in IoT. My understanding is that PubNub is basically a very fast pubsub system that guarantees less than 1/4 second delivery- is this a correct understanding of it? But PubNub is a bit expensive compared to using Google's Pub/Sub or maintaining our own Kafka.
  2. Is Google Pub/Sub in a sense similar to PubNub, because unlike being a single self-managed Kafka cluster running in a single data center, Google Pub/Sub has its own network because it's a part of Google? (similar to how PubNub is a "data-streams-network"?)
  3. If I use Kafka, should the producer be in the gateway devices?
    1. If the producer isn't in the gateway devices, then should the Kafka producers be on our servers and have a REST API to accept messages from the gateway devices?
    2. If the Kafka producer IS in the gateway devices, does there need to be anything special in front of the Kafka brokers for them to accept messages from the gateway devices?
  4. PubNub can be used to send commands back to the IoT devices. Can this also be done with Google Pub/Sub or Kafka?
    1. To push commands out to the IoT devices With Kafka, would every Gateway device need a consumer that is waiting for messages from the topics it's subscribed to? (e.g. the commands)

Also, not sure if it's worth mentioning, but currently, the team is just me and maybe 2 other full stack developers. We've read up on Kafka and Zookeeper but none of us have gone past rolling out a tutorial example of it.

like image 834
gunit Avatar asked Apr 28 '17 08:04

gunit


People also ask

Should I use kinesis/PubSub or Kafka?

These operations aren’t needed when you use Kinesis/PubSub. even said that Kinesis isn’t automatically scaled up/down, it’s still easier than doing it in Kafka. Kafka can support ordered messages in the partition level, consumer read data from partition, so it will get the messages ordered.

What is the difference between Kafka and pub/sub?

Kafka and Pub/Sub both perform well when handling large volumes of small messages. Kafka places no hard limit on message size and lets you configure the allowed message size, while Pub/Sub limits messages to 10 MB. You can indirectly send larger payloads by first storing the object in Cloud Storage, as shown in the following diagram:

What is the difference between Kafka and kinesis?

Kafka can run on a cluster of brokers with partitions split across cluster nodes. As a result, Kafka aims to be highly scalable. However, Kafka can require extra effort by the user to configure and scale according to requirements. Kinesis is a cloud based real-time processing service.

How does Pub/Sub work with Google Cloud?

Pub/Sub directs publisher traffic to the nearest Google Cloud data center where data storage is allowed, as defined in the resource location restriction policy. Pub/Sub can integrate with many Google Cloud services such as Dataflow , Cloud Storage , and Cloud Run .


2 Answers

I would recommend option 3.1 since I personally know it is proven in production for many IoT use cases including one which involves over 20 Million devices. Confluent Kafka REST Proxy is open source and makes a great way to convert REST/HTTP(s) from the gateways over the internet (using firewall and load balancer friendly HTTPS protocol) into a Kafka REST Proxy in the cloud/datacenter and then into Kafka and all the back end dashboard tools that all support Kafka very well. Even IBM uses this architecture for their IoT infrastructure on Bluemix MessageHub. If REST is not to your liking then there are MQTT, CoAP, websockset, AMQP, XMPP and many other Kafka Connectors to chose from.

like image 163
Hans Jespersen Avatar answered Oct 15 '22 11:10

Hans Jespersen


All great questions. (Full disclosure I work for PubNub)

PubNub is way more then just pub/sub (I will get to that later on). First PubNub was built as a global distributed network for a reason - so we could provide low-latency connectivity for all devices around the world through local Points of Presence. Because of this distributed architecture your devices will always connect to the POP closest to them and because we replicate messages globally, if a server or node goes down you will automatically be reconnected to the next closest node with no message lose. Because of this, PubNub provides 99.999% uptime SLA's to all customers.

When it comes to build vs. buy decision, I can tell you that many of our customers started by thinking they could build it themselves but quickly realized the undertaking was way more than they expected - read more here. Building and maintaining the client libraries, scaling the backend, 24-7 monitoring, and security are all things that you would need to have in-house expertise. If you take into consideration the upfront development costs, on-going maintenance - vs. downloading an SDK today, start coding today so you can get to market faster with a known scalable solution.

PubNub is priced based on transactions, so depending on how many devices and the level of traffic, I would bet the total cost would still be less than one full time employee. And for that you get to choose from over 70+ client SDK's, tap into a proven scalable architecture, take advantage of the security features already built in, have a full team ready and waiting to help 24/7, so you can focus on innovation and not infrastructure.

PubNub is way more then just Pub/Sub. PubNub provides not only real-time messaging, but also state management and serverless compute via their programmable network. PubNub allows you to write and deploy functions in the network, in fact there are already 30+ prebuilt functions available in the PubNub BLOCKS Catalog that allow you to send SMS, emails, and more, when data changes.

PubNub has also created an open source project for building real-time dashboards called Project EON. This makes it super easy to provide real-time visualization for all of your device data.

You are also correct in stating that PubNub can be used for remote device control. And is being used today for just that by Insteon, Logitech, Samsung, Wink and many more.

If you have any additional questions, PubNub has an amazing support staff available 24/7 - [email protected] or click chat on the website.

like image 30
wmschott Avatar answered Oct 15 '22 12:10

wmschott