Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Develop a clock and workers in node.js on heroku

I'm working on a service that needs to analyze data from social media networks every five minutes for different users. I'm developing it in node.js and I will implement it on Heroku.

According to this article on Heroku website, the best way to do that is separating the logic of the scheduler from the logic of the worker. In fact, the idea is to have one dyno dedicated to schedule tasks to avoid duplication. This dyno instructs a farm of workers (n dynos as needed) to do the tasks.

Here is the procfile of this architecture:

web:    node web.js
worker: node worker.js
clock:  node clock.js

The problem is how to implement it in node.js. I googled it, and the suggestion is to use message queue systems (like IronMQ, RabbitMQ or CloudAMQP). But I'm trying to set my code and app simple and with the minor need of add-ons.

The question is: is there a way to communicate directly from my scheduler (clock) to the worker dynos?

Thanks for your answers.

like image 392
Jordi Romeu Avatar asked Jan 21 '15 15:01

Jordi Romeu


People also ask

Does node schedule work on Heroku?

After testing on Heroku, you can add a job to Cron To Go that specifies node schedules.

What is workers in Heroku?

Heroku allows you to specify an application-specific process model, which can include background workers retrieving and processing jobs from the work queue.

Is heroku scheduler free?

Scheduler is a free add-on for running jobs on your app at scheduled time intervals, much like cron in a traditional server environment. While Scheduler is a free add-on, it executes scheduled jobs via one-off dynos that will count towards your monthly usage. Scheduler job execution is expected but not guaranteed.


2 Answers

Heroku dynos do not have fixed IP addresses, so there is no way to open a direct connection between them. That's why you need to create a separate server instance with a static IP or other fixed endpoint that acts as a go-between.

You have at least two viable options: a RabbitMQ-type message queue, or a stripped down version using a pub-sub redis feed. I generally use the latter because it's quick, simple, and sufficiently robust for all my needs (e.g. if a message gets lost every once in a blue moon, it's no big deal). If, however, it is essential that you never lose a message, you should use a full-blown message queue like RabbitMQ.

Setting up the redis implementation is very straightforward. There are several redis add-ons (I use RedisCloud) with free and inexpensive plans. When you provision them, you get an endpoint to connect to and a password. Then you just connect your web dyno(s) and worker dyno(s) to your redis instance such that your web app publishes tasks to a channel and the worker subscribes to that channel.

If you need the web app to communicate with the client after task completion, you just create another channel for the worker to publish task completion messages and the web app to listen for them.

You'll never get duplication of tasks, as each time a worker receives a message it pops off the queue.

like image 85
BarthesSimpson Avatar answered Oct 13 '22 04:10

BarthesSimpson


If I understood this correctly, you want to spin a clock as one app, and then spin workers as separate apps? Sure, there is a direct way. You open a connection from the clock app towards the worker app.

For example, have every worker open a client sockets connection to the clock. Then the clock can communicate to them and relay orders.

Or use WebRTC. That way the workers will talk to the clock, but they can also talk to each other.

Or make an (authenticated) HTTP(s) REST endpoint on the worker where it will receive tasks. Like, POST /tasks will create a task on the worker. If the task is short, it can reply right away, so that the clock knows the job is done. Or if it's a longer task, it can acknowledge it, but later call an endpoint on the clock to say it's done, something like PUT /tasks/32.

Or even more directly, open a direct net connection towards the clock, for example on worker start (and the other way around). Use dgram and send UDP messages between worker and clock.

In any way, I also believe that the people suggesting MQ like RabbitMQ is much better to just push jobs/tasks on. Then it can distribute tasks as needed, and based on unacked count on the job queue, it can spin up more workers when needed.

But your question is very broad, so to get more details, you could provide a little more details.

like image 42
Zlatko Avatar answered Oct 13 '22 05:10

Zlatko