What could be performance expectations of RabbitMQ on EC2? Would appreciate sharing experience here.
I am trying to do some performance test of RabbitMQ on aws EC2. I have 3 separate EC2 instance running for RabbitMQ, Publisher and consumer/worker.
The scenario I have is that Publisher pushes JSON string (approx 165-200 bytes) to exchange type direct with durable set to true and bind queue with durable set to true (i.e. both in persistent mode). Consumer/worker is running on separate box - keeps pulling messages. (Moving forward these messages at worker are expected to be persisted in MongoDB and Publisher would be replaced with Restful service using REST easy)
To keep things simple I have simulated this scenario by using Multicast sample code. I had split multicast code in to two separate java file namely “Producer” and “Worker” to run each on separate box. I have used “c1.mediam” EC2 with Ubuntu server v11.4 32 bit for running producer and consumer and “m1.large” with Ubuntu server v11.4 64 bit for RabbitMQ.
I am able to achieve a throughput of 3-5k messages per second i.e. keeping study message push rate to 5K. (This concur with http://www.rabbitmq.com/faq.html#performance-latency)
Further, when I increase the push rate to 10-12k messages per second. Consumer’s ability to consume messages drops to 1-2k messages per second and it generates backlog (Many time it goes below 800 messages per second too).
With above scenario, I have following questions and would appreciate thoughts/suggestion to improve throughput of consumer as well. (NOTE: all the messages in my scenario are expected to similar type giving no opportunity to group them for setting routing therefore may need some kind of load-balancer approach)
1) This performance is observed with one rabbitMQ server, one exchange and one queue. Is anything further can be configured, fine-tuned to improvise throughput to more than 5k with persistent mode.
2) I do understand, clustering could be another option. However, I need to set cluster based on incoming load and I may not get message grouping / identity to define routing (since messages are expected to be just log description). Can I have clustering following load balancing option for worker/consumer?
3) I am expected to process several hundred thousand requests per second. I would appreciate sharing some experience and approach to achieve this.
What type of storage are you using for the EC2 instances? EBS storage is more reliable, but sometimes it has very low throughput (especially if it's a small sized EBS volume, i.e. <100GB). Instance store, on the other hand, has much better IO performance (from our experience, at least), but can only "live" as long as the instance is running. Also, quite a difference is the instance type you're using. m1.small and c1.medium both have moderate IO performance (http://aws.amazon.com/ec2/instance-types/).
We're running RabbitMQ in EC2 with persistence for all the messages. We use only m1.large instances (64bit with high IO performance). We started with EBS storage, then switched to instance-store, to see if there's any improvement. And instance-store instances are faster in terms of IO throughput. But, the drawback is that all persisted messages are lost along with the termination/failure of the instance (although we never experienced a failure ever, so far). In our scenario, we don't need such a big throughput, but we do care a lot if our messages get lost :-)
In conclusion, you could try to switch to an instance-store setup, to see how that handles, if there's any improvement. And if that works much better, then I think http://www.rabbitmq.com/pacemaker.html is a solution to overcome failure. At least that's the direction we're switching to.
Cheers
Have you considered adding multiple consumers? This is one of the core benefits of a loosely coupled bus/message architecture as compared to a strictly coupled architecture. It may help to understand the need for the message volume as well. Is this a benchmark just to see what you can do or is this tied to an actual application need?
Hundreds of kHz is very, very high: if RabbitMQ can do that at all, you're looking at partitioning across clustered nodes. These writers found that their EC2 instances could process at most 100K packets/second, so obviously you won't get message throughput higher than that through a single instance.
You might investigate Kafka, written by LinkedIn for a similar sort of vast-firehose model. It pushes some complexity out to consumers in order to allow for genuine distributed-ness and lower message overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With