Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Massive multi-user realtime application with Google App Engine

I'm building a multiuser realtime application with Google App Engine (Python) that would look like the Facebook livestream plugin: https://developers.facebook.com/docs/reference/plugins/live-stream/

Which means: 1 to 1 000 000 users on the same webpage can perform actions that are instantly notified to everyone else. It's like a group chat but with a lot of people...

My questions:
- Is App Engine able to scale to that kind of number?
- If yes, how would you design it?
- If no, what would be your suggestions?

Right now, this is my design:
- I'm using the App Engine Channel API
- I store every user connected in the memcache
- Everytime an action is performed, a notification task is added to a taskqueue
- The task consist in retrieving all users from memcache and send them a notification.

I know my bottleneck is in the task. Everybody is notified through the same task/ request. Right now, for 30 users connected, it lasts about 1 sec so for 100 000 users, you can imagine how long it could take.

How would you correct this?

Thanks a lot

like image 818
Damien Avatar asked Dec 03 '11 07:12

Damien


1 Answers

How many updates per user do you expect per second? If each user updates just once every hour, you'll be sending 10^12 messages per hour -- every sent message results in 1,000,000 more sends. This is 277 million messages per second. Put another way, if every user sends a message an hour, that works out to 277 incoming messages per second, or 277 million outgoing messages.

So I think your basic design is flawed. But the underlying question: "how do I broadcast the same message to lots of users" is still valid, and I'll address it.

As you have discovered, the Channel API isn't great at broadcast because each call takes about 50ms. You could work around this with multiple tasks executing in parallel.

For cases like this -- lots of clients who need the exact same stateless data, I would encourage you to use polling, rather than the Channel API, since every client is going to receive the exact same information -- no need to send individualized messages to each client. Decide on an acceptable average latency (eg. 1 second) and poll at twice that rate (eg. 2 seconds). Write a very lightweight, memcache-backed servlet to just get the most recent block of data and let the clients de-dupe.

like image 128
Moishe Lettvin Avatar answered Oct 28 '22 18:10

Moishe Lettvin