Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database sharding on Heroku

At some point in the next few months our app will be at the size where we need to shard our DB. We are using Heroku for hosting, Node.js/PostgreSQL stack.

Conceptually, it makes sense for our app to have each logical shard represent one user and all data associated with that user (each user of our app generates a lot of data, and there are no interactions between users). We need to retain the ability for the user to do complex ad-hoc querying on their data. I have read many articles such as this one which talk about sharding: http://www.craigkerstiens.com/2012/11/30/sharding-your-database/

Conceptually, I understand how Sharding works. However in practice I have no idea how to go about implementing this on Heroku, in terms of what code I need to write and what parts of my application I need to modify. A link to a tutorial or some pointers would be much appreciated.

Here are some resources I have already looked at:

  • http://www.craigkerstiens.com/2012/11/30/sharding-your-database/
  • MySQL sharding approaches?
  • Heroku takes care of multiple database servers?
  • http://petrohi.me/post/30848036722/scaling-out-postgres-partitioning
  • http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/
  • https://devcenter.heroku.com/articles/heroku-postgres-follower-databases
  • Why do people use Heroku when AWS is present? What distinguishes Heroku from AWS?
like image 588
raviparikh Avatar asked Feb 13 '13 19:02

raviparikh


People also ask

Is sharding possible in Postgres?

Version 10 of PostgreSQL added the declarative table partitioning feature. In version 11 (currently in beta), you can combine this with foreign data wrappers, providing a mechanism to natively shard your tables across multiple PostgreSQL servers.

What databases support sharding?

Cassandra, HBase, HDFS, MongoDB and Redis are databases that support sharding. Sqlite, Memcached, Zookeeper, MySQL and PostgreSQL are databases that don't natively support sharding at the database layer. For databases that don't offer built-in support, sharding logic has to reside in the application.

Does Heroku support databases?

Managed PostgreSQL from HerokuHeroku Postgres delivers the world's most advanced open source database as a trusted, secure, and scalable service that is optimized for developers.

Is Heroku Postgres good?

Heroku's Postgres open-source database is the most effective service for developers who want to build engaging apps. With all of its features and Heroku's experience managing databases, it guarantees security and compliance with GDPR.


1 Answers

As the author of the first article happy to chime in further. When it comes to sharding one of the very key components is what key are you sharding on. The complexity of sharding really comes into play when you have data that is intermingled across different physical nodes. If you're something like a multi-tenant app then modeling all your data around this idea of a tenant or customer can fit very cleanly in this setup. In that case you're going to want to break up all tables that are related to customer and shard them the same way as other tenant related tables.

As for doing this on Heroku, there are two options. You can roll your own with Heroku Postgres and application logic, or using something like Citus (which is an add-on that helps manage more of this for you.

For rolling your own, you'll first create the various application logic to handle creating all your shards and knowing where to route the appropriate queries to. For Rails there are some gems to help wtih this like activerecord-multi-tenant or apartment. When it comes to actually moving to sharding and that migration, what you'll want to do is create a Heroku follower to start. During the migration you'll have it start un-following. Then you'll remove half of the data from the original primary and the other half from the follower you separated accordingly.

like image 195
CraigKerstiens Avatar answered Oct 28 '22 21:10

CraigKerstiens