2 days ago, I upgraded our Heroku Postgres server from Kappa to Ronin. Our DB was up to several GB and I figured the extra ram would help with the cache. I used the standard fast swapping technique (create follower, allow transfer, promote follower). I know that the cache can take time to warm up, but it's been several days and it's been SLOWING down. Our smaller DB was running around 5ms response times. The new DB jumped to about 10ms after the transfer (cold cache). It has since fluctuated between 10ms and 20ms. <ul> <li>The new DB is running the exact same version (9.2.4). </li> <li>I have noticed there is more logging occurring (checkpoints). </li> <li>The db cache hit/miss from the old DB was ~0.91, hence the update. The new DB is already up to a similar hit/miss so I would expect that the warmness of the cache is no longer the issue.</li> </ul> Is there some config which could be different? I know that every app is different, but shouldn't the cache have warmed by now? Is there any undocumented differences between Kappa & Ronin? Thanks

I've seen this before with a client who called me for some emergency help. After doing some poking around with <code>heroku bash</code> we eventually concluded that the new instance was on particularly busy underlying server. We did a failover via follower promotion to another machine, at which point performance greatly improved - though the failover its self was challenging due to the problems with the master. As far as I know Heroku's instances are Amazon EC2 nodes (Xen VMs) that run an LXC container to isolate each Heroku user's database clusters. LXC offers rather less isolation than a full VM does; instances can contend for RAM, disk I/O, CPU, etc, depending on the exact policy configured with OpenCZ, any control group policies, etc. If you're on an instance where the other users aren't doing much and if the container permits your DB to use resources that aren't currently required by other users, you could easily see steadily higher than guaranteed performance. I suspect that people on larger heroku plans are more likely to actually be using the resources of the system you're sharing a container with. If you do a promotion failover to a bigger instance where all the users are there because they really need the resources offered by the bigger machine you could actually get less resources overall, because everyone's actually using their shares. It's frustrating that Heroku offer so little visibility into the systems that run their DBs. It's hard to tell how/if they load balance between container hosts, what the underlying load on the system is, etc. In a comment, @Forrest pointed out that Heroku have a useful page on their server details, showing that only the lower tiers are multi-tenant, but higher tiers are not. This would easily explain the performance loss observed here, and would fit in with my comments above that the lower plan was allowing Forrest to borrow unused resources from other users.

Heroku Postgres DB slower after upgrade

Tags:

postgresql

heroku

2 days ago, I upgraded our Heroku Postgres server from Kappa to Ronin. Our DB was up to several GB and I figured the extra ram would help with the cache. I used the standard fast swapping technique (create follower, allow transfer, promote follower). I know that the cache can take time to warm up, but it's been several days and it's been SLOWING down.

Our smaller DB was running around 5ms response times. The new DB jumped to about 10ms after the transfer (cold cache). It has since fluctuated between 10ms and 20ms.

The new DB is running the exact same version (9.2.4).
I have noticed there is more logging occurring (checkpoints).
The db cache hit/miss from the old DB was ~0.91, hence the update. The new DB is already up to a similar hit/miss so I would expect that the warmness of the cache is no longer the issue.

Is there some config which could be different? I know that every app is different, but shouldn't the cache have warmed by now? Is there any undocumented differences between Kappa & Ronin?

Thanks

242

asked Jul 15 '13 18:07

Forrest

1 Answers

I've seen this before with a client who called me for some emergency help.

After doing some poking around with heroku bash we eventually concluded that the new instance was on particularly busy underlying server. We did a failover via follower promotion to another machine, at which point performance greatly improved - though the failover its self was challenging due to the problems with the master.

As far as I know Heroku's instances are Amazon EC2 nodes (Xen VMs) that run an LXC container to isolate each Heroku user's database clusters. LXC offers rather less isolation than a full VM does; instances can contend for RAM, disk I/O, CPU, etc, depending on the exact policy configured with OpenCZ, any control group policies, etc.

If you're on an instance where the other users aren't doing much and if the container permits your DB to use resources that aren't currently required by other users, you could easily see steadily higher than guaranteed performance.

I suspect that people on larger heroku plans are more likely to actually be using the resources of the system you're sharing a container with.

If you do a promotion failover to a bigger instance where all the users are there because they really need the resources offered by the bigger machine you could actually get less resources overall, because everyone's actually using their shares.

It's frustrating that Heroku offer so little visibility into the systems that run their DBs. It's hard to tell how/if they load balance between container hosts, what the underlying load on the system is, etc.

In a comment, @Forrest pointed out that Heroku have a useful page on their server details, showing that only the lower tiers are multi-tenant, but higher tiers are not. This would easily explain the performance loss observed here, and would fit in with my comments above that the lower plan was allowing Forrest to borrow unused resources from other users.

160

answered Oct 03 '22 08:10

Craig Ringer

Related questions
                            
                                Database inheritance model with Node.js
                            
                                Faster search for records where 1st character of field doesn't match [A-Za-z]?
                            
                                Average over hard to define partition
                            
                                PostgreSQL on my Mac - did I install it? how do I uninstall it? [closed]
                            
                                libpq - PQsendQuery wait for complete result
                            
                                Saving / Loading Images in Postgres using Anorm (Scala/PlayFramework 2)
                            
                                SQL sum of column value, unique per user per day
                            
                                How to use SQLAlchemy to select value at a position in a PostgreSQL ARRAY?
                            
                                Django Admin Search optimization
                            
                                How safe is it to use only redis to implement activity streams?
                            
                                How can I optimize this query (or is there a better way)?
                            
                                Copy in Postgres from a tab delimited file to table
                            
                                psql: FATAL: role "postgres" does not exist (with -h localhost option)
                            
                                can we select and update to database together?
                            
                                Drop or create database from stored procedure in PostgreSQL
                            
                                PostgreSQL - filter database list
                            
                                PostgreSQL, pgAdmin, Java: How to make them all UTC?
                            
                                PostgreSQL expanding cidr into individual addresses
                            
                                libpq, insert with parameters
                            
                                Scala & Play! & Slick & PostgreSQL auto increment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With