Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Heroku Postgres DB slower after upgrade

2 days ago, I upgraded our Heroku Postgres server from Kappa to Ronin. Our DB was up to several GB and I figured the extra ram would help with the cache. I used the standard fast swapping technique (create follower, allow transfer, promote follower). I know that the cache can take time to warm up, but it's been several days and it's been SLOWING down.

Our smaller DB was running around 5ms response times. The new DB jumped to about 10ms after the transfer (cold cache). It has since fluctuated between 10ms and 20ms.

  • The new DB is running the exact same version (9.2.4).
  • I have noticed there is more logging occurring (checkpoints).
  • The db cache hit/miss from the old DB was ~0.91, hence the update. The new DB is already up to a similar hit/miss so I would expect that the warmness of the cache is no longer the issue.

Is there some config which could be different? I know that every app is different, but shouldn't the cache have warmed by now? Is there any undocumented differences between Kappa & Ronin?

Thanks

like image 242
Forrest Avatar asked Jul 15 '13 18:07

Forrest


People also ask

Is Heroku Postgres good?

Heroku Postgres is an easy, low-cost way to get started with a relational database on the Heroku platform. This open-source database is also the most effective service for developers looking to build engaging apps.

What version of Postgres does heroku use?

For Hobby plans, all newly provisioned databases will default to PostgreSQL 14.


1 Answers

I've seen this before with a client who called me for some emergency help.

After doing some poking around with heroku bash we eventually concluded that the new instance was on particularly busy underlying server. We did a failover via follower promotion to another machine, at which point performance greatly improved - though the failover its self was challenging due to the problems with the master.

As far as I know Heroku's instances are Amazon EC2 nodes (Xen VMs) that run an LXC container to isolate each Heroku user's database clusters. LXC offers rather less isolation than a full VM does; instances can contend for RAM, disk I/O, CPU, etc, depending on the exact policy configured with OpenCZ, any control group policies, etc.

If you're on an instance where the other users aren't doing much and if the container permits your DB to use resources that aren't currently required by other users, you could easily see steadily higher than guaranteed performance.

I suspect that people on larger heroku plans are more likely to actually be using the resources of the system you're sharing a container with.

If you do a promotion failover to a bigger instance where all the users are there because they really need the resources offered by the bigger machine you could actually get less resources overall, because everyone's actually using their shares.

It's frustrating that Heroku offer so little visibility into the systems that run their DBs. It's hard to tell how/if they load balance between container hosts, what the underlying load on the system is, etc.

In a comment, @Forrest pointed out that Heroku have a useful page on their server details, showing that only the lower tiers are multi-tenant, but higher tiers are not. This would easily explain the performance loss observed here, and would fit in with my comments above that the lower plan was allowing Forrest to borrow unused resources from other users.

like image 160
Craig Ringer Avatar answered Oct 03 '22 08:10

Craig Ringer