Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres won't start after deleting pg_xlog files

Using Debian Wheezy, Postgresql 9.3

My database went down because the partition where it keeps the WAL files got full. So, I deleted everything inside ./pg_xlog/ , because I didn't know what they were (yea, incredibly stupid of me). Now the Postgres service won't start, though the problem, according to syslog:

00000: could not open tablespace directory "pg_tblspc/16386/PG_9.3_201306121": File or directory not found
LOCAL:  RelationCacheInitFileRemoveInDir, relcache.c:4895
00000: Primary checkpoint record is invalid
LOCAL:  ReadCheckpointRecord, xlog.c:6543
00000: Secondary checkpoint record is invalid
LOCAL:  ReadCheckpointRecord, xlog.c:6547
PANIC: XX000: could not locate a valid checkpoint record
LOCAL:  StartupXLOG, xlog.c:5228

I'm not entirely sure whether the problem is that it can't find the proper pg_tblspc or the total lack of checkpoint WAL files. The actual path to where the databases are stored is /dados/PG_9.3_201306121. What can I do to make the service start again?

EDIT1: Okay, I've managed to get the thing back online. Some databases got corrupt. I've managed to DROPDB two of them (couldn't even connect to them without them forcing a service restart). I tried doing it to another one that got corrupt, but the error was related to xlog again. I've tried doing a clean restore over it, but the restore was incomplete. Then, I've created a new database and tried to restore an older backup of this database. It also came incomplete.

Now I can't drop any databases, nor create new ones, I always get a xlog flush request not satisfied error. I've tried running pg_resetxlog, but it didn't seem to do anything. Another thing the error shows is cannot write to block 1 of pg_tblspc/16385/PG_9.3_201306121/36596452/11773, write error may be permanent.

EDIT2: Part of the problem above was with that 11773 file. I've renamed it to 11773.corrupt and now the database allows me to create and drop again.

like image 526
Arthur Avatar asked Apr 27 '16 20:04

Arthur


1 Answers

Postgres won't start after deleting pg_xlog files

Um, yeah. Don't do that.

What can I do to make the service start again?

Well, you've corrupted your database. Restore from backups. You have backups, right? Preferably a handy PITR archive like from PgBarman where you can restore up to 5 mins ago. No?

OK, first, archive the damaged copy. https://wiki.postgresql.org/wiki/Corruption

Now. If you're lucky, pg_resetxlog will get you up and running enough to successfully do a pg_dump of the database, so you can then move the old damaged install's datadir aside, initdb a new one, and restore the database to it.

If you're unlucky pg_dump won't succeed, or you'll get restore failures due to things like duplicate primary keys. In the latter case might have to repair the dump by hand. If pg_dump fails, appropriate action will depend on why it fails.

So yeah. Don't delete pg_xlog.

There are discussions within the PostgreSQL community about renaming pg_xlog to something that makes it more obvious that it's an important component of the database, and hopefully it'll get done in the 9.7 release.

like image 74
Craig Ringer Avatar answered Oct 27 '22 21:10

Craig Ringer