Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a advantage to storing a compressed file in the database

Background on the problem.

  • The database is PostgreSQL 9.1
  • Data is large amounts of text(web page mark-up language)
  • The column is bytea

So I can store compress the text using PHP's gzcompress and possibly decrease a file size up to 70% and then store it in a bytea column. But is the operation really worth it? Isn't bytea already compressed in PostgrSQL using TOAST and adding another layer of compress will not have a significant impact on the size of the data?

like image 642
Devin Dixon Avatar asked Oct 21 '22 15:10

Devin Dixon


1 Answers

Yes and no, it depends on your application.

RE: TOAST, according to PostgreSQL's documentation compression (using LZ), they only invoke compression if the text is larger than a threshold value of 2KiB.

So if the HTML you're storing is less than 2KiB then it might be worthwhile doing your own compression, though in this case I wouldn't bother as most HTML documents tend to be at least 10KiB thesedays, and implementing compression in your application layer seems like trouble and makes your data less portable. There is also a very real performance hit from doing it from within PHP.

However, if you were storing an archive for a very large web forum, for example, where the HTML is going to be less than 2KiB on average, but there's a lot of it (some forums have postcounts into tens-of-billions) then there's a very good case in compressing the data regardless.

So if you have lots (as in, >10GB or so) of small pieces of data, then it might be worthwhile compressing the data by yourself, but always profile and benchmark first!, otherwise don't bother and let PostgreSQL sort it out.

like image 139
Dai Avatar answered Oct 27 '22 01:10

Dai