Background on the problem.
So I can store compress the text using PHP's gzcompress and possibly decrease a file size up to 70% and then store it in a bytea column. But is the operation really worth it? Isn't bytea already compressed in PostgrSQL using TOAST and adding another layer of compress will not have a significant impact on the size of the data?
Yes and no, it depends on your application.
RE: TOAST, according to PostgreSQL's documentation compression (using LZ), they only invoke compression if the text is larger than a threshold value of 2KiB.
So if the HTML you're storing is less than 2KiB then it might be worthwhile doing your own compression, though in this case I wouldn't bother as most HTML documents tend to be at least 10KiB thesedays, and implementing compression in your application layer seems like trouble and makes your data less portable. There is also a very real performance hit from doing it from within PHP.
However, if you were storing an archive for a very large web forum, for example, where the HTML is going to be less than 2KiB on average, but there's a lot of it (some forums have postcounts into tens-of-billions) then there's a very good case in compressing the data regardless.
So if you have lots (as in, >10GB or so) of small pieces of data, then it might be worthwhile compressing the data by yourself, but always profile and benchmark first!, otherwise don't bother and let PostgreSQL sort it out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With