Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to reduce the size of texts?

Description: I have a huge MySQL database table. The total size is about 10 terabytes. It only contains texts.

A sample text from this database table:

In other cases, some countries have gradually learned to produce the same products and services that previously only the U.S. and a few other countries could produce. Real income growth in the U.S. has slowed.

There are about 50 billion different texts.

What have I tried?

I've tried to zip them all. Actually it has worked, reduced the total size. However, I need to make searching and I can't search any data while they are located in a zip file.

I've tried PHP's base64 encoding. It has made my sample text data as:

SW4gb3RoZXIgY2FzZXMsIHNvbWUgY291bnRyaWVzIGhhdmUgZ3JhZHVhbGx5IGxlYXJuZW QgdG8gcHJvZHVjZSB0aGUgc2FtZSBwcm9kdWN0cyBhbmQgc2VydmljZXMgdGhhdCBwcmV2 aW91c2x5IG9ubHkgdGhlIFUuUy4gYW5kIGEgZmV3IG90aGVyIGNvdW50cmllcyBjb3VsZC Bwcm9kdWNlLiBSZWFsIGluY29tZSBncm93dGggaW4gdGhlIFUuUy4gaGFzIHNsb3dlZC4=

What I'd like to accomplish?

I want to reduce text's size before sending them to MySQL. First of all, I don't know how I can do this job. I'm thinking of encrypting and decrypting the data.

So, here is an example what I want to do:

I want to encrypt text data before storing. Then, I want to call encrypted data from MySQL in order to decrypt.

Any way to reduce the size of texts? Base64 does not work for me, is there any other way?

like image 209
Paraiba to Pusan Avatar asked Sep 22 '12 19:09

Paraiba to Pusan


2 Answers

Please Note neither base64 nor encryption was designed for reduction of string length. What you should be looking at is compression and i think you should look at gzcompress and gzdeflate

Example using decoded version of your text

$original = "In other cases, some countries have gradually learned to produce the same products and services that previously only the U.S. and a few other countries could produce. Real income growth in the U.S. has slowed." ;
$base64 = base64_encode($original);
$compressed = base64_encode(gzcompress($original, 9));
$deflate = base64_encode(gzdeflate($original, 9));
$encode = base64_encode(gzencode($original, 9));


$base64Length = strlen($base64);
$compressedLength = strlen($compressed) ;
$deflateLength  = strlen($deflate) ;
$encodeLength  = strlen($encode) ;

echo "<pre>";
echo "Using GZ Compress   =  " , 100 - number_format(($compressedLength / $base64Length ) * 100 , 2)  , "% of Improvement", PHP_EOL;
echo "Using Deflate       =  " , 100 - number_format(($deflateLength / $base64Length ) * 100 , 2)  , "% of Improvement", PHP_EOL;
echo "</pre>";

Output

Using GZ Compress   =  32.86%  Improvement
Using Deflate       =  35.71%  Improvement
like image 158
Baba Avatar answered Sep 22 '22 10:09

Baba


Base64 is not compression or encryption, it is encoding. You can pass text data through the gzip compression algorithm (http://php.net/manual/en/function.gzcompress.php) before you store it in the database, but that will basically make the data unsearchable via MySQL queries.

like image 38
monitorjbl Avatar answered Sep 20 '22 10:09

monitorjbl