Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lossless data compression for extremely large data - Planetary artificial intelligence

I want to create an environment for artificial intelligence, with planetary size. It will simulate underground life on a very large world. According to Wikipedia, planet Earth has a surface area of 510,072,000 Km^2, I want to create a square of similar proportions, maybe bigger. I will store one meter on each bit, where 0 means dirt and 1 means wall of dirt.

Let's first calculate how to store a single line of this square. One line would be 510072000000m and each byte can store 8 meters, so one line would be 59.38GB and the entire world would be 3.44PB. And I would like to add at least water and lava to each square meter, that would multiply the results by 2.

I need to compress this information with lossless data compression algorithms. I first tried a very direct approach with 7zip and I tried it with a smaller world, where one line would be 6375B. In theory, the world should be 6375^2B = 38.76MB, but when I try it I get a file of 155MB, I do not know why this difference. But when I compress it with 7Zip, I get a file of 40.1MB. It is a huge difference, and with that ratio I would convert my 3.44PB world file into a 912.21GB file.

My first thought is, why am I having such a large file, when maths tell me it should be smaller? Maybe the problem is the code, maybe the problem is that I had errors on maths. The code is as follows: (C#)

// 510072000000m each line = 63759000000B
const long SIZE = 6375;

// Create the new, empty data file.
string fileName = tbFile.Text;

FileStream fs = new FileStream(fileName, FileMode.Create);

// Create the writer for data.
BinaryWriter w = new BinaryWriter(fs);

// Use random numbers to fill the data
Random random = new Random();
// Write data to the file.
for (int i = 0; i < SIZE; i++)
{
    for (int j = 0; j < SIZE; j++)
    {
        w.Write(random.Next(0,256));
    }
}

w.Close();

fs.Close();

And the maths are so basic that if I did something wrong I cannot see it.

Can you give me any advice? Just focus on data compression, artificial intelligence is not a problem because I have experience with evolutionary algorithms and the world does not need to be real time, it can take all the time it needs.

Thank you all for your time.

like image 289
user1506205 Avatar asked Jul 06 '12 09:07

user1506205


1 Answers

I don't know about C#, but it seems you are currently writing 4 bytes each time (6375 * 6375 * 4 Bytes in MB = 155 MB). So I guess the Write method currently writes a 32 bits integer.

like image 80
Scharron Avatar answered Oct 11 '22 13:10

Scharron