Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

200 millions items at Vector

Tags:

c++

list

vector

Here is my structure

struct Node
{
 int chrN;
 long int pos;
 int nbp;
 string Ref;
 string Alt;
};

to fill the structure I read though a file and pars my interest variable to the structure and then push it back to a vectore. The problem is, there are around 200 millions items and I should keep all of them at memory (for the further steps)! But the program terminated after pushing back 50 millions of nodes with bad_allocation error.

terminate called after throwing an instance of 'std::bad_alloc'
what():  std::bad_alloc

searching around give me the idea I'm out of memory! but the output of top shows %48 (when the termination happened)

Additional information which may be useful: I set the stack limitation unlimit and I'm using Ubuntu x86_64 x86_64 x86_64 GNU/Linux with 4Gb RAM.

Any help whuold be most welcome.

Update:

1st switch from vector to list, then store each ~500Mb at file and index them for further analyses.

like image 942
khikho Avatar asked Mar 17 '26 01:03

khikho


2 Answers

Vector storage is contiguous, in this case, 200 mio * the sizeof the struct bytes are required. For each of the strings in the struct, another mini allocation may be needed to hold the string. All together, this is not going to fit your available address space, and no (non-compressing) data structure is going to solve this.

Vectors usually grow their backing capacity exponentially (which amortizes the cost for push_back). So when your program was already using about half the available address space, the vector probably attempted to double its size (or add 50%), which then caused the bad_alloc and it did not free the previous buffer, so the final memory appears to be only 48%.

like image 79
Alexander Gessler Avatar answered Mar 19 '26 17:03

Alexander Gessler


That node structure consumes up to 44 bytes, plus the actual string buffers. There's no way 200 million of them will fit in 4 GB.

You need to not hold your entire dataset in memory at once.

like image 21
Sneftel Avatar answered Mar 19 '26 15:03

Sneftel