Say I have 100G integer numbers and want to insert them into vector<int>
on a 32-bit machine, is it possible?
If I use a custom allocator
to manage the storage strategy, how to guarantee the following operations are always valid:
vector<int> coll;
coll.insert(100G integers);
memcpy(coll.begin() + (1024 * 1024 * 1024 * 8), "Hello", 5);
Note that the C++ standard requires the objects stored in a vector
must be consecutive. coll.begin() + (1024 * 1024 * 1024 * 8)
may be an address of the hard disk.
You can't use native pointers to address 100 G of integers directly, because they will consume 400 GB of memory; some 32-bit OS may address up to 2, 3 or 4 GB of RAM, most - up to 64 GB using PAE. Still, any 32-bit program will use 32-bit pointers able to address only up to 4 GB of RAM.
All standard STL implementations (libstdc++ from gcc, libcxx from llvm+clang, stlport from russia, microsoft stl from microsoft...) use native pointers inside STL collections, and native (32-bit) size_t as collection sizes.
You may try non-standard implementaton of STL, e.g. STXXL, http://stxxl.sourceforge.net/ (intro slides) which reimplements some STL collections using disk (HDD) as storage. With huge (you need the 400GB at least) fast SSD you may be able to fill the vector in several days or even tens of hours, if you are lucky.
The key features of STXXL are: Transparent support of parallel disks. The library provides implementations of basic parallel disk algorithms. STXXL is the only external memory algorithm library supporting parallel disks. The library is able to handle problems of very large size (tested to up to dozens of terabytes).
But modern versions of STXXL are not supported for 32-bit platforms; I can't say, will any older version work on 32-bit platform with so huge data... It uses some parts of STL, and if there are any size_t sized arguments, your task will fail...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With