I have a C++ program that transposes a very large matrix. The matrix is too large to hold in memory, so I was writing each column to a separate temporary file, and then concatenating the temporary files once the whole matrix has been processed. However, I am now finding that I am running up against the problem of having too many open temporary files (i.e. the OS doesn't allow me to open enough temporary files). Is there a system portable method for checking (and hopefully changing) the maximum number of allowed open files?
I realise I could close each temp file and reopen only when needed, but am worried about the performance impact of doing this.
My code works as follows (pseudocode - not guaranteed to work):
int Ncol=5000; // For example - could be much bigger.
int Nrow=50000; // For example - in reality much bigger.
// Stage 1 - create temp files
vector<ofstream *> tmp_files(Ncol); // Vector of temp file pointers.
vector<string> tmp_filenames(Ncol); // Vector of temp file names.
for (unsigned int ui=0; ui<Ncol; ui++)
{
string filename(tmpnam(NULL)); // Get temp filename.
ofstream *tmp_file = new ofstream(filename.c_str());
if (!tmp_file->good())
error("Could not open temp file.\n"); // Call error function
(*tmp_file) << "Column" << ui;
tmp_files[ui] = tmp_file;
tmp_filenames[ui] = filename;
}
// Stage 2 - read input file and write each column to temp file
ifstream input_file(input_filename.c_str());
for (unsigned int s=0; s<Nrow; s++)
{
int input_num;
ofstream *tmp_file;
for (unsigned int ui=0; ui<Ncol; ui++)
{
input_file >> input_num;
tmp_file = tmp_files[ui]; // Get temp file pointer
(*tmp_file) << "\t" << input_num; // Write entry to temp file.
}
}
input_file.close();
// Stage 3 - concatenate temp files into output file and clean up.
ofstream output_file("out.txt");
for (unsigned int ui=0; ui<Ncol; ui++)
{
string tmp_line;
// Close temp file
ofstream *tmp_file = tmp_files[ui];
(*tmp_file) << endl;
tmp_file->close();
// Read from temp file and write to output file.
ifstream read_file(tmp_filenames[ui].c_str());
if (!read_file.good())
error("Could not open tmp file for reading."); // Call error function
getline(read_file, tmp_line);
output_file << tmp_line << endl;
read_file.close();
// Delete temp file.
remove(tmp_filenames[ui].c_str());
}
output_file.close();
Many thanks in advance!
Adam
There are at least two limits:
ulimit to change the limit, within the bounds allowed by the sysadminA better solution is to avoid having so many open files. In one of my own programs, I wrote a wrapper around the file abstraction (this was in Python, but the principle is the same in C), which keeps track of the current file position in each file, and opens/closes files as needed, keeping a pool of currently-open files.
There isn't a portable way to change the max number of open files. Limits like this tend to be imposed by the operating system and are therefore OS-specific.
Your best bet is to reduce the number of files you have open at any one time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With