Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C/C++ System portable way to change maximum number of open files

I have a C++ program that transposes a very large matrix. The matrix is too large to hold in memory, so I was writing each column to a separate temporary file, and then concatenating the temporary files once the whole matrix has been processed. However, I am now finding that I am running up against the problem of having too many open temporary files (i.e. the OS doesn't allow me to open enough temporary files). Is there a system portable method for checking (and hopefully changing) the maximum number of allowed open files?

I realise I could close each temp file and reopen only when needed, but am worried about the performance impact of doing this.

My code works as follows (pseudocode - not guaranteed to work):

int Ncol=5000; // For example - could be much bigger.
int Nrow=50000; // For example - in reality much bigger.

// Stage 1 - create temp files
vector<ofstream *> tmp_files(Ncol);  // Vector of temp file pointers.
vector<string> tmp_filenames(Ncol);  // Vector of temp file names.
for (unsigned int ui=0; ui<Ncol; ui++)
{
    string filename(tmpnam(NULL));  // Get temp filename.
    ofstream *tmp_file = new ofstream(filename.c_str());
    if (!tmp_file->good())
         error("Could not open temp file.\n"); // Call error function
    (*tmp_file) << "Column" << ui;
    tmp_files[ui] = tmp_file;
    tmp_filenames[ui] = filename;
 }

 // Stage 2 - read input file and write each column to temp file
 ifstream input_file(input_filename.c_str());
 for (unsigned int s=0; s<Nrow; s++)
 {
       int input_num;
       ofstream *tmp_file;
       for (unsigned int ui=0; ui<Ncol; ui++)
       {
           input_file >> input_num;
           tmp_file = tmp_files[ui];          // Get temp file pointer
           (*tmp_file) << "\t" << input_num;  // Write entry to temp file.
       }
 }
 input_file.close();

 // Stage 3 - concatenate temp files into output file and clean up.
 ofstream output_file("out.txt");
 for (unsigned int ui=0; ui<Ncol; ui++)
 {
      string tmp_line;
      // Close temp file
      ofstream *tmp_file = tmp_files[ui];
      (*tmp_file) << endl;
      tmp_file->close();

      // Read from temp file and write to output file.
      ifstream read_file(tmp_filenames[ui].c_str());
      if (!read_file.good())
            error("Could not open tmp file for reading."); // Call error function
      getline(read_file, tmp_line);
      output_file << tmp_line << endl;
      read_file.close();

      // Delete temp file.
      remove(tmp_filenames[ui].c_str());
 }
 output_file.close();

Many thanks in advance!

Adam

like image 208
Adam Avatar asked Dec 11 '25 17:12

Adam


2 Answers

There are at least two limits:

  • the operating system may impose a limit; in Unix (sh, bash, and similar shells), use ulimit to change the limit, within the bounds allowed by the sysadmin
  • the C library implementation may have a limit as well; you'll probably need to recompile the library to change that

A better solution is to avoid having so many open files. In one of my own programs, I wrote a wrapper around the file abstraction (this was in Python, but the principle is the same in C), which keeps track of the current file position in each file, and opens/closes files as needed, keeping a pool of currently-open files.

There isn't a portable way to change the max number of open files. Limits like this tend to be imposed by the operating system and are therefore OS-specific.

Your best bet is to reduce the number of files you have open at any one time.

like image 42
NPE Avatar answered Dec 14 '25 07:12

NPE