Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently read flattened file in C

I'm trying to read a large file that has one float per line in C. For this, I put together the code below. It works fine when testing on small data. However, when reading 600 million numbers this way, it is very slow. Any ideas for how I can speed it up? I'm generating the raw file via python, so re-formatting the data (to have multiple numbers in a line separated by commas for example) is also an option. Any insight into why this method is so slow would be greatly appreciated.

void read_file(float *W)
{
   FILE *fp;

   int i = 0;

// In this file, one row should contain only one NUMBER!!
// So flatten the matrix.
   if (fp = fopen("C:\\Users\\rohit\\Documents\\GitHub\\base\\numerical\\c\\ReadFile1\\Debug\\data.txt", "r")) {
      while (fscanf(fp, "%f", &W[i]) != EOF) {
         ++i;
      }
      fclose(fp);
   }

   fclose(fp);

   scanf("%d",&i);    
}
like image 281
Rohit Pandey Avatar asked Jan 27 '23 16:01

Rohit Pandey


1 Answers

(Comment: This is my second answer.) I see the OP asked in a comment:

Do you happen to have a sample in C for reading the binary floats by any chance?

A binary version would blow any ascii version out-of-the-water. And is shorter.

Here the OP's function signature has been changed to include the maximum number of floats in the return W, and to return the number actually read from the file.

size_t read_file(float *W, size_t maxlen)
{
    FILE *fp = fopen("C:\\Users\\rohit\\Documents\\GitHub\\base\\numerical\\c\\ReadFile1\\Debug\\data.txt", "r");
    return fp ? fread(W, sizeof(float), maxlen, fp) : 0;
}

Or for something even faster, you could use mmap... . But this is not available on Windows.


Added: However, unbuffered I/O is would perhaps be faster. The following function uses a single malloc and a single unbuffered read to copy a file to the heap. (NB: not yet tested on large files; may need open64.)

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>  
#include <sys/stat.h>  

void *readFileToHeap(const char *file, int *len) {
   void *retval = 0;
   ssize_t cnt;
   struct stat st;
   const int fd = open(file, O_RDONLY, 0);
   if (fd < 0)
      return printf("Cannot open %s\n", file), (void *)0;
   if (fstat(fd, &st))
      return perror("fstat()"), close(fd), (void *)0;
   if (!(retval = malloc(st.st_size)))
      return perror("malloc()"), close(fd), (void *)0;
   cnt = read(fd, retval, st.st_size);
   close(fd); // not the best: could clobber errno
   if (cnt < 0)
      return perror("read()"), free(retval), (void *)0;
   if (cnt != st.st_size)
      return printf("Partial read %d\n", cnt), free(retval), (void *)0;
   *len = cnt;
   return retval;
}
like image 111
Joseph Quinsey Avatar answered Jan 30 '23 06:01

Joseph Quinsey