I'm trying to read a large file that has one float per line in C. For this, I put together the code below. It works fine when testing on small data. However, when reading 600 million numbers this way, it is very slow. Any ideas for how I can speed it up? I'm generating the raw file via python, so re-formatting the data (to have multiple numbers in a line separated by commas for example) is also an option. Any insight into why this method is so slow would be greatly appreciated.
void read_file(float *W)
{
FILE *fp;
int i = 0;
// In this file, one row should contain only one NUMBER!!
// So flatten the matrix.
if (fp = fopen("C:\\Users\\rohit\\Documents\\GitHub\\base\\numerical\\c\\ReadFile1\\Debug\\data.txt", "r")) {
while (fscanf(fp, "%f", &W[i]) != EOF) {
++i;
}
fclose(fp);
}
fclose(fp);
scanf("%d",&i);
}
(Comment: This is my second answer.) I see the OP asked in a comment:
Do you happen to have a sample in C for reading the binary floats by any chance?
A binary version would blow any ascii version out-of-the-water. And is shorter.
Here the OP's function signature has been changed to include the maximum number of floats in the return W
, and to return the number actually read from the file.
size_t read_file(float *W, size_t maxlen)
{
FILE *fp = fopen("C:\\Users\\rohit\\Documents\\GitHub\\base\\numerical\\c\\ReadFile1\\Debug\\data.txt", "r");
return fp ? fread(W, sizeof(float), maxlen, fp) : 0;
}
Or for something even faster, you could use mmap
... . But this is not available on Windows.
Added: However, unbuffered I/O is would perhaps be faster. The following function uses a single malloc
and a single unbuffered read
to copy a file to the heap. (NB: not yet tested on large files; may need open64
.)
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
void *readFileToHeap(const char *file, int *len) {
void *retval = 0;
ssize_t cnt;
struct stat st;
const int fd = open(file, O_RDONLY, 0);
if (fd < 0)
return printf("Cannot open %s\n", file), (void *)0;
if (fstat(fd, &st))
return perror("fstat()"), close(fd), (void *)0;
if (!(retval = malloc(st.st_size)))
return perror("malloc()"), close(fd), (void *)0;
cnt = read(fd, retval, st.st_size);
close(fd); // not the best: could clobber errno
if (cnt < 0)
return perror("read()"), free(retval), (void *)0;
if (cnt != st.st_size)
return printf("Partial read %d\n", cnt), free(retval), (void *)0;
*len = cnt;
return retval;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With