Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can open small ASCII file, but not large binary file?

I am using the below code to open a large (5.1GB) binary file in MSVC on Windows. The machine has plenty of RAM. The problem is the length is being retrieved as zero. However, when I change the file_path to a smaller ASCII file the code works fine.

Why can I not load the large binary file? I prefer this approach as I wanted a pointer to the file contents.

FILE * pFile;
uint64_t lSize;
char * buffer;
size_t result;

pFile = fopen(file_path, "rb");
if (pFile == NULL) { 
    fputs("File error", stderr); exit(1); 
}

// obtain file size:
fseek(pFile, 0, SEEK_END);
lSize = ftell(pFile);                                // RETURNS ZERO
rewind(pFile);

// allocate memory to contain the whole file:
buffer = (char*)malloc(sizeof(char)*lSize);
if (buffer == NULL) {
    fputs("Memory error", stderr); exit(2); 
}

// copy the file into the buffer:
result = fread(buffer, 1, lSize, pFile);             // RETURNS ZERO TOO
if (result != lSize) {                               // THIS FAILS
    fputs("Reading error", stderr); exit(3); 
}

/* the whole file is now loaded in the memory buffer. */

its not the file permissions or anything, they are fine.

like image 638
mezamorphic Avatar asked Sep 24 '16 15:09

mezamorphic


Video Answer


2 Answers

If you allocate 5,1 GB, you'd better be sure that you've compiled your code in 64 bits and run it on a 64 bits windows version. Ohterwhise, the memory address space is limited to maxi 3 GB on a 32 bits Windows and 4 GB with 32 bits code on a 64 bits Windows.

By the way, ftell() returns a signed long. You have to check that there is no error here (such as an overflow if the OS allows larger file sizes), so that the value is not -1.

Edit:

Note that with MSVC, long will currently be a 32 bits number even if compiled for 64 bits. This means that ftell() will give you a meaningful result if the filesize if below 2GB (because fo the sign).

You could use non portable OS specific WinAPI function GetFileSizeEx() to get the size of large files in a signed 64 bit number.

malloc() takes a size_t which is an unsigned 64 bit number. So on this side you're safe.

An alternative would be to use file mapping.

Second edit

I looked at your edits about value received for size, which differ of what i expected. I could reproduce the error on my system, and got a size that was not null, but it was a number much much large than the file.

Looking at this CERT security recommendation, it appeared that the guarantees offered by the standard for fseek() in combination with SEEK_END are unsufficient and make this a very unsafe approach.

So let's repeast: the saffest way to get the size would be to use the native OS function i.e. GetFileSizeEx() on Windows. There's a workaround on a 64 bit windows: use _fseeki64() and _ftelli64():

...
if (_fseeki64(pFile, 0, SEEK_END)) {
    fputs("File seek error", stderr); 
    return (1);
}
lSize = _ftelli64(pFile);                            // RETURNS EXACT SIZE
...

This worked very well (the initial problem seemed to be linked with the return type which was not large enough). However keep in mind that it's a workaround, and I fear that there could be other error conditions that could lead to the vulnerability reported by CERT.

like image 76
Christophe Avatar answered Oct 12 '22 13:10

Christophe


The data type long is too small to represent you file size. Use the stat() method (or the Windows-specific alternative GetFileAttributes) to read the file size.

like image 35
RotatingPieces Avatar answered Oct 12 '22 11:10

RotatingPieces