Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping large files using MapViewOfFile

I have a very large file and I need to read it in small pieces and then process each piece. I'm using MapViewOfFile function to map a piece in memory, but after reading first part I can't read the second. It throws when I'm trying to map it.

    char *tmp_buffer = new char[bufferSize];
    LPCWSTR input = L"input";   
    OFSTRUCT tOfStr;
    tOfStr.cBytes = sizeof tOfStr;

    HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ); 
    HANDLE fileMap = CreateFileMapping(inputFile, NULL, PAGE_READONLY, 0, 0, input);

    while (offset < fileSize)
    {
        long k = 0;
        bool cutted = false;
        offset -= tempBufferSize;

        if (fileSize - offset <= bufferSize)
        {
            bufferSize = fileSize - offset;
        }

        char *buffer = new char[bufferSize + tempBufferSize];

        for(int i = 0; i < tempBufferSize; i++)
        {
            buffer[i] = tempBuffer[i];
        }

        char *tmp_buffer = new char[bufferSize];
        LPCWSTR input = L"input";
        HANDLE inputFile;
        OFSTRUCT tOfStr;
        tOfStr.cBytes = sizeof tOfStr;

        long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
        long long offsetLow = (offset & 0xFFFFFFFF);

        tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);

        memcpy(&buffer[tempBufferSize], &tmp_buffer[0], bufferSize);

        UnmapViewOfFile(tmp_buffer);

        offset += bufferSize;
        offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
        offsetLow = (offset & 0xFFFFFFFF);

        if (offset < fileSize)
        {
            char *next;
            next = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, 1);

            if (next[0] >= '0' && next[0] <= '9')
            {
                cutted = true;
            }

            UnmapViewOfFile(next);
        }

        ostringstream path_stream;
        path_stream << tempPath << splitNum;

        ProcessChunk(buffer, path_stream.str(), cutted, bufferSize);

        delete buffer;

        cout << (splitNum + 1) << " file(s) sorted" << endl;
        splitNum++;
    }
like image 857
Alexander Taraymovich Avatar asked Mar 27 '12 12:03

Alexander Taraymovich


1 Answers

One possibility is that you're not using an offset that's a multiple of the allocation granularity. From MSDN:

The combination of the high and low offsets must specify an offset within the file mapping. They must also match the memory allocation granularity of the system. That is, the offset must be a multiple of the allocation granularity. To obtain the memory allocation granularity of the system, use the GetSystemInfo function, which fills in the members of a SYSTEM_INFO structure.

If you try to map at something other than a multiple of the allocation granularity, the mapping will fail and GetLastError will return ERROR_MAPPED_ALIGNMENT.

Other than that, there are many problems in the code sample that make it very difficult to see what you're trying to do and where it's going wrong. At a minimum, you need to solve the memory leaks. You seem to be allocating and then leaking completely unnecessary buffers. Giving them better names can make it clear what they are actually used for.

Then I suggest putting a breakpoint on the calls to MapViewOfFile, and then checking all of the parameter values you're passing in to make sure they look right. As a start, on the second call, you'd expect offsetHigh to be 0 and offsetLow to be bufferSize.

A few suspicious things off the bat:

HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ); 

Every cast should make you suspicious. Sometimes they are necessary, but make sure you understand why. At this point you should ask yourself why every other file API you're using requires a HANDLE and this function returns an HFILE. If you check OpenFile documentation, you'll see, "This function has limited capabilities and is not recommended. For new application development, use the CreateFile function." I know that sounds confusing because you want to open an existing file, but CreateFile can do exactly that, and it returns the right type.

long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);

What type is offset? You probably want to make sure it's an unsigned long long or equivalent. When bitshifting, especially to the right, you almost always want an unsigned type to avoid sign-extension. You also have to make sure that it's a type that has more bits than the amount you're shifting by--shifting a 32-bit value by 32 (or more) bits is actually undefined in C and C++, which allows the compilers to do certain types of optimizations.

long long offsetLow = (offset & 0xFFFFFFFF);

In both of these statements, you have to be careful about the 0xFFFFFFFF value. Since you didn't cast it or give it a suffix, it can be hard to predict whether the compiler will treat it as an int or unsigned int. In this case, it'll be an unsigned int, but that won't be obvious to many people. In fact, I got this wrong when I first wrote this answer. [This paragraph corrected 16-MAY-2017] With bitwise operations, you almost always want to make sure you're using unsigned values.

tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);

You're casting offsetHigh and offsetLow to ints, which are signed values. The API actually wants DWORDs, which are unsigned values. Rather than casting in the call, I would declare offsetHigh and offsetLow as DWORDs and do the casting in the initialization, like this:

DWORD offsetHigh = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD offsetLow  = static_cast<DWORD>( offset        & 0xFFFFFFFFul);
tmp_buffer = reinterpret_cast<const char *>(MapViewOfFile(fileMap, FILE_MAP_READ, offsetHigh, offsetLow, bufferSize));

Those fixes may or may not resolve your problem. It's hard to tell what's going on from the incomplete code sample.

Here's a working sample you can compare to:

// Calls ProcessChunk with each chunk of the file.
void ReadInChunks(const WCHAR *pszFileName) {
  // Offsets must be a multiple of the system's allocation granularity.  We
  // guarantee this by making our view size equal to the allocation granularity.
  SYSTEM_INFO sysinfo = {0};
  ::GetSystemInfo(&sysinfo);
  DWORD cbView = sysinfo.dwAllocationGranularity;

  HANDLE hfile = ::CreateFileW(pszFileName, GENERIC_READ, FILE_SHARE_READ,
                               NULL, OPEN_EXISTING, 0, NULL);
  if (hfile != INVALID_HANDLE_VALUE) {
    LARGE_INTEGER file_size = {0};
    ::GetFileSizeEx(hfile, &file_size);
    const unsigned long long cbFile =
        static_cast<unsigned long long>(file_size.QuadPart);

    HANDLE hmap = ::CreateFileMappingW(hfile, NULL, PAGE_READONLY, 0, 0, NULL);
    if (hmap != NULL) {
      for (unsigned long long offset = 0; offset < cbFile; offset += cbView) {
        DWORD high = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
        DWORD low  = static_cast<DWORD>( offset        & 0xFFFFFFFFul);
        // The last view may be shorter.
        if (offset + cbView > cbFile) {
          cbView = static_cast<int>(cbFile - offset);
        }
        const char *pView = static_cast<const char *>(
            ::MapViewOfFile(hmap, FILE_MAP_READ, high, low, cbView));
        if (pView != NULL) {
          ProcessChunk(pView, cbView);
        }
      }
      ::CloseHandle(hmap);
    }
    ::CloseHandle(hfile);
  }
}
like image 152
Adrian McCarthy Avatar answered Oct 09 '22 04:10

Adrian McCarthy