Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to accelerate C++ writing speed to the speed tested by CrystalDiskMark?

Tags:

Now I get about 3.6GB data per second in memory, and I need to write them on my SSD continuously. I used CrystalDiskMark to test the writing speed of my SSD, it is almost 6GB per second, so I had thought this work should not be that hard.

![my SSD test result][1]:

[1]https://plus.google.com/u/0/photos/photo/106876803948041178149/6649598887699308850?authkey=CNbb5KjF8-jxJQ "test result":

My computer is Windows 10, using Visual Studio 2017 community.

I found this question and tried the highest voted answer. Unfortunately, the writing speed was only about 1s/GB for his option_2, far slower than tested by CrystalDiskMark. And then I tried memory mapping, this time writing becomes faster, about 630ms/GB, but still much slower. Then I tried multi-thread memory mapping, it seems that when the number of threads is 4, the speed was about 350ms/GB, and when I add the threads' number, the writing speed didn't go up anymore.

Code for memory mapping:

#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <thread>
#include <windows.h>
#include <sstream>


// Generate random data
std::vector<int> GenerateData(std::size_t bytes) {
    assert(bytes % sizeof(int) == 0);
    std::vector<int> data(bytes / sizeof(int));
    std::iota(data.begin(), data.end(), 0);
    std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
    return data;
}

// Memory mapping
int map_write(int* data, int size, int id){
    char* name = (char*)malloc(100);
    sprintf_s(name, 100, "D:\\data_%d.bin",id);
    HANDLE hFile = CreateFile(name, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);//
    if (hFile == INVALID_HANDLE_VALUE){
        return -1;
    }

    Sleep(0);

    DWORD dwFileSize = size;

    char* rname = (char*)malloc(100);
    sprintf_s(rname, 100, "data_%d.bin", id);

    HANDLE hFileMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, dwFileSize, rname);//create file  
    if (hFileMap == NULL) {
        CloseHandle(hFile);
        return -2;
    }

    PVOID pvFile = MapViewOfFile(hFileMap, FILE_MAP_WRITE, 0, 0, 0);//Acquire the address of file on disk
    if (pvFile == NULL) {
        CloseHandle(hFileMap);
        CloseHandle(hFile);
        return -3;
}

    PSTR pchAnsi = (PSTR)pvFile;
    memcpy(pchAnsi, data, dwFileSize);//memery copy 

    UnmapViewOfFile(pvFile);

    CloseHandle(hFileMap);
    CloseHandle(hFile);

    return 0;
}

// Multi-thread memory mapping
void Mem2SSD_write(int* data, int size){
    int part = size / sizeof(int) / 4;

    int index[4];

    index[0] = 0;
    index[1] = part;
    index[2] = part * 2;
    index[3] = part * 3;

    std::thread ta(map_write, data + index[0], size / 4, 10);
    std::thread tb(map_write, data + index[1], size / 4, 11);
    std::thread tc(map_write, data + index[2], size / 4, 12);
    std::thread td(map_write, data + index[3], size / 4, 13);

    ta.join();
    tb.join();
    tc.join();
    td.join();
 }

//Test:
int main() {
    const std::size_t kB = 1024;
    const std::size_t MB = 1024 * kB;
    const std::size_t GB = 1024 * MB;

    for (int i = 0; i < 10; ++i) {
        std::vector<int> data = GenerateData(1 * GB);
        auto startTime = std::chrono::high_resolution_clock::now();
        Mem2SSD_write(&data[0], 1 * GB);
        auto endTime = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
        std::cout << "1G writing cost: " << duration << " ms" << std::endl;
    }

    system("pause");
    return 0;
}

So I'd like to ask, is there any faster writing method for C++ to writing huge files? Or, why can't I write as fast as tested by CrystalDiskMark? How does CrystalDiskMark write?

Any help would be greatly appreciated. Thank you!

like image 466
Chuang Men Avatar asked Jan 23 '19 08:01

Chuang Men


1 Answers

first of all this is not c++ question but os related question. for get maximum performance need need use os specific low level api call, which not exist in general c++ libs. from your code clear visible that you use windows api, so search solution for windows how minimum.

from CreateFileW function:

When FILE_FLAG_NO_BUFFERING is combined with FILE_FLAG_OVERLAPPED, the flags give maximum asynchronous performance, because the I/O does not rely on the synchronous operations of the memory manager.

so we need use combination of this 2 flags in call CreateFileW or FILE_NO_INTERMEDIATE_BUFFERING in call NtCreateFile

also extend file size and valid data length take some time, so better if final file at begin is known - just set file final size via NtSetInformationFile with FileEndOfFileInformation or via SetFileInformationByHandle with FileEndOfFileInfo. and then set valid data length with SetFileValidData or via NtSetInformationFile with FileValidDataLengthInformation. set valid data length require SE_MANAGE_VOLUME_NAME privilege enabled when opening a file initially (but not when call SetFileValidData)

also look for file compression - if file compressed (it will be compressed by default if created in compressed folder) this is very slow writting. so need disbale file compression via FSCTL_SET_COMPRESSION

then when we use asynchronous I/O (fastest way) we not need create several dedicated threads. instead we need determine number of I/O requests run in concurrent. if you use CrystalDiskMark it actually run CdmResource\diskspd\diskspd64.exe for test and this is coresponded to it -o<count> parameter (run diskspd64.exe /? > h.txt for look parameters list).

use non Buffering I/O make task more hard, because exist 3 additional requirements:

  1. Any ByteOffset passed to WriteFile must be a multiple of the sector size.
  2. The Length passed to WriteFile must be an integral of the sector size
  3. Buffers must be aligned in accordance with the alignment requirement of the underlying device. To obtain this information, call NtQueryInformationFile with FileAlignmentInformation or GetFileInformationByHandleEx with FileAlignmentInfo

in most situations, page-aligned memory will also be sector-aligned, because the case where the sector size is larger than the page size is rare.

so almost always buffers allocated with VirtualAlloc function and multiple page size (4,096 bytes ) is ok. in concrete test for smaller code size i use this assumption

struct WriteTest 
{
    enum { opCompression, opWrite };

    struct REQUEST : IO_STATUS_BLOCK 
    {
        WriteTest* pTest;
        ULONG opcode;
        ULONG offset;
    };

    LONGLONG _TotalSize, _BytesLeft;
    HANDLE _hFile;
    ULONG64 _StartTime;
    void* _pData;
    REQUEST* _pRequests;
    ULONG _BlockSize;
    ULONG _ConcurrentRequestCount;
    ULONG _dwThreadId;
    LONG _dwRefCount;

    WriteTest(ULONG BlockSize, ULONG ConcurrentRequestCount) 
    {
        if (BlockSize & (BlockSize - 1))
        {
            __debugbreak();
        }
        _BlockSize = BlockSize, _ConcurrentRequestCount = ConcurrentRequestCount;
        _dwRefCount = 1, _hFile = 0, _pRequests = 0, _pData = 0;
        _dwThreadId = GetCurrentThreadId();
    }

    ~WriteTest()
    {
        if (_pData)
        {
            VirtualFree(_pData, 0, MEM_RELEASE);
        }

        if (_pRequests)
        {
            delete [] _pRequests;
        }

        if (_hFile)
        {
            NtClose(_hFile);
        }

        PostThreadMessageW(_dwThreadId, WM_QUIT, 0, 0);
    }

    void Release()
    {
        if (!InterlockedDecrement(&_dwRefCount))
        {
            delete this;
        }
    }

    void AddRef()
    {
        InterlockedIncrementNoFence(&_dwRefCount);
    }

    void StartWrite()
    {
        IO_STATUS_BLOCK iosb;

        FILE_VALID_DATA_LENGTH_INFORMATION fvdl;
        fvdl.ValidDataLength.QuadPart = _TotalSize;
        NTSTATUS status;

        if (0 > (status = NtSetInformationFile(_hFile, &iosb, &_TotalSize, sizeof(_TotalSize), FileEndOfFileInformation)) ||
            0 > (status = NtSetInformationFile(_hFile, &iosb, &fvdl, sizeof(fvdl), FileValidDataLengthInformation)))
        {
            DbgPrint("FileValidDataLength=%x\n", status);
        }

        ULONG offset = 0;
        ULONG dwNumberOfBytesTransfered = _BlockSize;

        _BytesLeft = _TotalSize + dwNumberOfBytesTransfered;

        ULONG ConcurrentRequestCount = _ConcurrentRequestCount;

        REQUEST* irp = _pRequests;

        _StartTime = GetTickCount64();

        do 
        {
            irp->opcode = opWrite;
            irp->pTest = this;
            irp->offset = offset;
            offset += dwNumberOfBytesTransfered;
            DoWrite(irp++);
        } while (--ConcurrentRequestCount);
    }

    void FillBuffer(PULONGLONG pu, LONGLONG ByteOffset)
    {
        ULONG n = _BlockSize / sizeof(ULONGLONG);
        do 
        {
            *pu++ = ByteOffset, ByteOffset += sizeof(ULONGLONG);
        } while (--n);
    }

    void DoWrite(REQUEST* irp)
    {
        LONG BlockSize = _BlockSize;

        LONGLONG BytesLeft = InterlockedExchangeAddNoFence64(&_BytesLeft, -BlockSize) - BlockSize;

        if (0 < BytesLeft)
        {
            LARGE_INTEGER ByteOffset;
            ByteOffset.QuadPart = _TotalSize - BytesLeft;

            PVOID Buffer = RtlOffsetToPointer(_pData, irp->offset);

            FillBuffer((PULONGLONG)Buffer, ByteOffset.QuadPart);

            AddRef();

            NTSTATUS status = NtWriteFile(_hFile, 0, 0, irp, irp, Buffer, BlockSize, &ByteOffset, 0);

            if (0 > status)
            {
                OnComplete(status, 0, irp);
            }
        }
        else if (!BytesLeft)
        {
            // write end
            ULONG64 time = GetTickCount64() - _StartTime;

            WCHAR sz[64];
            StrFormatByteSizeW((_TotalSize * 1000) / time, sz, RTL_NUMBER_OF(sz));
            DbgPrint("end:%S\n", sz);
        }
    }

    static VOID NTAPI _OnComplete(
        _In_    NTSTATUS status,
        _In_    ULONG_PTR dwNumberOfBytesTransfered,
        _Inout_ PVOID Ctx
        )
    {
        reinterpret_cast<REQUEST*>(Ctx)->pTest->OnComplete(status, dwNumberOfBytesTransfered, reinterpret_cast<REQUEST*>(Ctx));
    }

    VOID OnComplete(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered, REQUEST* irp)
    {
        if (0 > status)
        {
            DbgPrint("OnComplete[%x]: %x\n", irp->opcode, status);
        }
        else 
        switch (irp->opcode)
        {
        default:
            __debugbreak();

        case opCompression:
            StartWrite();
            break;

        case opWrite:
            if (dwNumberOfBytesTransfered == _BlockSize)
            {
                DoWrite(irp);
            }
            else
            {
                DbgPrint(":%I64x != %x\n", dwNumberOfBytesTransfered, _BlockSize);
            }
        }

        Release();
    }

    NTSTATUS Create(POBJECT_ATTRIBUTES poa, ULONGLONG size)
    {
        if (!(_pRequests = new REQUEST[_ConcurrentRequestCount]) ||
            !(_pData = VirtualAlloc(0, _BlockSize * _ConcurrentRequestCount, MEM_COMMIT, PAGE_READWRITE)))
        {
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        ULONGLONG sws = _BlockSize - 1;
        LARGE_INTEGER as;

        _TotalSize = as.QuadPart = (size + sws) & ~sws;

        HANDLE hFile;
        IO_STATUS_BLOCK iosb;

        NTSTATUS status = NtCreateFile(&hFile,
            DELETE|FILE_GENERIC_READ|FILE_GENERIC_WRITE&~FILE_APPEND_DATA,
            poa, &iosb, &as, 0, 0, FILE_OVERWRITE_IF, 
            FILE_NON_DIRECTORY_FILE|FILE_NO_INTERMEDIATE_BUFFERING, 0, 0);

        if (0 > status)
        {
            return status;
        }

        _hFile = hFile;

        if (0 > (status = RtlSetIoCompletionCallback(hFile, _OnComplete, 0)))
        {
            return status;
        }

        static USHORT cmp = COMPRESSION_FORMAT_NONE;

        REQUEST* irp = _pRequests;

        irp->pTest = this;
        irp->opcode = opCompression;

        AddRef();
        status = NtFsControlFile(hFile, 0, 0, irp, irp, FSCTL_SET_COMPRESSION, &cmp, sizeof(cmp), 0, 0);

        if (0 > status)
        {
            OnComplete(status, 0, irp);
        }

        return status;
    }
};

void WriteSpeed(POBJECT_ATTRIBUTES poa, ULONGLONG size, ULONG BlockSize, ULONG ConcurrentRequestCount)
{
    BOOLEAN b;
    NTSTATUS status = RtlAdjustPrivilege(SE_MANAGE_VOLUME_PRIVILEGE, TRUE, FALSE, &b);

    if (0 <= status)
    {
        status = STATUS_INSUFFICIENT_RESOURCES;

        if (WriteTest * pTest = new WriteTest(BlockSize, ConcurrentRequestCount))
        {
            status = pTest->Create(poa, size);

            pTest->Release();

            if (0 <= status)
            {
                MessageBoxW(0, 0, L"Test...", MB_OK|MB_ICONINFORMATION);
            }
        }
    }
}
like image 157
RbMm Avatar answered Jan 04 '23 18:01

RbMm