Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up creation of a FileStream

My application needs to open a lot of small files, say 1440 files each containing data of 1 minute to read all the data of a certain day. Each file is only a couple of kB big. This is for a GUI application, so I want the user (== me!) to not have to wait too long.

It turns out that opening the files is rather slow. After researching, most time is wasted in creating a FileStream (OpenStream = new FileStream) for each file. Example code :

// stream en reader aanmaken
FileStream OpenStream;
BinaryReader bReader;

foreach (string file in files)
{
    // bestaat de file? dan inlezen en opslaan
    if (System.IO.File.Exists(file))
    {
        long Start = sw.ElapsedMilliseconds;

        // file read only openen, anders kan de applicatie crashen
        OpenStream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);

        Tijden.Add(sw.ElapsedMilliseconds - Start);

        bReader = new BinaryReader(OpenStream);

        // alles in één keer inlezen, werkt goed en snel
        // -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
        blAppend &= Bestanden.Add(file, bReader.ReadBytes((int)OpenStream.Length), blAppend);

        // file sluiten
        bReader.Close();
    }
}

Using the stopwatch timer, I see that most (> 80%) of the time is spent on creating the FileStream for each file. Creating the BinaryReader and actually reading the file (Bestanden.add) takes almost no time.

I'm baffled about this and cannot find a way to speed it up. What can I do to speed up the creation of the FileStream?

update to the question:

  • this happens both on windows 7 and windows 10
  • the files are local (on a SSD disk)
  • there are only the 1440 files in a directory
  • strangely, reading the (same) files again later, creating the FileStreams suddenly cost almost no time at all. Somewhere the OS is remembering the filestreams.
  • even if I close the application and restart it, opening the files "again" also costs almost no time. This makes it pretty hard to find the performance issue. I had to make a lot of copies of directory to recreate the problem over and over.
like image 554
wvl_kszen Avatar asked Jul 09 '17 11:07

wvl_kszen


People also ask

What is buffer size in FileStream?

The FileStream object is given the default buffer size of 8192 bytes. FileStream assumes that it has exclusive control over the handle. Reading, writing, or seeking while a FileStream is also holding a handle could result in data corruption.

What does FileStream close () do?

Closes the current stream and releases any resources (such as sockets and file handles) associated with the current stream.

How does FileStream work C#?

The FileStream is a class used for reading and writing files in C#. It is part of the System.IO namespace. To manipulate files using FileStream, you need to create an object of FileStream class. This object has four parameters; the Name of the File, FileMode, FileAccess, and FileShare.

What is the use of FileStream?

Remarks. Use the FileStream class to read from, write to, open, and close files on a file system, and to manipulate other file-related operating system handles, including pipes, standard input, and standard output.


1 Answers

As you have mentioned in the comment to the question FileStream reads first 4K to buffer by creating the object. You can change the size of this buffer to reflect better size of your data. (Decrease if your files are smaller than the buffer, for example). If you read file sequentially, you can give OS the hint about this through FileOptions. In addition, you can avoid BinaryReader, because you read files entirely.

    // stream en reader aanmaken
    FileStream OpenStream;

    foreach (string file in files)
    {
        // bestaat de file? dan inlezen en opslaan
        if (System.IO.File.Exists(file))
        {
            long Start = sw.ElapsedMilliseconds;

            // file read only openen, anders kan de applicatie crashen
            OpenStream = new FileStream(
                file,
                FileMode.Open,
                FileAccess.Read,
                FileShare.ReadWrite,
                bufferSize: 2048, //2K for example 
                options: FileOptions.SequentialScan);

            Tijden.Add(sw.ElapsedMilliseconds - Start);

            var bufferLenght = (int)OpenStream.Length;
            var buffer = new byte[bufferLenght];
            OpenStream.Read(buffer, 0, bufferLenght);

            // alles in één keer inlezen, werkt goed en snel
            // -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
            blAppend &= Bestanden.Add(file, buffer, blAppend);
        }
    }

I do not know type of Bestanden object. But if this object has methods for reading from array you can also reuse buffer for files.

    //the buffer should be bigger than the biggest file to read
    var bufferLenght = 8192;
    var buffer = new byte[bufferLenght];

    foreach (string file in files)
    {
        //skip 
        ...
        var fileLenght = (int)OpenStream.Length;
        OpenStream.Read(buffer, 0, fileLenght);

        blAppend &= Bestanden.Add(file, /*read bytes from buffer */, blAppend);

I hope it helps.

like image 160
Ivan R. Avatar answered Oct 04 '22 10:10

Ivan R.