Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing multiple files slows down after x seconds

Tags:

c++

I have code which gets frames from a camera and then saves it to disk. The structure of the code is: multiple threads malloc and copy their frames into new memory, enqueue memory. Finally, another thread removes frames from queue and writes them (using ffmpeg API, raw video no compression) to their files (actually I'm using my own memory pool so malloc is only called when I need more buffers). I can have upto 8 files/cams open at the same time enqueing.

The problem is that for the first 45 sec everything works fine: there's never more than one frame on queue. But after that my queue gets backed up, processing takes just a few ms longer resulting in increased ram usage because I cannot save the frames fast enough so I have to malloc more memory to store them.

I have a 8 core, 16GB RAM Windows 7 64 bit computer (NTFS, lots of free space in second disk drive). The disk is supposed to be able to write upto 6Gbits/sec. To save my data in time I need to be able to write at 50 MB/sec. I tested disk speed with "PassMark PerformanceTest" and I had 8 threads writing files simultaneously exactly like ffmpeg saves files (synchronized, uncached I/O) and it was able to achieve 100MB/sec. So why isn't my writes able to achieve that?

Here's how the ffmpeg writes look from Process monitor logs:

 
Time of Day     Operation   File#   Result  Detail
2:30:32.8759350 PM  WriteFile   8   SUCCESS Offset: 749,535,120, Length: 32,768
2:30:32.8759539 PM  WriteFile   8   SUCCESS Offset: 749,567,888, Length: 32,768
2:30:32.8759749 PM  WriteFile   8   SUCCESS Offset: 749,600,656, Length: 32,768
2:30:32.8759939 PM  WriteFile   8   SUCCESS Offset: 749,633,424, Length: 32,768
2:30:32.8760314 PM  WriteFile   8   SUCCESS Offset: 749,666,192, Length: 32,768
2:30:32.8760557 PM  WriteFile   8   SUCCESS Offset: 749,698,960, Length: 32,768
2:30:32.8760866 PM  WriteFile   8   SUCCESS Offset: 749,731,728, Length: 32,768
2:30:32.8761259 PM  WriteFile   8   SUCCESS Offset: 749,764,496, Length: 32,768
2:30:32.8761452 PM  WriteFile   8   SUCCESS Offset: 749,797,264, Length: 32,768
2:30:32.8761629 PM  WriteFile   8   SUCCESS Offset: 749,830,032, Length: 32,768
2:30:32.8761803 PM  WriteFile   8   SUCCESS Offset: 749,862,800, Length: 32,768
2:30:32.8761977 PM  WriteFile   8   SUCCESS Offset: 749,895,568, Length: 32,768
2:30:32.8762235 PM  WriteFile   8   SUCCESS Offset: 749,928,336, Length: 32,768, Priority: Normal
2:30:32.8762973 PM  WriteFile   8   SUCCESS Offset: 749,961,104, Length: 32,768
2:30:32.8763160 PM  WriteFile   8   SUCCESS Offset: 749,993,872, Length: 32,768
2:30:32.8763352 PM  WriteFile   8   SUCCESS Offset: 750,026,640, Length: 32,768
2:30:32.8763502 PM  WriteFile   8   SUCCESS Offset: 750,059,408, Length: 32,768
2:30:32.8763649 PM  WriteFile   8   SUCCESS Offset: 750,092,176, Length: 32,768
2:30:32.8763790 PM  WriteFile   8   SUCCESS Offset: 750,124,944, Length: 32,768
2:30:32.8763955 PM  WriteFile   8   SUCCESS Offset: 750,157,712, Length: 32,768
2:30:32.8764072 PM  WriteFile   8   SUCCESS Offset: 750,190,480, Length: 4,104
2:30:32.8848241 PM  WriteFile   4   SUCCESS Offset: 750,194,584, Length: 32,768
2:30:32.8848481 PM  WriteFile   4   SUCCESS Offset: 750,227,352, Length: 32,768
2:30:32.8848749 PM  ReadFile    4   END OF FILE Offset: 750,256,128, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
2:30:32.8848989 PM  WriteFile   4   SUCCESS Offset: 750,260,120, Length: 32,768
2:30:32.8849157 PM  WriteFile   4   SUCCESS Offset: 750,292,888, Length: 32,768
2:30:32.8849319 PM  WriteFile   4   SUCCESS Offset: 750,325,656, Length: 32,768
2:30:32.8849475 PM  WriteFile   4   SUCCESS Offset: 750,358,424, Length: 32,768
2:30:32.8849637 PM  WriteFile   4   SUCCESS Offset: 750,391,192, Length: 32,768
2:30:32.8849880 PM  WriteFile   4   SUCCESS Offset: 750,423,960, Length: 32,768, Priority: Normal
2:30:32.8850400 PM  WriteFile   4   SUCCESS Offset: 750,456,728, Length: 32,768
2:30:32.8850727 PM  WriteFile   4   SUCCESS Offset: 750,489,496, Length: 32,768, Priority: Normal

This looks very efficient, however, from DiskMon the actual disk writes look ridiculously fragmented back and forth writing which may account for this slow speed. See the graph for the write speed according to this data (~5MB/s).

 TIme  Write duration  Sector      Length  MB/sec
95.6    0.00208855  1490439632  896 0.409131784
95.6    0.00208855  1488197000  128 0.058447398
95.6    0.00009537  1482323640  128 1.279965529
95.6    0.00009537  1482336312  768 7.679793174
95.6    0.00009537  1482343992  384 3.839896587
95.6    0.00009537  1482350648  768 7.679793174
95.6    0.00039101  1489278984  1152    2.809730729
95.6    0.00039101  1489393672  896 2.185346123
95.6    0.0001812   1482349368  256 1.347354443
95.6    0.0001812   1482358328  896 4.715740549
95.6    0.0001812   1482370616  640 3.368386107
95.6    0.0001812   1482378040  256 1.347354443
95.6    0.00208855  1488197128  384 0.175342193
95.6    0.00208855  1488202512  640 0.292236989
95.6    0.00208855  1488210320  1024    0.467579182
95.6    0.00009537  1482351416  256 2.559931058
95.6    0.00009537  1482360120  896 8.959758703
95.6    0.00009537  1482371896  640 6.399827645
95.6    0.00009537  1482380088  256 2.559931058
95.7    0.00039101  1489394568  1152    2.809730729
95.7    0.00039101  1489396744  352 0.858528834
95.7    0.00039101  1489507944  544 1.326817289
95.7    0.0001812   1482378296  768 4.042063328
95.7    0.0001812   1482392120  768 4.042063328
95.7    0.0001812   1482400568  512 2.694708885
95.7    0.00208855  1488224144  768 0.350684386
95.7    0.00208855  1488232208  384 0.175342193

enter image description here

I'm pretty confident it's not my code, because I timed everything and for example enqueing takes a few us suggesting that threads don't get stuck waiting for each other. It must be the disk writes. So the question is how can I improve my disk writes and what can I do to profile actual disk writes (remember that I rely on FFmpeg dlls to save so I cannot access the low level writing functions directly). If I cannot figure it out, I'll dump all the frames in a single sequential binary file (which should increase I/O speed) and then split it into video files post processing.

I don't know how much my disk I/O is caching (CacheSet only shows disk C cache size), but the following image from the performance monitor taken at 0 and 45 sec into the video (just before my queue starts piling up) looks weird to me. Basically, the modified set and standby set grew from very little to this large value. Is that the data being cached? Is it possible that only at 45 sec data is starting to be written to disk so suddenly everything slows down? enter image description hereenter image description here (FYI, LabVIEW is the program that loads my dll.)

I'll appreciate any help.
M.

like image 210
Matt Avatar asked Aug 24 '11 06:08

Matt


2 Answers

With CreateFile it looks like you want one or both of these parameters:

  • FILE_FLAG_NO_BUFFERING
  • FILE_FLAG_WRITE_THROUGH

http://msdn.microsoft.com/en-us/library/cc644950(v=vs.85).aspx

Your delayed performance hit occurs when the OS starts pushing data to the disk.

6Gb/s is the performance capability of the SATA 2 bus not the actual devices connected or the physical platters or flash ram underneath.

A common problem with AV systems is constantly writing a high stream of data can get periodically interrupted by disk overhead tasks. There used to be special AV disks you can purchase that don't do this, these days you can purchase disks with special high throughput performance firmware explicitly for security video recording.

http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=210671&NewLang=en

like image 104
Steve-o Avatar answered Sep 20 '22 19:09

Steve-o


The problem is with repeated malloc and free which puts a load on the system. I suggest creating a buffer pools, i.e allocate N buffers in the initialization stage and reuse them instead of mallocing and freeing the memory. Since you have mentioned ffmpeg, to give an example from multimedia, In gstreamer, buffer management occurs in the form of buffer-pools and in a gstreamer pipeline buffers are usually taken and passed around from buffer pools. Most multimedia systems do this.

Regarding:

The problem is that for the first 45 sec everything works fine: there's never more than one frame on queue. But after that my queue gets backed up, processing takes just a few ms longer resulting in increased ram usage because I cannot save the frames fast enough so I have to malloc more memory to store them.

The application is trashing at this point. Calling malloc at this point will make matters even worse. I suggest implementing a producer-consumer model, where one of them gets waits depending on the case. In your case, set up a threshold of N buffers. If there are N buffers in the queue, new frames from camera are not enqueued till the existing buffers are processed.

Another idea, Instead of writing raw frames why not write encoded data? assuming you a want video, you can at least write a elementary H264 stream (and ffmpeg comes with a good H264 encoder!) or even better if you have access to a Mpeg-4 muxer, as a mp4 file? This will reduce the memory requirements and the IO load dramatically.

like image 34
vine'th Avatar answered Sep 22 '22 19:09

vine'th