I've been running into some issues with writing to a file - namely, not being able to write fast enough. To explain, my goal is to capture a stream of data coming in over gigabit Ethernet and simply save it to a file. The raw data is coming in at a rate of 10MS/s, and it's then saved to a buffer and subsequently written to a file. Below is the relevant section of code: <pre class="prettyprint"><code> std::string path = "Stream/raw.dat"; ofstream outFile(path, ios::out | ios::app| ios::binary); if(outFile.is_open()) cout << "Yes" << endl; while(1) { rxSamples = rxStream->recv(&rxBuffer[0], rxBuffer.size(), metaData); switch(metaData.error_code) { //Irrelevant error checking... //Write data to a file std::copy(begin(rxBuffer), end(rxBuffer), std::ostream_iterator<complex<float>>(outFile)); } } </code></pre> The issue I'm encountering is that it's taking too long to write the samples to a file. After a second or so, the device sending the samples reports its buffer has overflowed. After some quick profiling of the code, nearly all of the execution time is spent on <code>std::copy(...)</code> (99.96% of the time to be exact). If I remove this line, I can run the program for hours without encountering any overflow. That said, I'm rather stumped as to how I can improve the write speed. I've looked through several posts on this site, and it seems like the most common suggestion (in regard to speed) is to implement file writes as I've already done - through the use of <code>std::copy</code>. If it's helpful, I'm running this program on Ubuntu x86_64. Any suggestions would be appreciated.

So the main problem here is that you try to write in the same thread as you receive, which means that your recv() can only be called again after copy is complete. A few observations: <ul> <li>Move the writing to a different thread. This is about a USRP, so GNU Radio might really be the tool of your choice -- it's inherently multithreaded.</li> <li>Your output iterator is probably not the most performant solution. Simply "write()" to a file descriptor might be better, but that's performance measurements that are up to you</li> <li>If your hard drive/file system/OS/CPU aren't up to the rates coming in from the USRP, even if decoupling receiving from writing thread-wise, then there's nothing you can do -- get a faster system.</li> <li>Try writing to a RAM disk instead</li> </ul> In fact, I don't know how you came up with the <code>std::copy</code> approach. The rx_samples_to_file example that comes with UHD does this with a simple write, and you should definitely favor that over copying; file I/O can, on good OSes, often be done with one copy less, and iterating over all elements is probably very slow.

Let's do a bit of math. Your samples are (apparently) of type <code>std::complex<std::float></code>. Given a (typical) 32-bit float, that means each sample is 64 bits. At 10 MS/s, that means the raw data is around 80 megabytes per second--that's within what you can expect to write to a desktop (7200 RPM) hard drive, but getting fairly close to the limit (which is typically around 100-100 megabytes per second or so). Unfortunately, despite the <code>std::ios::binary</code>, you're actually writing the data in text format (because <code>std::ostream_iterator</code> basically does <code>stream << data;</code>). This not only loses some precision, but increases the size of the data, at least as a rule. The exact amount of increase depends on the data--a small integer value can actually decrease the quantity of data, but for arbitrary input, a size increase close to 2:1 is fairly common. With a 2:1 increase, your outgoing data is now around 160 megabytes/second--which is faster than most hard drives can handle. The obvious starting point for an improvement would be to write the data in binary format instead: <pre class="prettyprint"><code>uint32_t nItems = std::end(rxBuffer)-std::begin(rxBuffer); outFile.write((char *)&nItems, sizeof(nItems)); outFile.write((char *)&rxBuffer[0], sizeof(rxBuffer)); </code></pre> For the moment I've used <code>sizeof(rxBuffer)</code> on the assumption that it's a real array. If it's actually a pointer or vector, you'll have to compute the correct size (what you want is the total number of bytes to be written). I'd also note that as it stands right now, your code has an even more serious problem: since it hasn't specified a separator between elements when it writes the data, the data will be written without anything to separate one item from the next. That means if you wrote two values of (for example) <code>1</code> and <code>0.2</code>, what you'd read back in would not be <code>1</code> and <code>0.2</code>, but a single value of <code>10.2</code>. Adding separators to your text output will add yet more overhead (figure around 15% more data) to a process that's already failing because it generates too much data. Writing in binary format means each float will consume precisely 4 bytes, so delimiters are not necessary to read the data back in correctly. The next step after that would be to descend to a lower-level file I/O routine. Depending on the situation, this might or might not make much difference. On Windows, you can specify <code>FILE_FLAG_NO_BUFFERING</code> when you open a file with <code>CreateFile</code>. This means that reads and writes to that file will basically bypass the cache and go directly to the disk. In your case, that's probably a win--at 10 MS/s, you're probably going to use up the cache space quite a while before you reread the same data. In such a case, letting the data go into the cache gains you virtually nothing, but costs you some data to copy data to the cache, then somewhat later copy it out to the disk. Worse, it's likely to pollute the cache with all this data, so it's no longer storing other data that's a lot more likely to benefit from caching.

Improving/optimizing file write speed in C++

Tags:

c++

performance

usrp

software-defined-radio

uhd

I've been running into some issues with writing to a file - namely, not being able to write fast enough.

To explain, my goal is to capture a stream of data coming in over gigabit Ethernet and simply save it to a file.

The raw data is coming in at a rate of 10MS/s, and it's then saved to a buffer and subsequently written to a file.

Below is the relevant section of code:

    std::string path = "Stream/raw.dat";
    ofstream outFile(path, ios::out | ios::app| ios::binary);

    if(outFile.is_open())
        cout << "Yes" << endl;

    while(1)
    {
         rxSamples = rxStream->recv(&rxBuffer[0], rxBuffer.size(), metaData);
         switch(metaData.error_code)
         {

             //Irrelevant error checking...

             //Write data to a file
                std::copy(begin(rxBuffer), end(rxBuffer), std::ostream_iterator<complex<float>>(outFile));
         }
    }

The issue I'm encountering is that it's taking too long to write the samples to a file. After a second or so, the device sending the samples reports its buffer has overflowed. After some quick profiling of the code, nearly all of the execution time is spent on std::copy(...) (99.96% of the time to be exact). If I remove this line, I can run the program for hours without encountering any overflow.

That said, I'm rather stumped as to how I can improve the write speed. I've looked through several posts on this site, and it seems like the most common suggestion (in regard to speed) is to implement file writes as I've already done - through the use of std::copy.

If it's helpful, I'm running this program on Ubuntu x86_64. Any suggestions would be appreciated.

820

asked Aug 05 '15 17:08

Mlagma

2 Answers

So the main problem here is that you try to write in the same thread as you receive, which means that your recv() can only be called again after copy is complete. A few observations:

Move the writing to a different thread. This is about a USRP, so GNU Radio might really be the tool of your choice -- it's inherently multithreaded.
Your output iterator is probably not the most performant solution. Simply "write()" to a file descriptor might be better, but that's performance measurements that are up to you
If your hard drive/file system/OS/CPU aren't up to the rates coming in from the USRP, even if decoupling receiving from writing thread-wise, then there's nothing you can do -- get a faster system.
Try writing to a RAM disk instead

In fact, I don't know how you came up with the std::copy approach. The rx_samples_to_file example that comes with UHD does this with a simple write, and you should definitely favor that over copying; file I/O can, on good OSes, often be done with one copy less, and iterating over all elements is probably very slow.

167

answered Oct 23 '22 22:10

Marcus Müller

Let's do a bit of math.

Your samples are (apparently) of type std::complex<std::float>. Given a (typical) 32-bit float, that means each sample is 64 bits. At 10 MS/s, that means the raw data is around 80 megabytes per second--that's within what you can expect to write to a desktop (7200 RPM) hard drive, but getting fairly close to the limit (which is typically around 100-100 megabytes per second or so).

Unfortunately, despite the std::ios::binary, you're actually writing the data in text format (because std::ostream_iterator basically does stream << data;).

This not only loses some precision, but increases the size of the data, at least as a rule. The exact amount of increase depends on the data--a small integer value can actually decrease the quantity of data, but for arbitrary input, a size increase close to 2:1 is fairly common. With a 2:1 increase, your outgoing data is now around 160 megabytes/second--which is faster than most hard drives can handle.

The obvious starting point for an improvement would be to write the data in binary format instead:

uint32_t nItems = std::end(rxBuffer)-std::begin(rxBuffer);
outFile.write((char *)&nItems, sizeof(nItems));
outFile.write((char *)&rxBuffer[0], sizeof(rxBuffer));

For the moment I've used sizeof(rxBuffer) on the assumption that it's a real array. If it's actually a pointer or vector, you'll have to compute the correct size (what you want is the total number of bytes to be written).

I'd also note that as it stands right now, your code has an even more serious problem: since it hasn't specified a separator between elements when it writes the data, the data will be written without anything to separate one item from the next. That means if you wrote two values of (for example) 1 and 0.2, what you'd read back in would not be 1 and 0.2, but a single value of 10.2. Adding separators to your text output will add yet more overhead (figure around 15% more data) to a process that's already failing because it generates too much data.

Writing in binary format means each float will consume precisely 4 bytes, so delimiters are not necessary to read the data back in correctly.

The next step after that would be to descend to a lower-level file I/O routine. Depending on the situation, this might or might not make much difference. On Windows, you can specify FILE_FLAG_NO_BUFFERING when you open a file with CreateFile. This means that reads and writes to that file will basically bypass the cache and go directly to the disk.

In your case, that's probably a win--at 10 MS/s, you're probably going to use up the cache space quite a while before you reread the same data. In such a case, letting the data go into the cache gains you virtually nothing, but costs you some data to copy data to the cache, then somewhat later copy it out to the disk. Worse, it's likely to pollute the cache with all this data, so it's no longer storing other data that's a lot more likely to benefit from caching.

answered Oct 23 '22 21:10

Jerry Coffin

Related questions
                            
                                Why does opencv's HOG descriptor return so many values
                            
                                disable c++11 features in vs2013
                            
                                How to escape special commands in Doxygen inline code
                            
                                What are some use cases for memory_order_relaxed
                            
                                variable-sized object may not be initialized c++
                            
                                Installing lxml for Python 3.4 on Windows x 86 (32 bit) with Visual Studio C++ 2010 Express
                            
                                Get subnet mask from GetAdapterAddresses()
                            
                                opencv imread() on Windows for non-ASCII file names
                            
                                Logical !! versus nothing
                            
                                Google Test: error LNK2019: unresolved external symbol with Visual Studio 2013
                            
                                C++ pure virtual multiple inheritance?
                            
                                Square Brackets in Vectors
                            
                                Passing vector iterator to a function c++
                            
                                Run tests in specific order
                            
                                std::is_constructible on type with non-public destructor
                            
                                Why does incomplete type of smart pointer data member and raw pointer data member have different behavior when their parent destruct?
                            
                                The meaning of "EiC"
                            
                                How to determine if a boost::variant variable is empty?
                            
                                Adding Elements to std::vector of an abstract class
                            
                                Why variadic template constructor matches better than copy constructor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With