I ran into strange behavior when using the Aztec linear system solver library. Using valgrind, I found out that this library does a <code>memcpy</code> on overlapping buffers. Specification says that behavior of <code>memcpy</code> on overlapping buffers is not defined. It turns out that <code>memcpy</code> on many machines has the same behavior as if you would do it with a for loop and therefore you can safely copy from a higher source to a lower destination: <pre class="prettyprint"><code>for(int i = 0; i < len; i ++) dest[i] = source[i]; </code></pre> BUT on our large cluster, <code>memcpy</code> of overlapping buffers has a different behavior which leads to problems. Now I wonder whether the overlapping <code>memcpy</code> in the library is normal or just caused by another bug in my code. Since the library is widely used I assume that the <code>memcpy</code> issue should have been discovered earlier. On the other hand, it's still possible that the vast majority of the <code>memcpy</code> implementations behave like the for loop and therefore nobody ever encountered this problem. <ul> <li>Can anyone tell me about his experiences with overlapping <code>memcpy</code> on various machines?</li> <li>Which part of my computer system does actually provide <code>memcpy</code>?</li> </ul> I'd like to point out that question is about the practical experience with various implementations, not about what the specification says.

<code>memcpy()</code> doesn't support overlapping memory. This allows for optimizations that won't work if the buffers do overlap. There's not much to really look into, however, because C provides an alternative that does support overlapping memory: <code>memmove()</code>. Its usage is identical to <code>memcpy()</code>. You should use it if the regions might overlap, as it accounts for that possibility.

I've done some research on this in the past... on Linux, up until fairly recently, the implementation of <code>memcpy()</code> worked in a way that was similar enough to <code>memmove()</code> that overlapping memory wasn't an issue, and in my experience, other UNIXs were the same. This doesn't change the fact that this is undefined behavior according to the standard, and you are just lucky that on some platforms it sometimes works -- and <code>memmove()</code> is the standard-supported right answer. However, in 2010, the glibc maintainers rolled out a new, optimized <code>memcpy()</code> that changed the behavior of <code>memcpy()</code> for some Intel core types where the C standard library is compiled to be faster, but no longer works like <code>memmove()</code> [1]. (I seem to recall also that this is new code triggered only for memory segments larger than 80 bytes). Interestingly, this caused things like the Linux version of Adobe's Flash player to break[2], as well as several other open-source packages (back in 2010 when Fedora Linux became the first to adopt the changed <code>memcpy()</code> in glibc). <ul> <li>[1] https://sourceware.org/bugzilla/show_bug.cgi?id=12518 </li> <li>[2] https://bugzilla.redhat.com/show_bug.cgi?id=638477 </li> </ul>

memcpy of overlapping buffers [duplicate]

Tags:

c++

c

memcpy

I ran into strange behavior when using the Aztec linear system solver library. Using valgrind, I found out that this library does a memcpy on overlapping buffers. Specification says that behavior of memcpy on overlapping buffers is not defined.

It turns out that memcpy on many machines has the same behavior as if you would do it with a for loop and therefore you can safely copy from a higher source to a lower destination:

for(int i = 0; i < len; i ++)
  dest[i] = source[i];

BUT on our large cluster, memcpy of overlapping buffers has a different behavior which leads to problems.

Now I wonder whether the overlapping memcpy in the library is normal or just caused by another bug in my code. Since the library is widely used I assume that the memcpy issue should have been discovered earlier. On the other hand, it's still possible that the vast majority of the memcpy implementations behave like the for loop and therefore nobody ever encountered this problem.

Can anyone tell me about his experiences with overlapping memcpy on various machines?
Which part of my computer system does actually provide memcpy?

I'd like to point out that question is about the practical experience with various implementations, not about what the specification says.

654

asked Sep 02 '14 18:09

Michael

Video Answer

2 Answers

memcpy() doesn't support overlapping memory. This allows for optimizations that won't work if the buffers do overlap.

There's not much to really look into, however, because C provides an alternative that does support overlapping memory: memmove(). Its usage is identical to memcpy(). You should use it if the regions might overlap, as it accounts for that possibility.

106

answered Oct 21 '22 00:10

FatalError

I've done some research on this in the past... on Linux, up until fairly recently, the implementation of memcpy() worked in a way that was similar enough to memmove() that overlapping memory wasn't an issue, and in my experience, other UNIXs were the same. This doesn't change the fact that this is undefined behavior according to the standard, and you are just lucky that on some platforms it sometimes works -- and memmove() is the standard-supported right answer.

However, in 2010, the glibc maintainers rolled out a new, optimized memcpy() that changed the behavior of memcpy() for some Intel core types where the C standard library is compiled to be faster, but no longer works like memmove() [1]. (I seem to recall also that this is new code triggered only for memory segments larger than 80 bytes). Interestingly, this caused things like the Linux version of Adobe's Flash player to break[2], as well as several other open-source packages (back in 2010 when Fedora Linux became the first to adopt the changed memcpy() in glibc).

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=12518
[2] https://bugzilla.redhat.com/show_bug.cgi?id=638477

answered Oct 21 '22 01:10

JohnH

Related questions
                            
                                Count number of matches
                            
                                C++ Qt: Check the current State of QStateMachine
                            
                                Why is QObject destroyed signal called AFTER the destruction?
                            
                                How do I know the "default include directories", "default link directories" and "default link libraries" of gcc, g++/c++ in Ubuntu 11.04?
                            
                                QSqlQuery not positioned on a valid record
                            
                                Is there a significant inherent cost of object instantiation in C++?
                            
                                OpenGL ES 2.0 vs OpenGL 3 - Similarities and Differences
                            
                                C++: Optimizing speed with vector/array?
                            
                                OpenGL Shader Compilation Errors: unexpected $undefined at token "<undefined>"
                            
                                C++ stl unordered_map implementation, reference validity
                            
                                Two strings between brackets separated by a comma in C++ [duplicate]
                            
                                unordered_map with custom hashing/equal functions - functions don't get called [duplicate]
                            
                                Vectors and polymorphism in C++
                            
                                Python version of freopen()
                            
                                No display from glDrawElements
                            
                                C++ Variadic Function Templates of Known Type
                            
                                Constructor similar to std::map or std::vector in a class
                            
                                Looser Throw Specifier in C++
                            
                                Why a string which contains '\0' and '\t' can't use operator == to compare with "\0\t"?
                            
                                std::throw_with_nested expects polymorphic type in C++11?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With