Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use VC++ intrinsic functions w/o run-time library

I'm involved in one of those challenges where you try to produce the smallest possible binary, so I'm building my program without the C or C++ run-time libraries (RTL). I don't link to the DLL version or the static version. I don't even #include the header files. I have this working fine.

Some RTL functions, like memset(), can be useful, so I tried adding my own implementation. It works fine in Debug builds (even for those places where the compiler generates an implicit call to memset()). But in Release builds, I get an error saying that I cannot define an intrinsic function. You see, in Release builds, intrinsic functions are enabled, and memset() is an intrinsic.

I would love to use the intrinsic for memset() in my release builds, since it's probably inlined and smaller and faster than my implementation. But I seem to be a in catch-22. If I don't define memset(), the linker complains that it's undefined. If I do define it, the compiler complains that I cannot define an intrinsic function.

Does anyone know the right combination of definition, declaration, #pragma, and compiler and linker flags to get an intrinsic function without pulling in RTL overhead?

Visual Studio 2008, x86, Windows XP+.

To make the problem a little more concrete:

extern "C" void * __cdecl memset(void *, int, size_t);  #ifdef IMPLEMENT_MEMSET void * __cdecl memset(void *pTarget, int value, size_t cbTarget) {     char *p = reinterpret_cast<char *>(pTarget);     while (cbTarget > 0) {         *p++ = static_cast<char>(value);         --cbTarget;     }     return pTarget; } #endif  struct MyStruct {     int foo[10];     int bar; };  int main() {     MyStruct blah;     memset(&blah, 0, sizeof(blah));     return blah.bar; } 

And I build like this:

cl /c /W4 /WX /GL /Ob2 /Oi /Oy /Gs- /GF /Gy intrinsic.cpp link /SUBSYSTEM:CONSOLE /LTCG /DEBUG /NODEFAULTLIB /ENTRY:main intrinsic.obj 

If I compile with my implementation of memset(), I get a compiler error:

error C2169: 'memset' : intrinsic function, cannot be defined 

If I compile this without my implementation of memset(), I get a linker error:

error LNK2001: unresolved external symbol _memset 
like image 458
Adrian McCarthy Avatar asked May 30 '10 14:05

Adrian McCarthy


2 Answers

I think I finally found a solution:

First, in a header file, declare memset() with a pragma, like so:

extern "C" void * __cdecl memset(void *, int, size_t); #pragma intrinsic(memset) 

That allows your code to call memset(). In most cases, the compiler will inline the intrinsic version.

Second, in a separate implementation file, provide an implementation. The trick to preventing the compiler from complaining about re-defining an intrinsic function is to use another pragma first. Like this:

#pragma function(memset) void * __cdecl memset(void *pTarget, int value, size_t cbTarget) {     unsigned char *p = static_cast<unsigned char *>(pTarget);     while (cbTarget-- > 0) {         *p++ = static_cast<unsigned char>(value);     }     return pTarget; } 

This provides an implementation for those cases where the optimizer decides not to use the intrinsic version.

The outstanding drawback is that you have to disable whole-program optimization (/GL and /LTCG). I'm not sure why. If someone finds a way to do this without disabling global optimization, please chime in.

like image 115
Adrian McCarthy Avatar answered Sep 28 '22 03:09

Adrian McCarthy


  1. I'm pretty sure there's a compiler flag that tells VC++ not to use intrinsics

  2. The source to the runtime library is installed with the compiler. You do have the choice of excerpting functions you want/need, though often you'll have to modify them extensively (because they include features and/or dependencies you don't want/need).

  3. There are other open source runtime libraries available as well, which might need less customization.

  4. If you're really serious about this, you'll need to know (and maybe use) assembly language.

Edited to add:

I got your new test code to compile and link. These are the relevant settings:

Enable Intrinsic Functions: No Whole Program Optimization: No 

It's that last one that suppresses "compiler helpers" like the built-in memset.

Edited to add:

Now that it's decoupled, you can copy the asm code from memset.asm into your program--it has one global reference, but you can remove that. It's big enough so that it's not inlined, though if you remove all the tricks it uses to gain speed you might be able to make it small enough for that.

I took your above example and replaced the memset() with this:

void * __cdecl memset(void *pTarget, char value, size_t cbTarget) {     _asm {     push ecx     push edi      mov al, value     mov ecx, cbTarget     mov edi, pTarget     rep stosb      pop edi     pop ecx     }     return pTarget; } 

It works, but the library's version is much faster.

like image 27
egrunin Avatar answered Sep 28 '22 03:09

egrunin