I have a function that I would like to provide an assembly implementation for
on amd64 architecture. For the sake of discussion let's just suppose it's an
Add function, but it's actually more complicated than this. I have the
assembly version working but my question concerns getting the godoc to display
correctly. I have a feeling this is currenty impossible, but I wanted to seek
advice.
Some more details:
BMI2) therefore can only be used
following a CPUID capability check.The implementation is structured like this gist. At a high level:
amd64 case) the function is defined by delegating to
addGeneric.amd64 case the function is actually a variable, initially set to
addGeneric but replaced by addAsm in the init function if a cpuid
check passes.This approach works. However the godoc output is crappy because in the
amd64 case the function is actually a variable. Note godoc appears to be
picking up the same build tags as the machine it's running on. I'm not sure
what godoc.org would do.
Alternatives considered:
Add function delegates to addImpl. Then we pull some similar trick
to replace addImpl in the amd64 case. The problem with this is (in my
experiments) Go doesn't seem to be able to inline the call, and the assembly
is now wrapped in two function calls. Since the assembly is so small already
this has a noticable impact on performance.amd64 case we define a plain function Add that has the useAsm
check inside it, and calls one of addGeneric and addAsm depending on the
result. This would have an even worse impact on performance.So I guess the questions are:
See math.Sqrt for an example of how to do this.
To handle the cpuid check, set a package variable in init() and conditionally jump based on that variable in the assembly implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With