I have many small functions I would like to inline, for example to test flags for some condition:
const COND = UInt(1<<BITS_FOR_COND)
function is_cond(flags::UInt)
return flags & COND != 0
end
I could also make a macro:
macro IS_COND(flags::UInt)
return :(flags & COND != 0)
end
My motivation is many similar macro functions in the C code I am working with:
#define IS_COND(flags) ((flags) & COND)
I repeatedly timed the function, macro, function defined with @inline, and the expression by itself, but none are consistently faster than the others across many runs. The generated code for the function call in 1) and 3) are much longer than for the expression in 4), but I don't know how to compare 2) since @code_llvm
etc. don't work on other macros.
1) for j=1:10 @time for i::UInt=1:10000 is_cond(i); end end
2) for j=1:10 @time for i::UInt=1:10000 @IS_COND(i); end end
3) for j=1:10 @time for i::UInt=1:10000 is_cond_inlined(i); end end
4) for j=1:10 @time for i::UInt=1:10000 i & COND != 0; end end
Questions: What is the purpose of @inline
? I see from the sparse documentation that it appends the symbol :inline
to the expression :meta
, but what does that do, exactly? Is there any reason to prefer a function or macro for this kind of task?
My understanding is that a C macro function just substitutes the literal text of the macro at compile time, so the resulting code has no jumps and is therefore more efficient than a regular function call. (Safety is another issue, but let's assume the programmers know what they're doing.) A Julia macro has intermediate steps like parsing its arguments, so it's not obvious to me whether 2) should be faster than a 1). Ignoring for the moment that in this case the difference in performance is negligible, what technique results in the most efficient code?
Speed versus size The main benefit of using macros is faster execution time. During preprocessing, a macro is expanded (replaced by its definition) inline each time it is used. A function definition occurs only once regardless of how many times it is called.
Macros are typically faster than functions as they don't involve actual function call overhead.
Macros change existing source code or generate entirely new code. They are not some kind of more powerful function that unlocks secret abilities of Julia, they are just a way to automatically write code that you could have written out by hand anyway.
If the two syntaxes result in exactly the same generated code, should you prefer one over the other? YES. Functions are vastly superior to macros in situations like this.
@IS_COND
definition (you don't want to put a type annotation on the argument, you need to interpolate flags
into the returned expression, and you need to use esc
to get the hygiene correct).@
sigil is a good warning for "something beyond normal Julia syntax is occurring here." If it's behaving just like a function, though, might as well make it one.map
.@inline
annotation — it just does it on its own. You can use @inline
to give the compiler an extra nudge that a bigger function is especially important to inline… but often Julia is good at figuring it out on its own (like here).So, now, do they result in the same generated code? One of the most powerful things about Julia is your ability to ask it for its "intermediate work."
First, some set up:
julia> const COND = UInt(1<<7)
is_cond(flags) = return flags & COND != 0
macro IS_COND(flags)
return :($(esc(flags)) & COND != 0) # careful!
end
Now we can start looking at what happens when you use either is_cond
or @IS_COND
. In actual code, you'll be using these definitions within other functions, so let's create some test functions:
julia> test_func(x) = is_cond(x)
test_macro(x) = @IS_COND(x)
Now we can start moving down the chain to see if there's a difference. The first step is "lowering" — this simply converts the syntax to a limited subset to make life easier for the compiler. You can see that at this stage, the macro gets expanded but the function call still remains:
julia> @code_lowered test_func(UInt(1))
LambdaInfo template for test_func(x) at REPL[2]:1
:(begin
nothing
return (Main.is_cond)(x)
end)
julia> @code_lowered test_macro(UInt(1))
LambdaInfo template for test_macro(x) at REPL[2]:2
:(begin
nothing
return x & Main.COND != 0
end)
The next step, though, is inference and optimization. It's here that function inlining takes effect:
julia> @code_typed test_func(UInt(1))
LambdaInfo for test_func(::UInt64)
:(begin
return (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.and_int)((Base.sle_int)(0,0)::Bool,((Base.box)(UInt64,(Base.and_int)(x,Main.COND)) === (Base.box)(UInt64,0))::Bool))))
end::Bool)
julia> @code_typed test_macro(UInt(1))
LambdaInfo for test_macro(::UInt64)
:(begin
return (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.and_int)((Base.sle_int)(0,0)::Bool,((Base.box)(UInt64,(Base.and_int)(x,Main.COND)) === (Base.box)(UInt64,0))::Bool))))
end::Bool)
Look at that! This step in the internal representation is a little messier, but you can see that the function got inlined (even without @inline
!) and now the code looks exactly identical between the two.
We can go farther and ask for the LLVM… and indeed the two are exactly identical:
julia> @code_llvm test_func(UInt(1)) | julia> @code_llvm test_macro(UInt(1))
|
define i8 @julia_test_func_70754(i64) #0 { | define i8 @julia_test_macro_70752(i64) #0 {
top: | top:
%1 = lshr i64 %0, 7 | %1 = lshr i64 %0, 7
%2 = xor i64 %1, 1 | %2 = xor i64 %1, 1
%3 = trunc i64 %2 to i8 | %3 = trunc i64 %2 to i8
%4 = and i8 %3, 1 | %4 = and i8 %3, 1
%5 = xor i8 %4, 1 | %5 = xor i8 %4, 1
ret i8 %5 | ret i8 %5
} | }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With