I wonder why operating on Float64
values is faster than operating on Float16
:
julia> rnd64 = rand(Float64, 1000);
julia> rnd16 = rand(Float16, 1000);
julia> @benchmark rnd64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.800 μs … 662.140 μs ┊ GC (min … max): 0.00% … 99.37%
Time (median): 2.180 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.457 μs ± 13.176 μs ┊ GC (mean ± σ): 12.34% ± 3.89%
▁██▄▂▂▆▆▄▂▁ ▂▆▄▁ ▂▂▂▁ ▂
████████████████▇▇▆▆▇▆▅▇██▆▆▅▅▆▄▄▁▁▃▃▁▁▄▁▃▄▁▃▁▄▃▁▁▆▇██████▇ █
1.8 μs Histogram: log(frequency) by time 10.6 μs <
Memory estimate: 8.02 KiB, allocs estimate: 5.
julia> @benchmark rnd16.^2
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
Range (min … max): 5.117 μs … 587.133 μs ┊ GC (min … max): 0.00% … 98.61%
Time (median): 5.383 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.716 μs ± 9.987 μs ┊ GC (mean ± σ): 3.01% ± 1.71%
▃▅█▇▅▄▄▆▇▅▄▁ ▁ ▂
▄██████████████▇▆▇▆▆▇▆▇▅█▇████▇█▇▇▆▅▆▄▇▇▆█▇██▇█▇▇▇▆▇▇▆▆▆▆▄▄ █
5.12 μs Histogram: log(frequency) by time 7.48 μs <
Memory estimate: 2.14 KiB, allocs estimate: 5.
Maybe you ask why I expect the opposite: Because Float16
values have less floating point precision:
julia> rnd16[1]
Float16(0.627)
julia> rnd64[1]
0.4375452455597999
Shouldn't calculations with fewer precisions take place faster? Then I wonder why someone should use Float16
? They can do it even with Float128
!
As you can see, the effect you are expecting is present for Float32
:
julia> rnd64 = rand(Float64, 1000);
julia> rnd32 = rand(Float32, 1000);
julia> rnd16 = rand(Float16, 1000);
julia> @btime $rnd64.^2;
616.495 ns (1 allocation: 7.94 KiB)
julia> @btime $rnd32.^2;
330.769 ns (1 allocation: 4.06 KiB) # faster!!
julia> @btime $rnd16.^2;
2.067 μs (1 allocation: 2.06 KiB) # slower!!
Float64
and Float32
have hardware support on most platforms, but Float16
does not, and must therefore be implemented in software.
Note also that you should use variable interpolation ($
) when micro-benchmarking. The difference is significant here, not least in terms of allocations:
julia> @btime $rnd32.^2;
336.187 ns (1 allocation: 4.06 KiB)
julia> @btime rnd32.^2;
930.000 ns (5 allocations: 4.14 KiB)
The short answer is that you probably shouldn't use Float16 unless you are using a GPU or an Apple CPU because (as of 2022) other processors don't have hardware support for Float16.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With