I am interested in performance properties of the following intrinsics/instructions:
_mm256_andnot_si256
/ vpandn
_mm256_and_si256
/ vpand
_mm256_cmpgt_epi32
/ vpcmpgtd
But unfortunately Intel Intrinsics Guide does not contain tables with latency and throughput numbers for those intrinsics/instructions. Where can I find this information?
Three sources of latency and throughput numbers are:
InstlatX64 lists many instructions in different forms (memory and/or register operands, different operand widths etc.), but does not have information about the number of μops for each execution port. For performance optimization, not only the latency and throughput numbers are of interest, but also these μops per execution port are very relevant. This information is provided by Agner Fog's instruction tables and uops.info .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With