Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is POPCNT implemented in hardware?

According to http://www.agner.org/optimize/instruction_tables.pdf, the POPCNT instruction (which returns the number of set bits in a 32-bit or 64-bit register) has a throughput of 1 instruction per clock cycle on modern Intel and AMD processors. This is much faster than any software implementation which needs multiple instructions (How to count the number of set bits in a 32-bit integer?).

How is POPCNT implemented so efficiently in hardware?

like image 777
Siqi Lin Avatar asked Mar 02 '15 04:03

Siqi Lin


People also ask

What does POPCNT do?

Most CPU architectures in use today have an instruction called popcount , short for “population count”. Here's what it does: it counts the number of set bits in a machine word. For example (assuming 8-bit words for simplicity), popcount(00100110) is 3 and popcount(01100000) is 2 .

What is Popcnt instruction set?

POPCNT and LZCNT These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support.

What is pop count?

Population count may refer to: A census, the process of obtaining information about every member of a population (not necessarily a human population) Hamming weight, the number of non-zero entries ('1' bits) in a byte, string, word, array or other similar data structure.


1 Answers

There's a patent for combined popcnt, bit scan forward / reverse:

US8214414 B2 - Combined set bit count and detector logic

Abstract

A merged datapath for PopCount and BitScan is described. A hardware circuit includes a compressor tree utilized for a PopCount function, which is reused by a BitScan function (e.g., bit scan forward (BSF) or bit scan reverse (BSR)). Selector logic enables the compressor tree to operate on an input word for the PopCount or BitScan operation, based on a microprocessor instruction. The input word is encoded if a BitScan operation is selected. The compressor tree receives the input word, operates on the bits as though all bits have same level of significance (e.g., for an N-bit input word, the input word is treated as N one-bit inputs). The result of the compressor tree circuit is a binary value representing a number related to the operation performed (the number of set bits for PopCount, or the bit position of the first set bit encountered by scanning the input word).

like image 162
rcgldr Avatar answered Sep 30 '22 02:09

rcgldr