Show me the code: cpuid and popcount

 

So, modern CPUs have built-in functions for a lot of things, these days, which can offer you blisteringly fast access to some otherwise rather awkward and slow algorithms.

In my case, I wanted to access the CPU’s popcount function (count how many set bits there are in a value). The trick is that either the CPU supports it or it doesn’t, and different systems have different ways of testing if it does, and accessing it if it does. Of course if the CPU doesn’t support it, you still have to do it in software anyway.

So, I thought I’d write something extensible, and I thought I’d share.

So far, I’ve only produced this for WIN32/X64 targets, but it should be relatively trivial to add in a few lines to cover other platforms and systems.

typedef enum { UNTESTED, SUPPORTED, UNSUPPORTED } hardware_support_t;
extern hardware_support_t popcount_support;

namespace cpufeatures // Avoid any potential clashes, not that they're likely to occur.
{
 const int POPCOUNT_BYTE = 2, POPCOUNT_BIT = 23;
}

#ifdef _WIN32
extern bool win32_cpu_support(int cmd, int bcheck, int bitnum); // Use magic Windows pixies to check CPU hardware support flags
#endif

// A simple (and probably not very fast) implementation of popcount() in case there's no CPU support.
template<typename T>
size_t software_popcount(const T&val)
{
 auto value = val;
 unsigned int count = 0;
 while (value > 0) { // until all bits are zero
 if ((value & 1) == 1) // check lower bit
 count++;
 value >>= 1; // shift bits, removing lower bit
 }
 return count;
}

// Count the number of set bits in a POD type
template<typename T>
size_t popcount(const T& val)
{
#ifdef _WIN32
 // If _WIN32 is defined, we can test the CPU for hardware support
 // If we find that it is supported, we can use the hardware to do this for us in very little time.
 if (popcount_support == UNTESTED)
 {
 bool b = win32_cpu_support(0x00000001, cpufeatures::POPCOUNT_BYTE, cpufeatures::POPCOUNT_BIT);
 popcount_support = b ? SUPPORTED : UNSUPPORTED;
 }
 if (popcount_support == SUPPORTED)
#ifdef _M_X64
 return __popcnt64(val); // The 64 bit version
#else
 return __popcnt(val); // Or the 32 bit version if we're not building for x64
#endif
#endif // WIN32
 return software_popcount(val);
}


And here’s the last pieces:

hardware_support_t popcount_support=UNTESTED;

#ifdef _WIN32 // If we're not using _WIN32 (which includes x64) then we have to go about this in a very different way
bool win32_cpu_support(int cmd,int bcheck,int bitnum)
{
 std::array<int, 4> cpui;

 __cpuid(cpui.data(), cmd);
 return (cpui[bcheck] & (bitnum==0?1:1 << bitnum))!=0;
}

#endif

Once all of this passes through your compiler’s optimiser, it’s actually very lightweight, except for the crappy version of software_popcount(). There’s dozens of better versions out there, and you should totally snag one. The only real virtue of this one is that it is easy to read, and I don’t expect it to ever get called.

If this is something that’s useful to you, have fun with it.