Faster bit reading?

Question

In my application 20% of cpu time is spent on reading bits (skip) through my bit reader. Does anyone have any idea on how one might make the following code faster? At any given time, I do not need more than 20 valid bits (which is why I, in some situations, can use fast_skip).

Bits are read in big-endian order, which is why the byte swap is needed.

class bit_reader
{   
    std::uint32_t* m_data;
    std::size_t    m_pos;
    std::uint64_t  m_block;

public:
    bit_reader(void* data)
        : m_data(reinterpret_cast<std::uint32_t*>(data))
        , m_pos(0)
        , m_block(_byteswap_uint64(*reinterpret_cast<std::uint64_t*>(data)))
    {
    }

    std::uint64_t value(std::size_t n_bits = 64)
    {               
        assert(m_pos + n_bits < 64);

        return (m_block << m_pos) >> (64 - n_bits);
    }

    void skip(std::size_t n_bits) // 20% cpu time
    {
        assert(m_pos + n_bits < 42);

        m_pos  += n_bits;

        if(m_pos > 31)
        {
            m_block = _byteswap_uint64(reinterpret_cast<std::uint64_t*>(++m_data)[0]);
            m_pos  -= 32;
        }
    }   

    void fast_skip(std::size_t n_bits)
    {
        assert(m_pos + n_bits < 42);
        m_pos  += n_bits;
    }   
};

Target hardware is x64.

marko · Accepted Answer

I see from an earlier comment you are unpacking Huffman/arithmetic coded streams in JPEG.

skip() and value() are really simple enough to be inlined. There's a chance that the compiler will keep the shift register and buffer pointers in registers the whole while. Making all pointers here and in the caller with the restrict modifier might help by telling the compiler that you won't be writing the results of Huffman decoding into the bit-buffer, thus allowing further optimisation.
The average length of each Huffman/artimetic symbol is short - so, ~7 times out of 8, you won't need to top up the 64-bit shift register. Investigate giving the compiler a branch-prediction hint.
It's unusual for any symbol in the JPEG bitstream to be longer than 32-bits. Does this allow further optimization?
One very logical reason that skip() is a heavy path is that you're calling it a lot. You are consuming an entire symbol at once rather than every bit here aren't you? There are some clever tricks you can do by counting leading 0 or 1s in symbols and table lookup.
You might consider arranging your shift register such that the next bit in the stream is the LSB. This will avoid the shifts in value()

Faster bit reading?

Tags:

c++

bit-manipulation

64-bit

ronag

1 Answers

marko

Recent Activity

Donate For Us

Faster bit reading?

Tags:

c++

bit-manipulation

64-bit

ronag

1 Answers

marko

Related questions

Recent Activity

Donate For Us