Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store one decimal digit

Tags:

c++

digits

I have a problem which concern with large number of small integers (actually decimal digits). What is the space efficient way to store such a data?

Is it good idea to use std::bitset<4> to store one decimal digit?

like image 353
dann Avatar asked Jul 08 '17 11:07

dann


People also ask

How do you store decimal numbers?

The significant digits to the left of the decimal and the significant digits to the right of the decimal are stored in separate groups of bytes. At the maximum precision specification, DECIMAL(32,s) data types can store s-1 decimal digits to the right of the decimal point, if s is an odd number.

What is a 1 place decimal number?

One decimal place to the left of the decimal point is the ones place. One decimal place to the right of the decimal place is the tenths place.

How many bits are required to store a single decimal digit?

It takes on average 3.2 bits to represent a single decimal digit - 0 through 7 can be represented in 3 bits, while 8 and 9 require 4.

How do I print to 1 decimal in Python?

Round up a number to 1 decimal place in Python #Call the math. ceil() method, passing it the number multiplied by 10 . Divide the result by 10 . The result of the calculation will be the float number rounded down to 1 decimal.


3 Answers

Depending on how space-efficient it has to be and how efficient the retrieval should be, I see two possibilities:

  • Since a vector of std::bitset<4> is (as far as I know) stored in an unpacked setting (each bitset is stored in a memory word, either 32 or 64 bit), you should probably at least use a packed representation like using a 64 bit word to store 16 digits:

    store (if the digit was not stored before):
    block |= digit << 4 * index
    load:
    digit = (block >> 4 * index) & 0xF
    reset:
    block &= ~(0xF << 4 * index);
    

A vector of these 64 bit words (uint64_t) together with some access methods should be easy to implement.

  • If your space requirements are even tighter, you could e.g. try packing 3 digits in 10 bits (at most 1024) using divisions and modulo, which would be a lot less time-efficient. Also the alignment with 64 bit words is much more difficult, so I would only recommend this if you need to get the final 16% improvement, at most you can get something like 3.3 bits per digit.
like image 176
Tobias Ribizel Avatar answered Nov 04 '22 00:11

Tobias Ribizel


If you want a very compact way, then no, using bitset<4> is a bad idea, because bitset<4> will use at least one byte, instead of 4 bits.

I'd recommend using std::vector<std::uint32_t>

You can store multiple digits in an uint32_t. Two usual ways:

  1. Use for 4 bits for each digit, and use bit operations. This way you can store 8 digits in 4 bytes. Here, set/get operations are pretty fast. Efficiency: 4bit/digit
  2. Use base 10 encoding. uint32_t max value is 256^4-1, which is capable to store 9 digits in 4 bytes. Efficiency: 3.55bit/digit. Here, if you need to set/get all the 9 digits, then it is almost as fast than the previous version (as division by 10 will be optimized by a good compiler, no actual division will be done by the CPU). If you need random access, then set/get will be slower than the previous version (you can speed it up with libdivide).

If you use uint64_t instead of uint32_t, then you can store 16 digits with the first way (same 4bit/digit efficiency), and 19 digits with the second way: 3.36bit/digit efficieny, which is pretty close to the theoretical minimum: ~3.3219bit/digit

like image 36
geza Avatar answered Nov 04 '22 00:11

geza


Is it good idea to use std::bitset<4> to store one decimal digit?

Yes, in principle that's a good idea. It's a well known optimization and called BCD encoding.

(actually decimal digits). What is the space efficient way to store such a data?

You can compact the decimal digit representation by using one nibble of the occupied byte. Also math might be applied optimized, vs. ASCII representation of digits or such.

The std::bitset<4> won't serve that well for compacting the data.
std::bitset<4> will still occupy a full byte.

An alternative data structure I could think of is a bitfield

// Maybe #pragma pack(push(1))
struct TwoBCDDecimalDigits {
    uint8_t digit1 : 4;
    uint8_t digit2 : 4;
};
// Maybe #pragma pack(pop)

There is even a library available, to convert this format to a normalized numerical format supported at your target CPU architecture:

  • XBCD_Math

Another way I could think of is to write your own class:

class BCDEncodedNumber {
    enum class Sign_t : char {
        plus = '+' ,
        minus = '-'
    };
    std::vector<uint8_t> doubleDigitsArray;
public:
    BCDEncodedNumber() = default;
    BCDEncodedNumber(int num) {
        AddDigits(num); // Implements math operation + against the
                        // current BCD representation stored in 
                        // doubleDigitsArray.
    }    
};
like image 33
user0042 Avatar answered Nov 04 '22 00:11

user0042