I have a problem which concern with large number of small integers (actually decimal digits). What is the space efficient way to store such a data?
Is it good idea to use std::bitset<4>
to store one decimal digit?
The significant digits to the left of the decimal and the significant digits to the right of the decimal are stored in separate groups of bytes. At the maximum precision specification, DECIMAL(32,s) data types can store s-1 decimal digits to the right of the decimal point, if s is an odd number.
One decimal place to the left of the decimal point is the ones place. One decimal place to the right of the decimal place is the tenths place.
It takes on average 3.2 bits to represent a single decimal digit - 0 through 7 can be represented in 3 bits, while 8 and 9 require 4.
Round up a number to 1 decimal place in Python #Call the math. ceil() method, passing it the number multiplied by 10 . Divide the result by 10 . The result of the calculation will be the float number rounded down to 1 decimal.
Depending on how space-efficient it has to be and how efficient the retrieval should be, I see two possibilities:
Since a vector of std::bitset<4>
is (as far as I know) stored in an unpacked setting (each bitset is stored in a memory word, either 32 or 64 bit), you should probably at least use a packed representation like using a 64 bit word to store 16 digits:
store (if the digit was not stored before):
block |= digit << 4 * index
load:
digit = (block >> 4 * index) & 0xF
reset:
block &= ~(0xF << 4 * index);
A vector of these 64 bit words (uint64_t) together with some access methods should be easy to implement.
If you want a very compact way, then no, using bitset<4>
is a bad idea, because bitset<4>
will use at least one byte, instead of 4 bits.
I'd recommend using std::vector<std::uint32_t>
You can store multiple digits in an uint32_t. Two usual ways:
If you use uint64_t
instead of uint32_t
, then you can store 16 digits with the first way (same 4bit/digit efficiency), and 19 digits with the second way: 3.36bit/digit efficieny, which is pretty close to the theoretical minimum: ~3.3219bit/digit
Is it good idea to use std::bitset<4> to store one decimal digit?
Yes, in principle that's a good idea. It's a well known optimization and called BCD encoding.
(actually decimal digits). What is the space efficient way to store such a data?
You can compact the decimal digit representation by using one nibble of the occupied byte. Also math might be applied optimized, vs. ASCII representation of digits or such.
The std::bitset<4>
won't serve that well for compacting the data.std::bitset<4>
will still occupy a full byte.
An alternative data structure I could think of is a bitfield
// Maybe #pragma pack(push(1))
struct TwoBCDDecimalDigits {
uint8_t digit1 : 4;
uint8_t digit2 : 4;
};
// Maybe #pragma pack(pop)
There is even a library available, to convert this format to a normalized numerical format supported at your target CPU architecture:
Another way I could think of is to write your own class:
class BCDEncodedNumber {
enum class Sign_t : char {
plus = '+' ,
minus = '-'
};
std::vector<uint8_t> doubleDigitsArray;
public:
BCDEncodedNumber() = default;
BCDEncodedNumber(int num) {
AddDigits(num); // Implements math operation + against the
// current BCD representation stored in
// doubleDigitsArray.
}
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With