Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading bit-aligned data

Tags:

c#

.net-3.5

I have been reading the SWF format available on Adobe's site and it mentions that in order to save space, variable bits are used to store integers or floats (page 17 in the pdf)

I have always worked with byte-aligned data so have not given much thought to files that are bit-aligned, or have variable alignment where the information is stored in each byte.

So for example, you may have a struct containing four 13-bit integers stored sequentially ( rather than storing them as four 16-bit integers).

The first 13bits is the first integer, the next 13 bits is the second integer, and so on. It pads the last byte appropriate to make the struct byte-aligned with the rest of the file, so 52-bits would be padded to 56-bits, requiring 7 bytes to store those four integers as opposed to 8 bytes.

  • How do I approach this kind of problem?
  • How can I work with a stream of bytes at the bit-level?
  • Is there something I can use to help make it easier to work with this data?

I imagine the solution boils down to using bit-operations on byte arrays.

An example solution for parsing the four 13-bit integers would be nice as well to demonstrate the use of your suggested method.

like image 750
MxLDevs Avatar asked Aug 16 '12 00:08

MxLDevs


People also ask

What is aligned data in bit?

For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary. Data alignment is the aligning of elements according to their natural alignment.

What does it mean to be 32-bit aligned?

Alignment refers to a piece of data's location in memory. A variable is naturally aligned if it exists at a memory address that is a multiple of its size. For example, a 32-bit type is naturally aligned if it is located in memory at an address that is a multiple of four (that is, its lowest two bits are zero).

What does it mean to be 8 byte aligned?

An object that is "8 bytes aligned" is stored at a memory address that is a multiple of 8. Many CPUs will only load some data types from aligned locations; on other CPUs such access is just faster.

What does it mean to be 16 byte aligned?

Data that's aligned on a 16 byte boundary will have a memory address that's an even number — strictly speaking, a multiple of two. Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes.


2 Answers

There are two ways of dealing with this that I know of. The first is to manually do it - using bit-wise operators, division, modulus etc. on byte arrays [or integer/ulong etc if you were bored]. IsBitSet Example

The other way is a BitArray - which handles most of this for you :)


It would be nice to add an example of how exactly BitArray handles getting bits 13..25 as an int, as that would be the primary operation. At a first glance I see only a loop.

Fine... I wrote a quick & dirty test proof of concept:

var rnd = new Random();
//var data = Enumerable.Range(0, 10).ToArray();
var data = Enumerable.Range(0, 10).Select(x => rnd.Next(1 << 13)).ToArray();

foreach (var n in data) Console.WriteLine(n);

Console.WriteLine(new string('-', 13));

var bits = new BitArray(data.Length * 13);

for (int i = 0; i < data.Length; i++)
{
    var intBits = new BitArray(new[] { data[i] });
    for (int b = 12; b > -1; b--)
    {
        bits[i * 13 + b] = intBits[b];
        Console.Write(intBits[b] ? 1 : 0);
    }
    Console.WriteLine();
}
Console.WriteLine(new string('-', 13));

for (int i = 0; i < bits.Length / 13; i++)
{
    int number = 0;
    for (int b = 12; b > -1; b--)
        if (bits[i * 13 + b])
            number += 1 << b;

    Console.WriteLine(number);
}
Console.ReadLine();

Which outputs:

910
3934
7326
7990
7712
1178
6380
3460
5113
7489
-------------
0001110001110
0111101011110
1110010011110
1111100110110
1111000100000
0010010011010
1100011101100
0110110000100
1001111111001
1110101000001
-------------
910
3934
7326
7990
7712
1178
6380
3460
5113
7489

The bit array doesn't do much other than simplify accessing - it's still quite manual. I expect you'd write your own classes to simply this and make it neat and reusable - for example here's another quick concept:

//Improved to take sign into account.
//Sign is in addition to bits allocated for storage in this version.
//Stored as {sign}{bits}
//E.g.  -5, stored in 3 bits signed is:
//       1 101
//E.g.   5, stored in 3 bits [with sign turned on]
//       0 101
//E.g.   5, stored in 3 bits no sign
//         101  
//This may differ from your exiting format - e.g. you may use two's compliments.
static void Main(string[] args)
{
    int bitsPerInt = 13;

    //Create your data
    var rnd = new Random();
    //var data = Enumerable.Range(-5, 10).ToArray();
    var data = Enumerable.Range(0, 10).Select(x => rnd.Next(-(1 << bitsPerInt), 1 << bitsPerInt)).ToArray();

    var bits = new BitSerlializer();

    //Add length header
    bits.AddInt(data.Length, 8, false);
    foreach (var n in data)
    {
        bits.AddInt(n, bitsPerInt);
        Console.WriteLine(n);
    }

    //Serialize to bytes for network transfer etc.
    var bytes = bits.ToBytes();

    Console.WriteLine(new string('-', 10));
    foreach (var b in bytes) Console.WriteLine(Convert.ToString(b, 2).PadLeft(8, '0'));
    Console.WriteLine(new string('-', 10));

    //Deserialize
    bits = new BitSerlializer(bytes);
    //Get Length Header
    var count = bits.ReadInt(8, false);
    for (int i = 0; i < count; i++)
        Console.WriteLine(bits.ReadInt(bitsPerInt));

    Console.ReadLine();
}

public class BitSerlializer
{
    List<byte> bytes;
    int Position { get; set; }

    public BitSerlializer(byte[] initialData = null)
    {
        if (initialData == null)
            bytes = new List<byte>();
        else
            bytes = new List<byte>(initialData);
    }

    public byte[] ToBytes() { return bytes.ToArray(); }

    public void Addbit(bool val)
    {
        if (Position % 8 == 0) bytes.Add(0);
        if (val) bytes[Position / 8] += (byte)(128 >> (Position % 8));
        Position++;
    }

    public void AddInt(int i, int length, bool isSigned = true)
    {
        if (isSigned) Addbit(i < 0);
        if (i < 0) i = -i;

        for (int pos = --length; pos >= 0; pos--)
        {
            var val = (i & (1 << pos)) != 0;
            Addbit(val);
        }
    }

    public bool ReadBit()
    {
        var val = (bytes[Position / 8] & (128 >> (Position % 8))) != 0;
        ++Position;
        return val;
    }

    public int ReadInt(int length, bool isSigned = true)
    {
        var val = 0;
        var sign = isSigned && ReadBit() ? -1 : 1;

        for (int pos = --length; pos >= 0; pos--)
            if (ReadBit())
                val += 1 << pos;

        return val * sign;
    }
}
like image 161
NPSF3000 Avatar answered Sep 20 '22 03:09

NPSF3000


On the other hand, byte-array-based approach could go like this:

    int extend(uint raw, int bits)
    {
        int sh = 32 - bits;
        int x = (int)raw << sh; // puts your sign bit in the highest bit.
        return x >> sh;  // since x is signed this is an arithmatic signed shift
    }

    int read(byte[] data, int pos, int bits, bool signed)
    {
        int fbi = pos / 8; // first byte index
        int lbi = (pos + bits - 1) / 8; // last byte index
        int cnt = lbi - fbi + 1; // bytes spanned
        if (cnt > 3 || lbi >= data.Length) { throw new ArgumentException(); }

        uint raw = (uint)(
            (data[fbi] << (24 + pos % 8)) + 
            (cnt < 2 ? 0 : data[fbi + 1] << (16 + pos % 8)) + 
            (cnt < 3 ? 0 : data[fbi + 2] << (8 + pos % 8))
            ) >> (32 - bits);
        return signed ? extend(raw, bits) : (int)raw;
    }

Test for this:

    byte[] test = { 0x55, 0xAA, 0x10 };

    string s = "";
    s += read(test, 0, 8, false) + "\r\n";
    s += read(test, 0, 8, true) + "\r\n";
    s += read(test, 8, 8, false) + "\r\n";
    s += read(test, 8, 8, true) + "\r\n";
    s += read(test, 4, 8, false) + "\r\n";
    s += read(test, 7, 9, true) + "\r\n";
    s += read(test, 7, 10, true) + "\r\n";
    s += read(test, 7, 11, true) + "\r\n";
    s += read(test, 7, 12, true) + "\r\n";
    s += read(test, 7, 13, true) + "\r\n";
    s += read(test, 7, 14, true) + "\r\n";
    s += read(test, 7, 15, true) + "\r\n";
    s += read(test, 7, 16, true) + "\r\n";
    s += read(test, 7, 17, true) + "\r\n";
    s += read(test, 18, 2, true) + "\r\n";
    s += read(test, 18, 3, true) + "\r\n";
    s += read(test, 23, 1, true) + "\r\n";
    s += read(test, 23, 2, true) + "\r\n";

The test builds the string like the following:

    85
    85
    170
    -86
    90
    -86
    -172
    -344
    -688
    -1375
    -2750
    -5500
    -11000
    -22000
    1
    2
    0

then throws an exception on the last line.

like image 28
Eugene Ryabtsev Avatar answered Sep 19 '22 03:09

Eugene Ryabtsev