Custom size array

Simple Problem Statement:

Is is possible to have a array of a custom size data type (3/5/6/7 byte) in C or Cython?

Background:

I have run across a major memory inefficiency while attempting to code a complex algorithm. The algorithm calls for the storage of a mind-blowing amount of data. All the data is arranged in a contiguous block of memory (like an array). The data is simply a very long list of [usually] very large numbers. The type of the numbers in this list/array is constant given a certain set of numbers (they operate almost as a regular C array, where all numbers are of the same type in a array)

Problem:

Sometimes it is not efficient to store each number in a standard data size. Usually the normal data types are char, short, int, long etc... However if I use a int array to store a data type that is only in the range that can be stored in 3 bytes, on each number I lose 1 byte of space. This makes extreme inefficiency, and when you are storing millions of numbers, the effects are memory breaking. There is unfortunately no other way to implement the solution to the algorithm, and I believe a rough implementation of a custom data size is the only way to do this.

What I tried:

I have tried to use char arrays to complete this task, but the conversions between the different 0 - 255 value bits to form a larger data type is just inefficient in most cases. Often times, there is a mathematical method of taking chars and packing them into a larger number, or taking that larger number, and dividing out its individual chars. Here was an extremely inefficient algorithm of my attempt at this, written in Cython:

def to_bytes(long long number, int length):
    cdef:
        list chars = []
        long long m
        long long d
    
    for _ in range(length):
        m = number % 256
        d = number // 256
        chars.append(m)
        number = d
    
    cdef bytearray binary = bytearray(chars)
    binary = binary[::-1]
    return binary

def from_bytes(string):
    cdef long long d = int(str(string).encode('hex'), 16)
    return d

Keep in mind I don't exactly want improvements to this algorithm, but a fundamental way of declaring an array of a certain data type, so I don't have to do this transformation.

759

asked Jul 06 '14 02:07

Nick Pandolfi

1 Answers

I think the important question is if you need to have access to all the data at the same time.

If you only need to access one chunk of data at the same time

If you only need to access one array at a time, then one Pythonic possibility is to use NumPy arrays with data type uint8 and width as needed. When you need to operate on the data, you expand the compressed data by (here 3-octet numbers into uint32):

import numpy as np

# in this example `compressed` is a Nx3 array of octets (`uint8`)
expanded = np.empty((compressed.shape[0], 4))
expanded[:,:3] = compressed
expanded[:, 3] = 0
expanded = expanded.view('uint32').reshape(-1)

Then the operations are performed on expanded which is a 1-d vector of N uint32 values.

After we are done, the data can be saved back:

# recompress
compressed[:] = expanded.view('uint8').reshape(-1,4)[:,:3]

The time taken for each direction is (in my machine with Python) approximately 8 ns per element for the example above. Using Cython may not give much performance advantage here, because almost all time is spent copying the data between buffers somewhere in the dark depths of NumPy.

This is a high one-time cost, but if you are planning to access each element at least once, it is probably less expensive to pay the one-time cost than a similar cost for each operation.

Of course, the same approach can be taken in C:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <sys/resource.h>

#define NUMITEMS 10000000

int main(void)
    {
    uint32_t *expanded;
    uint8_t * cmpressed, *exp_as_octets;
    struct rusage ru0, ru1;
    uint8_t *ep, *cp, *end;
    double time_delta;

    // create some compressed data
    cmpressed = (uint8_t *)malloc(NUMITEMS * 3);

    getrusage(RUSAGE_SELF, &ru0);

    // allocate the buffer and copy the data
    exp_as_octets = (uint8_t *)malloc(NUMITEMS * 4);
    end = exp_as_octets + NUMITEMS * 4;
    ep = exp_as_octets;
    cp = cmpressed;
    while (ep < end)
        {
        // copy three octets out of four
        *ep++ = *cp++;
        *ep++ = *cp++;
        *ep++ = *cp++;
        *ep++ = 0;
        }
    expanded = (uint32_t *)exp_as_octets;

    getrusage(RUSAGE_SELF, &ru1);
    printf("Uncompress\n");
    time_delta = ru1.ru_utime.tv_sec + ru1.ru_utime.tv_usec * 1e-6 
               - ru0.ru_utime.tv_sec - ru0.ru_utime.tv_usec * 1e-6;
    printf("User: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);
    time_delta = ru1.ru_stime.tv_sec + ru1.ru_stime.tv_usec * 1e-6 
               - ru0.ru_stime.tv_sec - ru0.ru_stime.tv_usec * 1e-6;
    printf("System: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);

    getrusage(RUSAGE_SELF, &ru0);
    // compress back
    ep = exp_as_octets;
    cp = cmpressed;
    while (ep < end)
       {
       *cp++ = *ep++;
       *cp++ = *ep++;
       *cp++ = *ep++;
       ep++;
       }
    getrusage(RUSAGE_SELF, &ru1);
    printf("Compress\n");
    time_delta = ru1.ru_utime.tv_sec + ru1.ru_utime.tv_usec * 1e-6 
               - ru0.ru_utime.tv_sec - ru0.ru_utime.tv_usec * 1e-6;
    printf("User: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);
    time_delta = ru1.ru_stime.tv_sec + ru1.ru_stime.tv_usec * 1e-6 
               - ru0.ru_stime.tv_sec - ru0.ru_stime.tv_usec * 1e-6;
    printf("System: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);
    }

This reports:

Uncompress
 User: 0.022650 seconds, 2.27 nanoseconds per element
 System: 0.016171 seconds, 1.62 nanoseconds per element
Compress
 User: 0.011698 seconds, 1.17 nanoseconds per element
 System: 0.000018 seconds, 0.00 nanoseconds per element

The code was compiled with gcc -Ofast and is probably relatively close to the optimal speed. The system time is spent with the malloc. To my eye this looks pretty fast, as we are doing memory reads at 2-3 GB/s. (Which also means that while making the code multi-threaded would be easy, there may not be much speed benefit.)

If you want to have the best performance, you'll need to code the compression/decompression routines for each data width separately. (I do not promise the C code above is the absolutely fastest on any machine, I did not take a look at the machine code.)

If you need to random access separate values

If you, instead, need to access only one value here and another there, Python will not offer any resonably fast methods, as the array lookup overhead is huge.

In this case I suggest you create the C routines to fetch and put the data back. See technosaurus's answer. There are a lot of tricks, but the alignment problems cannot be avoided.

One useful trick when reading the odd-sized array might be (here reading 3 octets from an octet array compressed into a uint32_t value):

value = (uint32_t *)&compressed[3 * n] & 0x00ffffff;

Then someone else will take care of the possible misalignment, and there'll be one octet of garbage in the end. Unfortunately this cannot be used when writing the values. And - again - this may or may not be faster or slower than any of the other alternatives.

answered Sep 30 '22 14:09

DrV

Related questions
                            
                                Pandas Count Unique occurrences by Month
                            
                                How can I do an interpolating reindex in pandas using datetime indices?
                            
                                Vectorised average K-Nearest Neighbour distance in Python
                            
                                Virtualenv and Anaconda issues
                            
                                How do I use cvxopt for mean variance optimization with constraints?
                            
                                Pandas temporal cumulative sum by group
                            
                                Creating a Python script that runs as a Windows service using sc.exe
                            
                                Using Python Pillow lib to set Color depth
                            
                                Sending a flask request within a flask request
                            
                                How to draw a matrix sparsity pattern with color code in python?
                            
                                Getting User object from username in django
                            
                                python lxml different result on windows and linux
                            
                                Does Python support conditional structure in regex?
                            
                                Python - Sending Outlook email from different address using pywin32
                            
                                Pandas - Choose the starting day when resampling every 2 weeks
                            
                                Pandas diff() functionality on two columns in a dataframe
                            
                                Is there any difference between type and class?
                            
                                Trying to get a Python client talk to a Java Server using thrift's TFileTransport and TFileProcessor
                            
                                Matplotlib is not recognizing the attribute set_xdata.
                            
                                why the returned object of os.popen doesn't support next() call

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Custom size array

Tags:

python

arrays

c

cython