Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert RGB to RGBA in C

Tags:

c

image

I need to copy the contents of a byte array representing an image in RGB byte order into another RGBA(4 bytes per pixel) buffer. The alpha channel will get filled later. What would be the fastest way of achieving this?

like image 580
Yannis Avatar asked Aug 15 '11 18:08

Yannis


2 Answers

How tricky do you want it? You could set it up to copy a 4-byte word at a time, which might be a bit faster on some 32-bit systems:

void fast_unpack(char* rgba, const char* rgb, const int count) {
    if(count==0)
        return;
    for(int i=count; --i; rgba+=4, rgb+=3) {
        *(uint32_t*)(void*)rgba = *(const uint32_t*)(const void*)rgb;
    }
    for(int j=0; j<3; ++j) {
        rgba[j] = rgb[j];
    }
}

The extra case on the end is to deal with the fact that the rgb array is missing a byte. You could also make it a bit faster using aligned moves and SSE instructions, working in multiples of 4 pixels at a time. If you're feeling really ambitious, you can try even more horribly obfuscated things like prefetching a cache line into the FP registers, for example, then blitting it across to the other image all at once. Of course the mileage you get out of these optimizations is going to be highly dependent on the specific system configuration you are targetting, and I would be really skeptical that there is much benefit at all to doing any of this instead of the simple thing.

And my simple experiments confirm that this is indeed a little bit faster, at least on my x86 machine. Here is a benchmark:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <time.h>

void fast_unpack(char* rgba, const char* rgb, const int count) {
    if(count==0)
        return;
    for(int i=count; --i; rgba+=4, rgb+=3) {
        *(uint32_t*)(void*)rgba = *(const uint32_t*)(const void*)rgb;
    }
    for(int j=0; j<3; ++j) {
        rgba[j] = rgb[j];
    }
}

void simple_unpack(char* rgba, const char* rgb, const int count) {
    for(int i=0; i<count; ++i) {
        for(int j=0; j<3; ++j) {
            rgba[j] = rgb[j];
        }
        rgba += 4;
        rgb  += 3;
    }
}

int main() {
    const int count = 512*512;
    const int N = 10000;

    char* src = (char*)malloc(count * 3);
    char* dst = (char*)malloc(count * 4);

    clock_t c0, c1;    
    double t;
    printf("Image size = %d bytes\n", count);
    printf("Number of iterations = %d\n", N);

    printf("Testing simple unpack....");
    c0 = clock();
    for(int i=0; i<N; ++i) {
        simple_unpack(dst, src, count);
    }
    c1 = clock();
    printf("Done\n");
    t = (double)(c1 - c0) / (double)CLOCKS_PER_SEC;
    printf("Elapsed time: %lf\nAverage time: %lf\n", t, t/N);


    printf("Testing tricky unpack....");
    c0 = clock();
    for(int i=0; i<N; ++i) {
        fast_unpack(dst, src, count);
    }
    c1 = clock();
    printf("Done\n");
    t = (double)(c1 - c0) / (double)CLOCKS_PER_SEC;
    printf("Elapsed time: %lf\nAverage time: %lf\n", t, t/N);

    return 0;
}

And here are the results (compiled with g++ -O3):

Image size = 262144 bytes

Number of iterations = 10000

Testing simple unpack....Done

Elapsed time: 3.830000

Average time: 0.000383

Testing tricky unpack....Done

Elapsed time: 2.390000

Average time: 0.000239

So, maybe about 40% faster on a good day.

like image 180
Mikola Avatar answered Oct 01 '22 21:10

Mikola


The fastest was would be to use a library that implements the conversion for you rather than writing it yourself. Which platform[s] are you targeting?

If you insist on writing it yourself for some reason, write a simple and correct version first. Use that. If the performance is inadequate, then you can think about optimizing it. In general, this sort of conversion is best done using vector permutes, but the exact optimal sequence varies depending on the target architecture.

like image 33
Stephen Canon Avatar answered Oct 01 '22 20:10

Stephen Canon