I have a TFT display which can draw 16 bit colors, format RGB 565. I would like to add some transparency to what I display on it.
Let's say I have a black background (0x0000) and I want to draw a white foreground (0xFFFF) that is half transparent (opacity is controlled by another byte), so it will appear grey. How do I calculate that 16 bit grey color in the same RGB 565 format so I can send it to my TFT and it will display correctly (probably with some loss but I don't care)?
I need a function such as:
unsigned short calcColor_RGB565(unsigned short background_RGB565, unsigned short foreground_RGB565, unsigned char opacity)
calcColor_RGB565(0x0000, 0xFFFF, 128)
would result 0x8410 (or 0x1084, it isn't important because I send two separate bytes to the TFT so I would just invert the order if needed)
Thanks to anyone who can help me, I have tried things but I can't get the correct result not even close :/.
C-like pseudo code appreciated but I prefer explanations on how to do it.
Edit: forgot to say, I would like it to be as fast as possible because it's for an old microprocessor, so if it's faster to calculate the 2 bytes separately (and so I also don't have to separate them later) then I'm highly interested in such optimisations.
Edit 27 September: 5 days later, still not solved. I can convert from rgb565 to rgb8888, do the alpha blending and then convert back to rgb565, but that is too slow, there must be a better way!
My (untested) solution: I split the foreground and background colors to (red + blue) and (green) components and multiply them with a 6bit alpha value. Enjoy! (Only if it works :)
// rrrrrggggggbbbbb
#define MASK_RB 63519 // 0b1111100000011111
#define MASK_G 2016 // 0b0000011111100000
#define MASK_MUL_RB 4065216 // 0b1111100000011111000000
#define MASK_MUL_G 129024 // 0b0000011111100000000000
#define MAX_ALPHA 64 // 6bits+1 with rounding
uint16 alphablend( uint16 fg, uint16 bg, uint8 alpha ){
// alpha for foreground multiplication
// convert from 8bit to (6bit+1) with rounding
// will be in [0..64] inclusive
alpha = ( alpha + 2 ) >> 2;
// "beta" for background multiplication; (6bit+1);
// will be in [0..64] inclusive
uint8 beta = MAX_ALPHA - alpha;
// so (0..64)*alpha + (0..64)*beta always in 0..64
return (uint16)((
( ( alpha * (uint32)( fg & MASK_RB )
+ beta * (uint32)( bg & MASK_RB )
) & MASK_MUL_RB )
|
( ( alpha * ( fg & MASK_G )
+ beta * ( bg & MASK_G )
) & MASK_MUL_G )
) >> 6 );
}
/*
result masks of multiplications
uppercase: usable bits of multiplications
RRRRRrrrrrrBBBBBbbbbbb // 5-5 bits of red+blue
1111100000011111 // from MASK_RB * 1
1111100000011111000000 // to MASK_RB * MAX_ALPHA // 22 bits!
-----GGGGGGgggggg----- // 6 bits of green
0000011111100000 // from MASK_G * 1
0000011111100000000000 // to MASK_G * MAX_ALPHA
*/
The correct formula is something like this:
unsigned short blend(unsigned short fg, unsigned short bg, unsigned char alpha)
{
// Split foreground into components
unsigned fg_r = fg >> 11;
unsigned fg_g = (fg >> 5) & ((1u << 6) - 1);
unsigned fg_b = fg & ((1u << 5) - 1);
// Split background into components
unsigned bg_r = bg >> 11;
unsigned bg_g = (bg >> 5) & ((1u << 6) - 1);
unsigned bg_b = bg & ((1u << 5) - 1);
// Alpha blend components
unsigned out_r = (fg_r * alpha + bg_r * (255 - alpha)) / 255;
unsigned out_g = (fg_g * alpha + bg_g * (255 - alpha)) / 255;
unsigned out_b = (fg_b * alpha + bg_b * (255 - alpha)) / 255;
// Pack result
return (unsigned short) ((out_r << 11) | (out_g << 5) | out_b);
}
There is a shortcut you can use for dividing by 255. The compiler should be able to provide some strength reduction, but you might be able to do better by using the following formula instead:
// Alpha blend components
unsigned out_r = fg_r * a + bg_r * (255 - alpha);
unsigned out_g = fg_g * a + bg_g * (255 - alpha);
unsigned out_b = fg_b * a + bg_b * (255 - alpha);
out_r = (out_r + 1 + (out_r >> 8)) >> 8;
out_g = (out_g + 1 + (out_g >> 8)) >> 8;
out_b = (out_b + 1 + (out_b >> 8)) >> 8;
Note the large number of variables in the function... this is okay. If you try to "optimize" the code by rewriting the equations so that it creates fewer temporary variables, you are only doing work that the compiler already does for you. Unless you have a really bad compiler.
If this is not fast enough, there are a few options for how to proceed. However, choosing the correct option depends on the results of profiling, how the code is used, and the target architecture.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With