Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to multiply one XMM register by another, but with bit masked value

In x86/SIMD assembly, I've populated an XMM register with four 32 bit pixels of a graphic image I need to convert. However, the pixels are in 10 bit packed RGB format, so they exist in 32 bits in this form:

[  red   ][  green ][  blue  ][]
RRRRRRRRRRGGGGGGGGGGBBBBBBBBBB00

The last two bits are padding bits and are unused.

I need to multiply these pixels by another value, but the value needs to be masked so it only affects say, the red pixels. This value is constant, so it can be hard-coded. Let's say the value is 0.1234. How would I put this into another XMM register with appropriate masking so it only affects the red portion of each 32 bit segment?

Illustrated graphically, I would like to do something like this:

XMM0 (first 32 bit segment):
[ 0.1234 ][  1.0   ][  1.0   ][]

*

XMM1 (first 32 bit segment):
RRRRRRRRRRGGGGGGGGGGBBBBBBBBBB00

With the result being the product of XMM0 and XMM1. Of course, this 32 bit segment would be duplicated across the entire XMM register, I just specified the first 32 bits here so you get the idea.

like image 324
Synthetix Avatar asked Feb 18 '13 19:02

Synthetix


2 Answers

If you really only wanted to affect the red portion you might be able to come up with a trick that will multiply the red and part of the green by some constant (treating the register as a collection of 16-bit shorts) and then recombining just the new red part with the old green and blue.

A better strategy if you're going to operate on all of the colors is to unpack that format into a supported xmm register format (like 16- or 32-bit short or float) using a combination of shift and shuffle (and possibly convert to float) operations. Then do all of your math, then pack it back.

If you are ever re-using any values (for example, if you are computing a filter kernel) and you're working in float, it will be much faster if you unpack and convert to float once and then re-use that value over and over. Even if you have to make a loop that unpacks a whole row to 32-bit float before operating on it and re-packing the whole row.

like image 148
Ben Jackson Avatar answered Nov 05 '22 06:11

Ben Jackson


Assuming you want to use floating point to multiply your values, I would unpack the R/G/B values into individual floating point sections of an XMM register (just divide by 1023.0) for each value.

You may also find that it's actually easier to prepare four R, four G, and four B values, and then build a value that has the same multiplier for each of the colour values in another XMM register, and multiply by that, rather than holding R, G and B in one register. Obviously, this would require a bit of unrolling of the loop, but that tends to improve performance quite a bit anyway.

like image 28
Mats Petersson Avatar answered Nov 05 '22 07:11

Mats Petersson