Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE strangeness with Functions

I've been playing around with D's inline assembler and SSE, but found something I don't understand. When I try to add two float4 vectors immediately after declaration, the calculation is correct. If I put the calculation in a separate function, I get a series of nans.

//function contents identical to code section in unittest
float4 add(float4 lhs, float4 rhs)
{
    float4 res;
    auto lhs_addr = &lhs;
    auto rhs_addr = &rhs;
    asm
    {
        mov RAX, lhs_addr;
        mov RBX, rhs_addr;
        movups XMM0, [RAX];
        movups XMM1, [RBX];

        addps XMM0, XMM1;
        movups res, XMM0;
    }
    return res;
}

unittest
{
    float4 lhs = {1, 2, 3, 4};
    float4 rhs = {4, 3, 2, 1};

    println(add(lhs, rhs)); //float4(nan, nan, nan, nan)

    //identical code starts here
    float4 res;
    auto lhs_addr = &lhs;
    auto rhs_addr = &rhs;
    asm
    {
        mov RAX, lhs_addr;
        mov RBX, rhs_addr;
        movups XMM0, [RAX];
        movups XMM1, [RBX];

        addps XMM0, XMM1;
        movups res, XMM0;
    } //end identical code
    println(res); //float4(5, 5, 5, 5)
}

The assembly is functionally identical (as far as I can tell) to this link.

Edit: I'm using a custom float4 struct (for now, its just an array) because I want to be able to have an add function like float4 add(float4 lhs, float rhs). For the moment, that results in a compiler error like this:

Error: floating point constant expression expected instead of rhs

Note: I'm using DMD 2.071.0

like image 600
Straivers Avatar asked Apr 20 '16 03:04

Straivers


1 Answers

Your code is wierd, what version of dmd do you use? This works as excpected:

import std.stdio;
import core.simd;

float4 add(float4 lhs, float4 rhs)
{
    float4 res;
    auto lhs_addr = &lhs;
    auto rhs_addr = &rhs;
    asm
    {
        mov RAX, lhs_addr;
        mov RBX, rhs_addr;
        movups XMM0, [RAX];
        movups XMM1, [RBX];

        addps XMM0, XMM1;
        movups res, XMM0;
    }
    return res;
}

void main()
{
    float4 lhs = [1, 2, 3, 4];
    float4 rhs = [4, 3, 2, 1];

    auto r = add(lhs, rhs);
    writeln(r.array); //float4(5, 5, 5, 5)

    //identical code starts here
    float4 res;
    auto lhs_addr = &lhs;
    auto rhs_addr = &rhs;
    asm
    {
        mov RAX, lhs_addr;
        mov RBX, rhs_addr;
        movups XMM0, [RAX];
        movups XMM1, [RBX];

        addps XMM0, XMM1;
        movups res, XMM0;
    } //end identical code
    writeln(res.array); //float4(5, 5, 5, 5)
}
like image 73
Kozzi11 Avatar answered Oct 17 '22 22:10

Kozzi11