Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to reassign a ref local?

Tags:

c#

.net

pointers

C#'s ref locals are implemented using a CLR feature called managed pointers, that come with their own set of restrictions, but luckily being immutable is not one of them. I.e. in ILAsm if you have a local variable of managed pointer type, it's entirely possible to change this pointer, making it "reference" another location. (C++/CLI also exposes this feature as interior pointers.)

Reading the C# documentation on ref locals it appears to me that C#'s ref locals are, even though based on the managed pointers of CLR, not relocatable; if they are initialized to point to some variable, they cannot be made to point to something else. I've tried using

ref object reference = ref some_var;
ref reference = ref other_var;

and similar constructs, to no avail.

I've even tried to write a small struct wrapping a managed pointer in IL, it works as far as C# is concerned, but the CLR doesn't seem to like having a managed pointer in a struct, even if in my usage it doesn't ever go to the heap.

Does one really have to resort to using IL or tricks with recursion to overcome this? (I'm implementing a data structure that needs to keep track of which of its pointers were followed, a perfect use of managed pointers.)

like image 417
John Doe the Righteous Avatar asked Sep 04 '17 11:09

John Doe the Righteous


People also ask

What are Ref local?

Ref local is a new variable type introduced in C# 7.0 to store the references. It is mostly used in conjunction with Ref returns to store the reference in a local variable.

How do you return a reference in C#?

Starting with C# 7.0, C# supports reference return values (ref returns). A reference return value allows a method to return a reference to a variable, rather than a value, back to a caller. The caller can then choose to treat the returned variable as if it were returned by value or by reference.

What does ref keyword mean in C#?

The ref keyword indicates that a value is passed by reference. It is used in four different contexts: In a method signature and in a method call, to pass an argument to a method by reference. For more information, see Passing an argument by reference. In a method signature, to return a value to the caller by reference.

What is ref and out parameter in C#?

ref is used to state that the parameter passed may be modified by the method. in is used to state that the parameter passed cannot be modified by the method. out is used to state that the parameter passed must be modified by the method.


1 Answers

[edit:] "ref-reassign" is on the schedule for C# 7.3. The 'conditional-ref' workaround, which I discuss below, was deployed in C# 7.2.


I've also long been frustrated by this and just recently stumbled on a workable answer.

Essentially, in C# 7.2 you can now initialize ref locals with a ternary operator, and this can be con­triv­ed. somewhat torturously, into a simulation of ref-local reassignment. You "hand off" the ref local assignments downwards through multiple variables, as you move down in the lexical scope of your C# code.

This approach requires a great deal of unconventional thinking and a lot of planning ahead. For certain situations or coding scenarios, it may not be possible to anticipate the gamut of runtime con­figurations such that any conditional assignment scheme might apply. In this case you're out of luck. Or, switch to C++/CLI, which exposes managed tracking references. The tension here is that, for C#, the vast and indisputable gains in concision, elegance, and efficiency which are immediately realized by introducing the conventional use of managed pointers (these points are discussed fur­ther below) is frittered away with the degree of contortion required to overcome the reassignment problem.

The syntax that had eluded me for so long is shown next. Or, check the link I cited at the top.

C# 7.2 ref-local conditional assignment via ternary oerator ? :


ref int i_node = ref (f ? ref m_head : ref node.next);

This line is from a canonical problem case for the ref local dilemma that the questioner posed here. It's from code which maintains back-pointers while walking a singly-linked list. The task is trivial in C/C++, as it should be (and is quite beloved by CSE101 instructors, perhaps for that par­ticular reason)—but is entirely agonizing using managed pointers C#.

Such a complaint is entirely legitimate too, thanks to Microsoft's own C++/CLI language showing us how awesome managed pointers can be in the .NET universe. Instead, most C# developers seem to just end up using integer indices into arrays, or of course full blown native pointers with unsafe C#.

Some brief comments on the linked-list walking example, and why one would be interested in going to so much trouble over these managed pointers. We assume all of the nodes are actually structs in an array (ValueType, in-situ) such as m_nodes = new Node[100]; and each next pointer is thus an integer (its index in the array).

struct Node
{
    public int ix, next;
    public char data;

    public override String ToString() => 
              String.Format("{0}  next: {1,2}  data: {2}", ix, next, data);
};

As shown here, the head of the list will be a standalone integer, stored apart from the records. In the next snippet, I use the new C#7 syntax for ValueTuple to do so. Obviously it's no problem to traverse forward using these integer links—but C# has traditionally lacked an elegant way to main­tain a link to the node you came from. It's a problem since one of the integers (the first one) is a special case owing to not being embedded in a Node structure.

static (int head, Node[] nodes) L =
    (3,
    new[]
    {
        new Node { ix = 0, next = -1, data = 'E' },
        new Node { ix = 1, next =  4, data = 'B' },
        new Node { ix = 2, next =  0, data = 'D' },
        new Node { ix = 3, next =  1, data = 'A' },
        new Node { ix = 4, next =  2, data = 'C' },
    });

Additionally, there's presumably a decent amount of processing work to do on each node, but you really don't want to pay the (double) performance costs of imaging each (possibly large) ValueType out of its cozy array home—and then having to image each one back when you're done! After all, surely the reason we're using value types here is to maximize performance. As I discuss at length elsewhere on this site, structs can be extremely efficient in .NET, but only if you never accident­ally "lift" them out of their storage. It's easy to do and it can immediately destroy your memory bus bandwidth.

The trival approach to not-lifting the structs just repeats array indexing like so:

int ix = 1234;
arr[ix].a++;
arr[ix].b ^= arr[ix].c;
arr[ix].d /= (arr[lx].e + arr[ix].f);

Here, each ValueType field access is independently dereferenced on every access. Although this "optimization" does avoid the bandwidth penalties mentioned above, repeating the same array indexing operation over and over again can instead implicate an entirely different set of runtime penalties. The (opportunity) costs now are due to unnecessarily wasted cycles where .NET re­computes provably invariant physical offsets or performs redundant bounds checks on the array.

JIT optimizations in release-mode may mitigate these issues somewhat—or even dramatically—by recognizing and consolidating redundancy in the code you supplied, but maybe not as much as you'd think or hope (or eventually realize you don't want): JIT optimizations are strongly constrained by strict adherence to the .NET Memory Model.[1], which requires that whenever a storage location is publicly visible, the CPU must execute the relevant fetch sequence exactly as authored in the code. For the previous example, this means that if ix is shared with other threads in any way prior to the operations on arr, then the JIT must ensure that the CPU actually touches the ix storage location exactly 6 times, no more, no less.

Of course the JIT can do nothing to address the other obvious and widely-acknowledged problem with repetitive source code such as the previous example. In short, it's ugly, bug-prone, and harder to read and maintain. To illustrate this point,
              ☞   ...did you even notice the bug I intentionally put in the preceding code?

The cleaner version of the code shown next doesn't make bugs like this "easier to spot;" in­stead, as a class, it precludes them en­tirely, since there's now no need for an array-in­dexing variable at all. Variable ix doesn't need exist in the following, since 1234 is used only once. It follows that the bug I so deviously introduced earlier cannot be propagated to this example because it has no means of expression, the benefit being that what can't exist can't introduce a bug (as opposed to 'what does not exist...', which most certainly could be a bug)

ref Node rec = ref arr[1234];
rec.a++;
rec.b ^= rec.c;
rec.d /= (rec.e + rec.f);

Nobody would disagree that this is an improvement. So ideally we want to use managed pointers to directly read and write fields in the structure in situ. One way to do this is to write all of your in­ten­sive processing code as instance member functions and properties in the ValueType itself, though for some reason it seems that many people don't like this approach. In any case, the point is now moot with C#7 ref locals...

                                                    ✹                   ✹                   ✹

I'm now realizing that fully explaining the type of programming required here is probably too in­volved to show with a toy example and thus beyond the scope of a StackOverflow article. So I'm going to jump ahead and in order to wrap up I'll drop in a section of some working code I have showing simulated managed pointer reassignment. This is taken from a heavily modified snap­shot of HashSet<T> in the .NET 4.7.1 reference source[direct link], and I'll just show my version without much explanation:

int v1 = m_freeList;

for (int w = 0; v1 != -1; w++)
{
    ref int v2 = ref (w == 0 ? ref m_freeList : ref m_slots[v1].next);

    ref Slot fs = ref m_slots[v2];

    if (v2 >= i)
    {
        v2 = fs.next;
        fs = default(Slot);
        v1 = v2;
    }
    else
        v1 = fs.next;
}

This is just an arbitrary sample fragment from the working code so I don't expect anyone to follow it, but the gist of it is that the 'ref' variables, designated v1 and v2, are intertwined across scope blocks and the ternary operator is used to coordinate how they flow down. For example, the only purpose of the loop variable w is to handle which variable gets activated for the special case at the start of the linked-list traversal (discussed earlier).

Again, it turns out to be a very bizarre and tortured constraint on the normal ease and fluidity of modern C#. Patience, determination, and—as I mentioned earlier—a lot of planning ahead is required.



&lsqb;1.]
If you're not familiar with what's called the .NET Memory Model, I strongly suggest taking a look. I believe .NET's strength in this area is one of its most compelling features, a hidden gem and the one (not-so-)secret superpower that most fatefully em­barrasses those ever-strident friends of ours who yet adhere to the 1980's-era ethos of bare-metal coding. Note an epic irony: imposing strict limits on wild or unbounded aggression of compiler optimization may end up enabling apps with much better performance, because stronger constraints expose re­liable guarantees to developers. These, in turn imply stronger programming abstractions or suggest advanced design paradigms, in this case relevant to concurrent systems.

For example, if one agrees that, in the native community, lock-free programming has languished in the margins for decades, perhaps the unruly mob of optimizing compilers is to blame? Progress in this specialty area is easily wrecked without the reliable determinism and consistency provided by a rigorous and well-defined memory model, which, as noted, is somewhat at odds with unfettered compiler optimization. So here, constraints mean that the field can at last innovate and grow. This has been my experience in .NET, where lock-free programming has become a viable, realistic—and eventually, mundane—basic daily programming vehicle.

like image 158
Glenn Slayden Avatar answered Sep 29 '22 12:09

Glenn Slayden