Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Unsafe code and its uses

I am currently reading the ECMA-334 as suggested by a friend that does programming for a living. I am on the section dealing with Unsafe code. Although, I am a bit confused by what they are talking about.

The garbage collector underlying C# might work by moving objects around in memory, but this motion is invisible to most C# developers. For developers who are generally content with automatic memory management but sometimes need fine-grained control or that extra bit of performance, C# provides the ability to write “unsafe” code. Such code can deal directly with pointer types and object addresses; however, C# requires the programmer to fix objects to temporarily prevent the garbage collector from moving them. This “unsafe” code feature is in fact a “safe” feature from the perspective of both developers and users. Unsafe code shall be clearly marked in the code with the modifier unsafe, so developers can't possibly use unsafe language features accidentally, and the compiler and the execution engine work together to ensure 26 8 9BLanguage overview that unsafe code cannot masquerade as safe code. These restrictions limit the use of unsafe code to situations in which the code is trusted.

The example

using System;
class Test
{
    static void WriteLocations(byte[] arr)
    {
        unsafe
        {
            fixed (byte* pArray = arr)
            {
                byte* pElem = pArray;
                for (int i = 0; i < arr.Length; i++)
                {
                    byte value = *pElem;
                    Console.WriteLine("arr[{0}] at 0x{1:X} is {2}",
                    i, (uint)pElem, value);
                    pElem++;
                }
            }
        }
    }
    static void Main()
    {
        byte[] arr = new byte[] { 1, 2, 3, 4, 5 };
        WriteLocations(arr);
        Console.ReadLine();
    }
}

shows an unsafe block in a method named WriteLocations that fixes an array instance and uses pointer manipulation to iterate over the elements. The index, value, and location of each array element are written to the console. One possible example of output is:

arr[0] at 0x8E0360 is 1
arr[1] at 0x8E0361 is 2
arr[2] at 0x8E0362 is 3
arr[3] at 0x8E0363 is 4
arr[4] at 0x8E0364 is 5

but, of course, the exact memory locations can be different in different executions of the application.

Why is knowing the exact memory locations of for example, this array beneficial to us as developers? And could someone explain this ideal in a simplified context?

like image 780
Jesse Glover Avatar asked May 05 '15 23:05

Jesse Glover


People also ask

What is unsafe code?

Unsafe code in general is a keyword that denotes a code section that is not handled by the Common Language Runtime(CLR). Pointers are not supported by default in C# but unsafe keyword allows the use of the pointer variables.

What is the use of unsafe keyword?

The unsafe keyword denotes an unsafe context, which is required for any operation involving pointers. For more information, see Unsafe Code and Pointers.

What is unsafe code in C sharp?

Unsafe code in C# isn't necessarily dangerous; it's just code whose safety cannot be verified. Unsafe code has the following properties: Methods, types, and code blocks can be defined as unsafe. In some cases, unsafe code may increase an application's performance by removing array bounds checks.

What is unmanaged code and unsafe code?

This is responsible for things like memory management and garbage collection. So unmanaged simply runs outside of the context of the CLR. unsafe is kind of "in between" managed and unmanaged. unsafe still runs under the CLR, but it will let you access memory directly through pointers.


2 Answers

The fixed language feature is not exactly "beneficial" as it is "absolutely necessary".

Ordinarily a C# user will imagine Reference-types as being equivalent to single-indirection pointers (e.g. for class Foo, this: Foo foo = new Foo(); is equivalent to this C++: Foo* foo = new Foo();.

In reality, references in C# are closer to double-indirection pointers, it's a pointer (or rather, a handle) to an entry in a massive object table that then stores the actual addresses of objects. The GC not only will clean-up unused objects, but also move objects around in memory to avoid memory fragmentation.

All this is well-and-good if you're exclusively using object references in C#. As soon as you use pointers then you've got problems because the GC could run at any point in time, even during tight-loop execution, and when the GC runs your program's execution is frozen (which is why the CLR and Java are not suitable for Hard Real Time applications - a GC pause can last a few hundred milliseconds in some cases).

...because of this inherent behaviour (where an object is moved during code execution) you need to prevent that object being moved, hence the fixed keyword, which instructs the GC not to move that object.

An example:

unsafe void Foo() {

    Byte[] safeArray = new Byte[ 50 ];
    safeArray[0] = 255;
    Byte* p = &safeArray[0];

    Console.WriteLine( "Array address: {0}", &safeArray );
    Console.WriteLine( "Pointer target: {0}", p );
    // These will both print "0x12340000".

    while( executeTightLoop() ) {
        Console.WriteLine( *p );
        // valid pointer dereferencing, will output "255".
    }

    // Pretend at this point that GC ran right here during execution. The safeArray object has been moved elsewhere in memory.

    Console.WriteLine( "Array address: {0}", &safeArray );
    Console.WriteLine( "Pointer target: {0}", p );
    // These two printed values will differ, demonstrating that p is invalid now.
    Console.WriteLine( *p )
    // the above code now prints garbage (if the memory has been reused by another allocation) or causes the program to crash (if it's in a memory page that has been released, an Access Violation)
}

So instead by applying fixed to the safeArray object, the pointer p will always be a valid pointer and not cause a crash or handle garbage data.

Side-note: An alternative to fixed is to use stackalloc, but that limits the object lifetime to the scope of your function.

like image 140
Dai Avatar answered Oct 27 '22 00:10

Dai


One of the primary reasons I use fixed is for interfacing with native code. Suppose you have a native function with the following signature:

double cblas_ddot(int n, double* x, int incx, double* y, int incy);

You could write an interop wrapper like this:

public static extern double cblas_ddot(int n, [In] double[] x, int incx, 
                                       [In] double[] y, int incy);

And write C# code to call it like this:

double[] x = ...
double[] y = ...
cblas_dot(n, x, 1, y, 1);

But now suppose I wanted to operate on some data in the middle of my array say starting at x[2] and y[2]. There is no way to make the call without copying the array.

double[] x = ...
double[] y = ...
cblas_dot(n, x[2], 1, y[2], 1);
             ^^^^
             this wouldn't compile

In this case fixed comes to the rescue. We can change the signature of the interop and use fixed from the caller.

public unsafe static extern double cblas_ddot(int n, [In] double* x, int incx, 
                                              [In] double* y, int incy);

double[] x = ...
double[] y = ...
fixed (double* pX = x, pY = y)
{
    cblas_dot(n, pX + 2, 1, pY + 2, 1);
}

I've also used fixed in rare cases where I need fast loops over arrays and needed to ensure the .NET array bounds checking was not happening.

like image 27
jaket Avatar answered Oct 27 '22 01:10

jaket