Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C data structure to mimic C#'s List<List<int>>?

I am looking to refactor a c# method into a c function in an attempt to gain some speed, and then call the c dll in c# to allow my program to use the functionality.

Currently the c# method takes a list of integers and returns a list of lists of integers. The method calculated the power set of the integers so an input of 3 ints would produce the following output (at this stage the values of the ints is not important as it is used as an internal weighting value)

1
2
3
1,2
1,3
2,3
1,2,3

Where each line represents a list of integers. The output indicates the index (with an offset of 1) of the first list, not the value. So 1,2 indicates that the element at index 0 and 1 are an element of the power set.

I am unfamiliar with c, so what are my best options for data structures that will allow the c# to access the returned data?

Thanks in advance

Update

Thank you all for your comments so far. Here is a bit of a background to the nature of the problem.

The iterative method for calculating the power set of a set is fairly straight forward. Two loops and a bit of bit manipulation is all there is to it really. It just get called..a lot (in fact billions of times if the size of the set is big enough).

My thoughs around using c (c++ as people have pointed out) are that it gives more scope for performance tuning. A direct port may not offer any increase, but it opens the way for more involved methods to get a bit more speed out of it. Even a small increase per iteration would equate to a measurable increase.

My idea was to port a direct version and then work to increase it. And then refactor it over time (with help from everyone here at SO).

Update 2

Another fair point from jalf, I dont have to use list or equivelent. If there is a better way then I am open to suggestions. The only reason for the list was that each set of results is not the same size.

The code so far...

public List<List<int>> powerset(List<int> currentGroupList)
{
    _currentGroupList = currentGroupList;
    int max;
    int count;

    //Count the objects in the group
    count = _currentGroupList.Count;
    max = (int)Math.Pow(2, count);

    //outer loop
    for (int i = 0; i < max; i++)
    {
        _currentSet = new List<int>();

        //inner loop
        for (int j = 0; j < count; j++)
        {              
            if ((i & (1 << j)) == 0)
            {
                _currentSetList.Add(_currentGroupList.ElementAt(j));                          
            }
        }
        outputList.Add(_currentSetList);
    }   
    return outputList;
}

As you can see, not a lot to it. It just goes round and round a lot!

I accept that the creating and building of lists may not be the most efficient way, but I need some way of providing the results back in a manageable way.

Update 2

Thanks for all the input and implementation work. Just to clarify a couple of points raised: I dont need the output to be in 'natural order', and also I am not that interested in the empty set being returned.

hughdbrown's implementation is intesting but i think that i will need to store the results (or at least a subset of them) at some point. It sounds like memory limitiations will apply long before running time becomes a real issue. Partly because of this, I think I can get away with using bytes instead of integers, giving more potential storage.

The question really is then: Have we reached the maximum speed for this calcualtion in C#? Does the option of unmanaged code provide any more scope. I know in many respects the answer is futile, as even if we havled the time to run, it would only allow an extra values in the original set.

like image 698
jheppinstall Avatar asked Dec 05 '08 12:12

jheppinstall


1 Answers

Also, be sure that moving to C/C++ is really what you need to do for speed to begin with. Instrument the original C# method (standalone, executed via unit tests), instrument the new C/C++ method (again, standalone via unit tests) and see what the real world difference is.

The reason I bring this up is that I fear it may be a pyrhhic victory -- using Smokey Bacon's advice, you get your list class, you're in "faster" C++, but there's still a cost to calling that DLL: Bouncing out of the runtime with P/Invoke or COM interop carries a fairly substantial performance cost.

Be sure you're getting your "money's worth" out of that jump before you do it.

Update based on the OP's Update

If you're calling this loop repeatedly, you need to absolutely make sure that the entire loop logic is encapsulated in a single interop call -- otherwise the overhead of marshalling (as others here have mentioned) will definitely kill you.

I do think, given the description of the problem, that the issue isn't that C#/.NET is "slower" than C, but more likely that the code needs to be optimized. As another poster here mentioned, you can use pointers in C# to seriously boost performance in this kind of loop, without the need for marshalling. I'd look into that first, before jumping into a complex interop world, for this scenario.

like image 76
John Rudy Avatar answered Sep 21 '22 23:09

John Rudy