Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hashset, dictionary, arraylist: can`t see the forest for the trees

About: My mathematical program will have a giant collection of items to iterate through. It will mainly consist of an item and a pointer to an other item ((int)item,(int) pointer), resembling a key value pair. However, each item on his own will have several other attributes like this:

(item, pointer),(attribute, attribute, attribute, ...)

Items, pointers and attributes will be added frequently.

Searching this site and others has made me only more clueless about what collection to use for the program.

At this moment a Dictionary solution found on vcskicks.com seems to work best:

Dictionary<Dictionary<int,int>,Dictionary<int,int> nestedDictionary = 
    new Dictionary<Dictionary<int,int>,Dictionary<int,int> nestedDictionary();

or in plain language:

Dictionary<Dictionary<item, pointer>, 
           Dictionary<attribute,attribute, ...> nestedDictionary =      
    Dictionary<Dictionary<item, pointer>, 
               Dictionary<attribute,attribute, ...>();

Please note that the number of attributes are not predefined, it varies in length. Also at this moment Im reluctant to use objects because of the performance overhead.

Hashsets don`t seem to fit in because duplicate items will exist, however they will have different attributes. Or can a hashset have duplicate items but just not duplicate hashkeys? There seem to be some confusion.

According to some the following hashset will not compute:

11011, 0001
11011, 0011

According to others it will, because it will have a different hashkey. It leaves me puzzled.

My question:

At risk of being to vague: What is the best collection type to use? Ill be happy to add more to the story if necessary.

Edits:

Giant means: potentially millions of items. All items will have a pointer and attributes. General use will be searching for a particular item, retrieving the pointer, getting the item of the next pointer until there is no pointer left. At the same time all attributes for each item is collected. Adding will be done on a regular basis, removing only occasionally. Pointer: the pointer is an index of the item it refers to. If you have 2 items, and the first is linked to the second then the first item will have the index of the second item as a pointer. Best is defined as in memory usage and speed. At the end all found items will be compared to each other. Example:

[Item , pointer] [attribute, attribute, ...]
[11011,    1001] [ 1101,        1111 ]
[10001,    1000] [ 1110,        0101 ]
[11111,    0010] [ 1111,        1110 ]
[11011,    0001] [ 0010,        1010 ]

Thanks

like image 989
user2257315 Avatar asked Apr 08 '13 11:04

user2257315


1 Answers

So, basically it seems like you need to keep a collection of objects, each of which have the following properties:

  • Reference to the logical next item
  • Attributes

So an item would look something like this (just a quick example... not exactly best practice to keep everything public, but in your case you won't mind):

public struct MyItem
{
   public Dictionary<String, String> attributes;
   public MyItem next;
}

Then all you need is to keep a list:

List<MyItem> myList;

When you want to add something, it's easy:

MyItem item1 = new MyItem();
item1.attributes["name"] = "Joe";
item1.next = null; // this is the default behaviour... just illustrating here
myList.Add(item1);

MyItem item2 = new MyItem();
item2.attributes["name"] = "Mary";
item2.next = item1;
myList.Add(item2);

Then, when you want to traverse, just follow next.

MyItem item = myList[0];
while (item != NULL)
{
    Console.WriteLine(item["name"]);
    item = item.next;
}

Hope this helps.

like image 64
Gigi Avatar answered Oct 01 '22 03:10

Gigi