Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to store/retrieve a dictionary in C#?

I have a dictionary<string, int[]> which I need to store and retrieve as efficiently as possible from the disk.

The key length (string) will typically vary from 1 to 60 characters (unicode) but could exceed that length (this is however marginal and these values could be discarded). Integers in the array will be in the range 1 to 100 Million. (Typically, 1 to 5M)

My first idea was to use a delimited format:

key [tab] int,int,int,int,...
key2 [tab] int,int,int,int,...
...

and to load the dictionary as follows:

string[] Lines = File.ReadAllLines(sIndexName).ToArray();
string[] keyValues = new string[2];
List<string> lstInts =  new List<string>();
// Skip the header line of the index file.
for (int i = 1; i < Lines.Length; i++)
{
    lstInts.Clear();
    keyValues = Lines[i].Split('\t');
    if (keyValues[1].Contains(','))
    {
        lstInts.AddRange(keyValues[1].Split(','));
    }
    else
    {
        lstInts.Add(keyValues[1]);
    }
    int[] iInts = lstInts.Select(x => int.Parse(x)).ToArray();
    Array.Sort(iInts);
    dic.Add(keyValues[0], iInts);               
}

It works, but going over the potential size requirements, it's obvious this method is never going to scale well enough.

Is there an off-the-shelf solution for this problem or do I need to rework the algorithm completely?


Edit: I am a little embarassed to admit it, but I didn't know dictionaries could just be serialized to binary. I gave it a test run and and it's pretty much what I needed.

Here is the code (suggestions welcome)

    public static void saveToFile(Dictionary<string, List<int>> dic)
{
    using (FileStream fs = new FileStream(_PATH_TO_BIN, FileMode.OpenOrCreate))
    {
        BinaryFormatter bf = new BinaryFormatter();
        bf.Serialize(fs, dic);
    }
}

public static Dictionary<string, List<int>> loadBinFile()
{
    FileStream fs = null;
    try
    {
        fs = new FileStream(_PATH_TO_BIN, FileMode.Open);
        BinaryFormatter bf = new BinaryFormatter();
        return (Dictionary<string, List<int>>)bf.Deserialize(fs);
    }
    catch
    {
        return null;
    }
}

With a dictionary of 100k entries with an array of 4k integers each, serialization takes 14 seconds and deserialization 10 seconds and the resulting file is 1.6gb.

@Patryk: Please convert your comment to an answer so I can mark it as approved.

like image 852
Sylverdrag Avatar asked Nov 02 '22 12:11

Sylverdrag


1 Answers

The Dictionary<TKey, TValue> is marked as [Serializable] (and implements ISerializable) which can be seen here.

That means you can use e.g. BinaryFormatter to perform binary serialization and deserialization to and from a stream. Say, FileStream. :)

like image 53
Patryk Ćwiek Avatar answered Nov 15 '22 05:11

Patryk Ćwiek