Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# Appropriate data structure for storing values from csv file. Specific Case

I'm writing a program that will simply read 2 different .csv files containing following information:

file 1                  file2
AA,2.34                BA,6.45
AB,1.46                BB,5.45
AC,9.69                BC,6.21
AD,3.6                 AC,7.56

Where first column is string, second is double.

So far I have no difficulty in reading those files and placing values to the List:

firstFile = new List<KeyValuePair<string, double>>();
secondFile = new List<KeyValuePair<string, double>>();

I'm trying to instruct my program:

  • to take first value from the first column from the first row of the first file (in this case AA)
  • and look if there might be a match in the entire first column in the second file.
  • If string match is found, compare their corresponding second values (double in this case), and if in this case match found, add the entire row to the separate List.

Something similar to the below pseudo-code:

for(var i=0;i<firstFile.Count;i++)
{
    firstFile.Column[0].value[i].SearchMatchesInAnotherFile(secondFile.Column[0].values.All);
    if(MatchFound)
    {
        CompareCorrespondingDoubles();
        if(true)
        {
            AddFirstValueToList();
        }
    }
}

Instead of List I tried to use Dictionary but this data structure is not sorted and no way to access the key by the index.

I'm not asking for the exact code to provide, rather the question is:

What would you suggest to use as an appropriate data structure for this program so that I can investigate myself further?

like image 955
TiredOfProgramming Avatar asked Apr 24 '18 13:04

TiredOfProgramming


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C full form?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

What is C in C language?

C is an imperative procedural language supporting structured programming, lexical variable scope, and recursion, with a static type system. It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support.


1 Answers

KeyValuePair is actually only used for Dictionarys. I suggest to create your own custom type:

public class MyRow
{
    public string StringValue {get;set;}
    public double DoubleValue {get;set;}

    public override bool Equals(object o)
    {
         MyRow r = o as MyRow;
         if (ReferenceEquals(r, null)) return false;
         return r.StringValue == this.StringValue && r.DoubleValue == this.DoubleValue;
    }
    public override int GetHashCode()
    {
        unchecked { return StringValue.GetHashCode ^ r.DoubleValue.GetHashCode(); }
    }
}

And store the files in lists of this type:

List<MyRow> firstFile = ...
List<MyRow> secondFile = ...

Then you can determine the intersection (all elements that occure in both lists) via LINQ's Intersect method:

var result = firstFile.Intersect(secondFile).ToList();

It's necessary to override Equals and GetHashCode, because otherwise Intersect would only make a reference comparison. Alternativly you could implement an own IEqualityComparer<MyRow, MyRow> that does the comparison and pass it to the appropriate Intersect overload, too.


But if you can ensure that the keys (the string values are unique), you can also use a

Dictionary<string, double> firstFile = ...    
Dictionary<string, double> secondFile = ...

And in this case use this LINQ statement:

var result = new Dictionary<string, double>(
          firstFile.Select(x => new { First = x, Second = secondFile.FirstOrDefault(y => x.Key == y.Key) })
                   .Where(x => x.Second?.Value == x.First.Value));

which had a time complexity of O(m+n) while the upper solution would be O(m*n) (for m and n being the row counts of the two files).

like image 179
René Vogt Avatar answered Sep 22 '22 06:09

René Vogt