I'm writing a program that will simply read 2 different .csv files containing following information: <pre class="prettyprint"><code>file 1 file2 AA,2.34 BA,6.45 AB,1.46 BB,5.45 AC,9.69 BC,6.21 AD,3.6 AC,7.56 </code></pre> Where first column is <code>string</code>, second is <code>double</code>. So far I have no difficulty in reading those files and placing values to the List: <pre class="prettyprint"><code>firstFile = new List<KeyValuePair<string, double>>(); secondFile = new List<KeyValuePair<string, double>>(); </code></pre> I'm trying to instruct my program: <ul> <li>to take first value from the first column from the first row of the first file (in this case <code>AA</code>) </li> <li>and look if there might be a match in the entire first column in the second file. </li> <li>If string match is found, compare their corresponding second values (<code>double</code> in this case), and if in this case match found, add the entire row to the separate <code>List</code>. </li> </ul> Something similar to the below pseudo-code: <pre class="prettyprint"><code>for(var i=0;i<firstFile.Count;i++) { firstFile.Column[0].value[i].SearchMatchesInAnotherFile(secondFile.Column[0].values.All); if(MatchFound) { CompareCorrespondingDoubles(); if(true) { AddFirstValueToList(); } } } </code></pre> Instead of <code>List</code> I tried to use <code>Dictionary</code> but this data structure is not sorted and no way to access the key by the index. I'm not asking for the exact code to provide, rather the question is: <blockquote> What would you suggest to use as an appropriate data structure for this program so that I can investigate myself further? </blockquote>

<code>KeyValuePair</code> is actually only used for <code>Dictionary</code>s. I suggest to create your own custom type: <pre class="prettyprint"><code>public class MyRow { public string StringValue {get;set;} public double DoubleValue {get;set;} public override bool Equals(object o) { MyRow r = o as MyRow; if (ReferenceEquals(r, null)) return false; return r.StringValue == this.StringValue && r.DoubleValue == this.DoubleValue; } public override int GetHashCode() { unchecked { return StringValue.GetHashCode ^ r.DoubleValue.GetHashCode(); } } } </code></pre> And store the files in lists of this type: <pre class="prettyprint"><code>List<MyRow> firstFile = ... List<MyRow> secondFile = ... </code></pre> Then you can determine the intersection (all elements that occure in both lists) via LINQ's <code>Intersect</code> method: <pre class="prettyprint"><code>var result = firstFile.Intersect(secondFile).ToList(); </code></pre> It's necessary to override <code>Equals</code> and <code>GetHashCode</code>, because otherwise <code>Intersect</code> would only make a reference comparison. Alternativly you could implement an own <code>IEqualityComparer<MyRow, MyRow></code> that does the comparison and pass it to the appropriate <code>Intersect</code> overload, too. <hr> But if you can ensure that the keys (the string values are unique), you can also use a <pre class="prettyprint"><code>Dictionary<string, double> firstFile = ... Dictionary<string, double> secondFile = ... </code></pre> And in this case use this LINQ statement: <pre class="prettyprint"><code>var result = new Dictionary<string, double>( firstFile.Select(x => new { First = x, Second = secondFile.FirstOrDefault(y => x.Key == y.Key) }) .Where(x => x.Second?.Value == x.First.Value)); </code></pre> which had a time complexity of O(m+n) while the upper solution would be O(m*n) (for m and n being the row counts of the two files).

c# Appropriate data structure for storing values from csv file. Specific Case

Tags:

c#

data-structures

csv

I'm writing a program that will simply read 2 different .csv files containing following information:

file 1                  file2
AA,2.34                BA,6.45
AB,1.46                BB,5.45
AC,9.69                BC,6.21
AD,3.6                 AC,7.56

Where first column is string, second is double.

So far I have no difficulty in reading those files and placing values to the List:

firstFile = new List<KeyValuePair<string, double>>();
secondFile = new List<KeyValuePair<string, double>>();

I'm trying to instruct my program:

to take first value from the first column from the first row of the first file (in this case AA)
and look if there might be a match in the entire first column in the second file.
If string match is found, compare their corresponding second values (double in this case), and if in this case match found, add the entire row to the separate List.

Something similar to the below pseudo-code:

for(var i=0;i<firstFile.Count;i++)
{
    firstFile.Column[0].value[i].SearchMatchesInAnotherFile(secondFile.Column[0].values.All);
    if(MatchFound)
    {
        CompareCorrespondingDoubles();
        if(true)
        {
            AddFirstValueToList();
        }
    }
}

Instead of List I tried to use Dictionary but this data structure is not sorted and no way to access the key by the index.

I'm not asking for the exact code to provide, rather the question is:

What would you suggest to use as an appropriate data structure for this program so that I can investigate myself further?

955

asked Apr 24 '18 13:04

TiredOfProgramming

1 Answers

KeyValuePair is actually only used for Dictionarys. I suggest to create your own custom type:

public class MyRow
{
    public string StringValue {get;set;}
    public double DoubleValue {get;set;}

    public override bool Equals(object o)
    {
         MyRow r = o as MyRow;
         if (ReferenceEquals(r, null)) return false;
         return r.StringValue == this.StringValue && r.DoubleValue == this.DoubleValue;
    }
    public override int GetHashCode()
    {
        unchecked { return StringValue.GetHashCode ^ r.DoubleValue.GetHashCode(); }
    }
}

And store the files in lists of this type:

List<MyRow> firstFile = ...
List<MyRow> secondFile = ...

Then you can determine the intersection (all elements that occure in both lists) via LINQ's Intersect method:

var result = firstFile.Intersect(secondFile).ToList();

It's necessary to override Equals and GetHashCode, because otherwise Intersect would only make a reference comparison. Alternativly you could implement an own IEqualityComparer<MyRow, MyRow> that does the comparison and pass it to the appropriate Intersect overload, too.

But if you can ensure that the keys (the string values are unique), you can also use a

Dictionary<string, double> firstFile = ...    
Dictionary<string, double> secondFile = ...

And in this case use this LINQ statement:

var result = new Dictionary<string, double>(
          firstFile.Select(x => new { First = x, Second = secondFile.FirstOrDefault(y => x.Key == y.Key) })
                   .Where(x => x.Second?.Value == x.First.Value));

which had a time complexity of O(m+n) while the upper solution would be O(m*n) (for m and n being the row counts of the two files).

179

answered Sep 22 '22 06:09

René Vogt

Related questions
                            
                                I can use EF Core 2.0 in production applications?
                            
                                What is the most optimal way to use a C# struct as the key of a dictionary?
                            
                                How to Detect Arabic or Persian character in string by c#?
                            
                                how to use if condition with string inside the condition using c# [closed]
                            
                                How to PATCH an Aggregate Root
                            
                                Get data from ajax to mvc action
                            
                                C# for scripting (csx) location of script file
                            
                                Unit test with static value
                            
                                Unable to start debugging on the web server. Operation not supported. Unknown error 0x800040005
                            
                                How to lock objects withthe same ids?
                            
                                How can I inject dependencies into a custom ILogger in asp.net core 2.0?
                            
                                Why use specific exception catch blocks
                            
                                C# Moq how to get all method invocations
                            
                                How can I simplify this long series of if statements?
                            
                                How to use PEM certificate in Kestrel directly?
                            
                                Warning on dependencies when create a new standard .NET project
                            
                                Identityserver4 and API in single project
                            
                                How to inject dependency into MVVM View Model class?
                            
                                Linq OrderBy to group objects with the same sets
                            
                                Custom print of object in C# Interactive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With