Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - text file with matrix - divide all entries

Tags:

arrays

c#

matrix

I have a text file with a 1122 x 1122 matrix of precipitation measurements. Each measurement is represented with 4 decimal digits. Example lines look like this:

0.0234 0.0023 0.0123 0.3223 0.1234 0.0032 0.1236 0.0000 ....

(and this 1122 values long and 1122 lines down.

I need this same text file, but with all values divided by 6. (and I have to do this for 920 files like that....)

I managed to do this, but in a no doubt atrociously ineffective and memory exhaustive way:

  1. I open the textfiles one by one and read each text file line by line
  2. I split each line into a string array with the separate values as members
  3. I go through the array, converting each value to double, divide by 6 and convert the result back to string, formatted with 4 decimal digits and store as member in a new string array.
  4. I join the array back to a line
  5. I write this line to a new text file.
  6. Voila (after an hour or so...) I have my 920 new text files.

I am sure there is a much faster and professional way to do this. I have looked at endless sites about Matrix.Divide but don't see (or understand) a solution there for this problem. Any help will be appreciated! This is a code snippet as used for each file:



    foreach (string inputline in inputfile)
    {
        int count = 0;
        string[] str_precip = inputline.Split(' ');  // holds string measurements
        string[] str_divided_precip = new string[str_precip.Length]; // will hold string measurements divided by divider (6)
        foreach (string measurements in str_precip)
        {
            str_divided_precip[count] = ((Convert.ToDouble(measurements)) / 6).ToString("F4", CultureInfo.CreateSpecificCulture("en-US"));
            count++;
        }
        string divline = string.Join(" ", str_divided_precip);
        using (System.IO.StreamWriter newfile = new System.IO.StreamWriter(@"asc_files\divfile.txt", true))
        {
            newfile.WriteLine(divline);
        }
    } 

like image 832
Yossi Beck Avatar asked Oct 09 '16 15:10

Yossi Beck


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.


2 Answers

Assuming the files are well-formed, you should essentially be able to process them a character at a time without needing to create any arrays or do any complicated string parsing.

This snippet shows the general approach:

string s = "12.4567 0.1234\n"; // just an example
decimal d = 0;
foreach (char c in s)
{
    if (char.IsDigit(c))
    {
        d *= 10;
        d += c - '0';
    }
    else if (c == ' ' || c == '\n')
    {
        d /= 60000; // divide by 10000 to get 4dps; divide by 6 here too
        Console.Write(d.ToString("F4"));
        Console.Write(c);
        d = 0;
    }
    else {
        // no special processing needed as long as input file always has 4dp
        Debug.Assert(c == '.');
    }
}

Clearly you would be writing to a (buffered) file stream instead of the console.

You could probably roll your own faster version of ToString("F4") but I doubt it would make a significant difference to the timings. But if you can avoid creating a new array for each line of the input file by using this approach, I'd expect it to make a substantial difference. (In contrast, one array per file as a buffered writer is worthwhile, especially if it is declared big enough from the start.)

Edit (by Sani Singh Huttunen)
Sorry for editing your post but you are absolutely correct about this.
Fixed point arithmetics will provide a significant improvement in this case.

After introducing StreamReader (~10% improvement), float (another ~35% improvement) and other improvements (yet another ~20% improvement) (see comments) this approach takes ~12 minutes (system specs in my answer):

public void DivideMatrixByScalarFixedPoint(string inputFilname, string outputFilename)
{
    using (var inFile = new StreamReader(inputFilname))
    using (var outFile = new StreamWriter(outputFilename))
    {
        var d = 0;

        while (!inFile.EndOfStream)
        {
            var c = (char) inFile.Read();
            if (c >= '0' && c <= '9')
            {
                d = (d * 10) + (c - '0');
            }
            else if (c == ' ' || c == '\n')
            {
                // divide by 10000 to get 4dps; divide by 6 here too
                outFile.Write((d / 60000f).ToString("F4", CultureInfo.InvariantCulture.NumberFormat));
                outFile.Write(c);
                d = 0;
            }
        }
    }
}
like image 152
Matthew Strawbridge Avatar answered Sep 18 '22 23:09

Matthew Strawbridge


You open/close the output for every value, I think we can do better! Just replace it with this code:

using (System.IO.StreamWriter newfile = new System.IO.StreamWriter(@"asc_files\divfile.txt", true))
{
    foreach (string inputline in inputfile)
    {
        int count = 0;
        foreach (string measurements in inputline.Split(' '))
        {
            newfile.Write((Convert.ToDouble(measurements) / 6).ToString("F4", CultureInfo.CreateSpecificCulture("en-US")));
            if (++count < 1122)
            {
                newfile.Write(" ");
            }
        }

        newfile.WriteLine();
    }
} 

For the reading part, you may want to read one line at a time with ReadLine() instead of reading the whole file in a huge block and then splitting it in-memory. This streaming approach will greatly reduce memory allocation and based on hardware (how much memory you have, how fast your disks (HDD? SSD?) are) may enhance performance in a sensible way!

Let me please know how it works now, I'm very curious!

like image 33
pid Avatar answered Sep 19 '22 23:09

pid