Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulating lines of data

I have millions of lines generated from data updated every second which look like this:

104500 4783
104501 8930
104502 21794
104503 21927
104505 5746
104506 9968
104509 5867
104510 46353
104511 7767
104512 4903

The column on the left represents time (hhmmss format), and the column on the right is data which is updated second-by-second. As you can see however, it isn't actually second-by-second, and there are some missing times (10:45:04, 10:45:07, 10:45:08 are missing in this example). My goal is to add in the missing seconds, and to use the data from the previous second for that missing second, like this:

104500 4783
104501 8930
104502 21794
104503 21927
104504 21927 --
104505 5746
104506 9968
104507 9968 --
104508 9968 --
104509 5867
104510 46353
104511 7767
104512 4903

I don't want the "--" in the result, I just put those there to mark the added lines. So far I've tried to accomplish this using StreamReader and StreamWriter, but it doesn't seem like they're going to get me what I want. I'm a newbie programmer and a newbie to C#, so if you could just point me in the right direction, that would be great. I'm really just wondering if this is even possible to do in C#...I've spent a lot of time on MSDN and here on SO looking for a solution to this, but so far haven't found any.

Edit: The lines are in a text file, and I want to store the newly created data in a new text file.

like image 789
Brian Snow Avatar asked Dec 20 '11 16:12

Brian Snow


People also ask

What is manipulating of data?

Data manipulation refers to the process of adjusting data to make it organised and easier to read. Data manipulation language, or DML, is a programming language that adjusts data by inserting, deleting and modifying data in a database such as to cleanse or map the data.

What is manipulation in data structure?

Data manipulation is the method of organizing data to make it easier to read or more designed or structured. For instance, a collection of any kind of data could be organized in alphabetical order so that it can be understood easily.


2 Answers

There are a few things you need to put together.

  1. Read a file line-by-line: See here: Reading a Text File One Line at a Time
  2. Writing a file line-by-line : StreamWriter.WriteLine
  3. Keep track of the last read line. (Just use a variable in your while loop where you read the lines)
  4. Check whether there is a gap. Maybe by parsing the first column (string.Split) using TimeSpan.Parse. If there is a gap then write the last read line, incrementing the timespan.
like image 59
George Duckett Avatar answered Sep 29 '22 23:09

George Duckett


ok, here is the whole shooting match, tested and working against your test data:

public void InjectMissingData()
{
    DataLine lastDataLine = null;
    using (var writer = new StreamWriter(File.Create("c:\\temp\\out.txt")))
    {
        using (var reader = new StreamReader("c:\\temp\\in.txt"))
        {
            while (!reader.EndOfStream)
            {
                var dataLine = DataLine.Parse(reader.ReadLine());

                while (lastDataLine != null && dataLine.Occurence - lastDataLine.Occurence > TimeSpan.FromSeconds(1))
                {
                    lastDataLine = new DataLine(lastDataLine.Occurence + TimeSpan.FromSeconds(1), lastDataLine.Data);
                    writer.WriteLine(lastDataLine.Line);
                }

                writer.WriteLine(dataLine.Line);

                lastDataLine = dataLine;
            }
        }
    }
}

public class DataLine
{
    public static DataLine Parse(string line)
    {
        var timeString = string.Format("{0}:{1}:{2}", line.Substring(0, 2), line.Substring(2, 2),
                                       line.Substring(4, 2));

        return new DataLine(TimeSpan.Parse(timeString), long.Parse(line.Substring(7, line.Length - 7).Trim()));
    } 

    public DataLine(TimeSpan occurence, long data)
    {
        Occurence = occurence;
        Data = data;
    }

    public TimeSpan Occurence { get; private set; }
    public long Data { get; private set; }

    public string Line
    {
        get { return string.Format("{0}{1}{2} {3}", 
            Occurence.Hours.ToString().PadLeft(2, Char.Parse("0")), 
            Occurence.Minutes.ToString().PadLeft(2, Char.Parse("0")), 
            Occurence.Seconds.ToString().PadLeft(2, Char.Parse("0")),
            Data); }
    }
}
like image 36
Myles McDonnell Avatar answered Sep 30 '22 01:09

Myles McDonnell