Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing extra characters from Datetime

Tags:

c#

.net

regex

Hi I have following code that reads the date from a file.

using (var reader = new StreamReader(@"C:\myfile.txt")) 
{
    bool found= false;
    while (!reader.EndOfStream) 
  {
        var line = reader.ReadLine().Trim();

        if (found && line.EndsWith("Test")) 
        {
            var fordDate = DateTime.Parse(line.Substring(0, 19));
            Console.WriteLine("Test Date: {0}", fordDate);
            break;
        }
   }
 }

Problem is that it gives error when date has some other text connected with it. For example

\r\n2013-03-03 12:22:02 

I am trying to change it so that code can remove "\r\n" or any other text from it and just get the date part.

like image 759
J. Davidson Avatar asked Oct 05 '22 19:10

J. Davidson


2 Answers

You should use regular expressions

If your dates are always of the same format, you can easily write a regular expression that will extract dates from individual lines and strip anything else on each side. For the purpose of understanding regular expression should look like this:

\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}

This regular expression is too simplified and allows dates like 0000-00-00 99:99:99 which is likely invalid. It depends whether your file can hold some values that may apear as dates, but are not. A more complex (but more valid) expression would be (assuming that date is YYYY-MM-DD and not YYYY-DD-MM):

[12]\d{3}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])\s(?:[01]\d|2[0-3]):(?:[0-5]\d):(?:[0-5]\d)

This one will allow dates from year 1000 to 2999 with correct month numbers from 01-12 and days from 01-31 and hours from 00:00:00 to 23:59:59.

But to make this regular expression more useful I'll put it in parentheses and give it a name so these dates will become part of a named capture group (date) in your code that you can access using its name rather than index.

Regex rx = "(?<date>[12]\d{3}-(?:0\d|1[0-2])-(?:0[1-9]|[12]\d|3[01])\s(?:[01]\d|2[0-3]):(?:[0-5]\d):(?:[0-5]\d)).*Test$";
if (rx.Text(line))
{
    Match m = rx.Match(line);
    // no need to use TryParse as regex assures correct formatting
    fordDate = DateTime.Parse(m.Groups["date"]);
}

So instead of checking manually that line ends with Test I've also included the same requirement in the regular expression.

like image 188
Robert Koritnik Avatar answered Oct 11 '22 11:10

Robert Koritnik


Use this code to replace symbols you need:

string lineAfterReplace = line.Replace("\t", "").Replace("\r", "").Replace("\n", "");

@J. Davidson - it may be also better for you to use TryParse MSDN Link

than you have some kind of this code:

if (DateTime.TryParse(dateString, out dateValue))
{
   /* it was parsed without errors */
}
like image 27
MikroDel Avatar answered Oct 11 '22 13:10

MikroDel