Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse some strange text format

Tags:

c#

parsing

I'm trying to parse some data returned by a 3rd party app (a TSV file). I have all the data neatly parsed into each fields (see Parse a TSV file), but I don't know how to format some fields.
Sometimes the data in a field is encapsulated like this:

=T("[FIELD_DATA]")

(That's some sort of Excel formatting I believe.)
When that happens, specific chars are escaped by CHAR(ASCII_NUM), and the reste of string is also encapsulated like in the above example, without the = which only appears at the beginning of the field.

So, has anyone an idea how I could parse fields that look like this:

=T("- Merge User Interface of Global Xtra Alert and EMT Alert")&CHAR(10)&T("- Toaster ?!")&CHAR(10)&T("")&CHAR(10)&T("")&CHAR(10)&T("None")&CHAR(10)&T("")&CHAR(10)&T("None")

(any number of CHAR/T() groups).

I have been thinking of regex or looping the string, but I doubt the validity of this. Help, anyone?

like image 950
Antoine Avatar asked Feb 20 '26 16:02

Antoine


1 Answers

I would go similarly to Darin, but his regex wasn't working for me. I would use this one:

(=T|&CHAR|&T)(\("*([A-Za-z?!0-9 -]*)"*\))+

You'll find that Groups[2] (remember zero offset on those) will be the data inside of the () and "" if the "" exist. For example this will find:

- Merge User Interface of Global Xtra Alert and EMT Alert

in:

=T("- Merge User Interface of Global Xtra Alert and EMT Alert")

and:

10

in:

&CHAR(10)

If you have:

&T("")

it will produce a null in Groups[2].

Hope this helps.

like image 186
Tim C Avatar answered Feb 23 '26 06:02

Tim C



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!