I have this log from Exchange server
2010-05-20T01:53:33.097Z,12.10.53.144,,12.10.53.200,EXHUB-10,08CCC3F50C35F2D2;2010-05-20T01:53:32.128Z;0,EXHUB-10\Default EXHUB-10,SMTP,RECEIVE,829888,,[email protected],,521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",[email protected],<>,00A:
and i used this regex to match and group the pattern;
(\d{4}-\d{2}-\d{2})(?:[\w\s]+)(\d+:\d+:\d+.\d+)(?:[\w+\d.]*),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(['"].*['"]|.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(?:(\d{4}-\d{2}-\d{2}\w\d{2}:\d{2}:\d{2}.\d+)(?:\w+)*)*(.*)
Basically, the information in the log is separated by the comma.
Unfortunately, for the 'email subject' field, if the user enter the comma, the log will appear in double quote such as the above example - comma in the date format "Monday May 24, 2010"
.....521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",[email protected],.....
How can i grab the whole subject together with the comma without the double quote in the specific group(19th group)
You mention:
Basically, the information in the log is separated by the comma...also if a comma is part of the field the field will be double quoted.
which makes it a CSV file. Parsing a CSV file is a solved problem and you need not reinvent the wheel. Use a CSV parser provided by your language library.
If you are using Perl take a look at the Text::CSV module.
The line you gave seems to be in a CSV format. Why not parse it using a CSV parser, such as:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With