Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant structured text file parsing

Tags:

I need to parse a transcript of a live chat conversation. My first thought on seeing the file was to throw regular expressions at the problem but I was wondering what other approaches people have used.

I put elegant in the title as i've previously found that this type of task has a danger of getting hard to maintain just relying on regular expressions.

The transcripts are being generated by www.providesupport.com and emailed to an account, I then extract a plain text transcript attachment from the email.

The reason for parsing the file is to extract the conversation text for later but also to identify visitors and operators names so that the information can be made available via a CRM.

Here is an example of a transcript file:

Chat Transcript  Visitor: Random Website Visitor  Operator: Milton Company: Initech Started: 16 Oct 2008 9:13:58 Finished: 16 Oct 2008 9:45:44  Random Website Visitor: Where do i get the cover sheet for the TPS report? * There are no operators available at the moment. If you would like to leave a message, please type it in the input field below and click "Send" button * Call accepted by operator Milton. Currently in room: Milton, Random Website Visitor. Milton: Y-- Excuse me. You-- I believe you have my stapler? Random Website Visitor: I really just need the cover sheet, okay? Milton: it's not okay because if they take my stapler then I'll, I'll, I'll set the building on fire... Random Website Visitor: oh i found it, thanks anyway. * Random Website Visitor is now off-line and may not reply. Currently in room: Milton. Milton: Well, Ok. But… that's the last straw. * Milton has left the conversation. Currently in room:  room is empty.  Visitor Details --------------- Your Name: Random Website Visitor Your Question: Where do i get the cover sheet for the TPS report? IP Address: 255.255.255.255 Host Name: 255.255.255.255 Referrer: Unknown Browser/OS: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)