I have a sample input file as follows, with columns Id, Name, start date, end date, Age, Description, and Location:
220;John;23/11/2008;22/12/2008;28;Working as a professor in University;Hyderabad
221;Paul;30;23/11/2008;22/12/2008;He is a software engineer at MNC;Bangalore
222;Emma;23/11/2008;22/12/200825;Working as a mechanical engineer;Chennai
It contains 30 lines of data. My requirement is to only extract descriptions from the above text file.
My output should contain
Working as a professor in University
He is a software engineer at MNC
working as a mechanical engineer
I need to find a regular expression to extract the Description, and have tried many kinds, but I haven't been able to find the solution. How can I do it?
You can use this regex:
[^;]+(?=;[^;]*$)
[^;]
matches any character except ;
+
is a quantifier that matches the preceding character or group one to many times
*
is a quantifier that matches the preceding character or group zero to many times
$
is the end of the string
(?=pattern)
is a lookahead which checks if a particular pattern occurs ahead
/^(?:[^;]+;){3}([^;]+)/
will grab the fourth group between semicolons.
Although as stated in my comment, you should just split the string by semicolon and grab the fourth element of the split...that's the whole point of a delimited file - you don't need complex pattern matching.
Example implementation in Perl using your input example:
open(my $IN, "<input.txt") or die $!;
while(<$IN>){
(my $desc) = $_ =~ /^(?:[^;]+;){3}([^;]+)/;
print "'$desc'\n";
}
close $IN;
yields:
'Working as a professor in University'
'He is a software engineer at MNC'
'Working as a mechanical engineer'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With