Unfortunately, I'm not a regex expert, so I need a little help.
I'm looking for the solution how to grep an array of strings to get two lists of strings which do not start (1) or end (2) with the specific substring.
Let's assume we have an array with strings matching to the following rule:
[speakerId]-[phrase]-[id].txt
i.e.
10-phraseone-10.txt 11-phraseone-3.txt 1-phraseone-2.txt 2-phraseone-1.txt 3-phraseone-1.txt 4-phraseone-1.txt 5-phraseone-3.txt 6-phraseone-2.txt 7-phraseone-2.txt 8-phraseone-10.txt 9-phraseone-2.txt 10-phrasetwo-1.txt 11-phrasetwo-1.txt 1-phrasetwo-1.txt 2-phrasetwo-1.txt 3-phrasetwo-1.txt 4-phrasetwo-1.txt 5-phrasetwo-1.txt 6-phrasetwo-3.txt 7-phrasetwo-10.txt 8-phrasetwo-1.txt 9-phrasetwo-1.txt 10-phrasethree-10.txt 11-phrasethree-3.txt 1-phrasethree-1.txt 2-phrasethree-11.txt 3-phrasethree-1.txt 4-phrasethree-3.txt 5-phrasethree-1.txt 6-phrasethree-3.txt 7-phrasethree-1.txt 8-phrasethree-1.txt 9-phrasethree-1.txt
Let's introduce variables:
$speakerId$phrase$id1, $id2I would like to grep a list and obtain an array:
with elements which contain specific $phrase but we exclude those strigns which simultaneously start with specific $speakerId AND end with one of specified id's (for instance $id1 or $id2)
with elements which have specific $speakerId and $phrase but do NOT contain one of specific ids at the end (warning: remember to not exclude the 10 or 11 for $id=1 , etc.)
Maybe someone coulde use the following code to write the solution:
@AllEntries = readdir(INPUTDIR);
@Result1 = grep(/blablablahere/, @AllEntries);
@Result2 = grep(/anotherblablabla/, @AllEntries);
closedir(INPUTDIR);
Assuming a basic pattern to match your example:
(?:^|\b)(\d+)-(\w+)-(?!1|2)(\d+)\.txt(?:\b|$)
Which breaks down as:
(?:^|\b) # starts with a new line or a word delimeter
(\d+)- # speakerid and a hyphen
(\w+)- # phrase and a hyphen
(\d+) # id
\.txt # file extension
(?:\b|$) # end of line or word delimeter
You can assert exclusions using negative look-ahead. For instance, to include all matches that do not have the phrase phrasetwo you can modify the above expression to use a negative look-ahead:
(?:^|\b)(\d+)-(?!phrasetwo)(\w+)-(\d+)\.txt(?:\b|$)
Note how I include (?!phrasetwo). Alternatively, you find all phrasethree entries that end in an even number by using a look-behind instead of a look-ahead:
(?:^|\b)(\d+)-phrasethree-(\d+)(?<![13579])\.txt(?:\b|$)
(?<![13579]) just makes sure the last number of the ID falls on an even number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With