Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for removing percentage

Tags:

regex

perl

Hi I would really appreciate some help in forming a regex that removes a percentage from the end of a string:

Film name (2009) 58%  ->  Film name (2009)
Film name (2010) 59%  ->  Film name (2010)

The string may or may not have the bracketed year. Before the bracketed year, the film name may be alphanumeric and have multiple words.

I am using 'bulk rename utility' so am looking to fill in the 'match' and 'replace' fields.

The best I could come up with was:

([A-Z][a-z]*) \((\d*)\) (\d*\%) -->  \1 (\2)

though this only seemed to work with single word film names, and lost the brackets so I had to re-add!

I've google and every time I try possible expressions it doesn't work in the 'bulk rename utility' which I believe is based on pcre (Bulk Rename Utility).

like image 937
user1709655 Avatar asked Feb 20 '23 00:02

user1709655


2 Answers

This is very simply done with

s/\s*\d+%$//

which removes a trailing string of digits followed by a percentage sign, together with any preceding space characters

use strict;
use warnings;

while (<DATA>) {
  s/\s*\d+%$//;
  print;
}

__DATA__
Film name (2009) 58%
Film name (2010) 59%

output

Film name (2009)
Film name (2010)
like image 93
Borodin Avatar answered Feb 21 '23 14:02

Borodin


To avoid replacing the wrong things do this

\b(100|\d{1,2})%\b

and replace it with nothing.

It stops at word boundaries (ie 30% is ok but w30% is not) and gets only 100 or 0-99 numbers.

EDIT:

If the % is the last char of the string you can achieve a better result in doing

\b(100|\d{1,2})%$

this way you get only the % at the end of the line avoiding to remove numbers with % from the title of the film.

If the string is a filename and you need to replace it and you can't just remove a part of the tile you can do this

(.+?)(100|[0-9]{1,2})%$ #I think using 0-9 is accepted by more languages

and replace with

$1

\1 and \2 should not be used in a replacement expression. They are regex patterns that match what the first and second capture matched. $1 and $2 are variables that contain what the first and second capture matched, so you should use those instead.

like image 38
Gabber Avatar answered Feb 21 '23 14:02

Gabber