I have a large amount of data that I need to filter using Regex. The data consists of strings that look like this:
60166213
60173866-4533
60167323-9439-1259801
NL170-2683-1262201
60174710-1-A12-4
When I need them to look like this:
60166213
60173866-4533
60167323-9439
NL170-2683
60174710-1
How can I filter with regex to remove everything after and including the 2nd dash. The number of dashes varies, and I need to retain all strings 'as is' that do not contain more than one dash.
You can use a simple regex like this:
(.*?-.*?)-.*
Working demo
You can check on Code generator link the code for different languages. For php
that uses PCRE (Perl Comptaible RegEx) engine you can use:
$re = "/(.*?-.*?)-.*/";
$str = "60166213\n\n60173866-4533\n\n60167323-9439-1259801\n\nNL170-2683-1262201\n\n60174710-1-A12-4";
$subst = "\1";
$result = preg_replace($re, $subst, $str);
In Python:
results = [re.sub(r"^([^-]+(?:-[^-]+)?).*$", r"\1", data) for data in datum]
Explained
re.compile("""
^ # assert beginning of string
( # begin capturing group
[^-]+ # one or more non-hyphen characters
(?: # begin non-capturing group
- # literal hyphen
[^-]+ # followed by one or more non-hyphen characters
)? # end non-capturing group, and allow 1 or 0 of them
) # end capturing group
.* # match the rest of the string
$ # assert end of string""", re.X)
DEMO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With