I am trying to extract words [a-zA-Z]+
with one constraint: a word must contain at least one lower case letter AND at least one upper case letter (in any position within the word). Example: if input is hello 123 worLD
, the only match should be worLD
.
I tried to use positive lookaheads like this:
echo "hello 123 worLD" | grep -oP "(?=.*[a-z])(?=.*[A-Z])[a-zA-Z]+"
hello
This is not correct: the only match is hello
instead of worLD
. Then I tried this:
echo "hello 123 worLD" | grep -oP "\K((?=.*[a-z])(?=.*[A-Z])[a-zA-Z]+)"
hello
worLD
This is still incorrect: hello
should not be matched.
The .*
in the lookaheads checks for the letter presence not only in the adjacent word, but later in the string. Use [a-zA-Z]*
:
echo "hello 123 worLD" | grep -oP "\\b(?=[A-Za-z]*[a-z])(?=[A-Za-z]*[A-Z])[a-zA-Z]+"
See the demo online
I also added a word boundary \b
at the start so that the lookahead check was only performed after a word boundary.
Answer:
echo "hello 123 worLD" | grep -oP "\b(?=[A-Z]+[a-z]|[a-z]+[A-Z])[a-zA-Z]*"
Demo: https://ideone.com/HjLH5o
Explanation:
First check if word starts with one or more uppercase letters followed by one lowercase letters or vice versa followed by any number of lowercase and uppercase letters in any order.
Performance:
This solution takes 31 steps to reach the match on the provided test string, while the accepted solution takes 47 steps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With