Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning all characters before the first underscore

Using re in Python, I would like to return all of the characters in a string that precede the first appearance of an underscore. In addition, I would like the string that is being returned to be in all uppercase and without any non-alpanumeric characters.

For example:

AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1

I am pretty sure I know how to return a string in all uppercase using string.upper() but I'm sure there are several ways to remove the . efficiently. Any help would be greatly appreciated. I am still learning regular expressions slowly but surely. Each tip gets added to my notes for future use.

To further clarify, my above examples aren't the actual strings. The actual string would look like:

AG.av08_binloop_v6

With my desired output looking like:

AGAV08

And the next example would be the same. String:

TL.av1_binloopv2

Desired output:

TLAV1

Again, thanks all for the help!

like image 844
durandal Avatar asked Sep 21 '10 16:09

durandal


4 Answers

Even without re:

text.split('_', 1)[0].replace('.', '').upper()
like image 131
eumiro Avatar answered Nov 11 '22 17:11

eumiro


Try this:

re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())
like image 42
Gumbo Avatar answered Nov 11 '22 17:11

Gumbo


Since everyone is giving their favorite implementation, here's mine that doesn't use re:

>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
...     print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
...
AGAV08
TLAV1

I put .upper() on the outside of the generator so it is only called once.

like image 3
Steven Rumbalski Avatar answered Nov 11 '22 19:11

Steven Rumbalski


You don't have to use re for this. Simple string operations would be enough based on your requirements:

tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1
"""

for t in tests.splitlines(): 
     print t[:t.find('_')].replace('.', '').upper()

# Returns:
# AGAV08
# TLAV1

Or if you absolutely must use re:

import re 

pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)

for t in tests.splitlines():
    print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()

# Returns:
# AGAV08
# TLAV1
like image 2
jathanism Avatar answered Nov 11 '22 19:11

jathanism