Returning all characters before the first underscore

Question

Using re in Python, I would like to return all of the characters in a string that precede the first appearance of an underscore. In addition, I would like the string that is being returned to be in all uppercase and without any non-alpanumeric characters.

For example:

AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1

I am pretty sure I know how to return a string in all uppercase using string.upper() but I'm sure there are several ways to remove the . efficiently. Any help would be greatly appreciated. I am still learning regular expressions slowly but surely. Each tip gets added to my notes for future use.

To further clarify, my above examples aren't the actual strings. The actual string would look like:

AG.av08_binloop_v6

With my desired output looking like:

AGAV08

And the next example would be the same. String:

TL.av1_binloopv2

Desired output:

TLAV1

Again, thanks all for the help!

eumiro · Accepted Answer

Even without re:

text.split('_', 1)[0].replace('.', '').upper()

Gumbo · Answer

Try this:

re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())

Steven Rumbalski · Answer

Since everyone is giving their favorite implementation, here's mine that doesn't use re:

>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
...     print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
...
AGAV08
TLAV1

I put .upper() on the outside of the generator so it is only called once.

jathanism · Answer

You don't have to use re for this. Simple string operations would be enough based on your requirements:

tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1
"""

for t in tests.splitlines(): 
     print t[:t.find('_')].replace('.', '').upper()

# Returns:
# AGAV08
# TLAV1

Or if you absolutely must use re:

import re 

pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)

for t in tests.splitlines():
    print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()

# Returns:
# AGAV08
# TLAV1

Returning all characters before the first underscore

Tags:

python

string

regex

durandal

4 Answers

eumiro

Gumbo

Steven Rumbalski

jathanism

Recent Activity

Donate For Us

Returning all characters before the first underscore

Tags:

python

string

regex

durandal

4 Answers

eumiro

Gumbo

Steven Rumbalski

jathanism

Related questions

Recent Activity

Donate For Us