Parse 4th capital letter of line in Python?

Question

How can I parse lines of text from the 4th occurrence of a capital letter onward? For example given the lines:

adsgasdlkgasYasdgjaUUalsdkjgaZsdalkjgalsdkjTlaksdjfgasdkgj
oiwuewHsajlkjfasNasldjgalskjgasdIasdllksjdgaPlsdakjfsldgjQ

I would like to capture:

`ZsdalkjgalsdkjTlaksdjfgasdkgj`
`PlsdakjfsldgjQ`

I'm sure there is probably a better way than regular expressions, but I was attempted to do a non-greedy match; something like this:

match = re.search(r'[A-Z].*?$', line).group()

NPE · Accepted Answer

I present two approaches.

Approach 1: all-out regex

In [1]: import re

In [2]: s = 'adsgasdlkgasYasdgjaUUalsdkjgaZsdalkjgalsdkjTlaksdjfgasdkgj'

In [3]: re.match(r'(?:.*?[A-Z]){3}.*?([A-Z].*)', s).group(1)
Out[3]: 'ZsdalkjgalsdkjTlaksdjfgasdkgj'

The .*?[A-Z] consumes characters up to, and including, the first uppercase letter.

The (?:...){3} repeats the above three times without creating any capture groups.

The following .*? matches the remaining characters before the fourth uppercase letter.

Finally, the ([A-Z].*) captures the fourth uppercase letter and everything that follows into a capture group.

Approach 2: simpler regex

In [1]: import re

In [2]: s = 'adsgasdlkgasYasdgjaUUalsdkjgaZsdalkjgalsdkjTlaksdjfgasdkgj'

In [3]: ''.join(re.findall(r'[A-Z][^A-Z]*', s)[3:])
Out[3]: 'ZsdalkjgalsdkjTlaksdjfgasdkgj'

This attacks the problem directly, and I think is easier to read.

jsbueno · Answer

Anyway not using regular expressions will seen to be too verbose - although at the bytcodelevel it is a very simple algorithm running, and therefore lightweight.

It may be that regexpsare faster, since they are implemented in native code, but the "one obvious way to do it", though boring, certainly beats any suitable regexp in readability hands down:

def find_capital(string, n=4):
    count = 0
    for index, letter in enumerate(string):
        # The boolean value counts as 0 for False or 1 for True
        count += letter.isupper()  
        if count == n:
            return string[index:]
    return ""

Parse 4th capital letter of line in Python?

Tags:

python

drbunsen

2 Answers

NPE

jsbueno

Recent Activity

Donate For Us

Parse 4th capital letter of line in Python?

Tags:

python

drbunsen

2 Answers

NPE

jsbueno

Related questions

Recent Activity

Donate For Us