Python

Question

I am using python 2.7 and have a this in my code: regexp = re.compile('ttp_ws_sm_(\d)_')

That searches in my loop for different characters as in my sample (after the third underscore). I need to also do the same for strings like 'ttpv1_(\d+)_'

The two things I have tried are:

regexp = re.compile('ttp_ws_sm_(\d)_' or 'ttpv1_(\d+)_')

and

name = ('ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_')
regexp = re.compile(name)

Here's some example data:

sample filheader row
date,ttp_ws_sm_001_01, , , , , , , , , , , ,117
date,ttp_ws_sm_001_blank, , , , , , , , , , , ,31
date,ttp_ws_sm_045_01, , , , , , , , , , , ,145
date,ttp_ws_sm_045_blank, , , , , , , , , , , ,55
date,ttp_ws_sm_057_blank, , , , , , , , , , , ,98
date,ttpv1_001_, , , , , , , , , , , ,67
date,ttpv1_001_01, , , , , , , , , , , ,67*e is

complete code is:

from collections import defaultdict

import sys
import csv
import re
import os

#variables
output_path = '\\Isfs\data$\GIS Carto\TTP_Draw_Count'
source = '\\Isfs\data$\GIS Carto\TTP_Draw_Count'
name = ('ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_')

def main():
    result = defaultdict(int)
    regexp = re.compile(name)

    with open(os.path.join(source, 'TTP_13_08.csv'), 'r') as f:
        rows = csv.reader(f)

        for row in rows:
            match = regexp.search(row[1])
            if match:
                result[match.group(1)] += int(row[13])

    for key, value in result.items():

         print ("Club %s %s" % (key, value))

if __name__ == '__main__':
    main()

If I don't use name and just put either of both strings in the compile statement I only return one set of totals. I need to have both sets combine and print for "001", '045'

flornquake · Accepted Answer

If I understand you correctly, you want a regex that matches either 'ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_'?

You can use the pipe character |:

re.compile(r'(?:ttp_ws_sm|ttpv1)_(\d+)_')

?: makes the first group non-capturing.

>>> pattern = re.compile(r'(?:ttp_ws_sm|ttpv1)_(\d+)_')
>>> pattern.match('ttpv1_001_').group(1)
'001'
>>> pattern.match('ttp_ws_sm_045_blank').group(1)
'045'

'ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_' doesn't work because it is actually the same as 'ttp_ws_sm_(\d+)_'. See Max's answer for an explanation.

Max · Answer

You should read a Python book. You have some severe misunderstandings of the language.

'ttp_ws_sm(\d+)_' or 'ttpv1_(\d+)_'

is a Boolean expression. Python interprets nonempty strings as truthy so it interprets this as (true thing or true thing). When the first part of a Boolean or is true, Python doesn't even look at the second part and just returns the first. Look:

('foo' or 'bar') == 'foo'
>>> True

That's why it (accidentally) works inside re.compile. Passing a Boolean expression to re.compile doesn't really make sense.

Secondly it's not clear what you're even trying to accomplish here. A single regexp might not be appropriate or could require different capture groups.

Python - can you use re.compile looking for two strings?

Tags:

regex

Mike Hirschmann

2 Answers

flornquake

Max

Recent Activity

Donate For Us