I am using python 2.7 and have a this in my code: regexp = re.compile('ttp_ws_sm_(\d)_')
That searches in my loop for different characters as in my sample (after the third underscore). I need to also do the same for strings like 'ttpv1_(\d+)_'
The two things I have tried are:
regexp = re.compile('ttp_ws_sm_(\d)_' or 'ttpv1_(\d+)_')
and
name = ('ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_')
regexp = re.compile(name)
Here's some example data:
sample filheader row
date,ttp_ws_sm_001_01, , , , , , , , , , , ,117
date,ttp_ws_sm_001_blank, , , , , , , , , , , ,31
date,ttp_ws_sm_045_01, , , , , , , , , , , ,145
date,ttp_ws_sm_045_blank, , , , , , , , , , , ,55
date,ttp_ws_sm_057_blank, , , , , , , , , , , ,98
date,ttpv1_001_, , , , , , , , , , , ,67
date,ttpv1_001_01, , , , , , , , , , , ,67*e is
complete code is:
from collections import defaultdict
import sys
import csv
import re
import os
#variables
output_path = '\\\\Isfs\\data$\\GIS Carto\TTP_Draw_Count'
source = '\\\\Isfs\\data$\\GIS Carto\TTP_Draw_Count'
name = ('ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_')
def main():
result = defaultdict(int)
regexp = re.compile(name)
with open(os.path.join(source, 'TTP_13_08.csv'), 'r') as f:
rows = csv.reader(f)
for row in rows:
match = regexp.search(row[1])
if match:
result[match.group(1)] += int(row[13])
for key, value in result.items():
print ("Club %s %s" % (key, value))
if __name__ == '__main__':
main()
If I don't use name and just put either of both strings in the compile statement I only return one set of totals. I need to have both sets combine and print for "001", '045'
If I understand you correctly, you want a regex that matches either 'ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_'?
You can use the pipe character |:
re.compile(r'(?:ttp_ws_sm|ttpv1)_(\d+)_')
?: makes the first group non-capturing.
>>> pattern = re.compile(r'(?:ttp_ws_sm|ttpv1)_(\d+)_')
>>> pattern.match('ttpv1_001_').group(1)
'001'
>>> pattern.match('ttp_ws_sm_045_blank').group(1)
'045'
'ttp_ws_sm_(\d+)_' or 'ttpv1_(\d+)_' doesn't work because it is actually the same as 'ttp_ws_sm_(\d+)_'. See Max's answer for an explanation.
You should read a Python book. You have some severe misunderstandings of the language.
'ttp_ws_sm(\d+)_' or 'ttpv1_(\d+)_'
is a Boolean expression. Python interprets nonempty strings as truthy so it interprets this as (true thing or true thing). When the first part of a Boolean or is true, Python doesn't even look at the second part and just returns the first. Look:
('foo' or 'bar') == 'foo'
>>> True
That's why it (accidentally) works inside re.compile. Passing a Boolean expression to re.compile doesn't really make sense.
Secondly it's not clear what you're even trying to accomplish here. A single regexp might not be appropriate or could require different capture groups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With