Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Python / group quantifiers

I want to match a list of variables which look like directories, e.g.:

Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34

The length of the "subdirectories" is variable having an upper bound (above it's 9). I want to group every subdirectory except the 1st one which I named "Same" above.

The best I could come up with is:

^(?:([^/]+)/){4,8}([^/]+)=(.*)

It already looks for 4-8 subdirectories but only groups the last one. Why's that? Is there a better solution using group quantifiers?

Edit: Solved. Will use split() instead.

like image 406
runDOSrun Avatar asked Dec 13 '25 00:12

runDOSrun


2 Answers

import re

regx = re.compile('(?:(?<=\A)|(?<=/)).+?(?=/|\Z)')


for ss in ('Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123',
           'Same/Same2/Battery/Name=SomeString',
           'Same/Same2/Home/Land/Some/More/Stuff=0.34'):

    print ss
    print regx.findall(ss)
    print

Edit 1

Now you have given more info on what you want to obtain ( _"Same/Same2/Battery/Name=SomeString becoming SAME2_BATTERY_NAME=SomeString"_ ) better solutions can be proposed: either with a regex or with split() , + replace()

import re
from os import sep

sep2 = r'\\' if sep=='\\' else '/'

pat = '^(?:.+?%s)(.+$)' % sep2
print 'pat==%s\n' % pat

ragx = re.compile(pat)

for ss in ('Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123',
           'Same\Same2\Battery\Name=SomeString',
           'Same\Same2\Home\Land\Some\More\Stuff=0.34'):

    print ss
    print ragx.match(ss).group(1).replace(sep,'_')
    print ss.split(sep,1)[1].replace(sep,'_')
    print

result

pat==^(?:.+?\\)(.+$)

Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123

Same\Same2\Battery\Name=SomeString
Same2_Battery_Name=SomeString
Same2_Battery_Name=SomeString

Same\Same2\Home\Land\Some\More\Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34

Edit 2

Re-reading your comment, I realized that I didn't take in account that you want to upper the part of the strings that lies before the '=' sign but not after it.

Hence, this new code that exposes 3 methods that answer this requirement. You will choose which one you prefer:

import re

from os import sep
sep2 = r'\\' if sep=='\\' else '/'



pot = '^(?:.+?%s)(.+?)=([^=]*$)' % sep2
print 'pot==%s\n' % pot
rogx = re.compile(pot)

pet = '^(?:.+?%s)(.+?(?==[^=]*$))' % sep2
print 'pet==%s\n' % pet
regx = re.compile(pet)


for ss in ('Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123',
           'Same\Same2\Battery\Name=SomeString',
           'Same\Same2\Ocean\Atlantic\North=',
           'Same\Same2\Maths\Addition\\2+2=4\Simple=ohoh'):
    print ss + '\n' + len(ss)*'-'

    print 'rogx groups  '.rjust(32),rogx.match(ss).groups()

    a,b = ss.split(sep,1)[1].rsplit('=',1)
    print 'split split  '.rjust(32),(a,b)
    print 'split split join upper replace   %s=%s' % (a.replace(sep,'_').upper(),b)

    print 'regx split group  '.rjust(32),regx.match(ss.split(sep,1)[1]).group()
    print 'regx split sub  '.rjust(32),\
          regx.sub(lambda x: x.group(1).replace(sep,'_').upper(), ss)
    print

result, on a Windows platform

pot==^(?:.+?\\)(.+?)=([^=]*$)

pet==^(?:.+?\\)(.+?(?==[^=]*$))

Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123
----------------------------------------------
                   rogx groups   ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
                   split split   ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
split split join upper replace   SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
              regx split group   Same2\Foot\Ankle\Joint\Sensor\Value
                regx split sub   SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123

Same\Same2\Battery\Name=SomeString
----------------------------------
                   rogx groups   ('Same2\\Battery\\Name', 'SomeString')
                   split split   ('Same2\\Battery\\Name', 'SomeString')
split split join upper replace   SAME2_BATTERY_NAME=SomeString
              regx split group   Same2\Battery\Name
                regx split sub   SAME2_BATTERY_NAME=SomeString

Same\Same2\Ocean\Atlantic\North=
--------------------------------
                   rogx groups   ('Same2\\Ocean\\Atlantic\\North', '')
                   split split   ('Same2\\Ocean\\Atlantic\\North', '')
split split join upper replace   SAME2_OCEAN_ATLANTIC_NORTH=
              regx split group   Same2\Ocean\Atlantic\North
                regx split sub   SAME2_OCEAN_ATLANTIC_NORTH=

Same\Same2\Maths\Addition\2+2=4\Simple=ohoh
-------------------------------------------
                   rogx groups   ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
                   split split   ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
split split join upper replace   SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
              regx split group   Same2\Maths\Addition\2+2=4\Simple
                regx split sub   SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
like image 80
eyquem Avatar answered Dec 14 '25 14:12

eyquem


I probably misunderstood what exactly you want to do, but here is how you would do it without regex:

for entry in list_of_vars:
    key, value = entry.split('=')
    key_components = key.split('/')
    if 4 <= len(key_components) <= 8:
        # here the actual work is done
        print "%s=%s" % ('_'.join(key_components[1:]).upper(), value)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!