Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python : regex lookbehind get word after single or double quotes

I have file with contents like below. I am trying to extract the word next to "-x" in the file and finally need to get only uniq results. As a part of that i tried the below regex but got only single and double quotes in the output. When i use regex only for double quotes, I got the result.

File Content

00 04 * * 2-6   testuser   /get_results.sh -q -x 'igp_srm_m' -s 'yesterday' -e 'yesterday' -m '2048' -b >>'/var/log/process/srm-console.log' 2>&1
00 10 * * 2-6   testuser   /get_results.sh -q -x 'igp_srm_m' -s 'yesterday' -e 'yesterday' -m '2048' -w '720' >>'/var/log/process/srm-console.log' 2>&1

00 08 * * 1-5   testuser   /get_results.sh -q -x "igp_france" -s "today" -e "today" -m "90000" -b -z partA >>"/var/log/process/france-partA-console.log" 2>&1
00 12 * * 2-6   testuser   /get_results.sh -q -x "igp_france" -s "yesterday" -e "yesterday" -m "90000" -w "900" -z partA >>"/var/log/process/france-partA-console.log" 2>&1

00 08 * * 1-5   testuser   /get_results.sh -q -x "igp_france" -s "today" -e "today" -m "90000" -b -z partB >>"/var/log/process/france-partB-console.log" 2>&1
00 12 * * 2-6   testuser   /get_results.sh -q -x "igp_france" -s "yesterday" -e "yesterday" -m "90000" -w "900" -z partB >>"/var/log/process/france-partB-console.log" 2>&1

00 12 * * 2-6   testuser   JAVA_OPTS='-server -Xmx512m' /merge.sh "yesterday" "igp_france" "partA,partB" >>"/var/log/process/france-console.log" 2>&1
00 08 * * 1-5   testuser   /get_results.sh -q -x "igpswitz_france" -s "today" -e "today" -m "15000" -b >>'/var/log/process/igpswitz_france-console.log' 2>&1
00 12 * * 2-6   testuser   /get_results.sh -q -x "igpswitz_france" -s "yesterday" -e "yesterday" -m "15000" -Dapc.maxalerts=8000 -w "900" >>'/var/log/process/igpswitz_france-console.log' 2>&1

30 07 * * 2-6   testuser   /get_results.sh -q -x "igp_franced" -s 'yesterday' -e 'yesterday' -m "105000" -b >>"/var/log/process/franced-console.log" 2>&1
15 12 * * 2-6   testuser   /get_results.sh -q -x "igp_franced" -s 'yesterday' -e 'yesterday' -m "105000" -w "960" >>"/var/log/process/franced-console.log" 2>&1

Tried syntax

import re
with open ("test2") as file:
        for line in file:
                try:
                        m=re.search('(?<=\-x (\"|\'))(\w+)',line)
                        print m.group(1)
                except:
                        m = None

Expected output

igp_srm_m
igp_france
igpswitz_france
igp_franced

Received Output

'
'
"
"
"
"
"
"
"
"

Unsure what is going wrong, because when I tried only for double quotes it is working correctly.

Working script only for double quotes

import re
with open ("test2") as file:
        for line in file:
                try:
                        m = re.search('(?<=\-x \")(\w*)', line)
                        print m.group(1)
                except:
                        m = None

Received Output - Search for double quotes only

igp_france
igp_france
igp_france
igp_france
igpswitz_france
igpswitz_france
igp_franced
igp_franced
like image 707
iamsage Avatar asked Feb 21 '26 19:02

iamsage


2 Answers

You can use a set to get the unique values.

In your pattern, the values are in group 2, but you can optimize the pattern a bit. the single and double quote can be used in a character class (["']) and captured in group 1. Then you can use a backreference to pair up the matched quote using \

-x (["'])(\w+)\1

Regex demo | Python demo

import re

result = set()

with open ("test2") as file:
    for line in file:
        try:
            m = re.search(r"-x ([\"'])(\w+)\1", line)
            result.add(m.group(2))
        except:
            m = None

print(result)

Output

{'igp_france', 'igp_srm_m', 'igp_franced', 'igpswitz_france'}
like image 140
The fourth bird Avatar answered Feb 23 '26 08:02

The fourth bird


In

m=re.search('(?<=\-x (\"|\'))(\w+)',line)
print m.group(1)

instead of group(1), use group(2), basically,

m=re.search('(?<=\-x (\"|\'))(\w+)',line)
print m.group(2)

From trying out on https://regex101.com/, group 1 is coming up as ' , while using group 2 gives the required output.

The double quotes one is working correctly since your required output is already in group 1.

like image 39
Pratyush Goyal Avatar answered Feb 23 '26 08:02

Pratyush Goyal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!