Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

issue with Backslash plague and re.split

Tags:

python

regex

csv

Question: Why as described below is the code cutting off the last character of some fields?

I have a string that I need parse, split up and then import as key/values into a dict. The issue I have is one field may contain multiple embedded comma-seperated subfields but in those cases there are three backslashes in front of the comma. I have the code 99% working but for some reason using the following code (which I think SHOULD work) results in the last character of all the other fields getting stripped. I think I understand the "backslash plague" in Python Regex and I tried this several ways but cannot find a way that both doesn't split the "ConfigChangeData" and doesnt drop the last character of the other fields.

First, here is the string I start with (in a variable called data):

2015-10-05 18:08:47,186 root         INFO     <181>Oct  5 17:09:10 someservername Administrative_and_Operational_Audit 0000419602 1 0 2015-10-05 17:09:10.841 -05:00 0000006065 52001 NOTICE Configuration-Changes: Changed configuration, Version=someversion.x86_64, ConfigVersionId=150, AdminInterface=GUI, AdminIPAddress=192.168.1.77, AdminSession=46CE916D0502A641592B105FF7CB3B70, AdminName=admin, ConfigChangeData='RADIUS:Shared Secret'='********'\\\,'TACACS+:Shared Secret'='********'\\\,'IP Address'='127.0.0.91/32', ObjectType=Network Device, ObjectName=testclient, ObjectId=4072, inLocalMode=false,

Here is my code:

##split the syslog data into CSV's in a list
#Here be dragons: One field, "ConfigChangeData" can have multiple embedded 
#subfields. This is indicated by three trailing backslashes
#The following line needs to split on commas NOT proceeded by a backslash
csvlist=re.split("[^\\\\],", data)
AVPdict=dict()
##Create an Attribute/value pair by analysing the CSV values
##If the CSV value represents a AVP pair (detected by presense of an = sign)
##add it to the AVP dict
for csv in csvlist:
    logger.debug("csv: %s" %(csv))
    if re.search("=", csv):
        csv=csv.strip() # clear out some embedded whitespace
        attribute,value=csv.split("=", 1)
        AVPdict[attribute]=value

Here is output from logging:

2015-10-05 18:08:47,189 root         DEBUG    csv:  Version=someversion.x86_6
2015-10-05 18:08:47,190 root         DEBUG    csv:  ConfigVersionId=15
2015-10-05 18:08:47,190 root         DEBUG    csv:  AdminInterface=GU
2015-10-05 18:08:47,190 root         DEBUG    csv: AdminIPAddress=192.168.7  
2015-10-05 18:08:47,191 root         DEBUG    csv:   AdminSession=46CE916D0502A641592B105FF7CB3B7
2015-10-05 18:08:47,191 root         DEBUG    csv:  AdminName=admi
2015-10-05 18:08:47,191 root         DEBUG    csv:  ConfigChangeData='RADIUS:Shared Secret'='********'\\\,'TACACS+:Shared Secret'='********'\\\,'IP Address'='127.0.0.91/32
2015-10-05 18:08:47,192 root         DEBUG    csv:  ObjectType=Network Devic
2015-10-05 18:08:47,192 root         DEBUG    csv:  ObjectName=testclien
2015-10-05 18:08:47,192 root         DEBUG    csv:  ObjectId=407
2015-10-05 18:08:47,193 root         DEBUG    csv:  inLocalMode=fals
2015-10-05 18:08:47,193 root         DEBUG    csv:
like image 903
wvunathans Avatar asked Dec 04 '25 18:12

wvunathans


1 Answers

Your regex pattern is consuming the last character before your comma because that character is part of the pattern you're splitting on. It's the character matched by the ugly [^\\\\] bit of the pattern.

I think you want a negative-lookbehind. This will let you check that the preceeding letter was not a backslash without actually including that character in the match.

csvlist=re.split(r"(?<!\\),", data)

Note that I'm using a raw string so you only need to two backslashes, rather than the four you were originally using.

like image 161
Blckknght Avatar answered Dec 06 '25 08:12

Blckknght