Split complicated strings in Python dynamically

Question

I have been having difficulty with organizing a function that will handle strings in the manner I want. I have looked into a handful previous questions 1, 2, 3 among others that I have sorted through. Here is the set up, I have well structured but variable data that needs to be split from a string read from the file, to an array of strings. The following showcases some examples of the data I am dealing with

('Vdfbr76','gsdf','gsfd','',NULL),
('Vkdfb23l','gsfd','gsfg','ggg@df.gf',NULL),
('4asg0124e','Lead Actor/SFX MUA/Prop designer','John Smith','jsmith@email.com',NULL),
('asdguIux','Director, Camera Operator, Editor, VFX','John Smith','',NULL),
...
(492,'E1asegaZ1ox','Nysdag_5YmD','145872325372620',1,'long, string, with, commas'),

I want to split these strings based on commas, however, there are commas occasionally contained within the strings which causes problems. In addition to this, developing an accurate re.split(regex, line) becomes difficult becomes the number of items in each line changes throughout the read.

Some solutions that I have tried up to this point.

def splitLine(text, fields, delimiter):
    return_line = []

    regex_string = "(.*?),"

    for i in range(0,len(fields)-1):

        regex_string+=("(.*)")

        if i < len(fields)-2:
            regex_string+=delimiter

    return_line = re.split(regex_string, text)

    return return_line

This will give a result where we have the following output

 regex_string
 return_line

However the main problem with this is that it occasionally lumps two fields together. In the case the 3rd value in the array.

(.*?),(.*),(.*),(.*),(.*),(.*)
['', '	(222', "'Vy1asdfnuJkA','Ndfbyz3_YMD'", "'14541242640005471'", '2', "'Hello World!')", '', '
']

Where the ideal result would look like:

['', '	(222', "'Vy1asdfnuJkA'", "'Ndfbyz3_YMD'", "'14541242640005471'", '2', "'Hello World!')", '', '
']

It is a small change, but it has a huge influence on the result. I tried manipulating the regex string to better suit what I was trying to do, but with each case I solved, another broke it unfortunately.

Another case which I played around with came from user Aaron Cronin in this post 4, which looks like below

def split_at(text, delimiter, opens='<([', closes='>)]', quotes='"\''):
result = []
buff = ""
level = 0
is_quoted = False

for char in text:
    if char in delimiter and level == 0 and not is_quoted:
        result.append(buff)
        buff = ""
    else:
        buff += char

        if char in opens:
            level += 1
        if char in closes:
            level -= 1
        if char in quotes:
            is_quoted = not is_quoted

if not buff == "":
    result.append(buff)

return result

The results of this look like so:

["	('Vk3NIasef366l','gsdasdf','gsfasfd','',NULL),
"]

The main problem is that it comes out as the same string. Which puts me in a feedback loop.

The ideal result would look like:

[	('Vk3NIasef366l','gsdasdf','gsfasfd','',NULL),
]

Any help is appreciated, I am not sure what the best approach is in this scenario. I am happy to clarify any questions that arise as well. I tried to be as complete as possible.

UltraInstinct · Accepted Answer

Use ast's literal_eval!

from ast import literal_eval

s = """('Vdfbr76','gsdf','gsfd','',NULL),
('Vkdfb23l','gsfd','gsfg','ggg@df.gf',NULL),
('4asg0124e','Lead Actor/SFX MUA/Prop designer','John Smith','jsmith@email.com',NULL),
('asdguIux','Director, Camera Operator, Editor, VFX','John Smith','',NULL),
(492,'E1asegaZ1ox','Nysdag_5YmD','145872325372620',1,'long, string, with, commas'),
"""

for line in s.split("
"):
    line = line.strip().rstrip(",").replace("NULL", "None")
    if line:
        print list(literal_eval(line))  #list(..) is just an example

Output:

['Vdfbr76', 'gsdf', 'gsfd', '', None]
['Vkdfb23l', 'gsfd', 'gsfg', 'ggg@df.gf', None]
['4asg0124e', 'Lead Actor/SFX MUA/Prop designer', 'John Smith', 'jsmith@email.com', None]
['asdguIux', 'Director, Camera Operator, Editor, VFX', 'John Smith', '', None]
[492, 'E1asegaZ1ox', 'Nysdag_5YmD', '145872325372620', 1, 'long, string, with, commas']

Split complicated strings in Python dynamically

Tags:

python

string

regex

split

msleevi

1 Answers

UltraInstinct

Recent Activity

Donate For Us

Split complicated strings in Python dynamically

Tags:

python

string

regex

split

msleevi

1 Answers

UltraInstinct

Related questions

Recent Activity

Donate For Us