Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python parsing CSV string with possible sets

I have a CSV string where some of the items might be enclosed by {} with commas inside. I wanted to collect the string values in a list.

What is the most pythonic way to collect the values in a list?

Example 1: 'a,b,c', expected output ['a', 'b', 'c']

Example 2: '{aa,ab}, b, c', expected output ['{aa,ab}','b','c']

Example 3: '{aa,ab}, {bb,b}, c', expected output ['{aa,ab}', '{bb,b}', 'c']

I have tried to work with s.split(','), it works for example 1 but will mess up for case 2 and 3.

I believe that this question (How to split but ignore separators in quoted strings, in python?) is very similar to my problem. But I can't figure out the proper regex syntax to use.

like image 317
rph Avatar asked Jan 25 '26 10:01

rph


2 Answers

The solution is very similar in fact:

import re
PATTERN = re.compile(r'''\s*((?:[^,{]|\{[^{]*\})+)\s*''')
data = '{aa,ab}, {bb,b}, c'
print(PATTERN.split(data)[1::2])

will give:

['{aa,ab}', '{bb,b}', 'c']
like image 181
Marco Pantaleoni Avatar answered Jan 27 '26 01:01

Marco Pantaleoni


A more readable way (at least to me) is to explain what you are looking for: either something between brackets { } or something that only contains alphanumeric characters:

import re 

examples = [
  'a,b,c',
  '{aa,ab}, b, c',
  '{aa,ab}, {bb,b}, c'
]

for example in examples:
  print(re.findall(r'(\{.+?\}|\w+)', example))

It prints

['a', 'b', 'c']
['{aa,ab}', 'b', 'c']
['{aa,ab}', '{bb,b}', 'c']
like image 27
Guybrush Avatar answered Jan 26 '26 23:01

Guybrush



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!