Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Split a row into columns - csv data

Am trying to read data from csv file, split each row into respective columns.

But my regex is failing when a particular column has commas with in itself.

eg: a,b,c,"d,e, g,",f

I want result like:

a    b    c    "d,e, g,"    f  

which is 5 columns.

Here is the regex am using to split the string by comma

,(?=(?:"[^"]?(?:[^"])*))|,(?=[^"]+(?:,)|,+|$)

but it fails for few strings while it works for others.

All am looking for is, when I read data from csv using pyspark into dataframe/rdd, I want to load/preserve all the columns without any mistakes

Thank You

like image 541
Alekhya Vemavarapu Avatar asked Dec 04 '25 21:12

Alekhya Vemavarapu


2 Answers

Much easier with the help of the newer regex module:

import regex as re

string = 'a,b,c,"d,e, g,",f'
rx = re.compile(r'"[^"]*"(*SKIP)(*FAIL)|,')

parts = rx.split(string)
print(parts)
# ['a', 'b', 'c', '"d,e, g,"', 'f']

It supports the (*SKIP)(*FAIL) mechanism, which ignores everything betweem double quotes in this example.


If you have escaped double quotes, you could use:
import regex as re

string = '''a,b,c,"d,e, g,",f, this, one, with "escaped \"double",quotes:""'''
rx = re.compile(r'".*?(?<!\\)"(*SKIP)(*FAIL)|,')
parts = rx.split(string)
print(parts)
# ['a', 'b', 'c', '"d,e, g,"', 'f', ' this', ' one', ' with "escaped "double",quotes:""']

See a demo for the latter on regex101.com.


For nearly 50 points, I feel to provide the csv methods as well:
import csv
string = '''a,b,c,"d,e, g,",f, this, one, with "escaped \"double",quotes:""'''

# just make up an iterable, normally a file would go here
for row in csv.reader([string]):
    print(row)
    # ['a', 'b', 'c', 'd,e, g,', 'f', ' this', ' one', ' with "escaped "double"', 'quotes:""']
like image 129
Jan Avatar answered Dec 06 '25 11:12

Jan


Try \,(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$).

Used this answer which explains how to match everything that is not in quotes ignoring escaped quotes and http://regexr.com/ to test.

Note that - as other answers to your question state - there are better ways to parse CSV than use a regex.

like image 29
Erwin Rooijakkers Avatar answered Dec 06 '25 10:12

Erwin Rooijakkers



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!