Python open a csv document that have different types of separators

Question

I have a txt document with the following structure:

1:0.84722,0.52855;0.65268,0.24792;0.66525,0.46562
2:0.84722,0.52855;0.65231,0.24513;0.66482,0.46548
3:0.84722,0.52855;0.65197,0.24387;0.66467,0.46537

The first number with the colon is the index, and I don't know how to indicate it when I open the file. Indeed I would like to erase it. Then data is separated with commas and semicolons and I would like to have each number in a different column, regardless of whether the separator is a comma or a semicolon. How could I do it?

Dani Mesejo · Accepted Answer

Use the following to load the csv using pd.read_csv:

import pandas as pd

df = pd.read_csv("data.csv",  # the file path, change it to your filename 
                 sep="[,;:]",  # the separator use a regular expression
                 engine="python",  # need this to use regular expression as sep
                 usecols=range(1, 7),  # use columns from [1, 7)
                 header=None  # no header
                 )
print(df)

Output

         1        2        3        4        5        6
0  0.84722  0.52855  0.65268  0.24792  0.66525  0.46562
1  0.84722  0.52855  0.65231  0.24513  0.66482  0.46548
2  0.84722  0.52855  0.65197  0.24387  0.66467  0.46537

Note
Once you load the file I advise to save it (using to_csv) as a proper csv file.

ojdo · Answer

As you are using pandas.read_csv already, simply have a look at its documentation for argument sep:

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: ' '.

So in your case, simply calling pandas.read_csv(..., sep='[,;:]') should do the trick.

Python open a csv document that have different types of separators

Tags:

python

pandas

csv

Cristina Dominguez Fernandez

2 Answers

Dani Mesejo

ojdo

Recent Activity

Donate For Us

Python open a csv document that have different types of separators

Tags:

python

pandas

csv

Cristina Dominguez Fernandez

2 Answers

Dani Mesejo

ojdo

Related questions

Recent Activity

Donate For Us