I have a txt document with the following structure:
1:0.84722,0.52855;0.65268,0.24792;0.66525,0.46562
2:0.84722,0.52855;0.65231,0.24513;0.66482,0.46548
3:0.84722,0.52855;0.65197,0.24387;0.66467,0.46537
The first number with the colon is the index, and I don't know how to indicate it when I open the file. Indeed I would like to erase it. Then data is separated with commas and semicolons and I would like to have each number in a different column, regardless of whether the separator is a comma or a semicolon. How could I do it?
Use the following to load the csv using pd.read_csv:
import pandas as pd
df = pd.read_csv("data.csv", # the file path, change it to your filename
sep="[,;:]", # the separator use a regular expression
engine="python", # need this to use regular expression as sep
usecols=range(1, 7), # use columns from [1, 7)
header=None # no header
)
print(df)
Output
1 2 3 4 5 6
0 0.84722 0.52855 0.65268 0.24792 0.66525 0.46562
1 0.84722 0.52855 0.65231 0.24513 0.66482 0.46548
2 0.84722 0.52855 0.65197 0.24387 0.66467 0.46537
Note
Once you load the file I advise to save it (using to_csv) as a proper csv file.
As you are using pandas.read_csv already, simply have a look at its documentation for argument sep:
Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.
So in your case, simply calling pandas.read_csv(..., sep='[,;:]') should do the trick.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With