I'm having this type of CSV file:
12012;My Name is Mike. What is your's?;3;0
1522;In my opinion: It's cool; or at least not bad;4;0
21427;Hello. I like this feature!;5;1
I want to get this data into da pandas.DataFrame
.
But read_csv(sep=";")
throws exceptions due to the semicolon in the user generated message column in line 2 (In my opinion: It's cool; or at least not bad). All remaining columns constantly have numeric dtypes.
What is the most convenient method to manage this?
delimiter specifies the character used to separate each field. The default is the comma ( ',' ). quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ).
We can use the pandas Series. str. split() function to split strings in the column around a given separator/delimiter. which is similar to the python string split() function but applies to the entire data frame column.
Dealing with unquoted delimiters is always a nuisance. In this case, since it looks like the broken text is known to be surrounded by three correctly-encoded columns, we can recover. TBH, I'd just use the standard Python reader and build a DataFrame once from that:
import csv
import pandas as pd
with open("semi.dat", "r", newline="") as fp:
reader = csv.reader(fp, delimiter=";")
rows = [x[:1] + [';'.join(x[1:-2])] + x[-2:] for x in reader]
df = pd.DataFrame(rows)
which produces
0 1 2 3
0 12012 My Name is Mike. What is your's? 3 0
1 1522 In my opinion: It's cool; or at least not bad 4 0
2 21427 Hello. I like this feature! 5 1
Then we can immediately save it and get something quoted correctly:
In [67]: df.to_csv("fixedsemi.dat", sep=";", header=None, index=False)
In [68]: more fixedsemi.dat
12012;My Name is Mike. What is your's?;3;0
1522;"In my opinion: It's cool; or at least not bad";4;0
21427;Hello. I like this feature!;5;1
In [69]: df2 = pd.read_csv("fixedsemi.dat", sep=";", header=None)
In [70]: df2
Out[70]:
0 1 2 3
0 12012 My Name is Mike. What is your's? 3 0
1 1522 In my opinion: It's cool; or at least not bad 4 0
2 21427 Hello. I like this feature! 5 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With