I have a csv file like
id,body,category,subcategory,number,smstype,smsflag
50043,"Dear Customer,Thank you for registering",,,DM-YEBA,inbox,0
30082,Congrats! Your account has been activated.,,,DM-SBAW,inbox,0
when i'm using pd.read_csv()
then the whole first observation is included in the id
column and is not separated among other columns due to the double quotes used for the message body, while in second observation the line is properly separated among the columns.
What should I do such that the first observation is seperated among all columns like in this image
see what actually pd.read_csv is doing. It's taking the whole observation in id column
when i am opening the csv file in notepad it's adding extra quotation marks to the whole row which is eventually causing the fiasco and the quotation mark already in the file are escaped with another ' " ' as shown below.
id,body,category,subcategory,number,smstype,smsflag
"50043,""Dear Customer,Thank you for registering"",,,DM-YEBA,inbox,0"
30082,Congrats! Your account has been activated.,,,DM-SBAW,inbox,0
The main problem lies in the way csv file of microsoft excel is actually saved. When the same csv file is opened in notepad it adds extra quotation marks in the lines which have quotes.
1) It adds quote at the starting and ending of the line.
2) It escapes the existing quotes with one more quote. Hence, when we import our csv file in pandas it takes the whole line as one string and thus it ends up all in the first column.
To tackle this-
I imported the csv file and corrected the csv by applying regex substitution and saved it as text file. Then i imported this text file as pandas dataframe. Problem Solved.
with open('csvdata.csv','r+') as csv_file:
for line in csv_file:
# removing starting and ending quotes of a line
pattern1 = re.compile(r'^"|"$',re.MULTILINE)
line = re.sub(r'^"|"$',"",line)
# substituting escaped quote with a single quote
pattern2 = re.compile(r'""')
line = re.sub(r'""','"',line)
corrected_csv = open("new_csv.txt",'a')
corrected_csv.write(line)
corrected_csv.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With