Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas escape carriage return in to_csv

Tags:

python

pandas

I have a string column that sometimes has carriage returns in the string:

import pandas as pd
from io import StringIO

datastring = StringIO("""\
country  metric           2011   2012
USA      GDP              7      4
USA      Pop.             2      3
GB       GDP              8      7
""")
df = pd.read_table(datastring, sep='\s\s+')
df.metric = df.metric + '\r'  # append carriage return

print(df)
  country  metric  2011  2012
0     USA   GDP\r     7     4
1     USA  Pop.\r     2     3
2      GB   GDP\r     8     7

When writing to and reading from csv, the dataframe gets corrupted:

df.to_csv('data.csv', index=None)

print(pd.read_csv('data.csv'))
  country metric  2011  2012
0     USA    GDP   NaN   NaN
1     NaN      7     4   NaN
2     USA   Pop.   NaN   NaN
3     NaN      2     3   NaN
4      GB    GDP   NaN   NaN
5     NaN      8     7   NaN

Question

What's the best way to fix this? The one obvious method is to just clean the data first:

df.metric = df.metric.str.replace('\r', '')
like image 319
Kamil Sindi Avatar asked Dec 31 '15 18:12

Kamil Sindi


People also ask

How do I export a pandas Dataframe to a CSV file?

In order to use Pandas to export a dataframe to a CSV file, you can use the aptly-named dataframe method, .to_csv (). The only required argument of the method is the path_or_buf = parameter, which specifies where the file should be saved.

What is escapecharstr in CSV file?

escapecharstr, default None String of length 1. Character used to escape sep and quotechar when appropriate. Show activity on this post. Huh. This seems like an open issue with round-tripping data from pandas to csv.

What is the use of escapecharstr in pandas?

For reference this parameter's definition in the pandas docs is: escapecharstr, default None String of length 1. Character used to escape sep and quotechar when appropriate. Show activity on this post. Huh. This seems like an open issue with round-tripping data from pandas to csv.

What is pandas Dataframe to_CSV () function?

Pandas DataFrame: to_csv () function Last update on May 21 2020 13:58:09 (UTC/GMT +8 hours) DataFrame - to_csv () function The to_csv () function is used to write object to a comma-separated values (csv) file.


1 Answers

Specify the line_terminator:

print(pd.read_csv('data.csv', line_terminator='\n'))

  country  metric  2011  2012
0     USA   GDP\r     7     4
1     USA  Pop.\r     2     3
2      GB   GDP\r     8     7

UPDATE:

In more recent versions of pandas (the original answer is from 2015) the name of the argument changed to lineterminator.

like image 68
Mike Müller Avatar answered Sep 30 '22 18:09

Mike Müller