Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas read_excel: sep parameter available?

Tags:

I'm trying to read a .xlsx into a pandas dataframe using
pd.read_excel("C:/...").

The problem is I only get one column which include all the data seperated by ",".

|---| "Country","Year","Export" |  
|---|---------------------------|  
| 0 | Canada,2017,3002          |  
| 1 | Bulgaria,2016,3960        |  
| 2 | Germany,2015,3818         |

But this is not the format that I want... I'd like to get three columns like the table below.

|---| "Country"    | "Year"   | "Export"   |  
|---|--------------|----------| -----------|  
|1  | Canada       | 2017     |       3002 |  
|2  | Bulgaria     | 2016     |       3960 |  
|3  | Germany      | 2015     |       3818 |

So i am looking for the sep=',' or delimiter=',' parameter as contained in pd.read_csv. I allready worked through the documentation of pandas.read_excel but have not found a parameter which handle this problem...

Thanks!

like image 659
stomar02 Avatar asked Apr 29 '18 21:04

stomar02


2 Answers

One option is to save your .xlsx as a csv file. If you open this in a text editor you should see the annoying column is saved within quotes, but has values separated with commas like:

"Country,Year,Export",...  
"Canada,2017,3002",...
"Bulgaria,2016,3960",...        
"Germany,2015,3818",... 

Then you can read this file with pd.read_csv() and it will create a column named 'Country,Year,Export' which looks like

  Country,Year,Export
0    Canada,2017,3002
1  Bulgaria,2016,3960
2   Germany,2015,3818

You can then split this into separate columns with str.split()

df[['Country', 'Year', 'Export']] = pd.DataFrame(df['Country,Year,Export'].str.split(',').tolist())

  Country,Year,Export   Country  Year Export
0    Canada,2017,3002    Canada  2017   3002
1  Bulgaria,2016,3960  Bulgaria  2016   3960
2   Germany,2015,3818   Germany  2015   3818
like image 184
ALollz Avatar answered Sep 28 '22 18:09

ALollz


Your Excel file might contain all its data in a single Excel column as comma-separated strings. read_excel doesn't have a sep= parameter, since it's designed to parse Excel files that don't use user-visible characters to demarcate columns.

You can either fix the Excel file, or fix the malformed DataFrame (named df) like this:

df2 = df.iloc[:, 0].str.split(',', expand=True)
df2.columns = [n.replace('"', '') for n in df.columns.str.split(',')[0]]
like image 37
Peter Leimbigler Avatar answered Sep 28 '22 18:09

Peter Leimbigler